Skip to content

Unify summary/description logic to timetable#25045

Merged
uranusjr merged 1 commit intoapache:mainfrom
astronomer:dataset-timetable
Jul 15, 2022
Merged

Unify summary/description logic to timetable#25045
uranusjr merged 1 commit intoapache:mainfrom
astronomer:dataset-timetable

Conversation

@uranusjr
Copy link
Member

These values are pulled directly from the timetable, so we can unify the dataset case into it instead of needing extra if-else cases.

These values are pulled directly from the timetable, so we can unify the
dataset case into it instead of needing extra if-else cases.
@uranusjr uranusjr marked this pull request as ready for review July 14, 2022 05:55
@uranusjr uranusjr requested review from XD-DENG, ashb and kaxil as code owners July 14, 2022 05:55
else:
orm_dag.schedule_interval = dag.schedule_interval
orm_dag.timetable_description = dag.timetable.description
orm_dag.schedule_interval = dag.schedule_interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uranusjr this needs to be 'Dataset' but right now it will be None

Copy link
Contributor

@dstandish dstandish Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yesterday i tried doing the same thing, but making it dag.timetable.summary --- which would be probably the "right" way -- but then tests started failing so i aborted mission

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm those two should be the same value (due to the assignment in __init__ so something somewhere is changing the value and we need to find out what that is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh -- maybe i have that wrong! lemme look

Copy link
Contributor

@dstandish dstandish Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so ... here's what's going on... i didn't see that you set DAG.schedule_interval = 'Dataset' in DAG.init

so i assumed it would be None here.

the way i tried this before, I had DAG.schedule_interval == None and DagModel.schudule_interval == 'Dataset', with the thought that DAG.schedule_interval should be None, while DagModel.schedule_interval would be better named DagModel.schedule_label or something.

the idea is that there are two different things going on here. one is, the "label" presented under "schedule" column in dags page. other is, the schedule_interval param for DAG object -- which for datasets is None -- cus there is no schedule interval for this kind of dag. it may best to make clear this distinction and leave DAG.schedule_interval None while setting DagModel.schedule_interval to Dataset (and perhaps renaming DagModel.schedule_interval to DagModel.schedule_summary or DagModel.schedule_label). But doing this runs afoul of some tests and will be more work, and we need not let perfect stand in the way of good, so i'll approve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the key “problem” per-se is that DAG.schedule_interval at this point does not actually have any functionality beyond determining the label name we show on the UI. A rename here is really needed to reflect that, and this change makes sense if you look pass the attribute name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i guess what i am saying though is that while DagModel.schedule_interval does not have meaning and is just a label, DAG.schedule_interval does, and so that's why it feels a little off to store the label Dataset on DAG.schedule_interval. storing on the DagModel attr makes sense but, in a perfect world, we wouldn't also set it on DagModel. anyway, nothing really to worry about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while DagModel.schedule_interval does not have meaning and is just a label, DAG.schedule_interval does

er, at least i think it does, doesn't it? like that's where you set @daily or 30 7 * * *

@jedcunningham jedcunningham added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) AIP-48 labels Jul 14, 2022
Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have made initial review "request changes" (just to prevent inadvertent premature merge)

else:
orm_dag.schedule_interval = dag.schedule_interval
orm_dag.timetable_description = dag.timetable.description
orm_dag.schedule_interval = dag.schedule_interval
Copy link
Contributor

@dstandish dstandish Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so ... here's what's going on... i didn't see that you set DAG.schedule_interval = 'Dataset' in DAG.init

so i assumed it would be None here.

the way i tried this before, I had DAG.schedule_interval == None and DagModel.schudule_interval == 'Dataset', with the thought that DAG.schedule_interval should be None, while DagModel.schedule_interval would be better named DagModel.schedule_label or something.

the idea is that there are two different things going on here. one is, the "label" presented under "schedule" column in dags page. other is, the schedule_interval param for DAG object -- which for datasets is None -- cus there is no schedule interval for this kind of dag. it may best to make clear this distinction and leave DAG.schedule_interval None while setting DagModel.schedule_interval to Dataset (and perhaps renaming DagModel.schedule_interval to DagModel.schedule_summary or DagModel.schedule_label). But doing this runs afoul of some tests and will be more work, and we need not let perfect stand in the way of good, so i'll approve.

@uranusjr uranusjr merged commit 3a7fec5 into apache:main Jul 15, 2022
@uranusjr uranusjr deleted the dataset-timetable branch July 15, 2022 02:46
@jedcunningham jedcunningham added this to the Airflow 2.4.0 milestone Sep 15, 2022
@eladkal eladkal added area:data-aware-scheduling assets, datasets, AIP-48 and removed AIP-48 labels Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:data-aware-scheduling assets, datasets, AIP-48 changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants