New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax mandatory requirement for start_date when schedule=None #35356
Conversation
tests/models/test_dag.py
Outdated
dag = DAG("dag_without_start_date") | ||
dag.add_task(BaseOperator(task_id="task_without_start_date")) | ||
dagrun = dag.create_dagrun( | ||
state=State.RUNNING, run_type=DagRunType.SCHEDULED, execution_date=DEFAULT_DATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can DagRunType
be scheduled ? With scheduled runs we must have start_date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can DagRunType be scheduled ? With scheduled runs we must have start_date
IMHO it should be ok if catchup is False
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler creates runs with type scheduled. The fix in this PR is for manual runs that are created by the user.
catchup parameter has no affect on manual runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated DagRunType.SCHEDULED
to DagRunType.MANUAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I wonder if we can do the same thing for dags with schedule!=None
if the catchup is set to False
, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think so because the first run depends on the start_date. If you set it in the past and interval passed when dag is activated it will create a run. If you set it for a future date then interval is not completed thus no run will be created when dag is set to active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but we can consider that this condition is always met when the user does not provide a start date.
50aa04a
to
4ce9181
Compare
4ce9181
to
fd17fe7
Compare
Do we have a test for a non-None schedule raising an exception? |
@uranusjr Do we need to fail explicitly for non None Schedule and empty start date? In this case, it will start the job on the next schedule |
Also want to mention that previously start date and task date were checked to throw the start date error which is part of add task. But schedule will not be empty then as it will be the default schedule interval value. |
@uranusjr @eladkal @hussein-awala |
If I remember correctly, we currently fail explicitly for that combination, so it’s better to continue the behaviour. Having a DAG created silently but can never fire can be confusing for users. Task start dates are fine, just consider the DAG date. |
fd17fe7
to
da7c7bf
Compare
Thanks for the input. Changes are added. |
Tests are failing:
|
da7c7bf
to
697e8ec
Compare
Fixed the failing tests |
697e8ec
to
922e8fb
Compare
922e8fb
to
9cdd32d
Compare
9cdd32d
to
18f2dba
Compare
* Relax mandatory requirement for start_date when schedule=None * Updated run_type in unit tests * Added check for empty start_date and non empty schedule * Fix the build failures * Fix the build failures * Update based on review comments (cherry picked from commit 930f165)
As a part of this PR, existing
start_date
validation is removed to handleschedule=None
Closes: #35199