New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Last automated data interval is always available in custom timetable #27672
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
The first ever run of a DAG is generally calculated when the DAG is detected and pushed into the system by the DAG parser, before a DAG run is ever created. You should be able to see these fields in the |
@uranusjr ok, thanks for the clarification. This could be useful to add to the docs. |
It kind of depends. I believe the DAG parser is a part of the scheduler process, so it should be somewhere in the scheduler logs. But if you add the DAG very early, it might be buried somewhere in Airflow startup and becomes unvisible (due to log config). Honestly there are many parts of how Airflow log things that’s unclear to me as well.
An entry in under Concepts that goes through how a DAG run is created would likely be a good idea. Would you be interested in helping out with that? |
yeah sure, I could probably do it in the next couple of days. |
@mrn-aglic and @eladkal - let me know if you want me to help here and close this. |
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
I'm writing an example custom timetable. Implemented
next_dagrun_info
.From the docs and examples the parameter
last_automated_data_interval
should beNone
if there are noprevious runs.
However, when I start up the example:
dag_run
is empty.last_automated_data_interval
is a data interval and notNone
as specified by documentation.This raises the question of how to determine the first DAG run (probably could subtract the DataInterval start and start_date from the DAG (if possible).
Here is an example from the logs:
airflow-feat-scheduler | [2022-11-14 19:57:58,934] {WorkDayTimetable.py:28} INFO - last_automated_data_interval: DataInterval(start=DateTime(2022, 11, 10, 0, 0, 0, tzinfo=Timezone('UTC')), end=DateTime(2022, 11, 11, 0, 0, 0, tzinfo=Timezone('UTC')))
I'm using Airflow 2.4.2.
What you think should happen instead
The value of the parameter should be None as specified in the docs.
How to reproduce
Should be reproducible by running the example given in the docs and logging the value of the parameter:
last_automated_data_interval
. Should appear in the logs.Operating System
macOs Ventura
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
The problem occurs every time.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: