Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last automated data interval is always available in custom timetable #27672

Open
1 of 2 tasks
mrn-aglic opened this issue Nov 14, 2022 · 6 comments
Open
1 of 2 tasks

Last automated data interval is always available in custom timetable #27672

mrn-aglic opened this issue Nov 14, 2022 · 6 comments

Comments

@mrn-aglic
Copy link

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

I'm writing an example custom timetable. Implemented next_dagrun_info.
From the docs and examples the parameter last_automated_data_interval should be None if there are no
previous runs.

However, when I start up the example:

  1. I can confirm that the table dag_run is empty.
  2. when starting (unpausing the DAG) for the first time, the last_automated_data_interval is a data interval and not None as specified by documentation.

This raises the question of how to determine the first DAG run (probably could subtract the DataInterval start and start_date from the DAG (if possible).

Here is an example from the logs:
airflow-feat-scheduler | [2022-11-14 19:57:58,934] {WorkDayTimetable.py:28} INFO - last_automated_data_interval: DataInterval(start=DateTime(2022, 11, 10, 0, 0, 0, tzinfo=Timezone('UTC')), end=DateTime(2022, 11, 11, 0, 0, 0, tzinfo=Timezone('UTC')))

I'm using Airflow 2.4.2.

What you think should happen instead

The value of the parameter should be None as specified in the docs.

How to reproduce

Should be reproducible by running the example given in the docs and logging the value of the parameter: last_automated_data_interval. Should appear in the logs.

Operating System

macOs Ventura

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

The problem occurs every time.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@mrn-aglic mrn-aglic added area:core kind:bug This is a clearly a bug labels Nov 14, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 14, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@uranusjr
Copy link
Member

The first ever run of a DAG is generally calculated when the DAG is detected and pushed into the system by the DAG parser, before a DAG run is ever created. You should be able to see these fields in the dag table are already set at this point: next_dagrun, next_dagrun_data_interval_start, and next_dagrun_data_interval_end. The log you see is the scheduler retriving values for the second run (and storing them in dag), after the first run is created from the aforementioned fields.

@mrn-aglic
Copy link
Author

@uranusjr ok, thanks for the clarification. This could be useful to add to the docs.
While on topic, can we see the logs for the first dag run calculation anywhere?

@uranusjr
Copy link
Member

can we see the logs for the first dag run calculation anywhere?

It kind of depends. I believe the DAG parser is a part of the scheduler process, so it should be somewhere in the scheduler logs. But if you add the DAG very early, it might be buried somewhere in Airflow startup and becomes unvisible (due to log config). Honestly there are many parts of how Airflow log things that’s unclear to me as well.

This could be useful to add to the docs.

An entry in under Concepts that goes through how a DAG run is created would likely be a good idea. Would you be interested in helping out with that?

@mrn-aglic
Copy link
Author

yeah sure, I could probably do it in the next couple of days.

@eladkal eladkal added good first issue kind:documentation and removed kind:bug This is a clearly a bug labels Jan 30, 2023
@vijayasarathib
Copy link
Contributor

@mrn-aglic and @eladkal - let me know if you want me to help here and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants