Fixing Task Duration view in case of manual DAG runs only (#22015) #29195

duke8585 · 2023-01-27T15:56:16Z

Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by nums in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.

closes: #22015

boring-cyborg · 2023-01-27T15:56:20Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

Taragolis · 2023-01-27T16:09:47Z

@duke8585, thanks for contribution, however we also need tests for this changes, for more details see:

duke8585 · 2023-01-28T09:09:02Z

@duke8585, thanks for contribution, however we also need tests for this changes, for more details see:

TESTING

https://github.com/apache/airflow/blob/main/tests/models/test_dag.py

@Taragolis thank you. to my knowledge the function lacked tests to begin with and i was wondering how to test this as it requires a backend db with DAG runs. but i see some helpful examples in the reference you provided :)

i will see what we can do about that.

Taragolis · 2023-01-28T11:59:30Z

to my knowledge the function lacked tests to begin with and i was wondering how to test this as it requires a backend db with DAG runs

Yeah it required db backend, if you run in venv I thought it would be enough to use SQLite, Airflow created it by default if no backend provided (${AIRFLOW_HOME}/airflow.db or ${AIRFLOW_HOME}/unittests.db), if test DB not initialised you could run pytest with flag --with-db-init

If you run tests inside of Breeze then you already have DB backend by default i think it is Postgres.

And at least PyCharm/IDEA correctly detected all tests out of the box, so you could easily run it in it (just make sure you configure venv)

BTW, @boring-cyborg also provide useful links for First time contributor as first message in this PR

duke8585 · 2023-02-06T09:27:49Z

hello everyone,
i do not understand why the actions fail at the moment. is this s related to my changes? how to continue? thanks :)

duke8585 · 2023-02-10T10:02:18Z

hello again,
running the breeze static-checks --all-files --show-diff-on-failure --color always locally works for me. i do not get where the error occurs. help is greatly appreciated :)

potiuk · 2023-02-10T18:53:59Z

Static checks are fine. Those arenot static checks that are failing but some other tests "hang" (that seems to happen quite often with multiple PRs during the last few hours and we are still tracing the cause of it

duke8585 · 2023-02-13T13:05:02Z

just did another manual local test and the UI works as i expect it to.

potiuk · 2023-02-20T19:59:14Z

LGTM - anyone else :) ?

Taragolis

Minor suggestion for remove redundant part of code

airflow/models/dag.py

Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.

With this new fix, the determination of the min_date is done on a set of execution_date values - via querying - limited by `num`and taking the last element of it. the fallback value if none are found is simply the base_date. this way, the invoked get_task_instances cannot default to 30 days prior to the current time. Furthermore, the query itself is no longer confined to non-manual runs, but includes all DAGRun types. More context: During writing the tests, it became evident that the current implementation of the get_task_instances_before method has two problems: 1) With the implementation of a limit + offset determination of the min_date, no results can occur: * Either due to less DAGRuns in the DB than `num` or * Because of the constraining to non-manual runs (see more below). Consequently, the fallback in the invoked get_task_instances is then a -30d from datetime.now(), so an entirely different time dimension than the one selected in the UI (execution_date). 2) With the explicit omission of manual runs in the query, in case of n manual runs and 1 scheduled one before the base_date (e.g. S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in the case of mainly triggered/manual DAGRuns, this leads to more results displayed than expected. in case of solely manual DAGRuns, the method will return no results and hence default to the above mentioned fallback, which operates on a different timeline. prior tp this fix, there was even a fallback to the start of the current epochal period, i.e. showing all DAGRuns in the DB. With the previously suggested non-breaking fix, it was not possible to avoid this scenario, with any scheduled run found, the fallback was never reached. The suggested changes may be considered breaking. the changes are confined to the scope of the get_task_instances_before method and usage is only in the views.py for the duration, tries and landing times, so where the changes meant to fix the existing behavior.

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>

potiuk · 2023-02-25T13:57:13Z

Closed/reopened to rebuild

boring-cyborg · 2023-02-25T21:50:40Z

Awesome work, congrats on your first merged pull request!

duke8585 · 2023-02-26T07:49:21Z

thanks everyone. i learned quite some things here :)

…29195) * Fixing Task Duration view in case of manual DAG runs only (#22015) Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes. * Fixing get_task_instances_before method and adding a unit test With this new fix, the determination of the min_date is done on a set of execution_date values - via querying - limited by `num`and taking the last element of it. the fallback value if none are found is simply the base_date. this way, the invoked get_task_instances cannot default to 30 days prior to the current time. Furthermore, the query itself is no longer confined to non-manual runs, but includes all DAGRun types. More context: During writing the tests, it became evident that the current implementation of the get_task_instances_before method has two problems: 1) With the implementation of a limit + offset determination of the min_date, no results can occur: * Either due to less DAGRuns in the DB than `num` or * Because of the constraining to non-manual runs (see more below). Consequently, the fallback in the invoked get_task_instances is then a -30d from datetime.now(), so an entirely different time dimension than the one selected in the UI (execution_date). 2) With the explicit omission of manual runs in the query, in case of n manual runs and 1 scheduled one before the base_date (e.g. S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in the case of mainly triggered/manual DAGRuns, this leads to more results displayed than expected. in case of solely manual DAGRuns, the method will return no results and hence default to the above mentioned fallback, which operates on a different timeline. prior tp this fix, there was even a fallback to the start of the current epochal period, i.e. showing all DAGRuns in the DB. With the previously suggested non-breaking fix, it was not possible to avoid this scenario, with any scheduled run found, the fallback was never reached. The suggested changes may be considered breaking. the changes are confined to the scope of the get_task_instances_before method and usage is only in the views.py for the duration, tries and landing times, so where the changes meant to fix the existing behavior. * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> --------- Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> (cherry picked from commit 8b8552f)

duke8585 requested review from kaxil, XD-DENG and ashb as code owners January 27, 2023 15:56

duke8585 force-pushed the main branch 3 times, most recently from ef36348 to cb8f523 Compare February 6, 2023 06:27

duke8585 force-pushed the main branch from cb8f523 to 2805045 Compare February 9, 2023 14:17

duke8585 force-pushed the main branch 2 times, most recently from 684e4ec to 4877b48 Compare February 12, 2023 20:56

duke8585 force-pushed the main branch 2 times, most recently from b1eed5c to 904027b Compare February 16, 2023 07:31

potiuk approved these changes Feb 20, 2023

View reviewed changes

Taragolis reviewed Feb 20, 2023

View reviewed changes

airflow/models/dag.py Outdated Show resolved Hide resolved

airflow/models/dag.py Outdated Show resolved Hide resolved

airflow/models/dag.py Outdated Show resolved Hide resolved

duke8585 and others added 5 commits February 22, 2023 15:46

Update airflow/models/dag.py

62ed07c

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>

Update airflow/models/dag.py

a4dc399

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>

Update airflow/models/dag.py

4603ffd

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>

duke8585 force-pushed the main branch from 33befc5 to 4603ffd Compare February 22, 2023 14:46

potiuk closed this Feb 25, 2023

potiuk reopened this Feb 25, 2023

potiuk merged commit 8b8552f into apache:main Feb 25, 2023

eladkal added this to the Airflow 2.5.2 milestone Feb 26, 2023

pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Feb 27, 2023

pierrejeambrun mentioned this pull request Mar 10, 2023

Status of testing of Apache Airflow 2.5.2rc2 #30028

Closed

64 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Task Duration view in case of manual DAG runs only (#22015) #29195

Fixing Task Duration view in case of manual DAG runs only (#22015) #29195

duke8585 commented Jan 27, 2023

boring-cyborg bot commented Jan 27, 2023

Taragolis commented Jan 27, 2023

duke8585 commented Jan 28, 2023 •

edited

Loading

Taragolis commented Jan 28, 2023 •

edited

Loading

duke8585 commented Feb 6, 2023

duke8585 commented Feb 10, 2023

potiuk commented Feb 10, 2023

duke8585 commented Feb 13, 2023

potiuk commented Feb 20, 2023

Taragolis left a comment

potiuk commented Feb 25, 2023

boring-cyborg bot commented Feb 25, 2023

duke8585 commented Feb 26, 2023

Fixing Task Duration view in case of manual DAG runs only (#22015) #29195

Fixing Task Duration view in case of manual DAG runs only (#22015) #29195

Conversation

duke8585 commented Jan 27, 2023

boring-cyborg bot commented Jan 27, 2023

Taragolis commented Jan 27, 2023

duke8585 commented Jan 28, 2023 • edited Loading

Taragolis commented Jan 28, 2023 • edited Loading

duke8585 commented Feb 6, 2023

duke8585 commented Feb 10, 2023

potiuk commented Feb 10, 2023

duke8585 commented Feb 13, 2023

potiuk commented Feb 20, 2023

Taragolis left a comment

Choose a reason for hiding this comment

potiuk commented Feb 25, 2023

boring-cyborg bot commented Feb 25, 2023

duke8585 commented Feb 26, 2023

duke8585 commented Jan 28, 2023 •

edited

Loading

Taragolis commented Jan 28, 2023 •

edited

Loading