-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing Task Duration view in case of manual DAG runs only (#22015) #29195
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
@duke8585, thanks for contribution, however we also need tests for this changes, for more details see: |
@Taragolis thank you. to my knowledge the function lacked tests to begin with and i was wondering how to test this as it requires a backend db with DAG runs. but i see some helpful examples in the reference you provided :) i will see what we can do about that. |
Yeah it required db backend, if you run in venv I thought it would be enough to use SQLite, Airflow created it by default if no backend provided ( If you run tests inside of Breeze then you already have DB backend by default i think it is Postgres. And at least PyCharm/IDEA correctly detected all tests out of the box, so you could easily run it in it (just make sure you configure venv) BTW, @boring-cyborg also provide useful links for First time contributor as first message in this PR |
ef36348
to
cb8f523
Compare
hello everyone, |
hello again, |
Static checks are fine. Those arenot static checks that are failing but some other tests "hang" (that seems to happen quite often with multiple PRs during the last few hours and we are still tracing the cause of it |
684e4ec
to
4877b48
Compare
just did another manual local test and the UI works as i expect it to. |
b1eed5c
to
904027b
Compare
LGTM - anyone else :) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion for remove redundant part of code
Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.
With this new fix, the determination of the min_date is done on a set of execution_date values - via querying - limited by `num`and taking the last element of it. the fallback value if none are found is simply the base_date. this way, the invoked get_task_instances cannot default to 30 days prior to the current time. Furthermore, the query itself is no longer confined to non-manual runs, but includes all DAGRun types. More context: During writing the tests, it became evident that the current implementation of the get_task_instances_before method has two problems: 1) With the implementation of a limit + offset determination of the min_date, no results can occur: * Either due to less DAGRuns in the DB than `num` or * Because of the constraining to non-manual runs (see more below). Consequently, the fallback in the invoked get_task_instances is then a -30d from datetime.now(), so an entirely different time dimension than the one selected in the UI (execution_date). 2) With the explicit omission of manual runs in the query, in case of n manual runs and 1 scheduled one before the base_date (e.g. S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in the case of mainly triggered/manual DAGRuns, this leads to more results displayed than expected. in case of solely manual DAGRuns, the method will return no results and hence default to the above mentioned fallback, which operates on a different timeline. prior tp this fix, there was even a fallback to the start of the current epochal period, i.e. showing all DAGRuns in the DB. With the previously suggested non-breaking fix, it was not possible to avoid this scenario, with any scheduled run found, the fallback was never reached. The suggested changes may be considered breaking. the changes are confined to the scope of the get_task_instances_before method and usage is only in the views.py for the duration, tries and landing times, so where the changes meant to fix the existing behavior.
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
Closed/reopened to rebuild |
Awesome work, congrats on your first merged pull request! |
thanks everyone. i learned quite some things here :) |
…29195) * Fixing Task Duration view in case of manual DAG runs only (#22015) Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes. * Fixing get_task_instances_before method and adding a unit test With this new fix, the determination of the min_date is done on a set of execution_date values - via querying - limited by `num`and taking the last element of it. the fallback value if none are found is simply the base_date. this way, the invoked get_task_instances cannot default to 30 days prior to the current time. Furthermore, the query itself is no longer confined to non-manual runs, but includes all DAGRun types. More context: During writing the tests, it became evident that the current implementation of the get_task_instances_before method has two problems: 1) With the implementation of a limit + offset determination of the min_date, no results can occur: * Either due to less DAGRuns in the DB than `num` or * Because of the constraining to non-manual runs (see more below). Consequently, the fallback in the invoked get_task_instances is then a -30d from datetime.now(), so an entirely different time dimension than the one selected in the UI (execution_date). 2) With the explicit omission of manual runs in the query, in case of n manual runs and 1 scheduled one before the base_date (e.g. S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in the case of mainly triggered/manual DAGRuns, this leads to more results displayed than expected. in case of solely manual DAGRuns, the method will return no results and hence default to the above mentioned fallback, which operates on a different timeline. prior tp this fix, there was even a fallback to the start of the current epochal period, i.e. showing all DAGRuns in the DB. With the previously suggested non-breaking fix, it was not possible to avoid this scenario, with any scheduled run found, the fallback was never reached. The suggested changes may be considered breaking. the changes are confined to the scope of the get_task_instances_before method and usage is only in the views.py for the duration, tries and landing times, so where the changes meant to fix the existing behavior. * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> --------- Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> (cherry picked from commit 8b8552f)
…29195) * Fixing Task Duration view in case of manual DAG runs only (#22015) Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes. * Fixing get_task_instances_before method and adding a unit test With this new fix, the determination of the min_date is done on a set of execution_date values - via querying - limited by `num`and taking the last element of it. the fallback value if none are found is simply the base_date. this way, the invoked get_task_instances cannot default to 30 days prior to the current time. Furthermore, the query itself is no longer confined to non-manual runs, but includes all DAGRun types. More context: During writing the tests, it became evident that the current implementation of the get_task_instances_before method has two problems: 1) With the implementation of a limit + offset determination of the min_date, no results can occur: * Either due to less DAGRuns in the DB than `num` or * Because of the constraining to non-manual runs (see more below). Consequently, the fallback in the invoked get_task_instances is then a -30d from datetime.now(), so an entirely different time dimension than the one selected in the UI (execution_date). 2) With the explicit omission of manual runs in the query, in case of n manual runs and 1 scheduled one before the base_date (e.g. S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in the case of mainly triggered/manual DAGRuns, this leads to more results displayed than expected. in case of solely manual DAGRuns, the method will return no results and hence default to the above mentioned fallback, which operates on a different timeline. prior tp this fix, there was even a fallback to the start of the current epochal period, i.e. showing all DAGRuns in the DB. With the previously suggested non-breaking fix, it was not possible to avoid this scenario, with any scheduled run found, the fallback was never reached. The suggested changes may be considered breaking. the changes are confined to the scope of the get_task_instances_before method and usage is only in the views.py for the duration, tries and landing times, so where the changes meant to fix the existing behavior. * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> * Update airflow/models/dag.py Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> --------- Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> (cherry picked from commit 8b8552f)
Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by
nums
in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.closes: #22015