Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The grid view shows incorrect DAG runs #26505

Closed
1 of 2 tasks
zachliu opened this issue Sep 19, 2022 · 21 comments
Closed
1 of 2 tasks

The grid view shows incorrect DAG runs #26505

zachliu opened this issue Sep 19, 2022 · 21 comments
Labels
affected_version:2.4 Issues Reported for 2.4 area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug
Milestone

Comments

@zachliu
Copy link
Contributor

zachliu commented Sep 19, 2022

Apache Airflow version

2.4.0

What happened

Airflow 2.4.0's grid view has an issue:

Grid view stop showing the latest DAG runs in 2.4.0 (regardless of schedule, it shows old dag runs)

  • case 1 (all 25 are old dag runs)
    2022-09-19_16-12

  • case 2 (old and new dag runs are mixed)
    2022-09-20_17-59
    2022-09-20_17-50

What you think should happen instead

No response

How to reproduce

  1. Go to a DAG with >365 dag runs
  2. The place where it's supposed to show the latest dag run in the "grid view" now shows an old dag run ~1 year ago

Operating System

Linux Mint 20.3 Una

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@zachliu zachliu added area:core kind:bug This is a clearly a bug labels Sep 19, 2022
@eladkal eladkal added area:UI Related to UI/UX. For Frontend Developers. affected_version:2.4 Issues Reported for 2.4 and removed area:core labels Sep 19, 2022
@zachliu
Copy link
Contributor Author

zachliu commented Sep 20, 2022

most likely due to #25633 and #25880
@uranusjr @bbovenzi any idea how to solve this? i'm not familiar with the "timetable" part 😅

@kxepal
Copy link
Member

kxepal commented Sep 21, 2022

Hi there! Follow up from dev ML.

We faced exactly the same issue after upgrade from 2.3.4 to 2.4.0 for all our dags. All of them has quite a history up to 2016 year, so a lot of dag runs. Somehow grid show for us latest runs at 2021.11 while other views (calendar, graph) shows all of them and the most recent ones.

The exact fix which solved this problem is:

--- airflow/www/views.py
+++ airflow/www/views.py
@@ -3453,8 +3460,9 @@ class Airflow(AirflowBaseView):
             if run_state:
                 query = query.filter(DagRun.state == run_state)
 
-            ordering = (DagRun.__table__.columns[name].desc() for name in dag.timetable.run_ordering)
-            dag_runs = query.order_by(*ordering, DagRun.id.desc()).limit(num_runs).all()
+            dag_runs = query.order_by(DagRun.execution_date.desc()).limit(num_runs).all()
             dag_runs.reverse()
 
             encoded_runs = [wwwutils.encode_dag_run(dr) for dr in dag_runs]

It feels like there is a problem between data and logic that operates it, but not sure where and what is it.

@ashb ashb added this to the Airflow 2.4.1 milestone Sep 21, 2022
@ashb
Copy link
Member

ashb commented Sep 21, 2022

@uranusjr Can you take a look at this ASAP please so we can get a fix for this out in 2.4.1?

@ashb
Copy link
Member

ashb commented Sep 21, 2022

@kxepal What is the timetable/schedule parameter for the DAG(s) with this problem?

@kxepal
Copy link
Member

kxepal commented Sep 21, 2022

@ashb we have good old timedelta schedule interval value everywhere. No timetables at all.

@zachliu
Copy link
Contributor Author

zachliu commented Sep 21, 2022

@kxepal doesn't the fix actually revert the core of #25633? i also did that because i "accidentally" deployed airflow 2.4.0 to our production 🤫

@kxepal
Copy link
Member

kxepal commented Sep 21, 2022

@zachliu That PR contains a lot of changes, but for us was enough to revert those changes that in the diff. The rest of them causes no troubles.

@zachliu
Copy link
Contributor Author

zachliu commented Sep 21, 2022

@kxepal taking a wild guess here, maybe a if can solve all: if timetables are used, do that; else, do this

@kxepal
Copy link
Member

kxepal commented Sep 21, 2022

@zachliu probably. We don't use timetables, so it was clear fix for us.

@zachliu
Copy link
Contributor Author

zachliu commented Sep 21, 2022

@kxepal same here, i'm waiting for the distant future "calendar"-like scheduling system 😉

@ashb
Copy link
Member

ashb commented Sep 21, 2022

Everything is converted to a timetable now under the hood (and had been since timetables were introduced) so it's not as simple as an if.

@zachliu
Copy link
Contributor Author

zachliu commented Sep 21, 2022

aww... wrong guess 😿 it was based on the fact that some old dag runs don't have the Data interval start/end parameters (see case 2 screenshots)

@kxepal
Copy link
Member

kxepal commented Sep 21, 2022

@ashb What information could be useful to debug this issue?

@ashb
Copy link
Member

ashb commented Sep 21, 2022

@kxepal Creating a minimal stand-alone reproduction case if you can ("here's a dag, run it/backfill it for X" sort of thing)

@zachliu
Copy link
Contributor Author

zachliu commented Sep 21, 2022

@kxepal Creating a minimal stand-alone reproduction case if you can ("here's a dag, run it/backfill it for X" sort of thing)

i tried that, running the same dag many times in a short period (backfilling) doesn't help, they will be ordered properly. i guess because they are all converted to timetables uniformly

it seems the issue only occurs on dags that are older than the introduction of timetable, which suggests that during the conversion, older dag runs lack attribute(s) for a proper sorting as i mentioned before, for older dag runs whose data_interval_end is NULL, the order_by(*ordering, DagRun.id.desc()) doesn't work

it's basically

ordering = (DagRun.__table__.columns[name].desc() for name in dag.timetable.run_ordering)
dag_runs = query.order_by(*ordering, DagRun.id.desc()).limit(num_runs).all()
SELECT dag_id,
       run_id,
       data_interval_end
FROM dag_run
WHERE dag_id = '<dag_name>'
ORDER BY data_interval_end DESC,
         execution_date DESC
LIMIT 25;

vs

dag_runs = query.order_by(DagRun.execution_date.desc()).limit(num_runs).all()
SELECT dag_id,
       run_id,
       data_interval_end
FROM dag_run
WHERE dag_id = '<dag_name>'
ORDER BY execution_date DESC
LIMIT 25;

@kxepal
Copy link
Member

kxepal commented Sep 21, 2022

Creating a minimal stand-alone reproduction case if you can ("here's a dag, run it/backfill it for X" sort of thing)

Oblivious, but it be quite a hard to provide. @zachliu feels like walk on this path with no luck. But it looks like the problem could be solved by a simple new migration script. But what the cases it should handle and how it should fix them?

@zachliu
Copy link
Contributor Author

zachliu commented Sep 21, 2022

can the migration script simply fills all the NULL data_interval_end with execution_date?

@kxepal
Copy link
Member

kxepal commented Sep 21, 2022

But why it already doesn't? It doesn't feels like a problem with current logic - it works fine, but somewhere in changes between two versions. Actually, running new dagruns with 2.4 without a fix doesn't change anything.

@uranusjr
Copy link
Member

Filling all data interval with execution date would be excruciatingly slow, which is why it’s not done. Doing some fancy coalesce for ordering is probably possible.

@uranusjr
Copy link
Member

I created a PR for this. Would be awesome if someone could test it!

@ashb
Copy link
Member

ashb commented Sep 26, 2022

Fixed by #26626

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.4 Issues Reported for 2.4 area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

5 participants