Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links to specific dagruns don't work when older than 25 runs #34723

Closed
1 of 2 tasks
hterik opened this issue Oct 3, 2023 · 13 comments · Fixed by #34887 or #37018
Closed
1 of 2 tasks

Links to specific dagruns don't work when older than 25 runs #34723

hterik opened this issue Oct 3, 2023 · 13 comments · Fixed by #34887 or #37018
Labels
area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:bug This is a clearly a bug priority:high High priority bug that should be patched quickly but does not require immediate new release
Milestone

Comments

@hterik
Copy link
Contributor

hterik commented Oct 3, 2023

Apache Airflow version

2.7.1

What happened

  1. Browse to AIRFLOW/dagrun/list/?_flt_3_dag_id=my_dag
  2. Click any link from the Run Id column, below the most recent 25 runs.
    It should take you to AIRFLOW/dags/mydag/graph?run_id=scheduled__2023-09-21T00%3A00%3A00%2B00%3A00&execution_date=2023-09-21+00%3A00%3A00%2B00%3A00 and it does so correctly
  3. The graph view opens but no run is selected.
    When clicking on any item in the graph, the LATEST run is selected in the grid. If the user is not careful to observe this, they are mislead to thinking the results they are browsing in the graph is the ones they selected in step 2.

As workaround, one can append &num_runs=365 to the url, this will take you to the correct run from step 2. But eventually you will exhaust 365 runs as well.

This problem isn't only applicable to the dagrun list. Any links to the /dags/ graph are affected. We have external systems linking to individual dagruns that now get unusable after 25 runs.

What you think should happen instead

When navigating to a run via link, it should always present the run that was provided in run_id querystring.

How to reproduce

See above

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else

This was not a problem in Airflow 2.5, regression in 2.7, probably introduced by new embedded graph view (which otherwise is great :))

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@hterik hterik added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Oct 3, 2023
@hussein-awala
Copy link
Member

I cannot reproduce with main:
image
image

Could you provide some screenshots for your problem?

@hussein-awala hussein-awala added Can't Reproduce The problem cannot be reproduced area:webserver Webserver related Issues and removed area:core labels Oct 3, 2023
@hterik
Copy link
Contributor Author

hterik commented Oct 3, 2023

Sure, hope this is enough
image

The url was given by clicking a run in the List dag runs page.
Something interesting to observe is that after clicking the url, it redirects 2 times, first adding the dag_run_id parameter (not run_id), which then quickly gets removed again.

@hussein-awala
Copy link
Member

hussein-awala commented Oct 3, 2023

This date is the max date for the X dag runs, when you choose 2023-01-01 as start date, and 25 as a number of runs, the UI will show you the previous 25 runs before 2023-01-01. To select a dag run, you need to click on the red/green bar in the grid view.

@hterik
Copy link
Contributor Author

hterik commented Oct 3, 2023

This was not the case in Airflow 2.5, i could click on links referring to any dagrun, no matter how old, and it would correctly select it in the grid-view.
The selection of date and number of runs shouldn't have to be provided manually if run_id is part of the url.

To clarify, i am browsing the dagrun list attached below and clicking an item in the picture 1 here, then it results in the picture 2 (same as above) where no run is selected:

image

image

@jscheffl
Copy link
Contributor

jscheffl commented Oct 3, 2023

I did a test with an example DAG, triggered this manually >25 times. As the GRID view shows 25 items per default all is fine if you click on one of the recent 25 items.

If you click on an item >25 runs, GRID view will not adjust the selection of the 25runs to include in the column section your run. In my setup it shows the grid view, but clicking on the "Details" tab it shows the DAG overview, not details of the (previously) selected DAG runs. So in total it is hard to navigate from a DAG search result to the GRID view t check for details. (Especially because in GRID view the search options are limited compared to the DAG search screen)

@jscheffl jscheffl added area:UI Related to UI/UX. For Frontend Developers. and removed Can't Reproduce The problem cannot be reproduced needs-triage label for new issues that we didn't triage yet labels Oct 3, 2023
@hterik
Copy link
Contributor Author

hterik commented Oct 5, 2023

I've done some debugging:

Problem 1.
Dag list is linking to the legacy /graph endpoint. This endpoint is only for backwards compatibility purposes and redirects to the new /grid endpoint.
This old function does actually calculate the base_date based on run_id, inside dt_nr_dr_data = get_date_time_num_runs_dag_runs_form_data(request, session, dag)
https://github.com/apache/airflow/blob/0c8e30e43b70e9d033e1686b327eb00aab82479c/airflow/www/views.py#L216-230
But then it forgets to forward the dt_nr_dr_data["base_date"] in the returned kwargs:
https://github.com/apache/airflow/blob/0c8e30e43b70e9d033e1686b327eb00aab82479c/airflow/www/views.py#L2964-2969
SolutionA: Forward base_date
SolutionB: Change dag list to use new url

Problem 2
The new grid endpoint, receives the run_id parameter correctly, after the redirection described above.
However, it then performs a Javascript fetch call to the grid_data API. In this API call, the run_id is NOT included.
The grid_data function that serves this request has not implemented handling of run_id parameter, in the same way that the old get_date_time_num_runs_dag_runs_form_data function did.
https://github.com/apache/airflow/blob/0c8e30e43b70e9d033e1686b327eb00aab82479c/airflow/www/views.py#L3475-3478
Solution C: Implement run_id handling in grid_data and forward the parameter from the frontend.

I think both solution A and C are must haves here. B is optional, more nice to have to get rid of legacy in the future.

@AlexandreGCastor
Copy link

Hello,

This was Ok in airflow 2.6.2. This possibility to get logs on the history (more than 25 run ago) is very important for us. It's way too hard to hack the url each time for us. We would probably rollback waiting for this issue.

Alexandre Gué,
Lead DevOps CastorDoc

@jscheffl jscheffl added the priority:high High priority bug that should be patched quickly but does not require immediate new release label Oct 8, 2023
@hterik
Copy link
Contributor Author

hterik commented Oct 30, 2023

Sorry for pinging but is there any way we can move this forward? I've already provided a fix in #34887, waiting for review.
IMO not being able to permalink to specific dagruns is quite severe bug.

@QuintenBruynseraede
Copy link

Also eagerly waiting for this fix. For the time being, is there a workaround to filter + navigate to a dagrun?

hterik added a commit to hterik/airflow that referenced this issue Nov 30, 2023
Previously, if user set dag_run_id parameter in the url, that
refers to a old run, which doesn't fit in the most recent 25 runs,
then the requested run will not be selected.

This change fixes this by setting the base_date to a time where
the run_id is known to exist if dag_run_id is provided as
an explicit query parameter.

closes: apache#34723
@jermaine151
Copy link

I found a workaround to this problem in v2.7.3. If you add &base_date=<same as execution date> to the URL, it will locate runs older than the latest 25. Here's an example link that is working for me:

<base_url>/dags/<dag_id>/grid?execution_date=2023-12-14+19%3A10%3A24.218280%2B00%3A00&base_date=2023-12-14+19%3A10%3A24.218280%2B00%3A00&tab=graph&dag_run_id=<run_id>

@QuintenBruynseraede
Copy link

QuintenBruynseraede commented Jan 2, 2024

Thanks @jermaine151, that seems to work indeed. It doesn't visually highlight a dagrun in the grid view (see Screenshot), but if you navigate to the Graph tab, then click on a Task it does lead you to the dagrun from the URL.

image

That last bit is important because in our case multiple runs are triggered on the same second, the gridview doesn't distinguish between those.

@jermaine151
Copy link

Thanks @jermaine151, that seems to work indeed. It doesn't visually highlight a dagrun in the grid view (see Screenshot), but if you navigate to the Graph tab, then click on a Task it does lead you to the dagrun from the URL.

image

That last bit is important because in our case multiple runs are triggered on the same second, the gridview doesn't distinguish between those.

Are you adding the &dag_run_id= to the URL? That should land you on the right page, with the run selected.

@potiuk
Copy link
Member

potiuk commented Jan 20, 2024

cc: @bbovenzi @pierrejeambrun - would be cool to take a look and fix that one as it seems to be an important feature for those who complain about loosing the old grid view (see #36884) and it seems a bit of an architectural decision on the grid view behaviour.

@potiuk potiuk added this to the Airflow 2.8.2 milestone Jan 20, 2024
hterik added a commit to hterik/airflow that referenced this issue Feb 20, 2024
Previously, if user set dag_run_id parameter in the url, that
refers to a old run, which doesn't fit in the most recent 25 runs,
then the requested run will not be selected.

This change fixes this by setting the base_date to a time where
the run_id is known to exist if dag_run_id is provided as
an explicit query parameter.

closes: apache#34723
bbovenzi pushed a commit that referenced this issue Feb 20, 2024
…4887)

Previously, if user set dag_run_id parameter in the url, that
refers to a old run, which doesn't fit in the most recent 25 runs,
then the requested run will not be selected.

This change fixes this by setting the base_date to a time where
the run_id is known to exist if dag_run_id is provided as
an explicit query parameter.

closes: #34723
sunank200 pushed a commit to astronomer/airflow that referenced this issue Feb 21, 2024
…ache#34887)

Previously, if user set dag_run_id parameter in the url, that
refers to a old run, which doesn't fit in the most recent 25 runs,
then the requested run will not be selected.

This change fixes this by setting the base_date to a time where
the run_id is known to exist if dag_run_id is provided as
an explicit query parameter.

closes: apache#34723
ephraimbuddy pushed a commit that referenced this issue Feb 22, 2024
…4887)

Previously, if user set dag_run_id parameter in the url, that
refers to a old run, which doesn't fit in the most recent 25 runs,
then the requested run will not be selected.

This change fixes this by setting the base_date to a time where
the run_id is known to exist if dag_run_id is provided as
an explicit query parameter.

closes: #34723
(cherry picked from commit a0ebabb)
abhishekbhakat pushed a commit to abhishekbhakat/my_airflow that referenced this issue Mar 5, 2024
…ache#34887)

Previously, if user set dag_run_id parameter in the url, that
refers to a old run, which doesn't fit in the most recent 25 runs,
then the requested run will not be selected.

This change fixes this by setting the base_date to a time where
the run_id is known to exist if dag_run_id is provided as
an explicit query parameter.

closes: apache#34723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:bug This is a clearly a bug priority:high High priority bug that should be patched quickly but does not require immediate new release
Projects
None yet
7 participants