Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We have lost the previous tasks in the WebServer #31179

Closed
1 of 2 tasks
PhoenixChamb opened this issue May 10, 2023 · 7 comments
Closed
1 of 2 tasks

We have lost the previous tasks in the WebServer #31179

PhoenixChamb opened this issue May 10, 2023 · 7 comments
Labels
affected_version:2.6 Issues Reported for 2.6 area:core area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues duplicate Issue that is duplicated kind:bug This is a clearly a bug
Milestone

Comments

@PhoenixChamb
Copy link

PhoenixChamb commented May 10, 2023

Apache Airflow version

2.6.0

What happened

After upgrading Airflow to 2.6.0 from 2.5.1

We are losing the previous tasks on the Airflow web server GRID . These tasks are executed correctly however they are not displayed.
image

The behavior is curious, since the previous tasks are finished. An example is that if I mark the last one as Failed. The Grid is updated and shows me the tasks again.

image

Also if I select this option. You can see that the tasks are found, but they are never shown.

image

But finally I do not do a clear or anything, mysteriously the tasks in the Grid are loaded correctly (sometimes are loaded with the correct state "sucessfull" other time as in the image without any state)

image

As you can see in the image, all the task are 'found' but not loaded on the grid. Even all the previous task are finished succesfully.

Also if I check LOGS for previous NONE status task, logs are shown.

image

We are also seeing other behavior. Since we 'lose' the previous tasks, the DagRuns never finish and this results in the next DagRuns never being executed. That is, our Dummy task (Start) is not launched because since we do not have the previous tasks, we do not have their states.


Other example.

This is the GRID from a DAG.

image

Also If I check for previous dags, this is what is shown.

image

If I mark as success all 'start' task, suddenly GRID shows previous task, in this case with it's correct state.

image

What you think should happen instead

All the task should be displayed correctly with their correct state

How to reproduce

We don't know, we have days where all the DagRuns are with their correct tasks and status. Other cases where having the DAG correctly starts to go wrong and we end up losing the tasks. And other cases where nothing is affected.

Operating System

"Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux"

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.0.0
apache-airflow-providers-celery==3.1.0
apache-airflow-providers-cncf-kubernetes==6.1.0
apache-airflow-providers-common-sql==1.4.0
apache-airflow-providers-docker==3.6.0
apache-airflow-providers-elasticsearch==4.4.0
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-google==10.0.0
apache-airflow-providers-grpc==3.1.0
apache-airflow-providers-hashicorp==3.3.1
apache-airflow-providers-http==4.3.0
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-microsoft-azure==6.0.0
apache-airflow-providers-mysql==5.0.0
apache-airflow-providers-odbc==3.2.1
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-redis==3.1.0
apache-airflow-providers-sendgrid==3.1.0
apache-airflow-providers-sftp==4.2.4
apache-airflow-providers-slack==7.2.0
apache-airflow-providers-snowflake==4.0.5
apache-airflow-providers-sqlite==3.3.2
apache-airflow-providers-ssh==3.6.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

After upgrading from 2.5.1 to 2.6.0

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@PhoenixChamb PhoenixChamb added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels May 10, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 10, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@PhoenixChamb
Copy link
Author

We have lowered the airflow version to 2.5.3 and it has worked correctly again.

@potiuk potiuk removed the needs-triage label for new issues that we didn't triage yet label May 12, 2023
@potiuk potiuk added this to the Airflow 2.6.1 milestone May 12, 2023
@bbovenzi
Copy link
Contributor

bbovenzi commented May 12, 2023

Could you provide a few more details to help us figure out the root cause?

  • Does this happen with all of your dags or just specific ones?
  • Are there any console or network logs?
  • Are there any task instances at all in the return of the grid_data endpoint?

@bbovenzi bbovenzi added area:webserver Webserver related Issues area:UI Related to UI/UX. For Frontend Developers. labels May 12, 2023
@eladkal eladkal added the affected_version:2.6 Issues Reported for 2.6 label May 12, 2023
@PhoenixChamb
Copy link
Author

PhoenixChamb commented May 16, 2023

@bbovenzi

  • Only in specific ones, not all the DAGS.
  • We have not found any error both in the web server and in the scheduler, which indicates a failure.
  • Generally once we press the clear, past, downstream button. And finally we do not give. These tasks appear again, without status. In addition, usually the only tasks that we observe are those of the last DagRun.

@eladkal eladkal removed this from the Airflow 2.6.2 milestone Jun 11, 2023
@vDMG
Copy link
Contributor

vDMG commented Jul 26, 2023

Hello, we have the exact same issue on v2.6.3. Happened like 2 days after upgrading from 2.6.2 to 2.6.3.
For some dags, not all, previous tasks status are not displayed and it is impossible to clear a dag run. Only marking as success then clearing them allow us to re-run those dag runs where tasks disappeared.
BTW, they also disappeared from metadata database task_instance table.

EDIT : Actually this is true not for all previous tasks, only recent ones. In my case, all dag run starting from scheduled__2023-07-25T00:00:00+00:00 and the moment of when everything has disappeared on the UI (2023-07-26T:14:30 UTC)

image

image

@vDMG
Copy link
Contributor

vDMG commented Aug 3, 2023

Hi there,

We have been able to reproduce the incident and have determined that it is related to the use of a feature that allows users to delete multiple DAG runs at once with a selection. When this feature is used, it will delete the DAG run IDs for the selected DAGs, as well as the DAG run IDs for other DAGs 💀 and associated task instances.

So we need to investigate action_muldelete on this view and why it can delete other DAGs run IDs.

image

@potiuk
Copy link
Member

potiuk commented Aug 3, 2023

From slack discussion https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1691057005737649
Looks like duplicate of #32684 and scheduled to be fixed in 2.7.0

Marking it as duplicate and closing for now - we can always re-open if it turns to be red-herring.

@potiuk potiuk added the duplicate Issue that is duplicated label Aug 3, 2023
@potiuk potiuk added this to the Airflow 2.7.0 milestone Aug 3, 2023
@potiuk potiuk closed this as completed Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.6 Issues Reported for 2.6 area:core area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues duplicate Issue that is duplicated kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

6 participants