Skip to content

DAGs deleted from zips aren't deactivated #30600

@luos-fc

Description

@luos-fc

Apache Airflow version

2.5.3

What happened

When a DAG is removed from a zip in the DAGs directory, but the zip file remains, it is not marked correctly as inactive. It is still visible in the UI, and attempting to open the DAG results in an DAG "mydag" seems to be missing from DagBag. error in the UI.

The DAG is removed from the SerializedDag table, resulting in the scheduler repeatedly erroring with [2023-04-12T12:43:51.165+0000] {scheduler_job.py:1063} ERROR - DAG 'mydag' not found in serialized_dag table.

I have done some minor investigating and it appears that this piece of code may be the cause.

dag_filelocs provides the path to a specific python file within a zip, so SerializedDagModel.remove_deleted_dags is able to remove the missing DAG.

However, self._file_paths only contains the top-level zip name, so DagModel.deactivate_deleted_dags will only deactivate DAGs where the zip they are contained in is deleted, regardless of whether the DAG is still inside the zip.

I can see there are other methods that handle DAG deactivation and I'm not sure how these all interact but this does seem to cause this specific issue.

What you think should happen instead

DAGS that are no longer in the DagBag are marked as inactive

How to reproduce

Running airflow locally with docker-compose:

  • Create a zipfile with 2 DAG py files in in ./dags
  • Wait for the DAGs to be parsed by the scheduler and appear in the UI
  • Overwrite the existing DAG zip, with a new zip containing only 1 of the original DAG py files
  • Wait for scheduler loop to parse the new zip
  • Attempt to open the removed DAG in the UI, you will see an error

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

If I replace the docker image in the docker compose with an image built from this Dockerfile:

FROM apache/airflow:2.5.3
RUN sed -i '772s/self._file_paths/dag_filelocs/' /home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/manager.py
RUN sed -i '3351s/correct_maybe_zipped(dag_model.fileloc)/dag_model.fileloc/' /home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py

The DAG is deactivated as expected

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions