Skip to content

DAG disappears in airflow 3.0.6 #58404

@narenjngr

Description

@narenjngr

Apache Airflow version

Other Airflow 2/3 version (please specify below)

If "Other Airflow 2/3 version" selected, which one?

3.0.6

What happened?

After upgrade to airflow 3, system started experiencing random DAG disappearance.
The config for dag processor has this setup:

 -  name: AIRFLOW__DAG_PROCESSOR__BUNDLE_REFRESH_CHECK_INTERVAL
    value: "30"
  - name: AIRFLOW__DAG_PROCESSOR__REFRESH_INTERVAL
    value: "30"
  - name: AIRFLOW__CORE__STORE_SERIALIZED_DAGS
    value: "True"
  - name: AIRFLOW__DAG_PROCESSOR__DAG_FILE_PROCESSOR_TIMEOUT
    value: "600"
  - name: AIRFLOW__LOGGING__LOGGING_LEVEL
    value: "DEBUG"

Files found by aiflow dag -folder also reduce over the period of time.
Here is my dag processor values.yaml file section-

dagProcessor:
  enabled: true
  replicas: 1
  revisionHistoryLimit: ~
  command: ~
  args: ["bash", "-c", "exec airflow dag-processor"]


  strategy:
    rollingUpdate:
      maxSurge: "100%"
      maxUnavailable: "50%"

  livenessProbe:
    initialDelaySeconds: 120
    timeoutSeconds: 60
    failureThreshold: 10
    periodSeconds: 60
    command: ~

  serviceAccount:
    automountServiceAccountToken: true
    create: false
    name: "airflow"

    annotations: {}

  securityContext: {}

  securityContexts:
    pod: {}
    container: {}

  containerLifecycleHooks: {}

  resources:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 500Mi

  terminationGracePeriodSeconds: 60

  safeToEvict: true

  extraContainers: []
  extraInitContainers: []
  extraVolumes: []
  extraVolumeMounts: []

  # Select certain nodes for airflow dag processor pods.
  nodeSelector: {}
  affinity: {}
  tolerations: []
  topologySpreadConstraints: []

  priorityClassName: ~

  annotations: {}

  podAnnotations: {}

  logGroomerSidecar:
    enabled: true
    command: ~
    args: ["bash", "/clean-logs"]
    retentionDays: 15
    frequencyMinutes: 15
    resources: {}
    securityContexts:
      container: {}

    env: []

  waitForMigrations:
    enabled: true
    env: []
    securityContexts:
      container: {}

  env: 
    - name: AIRFLOW__DAG_PROCESSOR__BUNDLE_REFRESH_CHECK_INTERVAL
      value: "30"
    - name: AIRFLOW__DAG_PROCESSOR__REFRESH_INTERVAL
      value: "30"
    - name: AIRFLOW__CORE__STORE_SERIALIZED_DAGS
      value: "True"
    - name: AIRFLOW__DAG_PROCESSOR__DAG_FILE_PROCESSOR_TIMEOUT
      value: "600"
    - name: AIRFLOW__LOGGING__LOGGING_LEVEL
      value: "DEBUG"

Points of interest in log:
DAG processor logs started showing 0 DAGs from 1 in logs for dag PROJECT_DATABRICKS_STOP_JOBS suddenly as per attached log file clip. DAG is scheduled to run at every week day 7 PM local time.

What you think should happen instead?

DAGs shouldn't disappear unless I delete these DAGs from the mounted volume.

How to reproduce

Use same dag processor config and observer for few days. Intermittent issue.

Operating System

Linux

Versions of Apache Airflow Providers

eval_type_backport==0.2.2
apache-airflow-providers-databricks==6.4.0
apache-airflow-providers-mongo==4.2.1
apache-airflow-providers-git==0.0.2
apache-airflow-providers-standard==1.2.0
soda-core-spark-df==3.5.5
soda-core-spark[databricks]==3.5.5
soda-core-scientific==3.5.5
pymongo~=4.0.0
typing_extensions==4.13.2
paramiko<4
PyMuPDF~=1.26.5

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions