Skip to content

DagFileProcessorManager - ERROR - new DAG tasks are delayed in airflow schedule #21166

@ckvtvm-arch

Description

@ckvtvm-arch

Apache Airflow version

2.2.3 (latest released)

What happened

new DAG tasks are delayed in scheduling. When a new task is added, it takes atleast 10 hours to schedule in airflow.

/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The hide_sensitive_variable_fields option in [admin] has been moved to the hide_sensitive_var_conn_fields option in [core] - the old setting has been used, but please update your config.
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The default_queue option in [celery] has been moved to the default_queue option in [operators] - the old setting has been used, but please update your config.
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The default_queue option in [celery] has been moved to the default_queue option in [operators] - the old setting has been used, but please update your config.
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:357 DeprecationWarning: The dag_concurrency option in [core] has been renamed to max_active_tasks_per_dag - the old setting has been used, but please update your config.
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The processor_poll_interval option in [scheduler] has been renamed to scheduler_idle_sleep_time - the old setting has been used, but please update your config.


____ |( )_______ / /________ __
____ /| |_ /__ / / __ / __ _ | /| / /
___ ___ | / _ / _ / _ / / // / |/ |/ /
// |// // // // _
/____/|__/
[�[34m2022-01-26 19:44:29,624�[0m] {�[34mscheduler_job.py:�[0m596} INFO�[0m - Starting the scheduler�[0m
[�[34m2022-01-26 19:44:29,625�[0m] {�[34mscheduler_job.py:�[0m601} INFO�[0m - Processing each file at most -1 times�[0m
[�[34m2022-01-26 19:44:29,743�[0m] {�[34mmanager.py:�[0m163} INFO�[0m - Launched DagFileProcessorManager with pid: 21�[0m
[�[34m2022-01-26 19:44:29,747�[0m] {�[34mscheduler_job.py:�[0m1114} INFO�[0m - Resetting orphaned tasks for active dag runs�[0m
[�[34m2022-01-26 19:44:29,749�[0m] {�[34msettings.py:�[0m52} INFO�[0m - Configured default timezone Timezone('America/New_York')�[0m
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The hide_sensitive_variable_fields option in [admin] has been moved to the hide_sensitive_var_conn_fields option in [core] - the old setting has been used, but please update your config.
[�[34m2022-01-26 19:46:30,004�[0m] {�[34mmanager.py:�[0m334} ERROR�[0m - DagFileProcessorManager (PID=21) last sent a heartbeat 120.26 seconds ago! Restarting it�[0m
[�[34m2022-01-26 19:46:30,005�[0m] {�[34mprocess_utils.py:�[0m124} INFO�[0m - Sending Signals.SIGTERM to group 21. PIDs of all processes in the group: [21]�[0m
[�[34m2022-01-26 19:46:30,005�[0m] {�[34mprocess_utils.py:�[0m75} INFO�[0m - Sending the signal Signals.SIGTERM to group 21�[0m
[�[34m2022-01-26 19:46:30,137�[0m] {�[34mprocess_utils.py:�[0m70} INFO�[0m - Process psutil.Process(pid=21, status='terminated', exitcode=0, started='19:44:29') (21) terminated with exit code 0�[0m
[�[34m2022-01-26 19:46:30,141�[0m] {�[34mmanager.py:�[0m163} INFO�[0m - Launched DagFileProcessorManager with pid: 332�[0m
[�[34m2022-01-26 19:46:30,148�[0m] {�[34msettings.py:�[0m52} INFO�[0m - Configured default timezone Timezone('America/New_York')�[0m
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The hide_sensitive_variable_fields option in [admin] has been moved to the hide_sensitive_var_conn_fields option in [core] - the old setting has been used, but please update your config.
[�[34m2022-01-26 19:48:31,121�[0m] {�[34mmanager.py:�[0m334} ERROR�[0m - DagFileProcessorManager (PID=332) last sent a heartbeat 120.98 seconds ago! Restarting it�[0m
[�[34m2022-01-26 19:48:31,122�[0m] {�[34mprocess_utils.py:�[0m124} INFO�[0m - Sending Signals.SIGTERM to group 332. PIDs of all processes in the group: [332]�[0m
[�[34m2022-01-26 19:48:31,122�[0m] {�[34mprocess_utils.py:�[0m75} INFO�[0m - Sending the signal Signals.SIGTERM to group 332�[0m
[�[34m2022-01-26 19:48:31,294�[0m] {�[34mprocess_utils.py:�[0m70} INFO�[0m - Process psutil.Process(pid=332, status='terminated', exitcode=0, started='19:46:29') (332) terminated with exit code 0�[0m
[�[34m2022-01-26 19:48:31,298�[0m] {�[34mmanager.py:�[0m163} INFO�[0m - Launched DagFileProcessorManager with pid: 479�[0m
[�[34m2022-01-26 19:48:31,305�[0m] {�[34msettings.py:�[0m52} INFO�[0m - Configured default timezone Timezone('America/New_York')�[0m
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The hide_sensitive_variable_fields option in [admin] has been moved to the hide_sensitive_var_conn_fields option in [core] - the old setting has been used, but please update your config.
[�[34m2022-01-26 19:49:29,947�[0m] {�[34mscheduler_job.py:�[0m1114} INFO�[0m - Resetting orphaned tasks for active dag runs�[0m
[�[34m2022-01-26 19:50:32,119�[0m] {�[34mmanager.py:�[0m334} ERROR�[0m - DagFileProcessorManager (PID=479) last sent a heartbeat 120.82 seconds ago! Restarting it�[0m
[�[34m2022-01-26 19:50:32,119�[0m] {�[34mprocess_utils.py:�[0m124} INFO�[0m - Sending Signals.SIGTERM to group 479. PIDs of all processes in the group: [479]�[0m
[�[34m2022-01-26 19:50:32,119�[0m] {�[34mprocess_utils.py:�[0m75} INFO�[0m - Sending the signal Signals.SIGTERM to group 479�[0m
[�[34m2022-01-26 19:50:32,292�[0m] {�[34mprocess_utils.py:�[0m70} INFO�[0m - Process psutil.Process(pid=479, status='terminated', exitcode=0, started='19:48:31') (479) terminated with exit code 0�[0m
[�[34m2022-01-26 19:50:32,296�[0m] {�[34mmanager.py:�[0m163} INFO�[0m - Launched DagFileProcessorManager with pid: 752�[0m
[�[34m2022-01-26 19:50:32,302�[0m] {�[34msettings.py:�[0m52} INFO�[0m - Configured default timezone Timezone('America/New_York')�[0m
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The hide_sensitive_variable_fields option in [admin] has been moved to the hide_sensitive_var_conn_fields option in [core] - the old setting has been used, but please update your config.
[�[34m2022-01-26 19:52:32,501�[0m] {�[34mmanager.py:�[0m334} ERROR�[0m - DagFileProcessorManager (PID=752) last sent a heartbeat 120.21 seconds ago! Restarting it�[0m
[�[34m2022-01-26 19:52:32,501�[0m] {�[34mprocess_utils.py:�[0m124} INFO�[0m - Sending Signals.SIGTERM to group 752. PIDs of all processes in the group: [752]�[0m
[�[34m2022-01-26 19:52:32,501�[0m] {�[34mprocess_utils.py:�[0m75} INFO�[0m - Sending the signal Signals.SIGTERM to group 752�[0m
[�[34m2022-01-26 19:52:32,634�[0m] {�[34mprocess_utils.py:�[0m70} INFO�[0m - Process psutil.Process(pid=752, status='terminated', exitcode=0, started='19:50:32') (752) terminated with exit code 0�[0m
[�[34m2022-01-26 19:52:32,638�[0m] {�[34mmanager.py:�[0m163} INFO�[0m - Launched DagFileProcessorManager with pid: 1027�[0m
[�[34m2022-01-26 19:52:32,644�[0m] {�[34msettings.py:�[0m52} INFO�[0m - Configured default timezone Timezone('America/New_York')�[0m
/usr/local/lib/python3.6/site-packages/airflow/configuration.py:361 DeprecationWarning: The hide_sensitive_variable_fields option in [admin] has been moved to the hide_sensitive_var_conn_fields option in [core] - the old setting has been used, but please update your config.
[�[34m2022-01-26 19:54:30,101�[0m] {�[34mscheduler_job.py:�[0m1114} INFO�[0m - Resetting orphaned tasks for active dag runs�[0m
[�[34m2022-01-26 19:54:33,349�[0m] {�[34mmanager.py:�[0m334} ERROR�[0m - DagFileProcessorManager (PID=1027) last sent a heartbeat 120.71 seconds ago! Restarting it�[0m
[�[34m2022-01-26 19:54:33,349�[0m] {�[34mprocess_utils.py:�[0m124} INFO�[0m - Sending Signals.SIGTERM to group 1027. PIDs of all processes in the group: [1027]�[0m
[�[34m2022-01-26 19:54:33,349�[0m] {�[34mprocess_utils.py:�[0m75} INFO�[0m - Sending the signal Signals.SIGTERM to group 1027�[0m
[�[34m2022-01-26 19:54:33,482�[0m] {�[34mprocess_utils.py:�[0m70} INFO�[0m - Process psutil.Process(pid=1027, status='terminated', exitcode=0, started='19:52:32') (1027) terminated with exit code 0�[0m

What you expected to happen

new DAG tasks is supposed to be visible in airflow UI within few minutes of deployment

How to reproduce

please add new DAG tasks to the existing airflow pods in any infra.

Operating System

Red hat Openshift Linux

Versions of Apache Airflow Providers

2.2.3

Deployment

Other Docker-based deployment

Deployment details

kind: Pod
apiVersion: v1
metadata:
generateName: airflow-schcl-12-
annotations:
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "",
"interface": "eth0",
"ips": [
"10.130.27.188"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "",
"interface": "eth0",
"ips": [
"10.130.27.188"
],
"default": true,
"dns": {}
}]
openshift.io/deployment-config.latest-version: '12'
openshift.io/deployment-config.name: airflow-schcl
openshift.io/deployment.name: airflow-schcl-12
openshift.io/scc: restricted
selfLink: /api/v1/namespaces/gbditenc-prod/pods/airflow-schcl-12-snf9j
resourceVersion: '2923029853'
name: airflow-schcl-12-snf9j
uid: f0d2dd43-9bbe-4d1e-8c42-e52d9df03b3a
creationTimestamp: '2022-01-26T19:44:15Z'
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: '2022-01-26T19:44:15Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:openshift.io/deployment-config.latest-version': {}
'f:openshift.io/deployment-config.name': {}
'f:openshift.io/deployment.name': {}
'f:generateName': {}
'f:labels':
.: {}
'f:app': {}
'f:deployment': {}
'f:deploymentconfig': {}
'f:tier': {}
'f:ownerReferences':
.: {}
'k:{"uid":"7e449a8b-6eb8-4e67-8891-c8e245de92ca"}':
.: {}
'f:apiVersion': {}
'f:blockOwnerDeletion': {}
'f:controller': {}
'f:kind': {}
'f:name': {}
'f:uid': {}
'f:spec':
'f:containers':
'k:{"name":"airflow-schcl"}':
'f:image': {}
'f:volumeMounts':
.: {}
'k:{"mountPath":"/usr/local/airflow/mnt"}':
.: {}
'f:mountPath': {}
'f:name': {}
'k:{"mountPath":"/usr/local/airflow/secret"}':
.: {}
'f:mountPath': {}
'f:name': {}
'f:terminationMessagePolicy': {}
.: {}
'f:resources':
.: {}
'f:limits':
.: {}
'f:cpu': {}
'f:memory': {}
'f:requests':
.: {}
'f:cpu': {}
'f:memory': {}
'f:args': {}
'f:env':
'k:{"name":"POSTGRES_HOST"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__SMTP__SMTP_PORT"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__LOGGING__BASE_LOG_FOLDER"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AUTH_TYPE"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"POSTGRES_USER"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"REDIS_PASSWORD"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__SMTP__SMTP_HOST"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"REDIS_PORT"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__SMTP__SMTP_MAIL_FROM"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__CORE__PARALLELISM"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"APP_DL_GROUP"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__CORE__DAG_CONCURRENCY"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"REQUESTS_CA_BUNDLE"}':
.: {}
'f:name': {}
'f:value': {}
.: {}
'k:{"name":"FERNET_KEY"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"REDIS_HOST"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__CORE__DAGS_FOLDER"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"SRC_ACCT"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"POSTGRES_PASSWORD"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"POSTGRES_DB"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"EXECUTOR"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"POSTGRES_PORT"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__SCHEDULER__PARSING_PROCESSES"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"AIRFLOW__CELERY__WORKER_CONCURRENCY"}':
.: {}
'f:name': {}
'f:value': {}
'k:{"name":"LOAD_EX"}':
.: {}
'f:name': {}
'f:value': {}
'f:terminationMessagePath': {}
'f:imagePullPolicy': {}
'f:name': {}
'f:dnsPolicy': {}
'f:enableServiceLinks': {}
'f:restartPolicy': {}
'f:schedulerName': {}
'f:securityContext':
.: {}
'f:supplementalGroups': {}
'f:terminationGracePeriodSeconds': {}
'f:volumes':
.: {}
'k:{"name":"airflow"}':
.: {}
'f:name': {}
'f:persistentVolumeClaim':
.: {}
'f:claimName': {}
'k:{"name":"secret-volume"}':
.: {}
'f:name': {}
'f:secret':
.: {}
'f:defaultMode': {}
'f:secretName': {}
- manager: multus
operation: Update
apiVersion: v1
time: '2022-01-26T19:44:17Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
'f:k8s.v1.cni.cncf.io/network-status': {}
'f:k8s.v1.cni.cncf.io/networks-status': {}
- manager: kubelet
operation: Update
apiVersion: v1
time: '2022-01-26T19:44:18Z'
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:conditions':
'k:{"type":"ContainersReady"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:status': {}
'f:type': {}
'k:{"type":"Initialized"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:status': {}
'f:type': {}
'k:{"type":"Ready"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:status': {}
'f:type': {}
'f:containerStatuses': {}
'f:hostIP': {}
'f:phase': {}
'f:podIP': {}
'f:podIPs':
.: {}
'k:{"ip":"10.130.27.188"}':
.: {}
'f:ip': {}
'f:startTime': {}
namespace: gbditenc-prod
ownerReferences:
- apiVersion: v1
kind: ReplicationController
name: airflow-schcl-12
uid: 7e449a8-6eb8-4e67-8891-c8e245de92ca
controller: true
blockOwnerDeletion: true
labels:
app: airflowcelery
deployment: airflow-schcl-12
deploymentconfig: airflow-schcl
tier: scheduler
spec:
restartPolicy: Always
serviceAccountName: default
imagePullSecrets:
- name: default-dockercfg-pg5dj
- name: quay-gbdit
priority: 0
schedulerName: default-scheduler
enableServiceLinks: true
terminationGracePeriodSeconds: 30
preemptionPolicy: PreemptLowerPriority
nodeName: va10plvocp055.ent-ocp4-prod-ric.antmdc.internal.das
securityContext:
seLinuxOptions:
level: 's0:c35,c30'
supplementalGroups:
- 65534
fsGroup: 1001250000
containers:
- resources:
limits:
cpu: '1'
memory: 2Gi
requests:
cpu: 600m
memory: 900Mi
terminationMessagePath: /dev/termination-log
name: airflow-schcl
env:
- name: AIRFLOW__CORE__DAGS_FOLDER
value: /usr/local/airflow/mnt/dags
- name: AIRFLOW__LOGGING__BASE_LOG_FOLDER
value: /usr/local/airflow/mnt/dags/logs
- name: AIRFLOW__SMTP__SMTP_HOST
value:
- name: AIRFLOW__SMTP__SMTP_PORT
value: '25'
- name: AIRFLOW__SMTP__SMTP_MAIL_FROM
value:-careflow@.com
- name: AUTH_TYPE
value: LDAP
- name: SRC_ACCT
value: SRC_GBDENC_CAREFLOW@us.ad.wellpoint.com
- name: APP_DL_GROUP
value:
- name: FERNET_KEY
value:=
- name: POSTGRES_USER
value: airflow_gbditenc_prod
- name: POSTGRES_PASSWORD
value:
- name: POSTGRES_DB
value: airflow_gbditenc_prod
- name: POSTGRES_HOST
value:
- name: POSTGRES_PORT
value: '15432'
- name: REDIS_HOST
value: redis
- name: REDIS_PORT
value: '6379'
- name: REDIS_PASSWORD
value: airflow
- name: LOAD_EX
value: 'n'
- name: EXECUTOR
value: Celery
- name: REQUESTS_CA_BUNDLE
value: /usr/local/airflow/secret/root.pem
- name: AIRFLOW__CORE__PARALLELISM
value: '100'
- name: AIRFLOW__CORE__DAG_CONCURRENCY
value: '50'
- name: AIRFLOW__CELERY__WORKER_CONCURRENCY
value: '50'
- name: AIRFLOW__SCHEDULER__PARSING_PROCESSES
value: '4'
- name: AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG
value: '50'
- name: AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT
value: '120'
- name: AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD
value: '240'
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1001250000
imagePullPolicy: Always
volumeMounts:
- name: airflow
mountPath: /usr/local/airflow/mnt
- name: secret-volume
mountPath: /usr/local/airflow/secret
- name: default-token-2q6wn
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePolicy: File
image: >-
quay.apps.ent-ocp4-prod-ric.antmdc.internal.das/gbditenc-prod/-careflow@sha256:d58fb091e5045dbab325dc685935a453a2f2bba5f605a1b36e6d617ede3cbd3b
args:
- scheduler
serviceAccount: default
volumes:
- name: airflow
persistentVolumeClaim:
claimName: gbditenc-prod-pvc1
- name: secret-volume
secret:
secretName: airflow-secrets
defaultMode: 420
- name: default-token-2q6wn
secret:
secretName: default-token-2q6wn
defaultMode: 420
dnsPolicy: ClusterFirst
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoSchedule
status:
phase: Running
conditions:
- type: Initialized
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-01-26T19:44:15Z'
- type: Ready
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-01-26T19:44:18Z'
- type: ContainersReady
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-01-26T19:44:18Z'
- type: PodScheduled
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-01-26T19:44:15Z'
hostIP: 30.230.45.57
podIP: 10.130.27.188
podIPs:
- ip: 10.130.27.188
startTime: '2022-01-26T19:44:15Z'
containerStatuses:
- restartCount: 0
started: true
ready: true
name: airflow-schcl
state:
running:
startedAt: '2022-01-26T19:44:18Z'
imageID: >-
quay.apps.ent-ocp4-prod-ric.antmdc.internal.das/gbditenc-prod/-careflow@sha256:d58fb091e5045dbab325dc685935a453a2f2bba5f605a1b36e6d617ede3cbd3b
image: >-
quay.apps.ent-ocp4-prod-ric.antmdc.internal.das/gbditenc-prod/-careflow@sha256:d58fb091e5045dbab325dc685935a453a2f2bba5f605a1b36e6d617ede3cbd3b
lastState: {}
containerID: 'cri-o://32e91f035274091eb443a80ca31944832eb7232a945b24181d89aea95da6d223'
qosClass: Burstable

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions