-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Description
Apache Airflow version
2.2.5 (latest released)
What happened
I am running airflow via the suggested docker-compose.yml, which yields us a scheduler that runs dags every second.
I am modifying Apache's docker container with our own Dockerfile from 'apache/airflow:2.2.5' and then running it via docker-compose. Docker-compose.yml properly brings up Airflow and it is functional.
Here's some example lines from our scheduler log (notice I set our dag factory to not create any dags to make the output easier to see, when it's set up properly our dags are populated without error):
[2022-04-25 18:04:53,406] {processor.py:642} INFO - Processing file /opt/airflow/dags/databridge_v2_dag_factory.py for tasks to queue
[2022-04-25 18:04:53,406] {logging_mixin.py:109} INFO - [2022-04-25 18:04:53,406] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:53,866] {processor.py:654} WARNING - No viable dags retrieved from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:53,897] {processor.py:171} INFO - Processing /opt/airflow/dags/databridge_v2_dag_factory.py took 0.495 seconds
[2022-04-25 18:04:53,922] {processor.py:163} INFO - Started process (PID=217) to work on /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:53,923] {processor.py:642} INFO - Processing file /opt/airflow/dags/databridge_v2_dag_factory.py for tasks to queue
[2022-04-25 18:04:53,924] {logging_mixin.py:109} INFO - [2022-04-25 18:04:53,924] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:54,800] {processor.py:654} WARNING - No viable dags retrieved from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:54,896] {processor.py:171} INFO - Processing /opt/airflow/dags/databridge_v2_dag_factory.py took 0.977 seconds
[2022-04-25 18:04:54,926] {processor.py:163} INFO - Started process (PID=221) to work on /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:54,927] {processor.py:642} INFO - Processing file /opt/airflow/dags/databridge_v2_dag_factory.py for tasks to queue
[2022-04-25 18:04:54,927] {logging_mixin.py:109} INFO - [2022-04-25 18:04:54,927] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:55,404] {processor.py:654} WARNING - No viable dags retrieved from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:55,440] {processor.py:171} INFO - Processing /opt/airflow/dags/databridge_v2_dag_factory.py took 0.517 seconds
[2022-04-25 18:04:55,460] {processor.py:163} INFO - Started process (PID=225) to work on /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:55,461] {processor.py:642} INFO - Processing file /opt/airflow/dags/databridge_v2_dag_factory.py for tasks to queue
[2022-04-25 18:04:55,461] {logging_mixin.py:109} INFO - [2022-04-25 18:04:55,461] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:56,001] {processor.py:654} WARNING - No viable dags retrieved from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:56,226] {processor.py:171} INFO - Processing /opt/airflow/dags/databridge_v2_dag_factory.py took 0.770 seconds
[2022-04-25 18:04:56,294] {processor.py:163} INFO - Started process (PID=229) to work on /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:56,296] {processor.py:642} INFO - Processing file /opt/airflow/dags/databridge_v2_dag_factory.py for tasks to queue
[2022-04-25 18:04:56,296] {logging_mixin.py:109} INFO - [2022-04-25 18:04:56,296] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:57,075] {processor.py:654} WARNING - No viable dags retrieved from /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:57,103] {processor.py:171} INFO - Processing /opt/airflow/dags/databridge_v2_dag_factory.py took 0.813 seconds
[2022-04-25 18:04:57,124] {processor.py:163} INFO - Started process (PID=233) to work on /opt/airflow/dags/databridge_v2_dag_factory.py
[2022-04-25 18:04:57,125] {processor.py:642} INFO - Processing file /opt/airflow/dags/databridge_v2_dag_factory.py for tasks to queue
[2022-04-25 18:04:57,126] {logging_mixin.py:109} INFO - [2022-04-25 18:04:57,126] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/databridge_v2_dag_factory.py
and our scheduler section of our config, I tried raising other values in an effort to elicit different behavior:
[scheduler]
job_heartbeat_sec = 5
clean_tis_without_dagrun_interval = 15.0
scheduler_heartbeat_sec = 60
num_runs = 5
scheduler_idle_sleep_time = 20
min_file_process_interval = 60
dag_dir_list_interval = 150
print_stats_interval = 30
pool_metrics_interval = 5.0
scheduler_health_check_threshold = 30
orphaned_tasks_check_interval = 300.0
child_process_log_directory = /opt/airflow/logs/scheduler
scheduler_zombie_task_threshold = 300
catchup_by_default = True
max_tis_per_query = 512
use_row_level_locking = True
parsing_processes = 1
use_job_schedule = True
allow_trigger_in_future = False
I can confirm those values are set as they're shown in the configuration section of the UI, and when running python code from within the container, that they're imported properly as the airflow code does it:
docker exec -it databridge-airflow-v2_airflow-webserver_1 /bin/bash
airflow@22cd52027e82:/opt/airflow$ python
>>> from airflow.configuration import conf
>>> conf.getint('scheduler', 'SCHEDULER_HEARTBEAT_SEC')
60
>>> conf.getint('scheduler','min_file_process_interval')
60
>>> conf.getint('scheduler','scheduler_idle_sleep_time')
20
What you think should happen instead
The airflow scheduler should honor the 'min_file_process_interval' and run every 60 seconds.
How to reproduce
Spin up airflow via docker-compose.yml
Operating System
Debian GNU/Linux 10 (buster)
Versions of Apache Airflow Providers
airflow@22cd52027e82:/opt/airflow$ pip freeze | grep apache-airflow-providers
apache-airflow-providers-amazon==3.2.0
apache-airflow-providers-celery==2.1.3
apache-airflow-providers-cncf-kubernetes==3.0.0
apache-airflow-providers-docker==2.5.2
apache-airflow-providers-elasticsearch==2.2.0
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-google==6.7.0
apache-airflow-providers-grpc==2.0.4
apache-airflow-providers-hashicorp==2.1.4
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-microsoft-azure==3.7.2
apache-airflow-providers-mysql==2.2.3
apache-airflow-providers-odbc==2.0.4
apache-airflow-providers-oracle==2.2.3
apache-airflow-providers-postgres==4.1.0
apache-airflow-providers-redis==2.0.4
apache-airflow-providers-sendgrid==2.0.4
apache-airflow-providers-sftp==2.5.2
apache-airflow-providers-slack==4.2.3
apache-airflow-providers-sqlite==2.1.3
apache-airflow-providers-ssh==2.4.3
Deployment
Docker-Compose
Deployment details
docker-compose version 1.29.2, build 5becea4c
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct