Scheduler throwing error when schedule is set to @hourly and catchup_by_default is set to False #37869
-
Apache Airflow version2.8.2 If "Other Airflow 2 version" selected, which one?No response What happened?I am using a very simple dag which has schedule set has @hourly. Scheduler is not scheduling it and giving error in logs.
Error in log:
If I set catchup_by_default to True or change the schedule to more than an hour it works fine. What you think should happen instead?No response How to reproduceNA Operating SystemDebian GNU/Linux 11 (bullseye) Versions of Apache Airflow ProvidersNo response DeploymentOther Docker-based deployment Deployment detailsUsing a custom docker image. Airflow configuration:
Anything else?No response Are you willing to submit PR?
Code of Conduct
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Unable to reproduce ____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2024-03-02T12:16:34.195+0000] {task_context_logger.py:63} INFO - Task context logging is enabled
[2024-03-02T12:16:34.196+0000] {executor_loader.py:115} INFO - Loaded executor: CeleryExecutor
[2024-03-02T12:16:34.367+0000] {scheduler_job_runner.py:808} INFO - Starting the scheduler
[2024-03-02T12:16:34.368+0000] {scheduler_job_runner.py:815} INFO - Processing each file at most -1 times
[2024-03-02T12:16:34.372+0000] {manager.py:169} INFO - Launched DagFileProcessorManager with pid: 32
[2024-03-02T12:16:34.373+0000] {scheduler_job_runner.py:1608} INFO - Adopting or resetting orphaned tasks for active dag runs
[2024-03-02T12:16:34.375+0000] {settings.py:60} INFO - Configured default timezone UTC
127.0.0.1 - - [02/Mar/2024 12:16:56] "GET /health HTTP/1.1" 200 -
[2024-03-02T12:17:22.183+0000] {dag.py:3834} INFO - Setting next_dagrun for my_dag_name to 2024-03-02 12:00:00+00:00, run_after=2024-03-02 13:00:00+00:00
[2024-03-02T12:17:23.260+0000] {dagrun.py:795} INFO - Marking run <DagRun my_dag_name @ 2024-03-02 11:00:00+00:00: scheduled__2024-03-02T11:00:00+00:00, state:running, queued_at: 2024-03-02 12:17:22.169354+00:00. externally triggered: False> successful
[2024-03-02T12:17:23.261+0000] {dagrun.py:846} INFO - DagRun Finished: dag_id=my_dag_name, execution_date=2024-03-02 11:00:00+00:00, run_id=scheduled__2024-03-02T11:00:00+00:00, run_start_date=2024-03-02 12:17:22.188603+00:00, run_end_date=2024-03-02 12:17:23.261163+00:00, run_duration=1.07256, state=success, external_trigger=False, run_type=scheduled, data_interval_start=2024-03-02 11:00:00+00:00, data_interval_end=2024-03-02 12:00:00+00:00, dag_hash=44b4fe2798b492ea8d535bd13c7385cf
[2024-03-02T12:17:23.266+0000] {dag.py:3834} INFO - Setting next_dagrun for my_dag_name to 2024-03-02 12:00:00+00:00, run_after=2024-03-02 13:00:00+00:00
127.0.0.1 - - [02/Mar/2024 12:17:26] "GET /health HTTP/1.1" 200 -
127.0.0.1 - - [02/Mar/2024 12:17:56] "GET /health HTTP/1.1" 200 - Maybe you miss any other details? |
Beta Was this translation helpful? Give feedback.
-
Hi @Taragolis , I debugged more on this. catchup_by_default doesn't seems to be the only reason. The default timezone + catchup_by_default seems to be an issue here. I am using CET timezone. And after that I was able to reproduce this. Here is a Dockerfile and docker-compose.yml to reproduce this issue. Dockerfile:
docker-compose.yml
|
Beta Was this translation helpful? Give feedback.
Well that is more like troubleshooting rather than actual problem in airflow codebase or dependencies.
I've added just this variables when start official Apache Airflow image, and it works without any issues
What is wrong with your cause, it is hard to guess but my assumptions
TZ
environment variableYou could try to replace