-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler crashlooping when dag with task_concurrency is deleted #20099
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
How did you solve this instance? I am having the same issue but I cannot trace which DAG was deleted. |
We did know the dag that was deleted, added the dag and disabled it before deleting it again did the trick for us. |
kaxil
pushed a commit
that referenced
this issue
Jan 13, 2022
) When executing task instances, we do not check if the dag is missing in the dagbag. This PR fixes it by ignoring task instances if we can't find the dag in serialized dag table Closes: #20099
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Apache Airflow version
2.2.2 (latest released)
What happened
After deleting the dag, the scheduler starts crashlooping and cannot recover. This means that an issue with the dag causes the whole environment to be down.
The stacktrace is as follows:
airflow-scheduler [2021-12-07 09:30:07,483] {kubernetes_executor.py:791} INFO - Shutting down Kubernetes executor
airflow-scheduler [2021-12-07 09:30:08,509] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1472
airflow-scheduler [2021-12-07 09:30:08,681] {process_utils.py:66} INFO - Process psutil.Process(pid=1472, status='terminated', exitcode=0, started='09:28:37') (1472) terminated with exit
airflow-scheduler [2021-12-07 09:30:08,681] {scheduler_job.py:655} INFO - Exited execute loop
airflow-scheduler Traceback (most recent call last):
airflow-scheduler File "/home/airflow/.local/bin/airflow", line 8, in
airflow-scheduler sys.exit(main())
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/main.py", line 48, in main
airflow-scheduler args.func(args)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 48, in command
airflow-scheduler return func(*args, **kwargs)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/cli.py", line 92, in wrapper
airflow-scheduler return f(*args, **kwargs)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
airflow-scheduler _run_scheduler_job(args=args)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
airflow-scheduler job.run()
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/base_job.py", line 245, in run
airflow-scheduler self._execute()
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
airflow-scheduler self._run_scheduler_loop()
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
airflow-scheduler num_queued_tis = self._do_scheduling(session)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 820, in _do_scheduling
airflow-scheduler num_queued_tis = self._critical_section_execute_task_instances(session=session)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 483, in _critical_section_execute_task_instances
airflow-scheduler queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/session.py", line 67, in wrapper
airflow-scheduler return func(*args, **kwargs)
airflow-scheduler File "/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 366, in _executable_task_instances_to_queued
airflow-scheduler if serialized_dag.has_task(task_instance.task_id):
airflow-scheduler AttributeError: 'NoneType' object has no attribute 'has_task'
What you expected to happen
I expect that the scheduler does not crash because the dag gets deleted. The biggest issue however is that the whole environment goes down, it would be acceptable that the scheduler has issues with that dag (it is deleted after all) but it should not affect all other dags on the environment.
How to reproduce
`
from airflow import DAG
from datafy.operators import DatafyContainerOperatorV2
from datetime import datetime, timedelta
default_args = {
"owner": "Datafy",
"depends_on_past": False,
"start_date": datetime(year=2021, month=12, day=1),
"task_concurrency": 4,
"retries": 2,
"retry_delay": timedelta(minutes=5),
}
dag = DAG(
"testnielsdev", default_args=default_args, max_active_runs=default_args["task_concurrency"] + 1, schedule_interval="0 1 * * *",
)
DatafyContainerOperatorV2(
dag=dag,
task_id="sample",
cmds=["python"],
arguments=["-m", "testnielsdev.sample", "--date", "{{ ds }}", "--env", "{{ macros.datafy.env() }}"],
instance_type="mx_small",
instance_life_cycle="spot",
)
`
When looking at the airflow code, the most important setting apart from the defaults is to specify task_concurrency.
2. I enable the dag
3. I delete it. When the file gets removed, the scheduler starts crashlooping.
Operating System
We use the default airflow docker image
Versions of Apache Airflow Providers
Not relevant
Deployment
Other Docker-based deployment
Deployment details
Not relevant
Anything else
It occurred at one of our customers and I was quickly able to preproduce the issue.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: