Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler dies if executor_config isnt passed a dict when using K8s executor #14182

Closed
fjmacagno opened this issue Feb 10, 2021 · 4 comments · Fixed by #14323
Closed

Scheduler dies if executor_config isnt passed a dict when using K8s executor #14182

fjmacagno opened this issue Feb 10, 2021 · 4 comments · Fixed by #14323
Assignees
Labels
kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues

Comments

@fjmacagno
Copy link
Contributor

Apache Airflow version: 2.0.1

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.15

Environment:

  • Cloud provider or hardware configuration: k8s on bare metal
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: pip3
  • Others:

What happened:
Scheduler dies with

[2021-02-10 21:09:27,469] {scheduler_job.py:1298} ERROR - Exception when executing SchedulerJob._run_schedu
ler_loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1280, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1384, in _run_scheduler
_loop
    self.executor.heartbeat()
  File "/usr/local/lib/python3.8/site-packages/airflow/executors/base_executor.py", line 158, in heartbeat
    self.trigger_tasks(open_slots)
  File "/usr/local/lib/python3.8/site-packages/airflow/executors/base_executor.py", line 188, in trigger_ta
sks
    self.execute_async(key=key, command=command, queue=None, executor_config=ti.executor_config)
  File "/usr/local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 493, in exec
ute_async
    kube_executor_config = PodGenerator.from_obj(executor_config)
  File "/usr/local/lib/python3.8/site-packages/airflow/kubernetes/pod_generator.py", line 175, in from_obj
    k8s_legacy_object = obj.get("KubernetesExecutor", None)
AttributeError: 'V1Pod' object has no attribute 'get'
[2021-02-10 21:09:28,475] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 60
[2021-02-10 21:09:29,222] {process_utils.py:66} INFO - Process psutil.Process(pid=66, status='terminated',
started='21:09:27') (66) terminated with exit code None
[2021-02-10 21:09:29,697] {process_utils.py:206} INFO - Waiting up to 5 seconds for processes to exit...
[2021-02-10 21:09:29,716] {process_utils.py:66} INFO - Process psutil.Process(pid=75, status='terminated',
started='21:09:28') (75) terminated with exit code None
[2021-02-10 21:09:29,717] {process_utils.py:66} INFO - Process psutil.Process(pid=60, status='terminated',
exitcode=0, started='21:09:27') (60) terminated with exit code 0
[2021-02-10 21:09:29,717] {scheduler_job.py:1301} INFO - Exited execute loop

What you expected to happen:
DAG loading fails, producing an error for just that DAG, instead of crashing the scheduler.

How to reproduce it:
Create a task like

    test = DummyOperator(task_id="new-pod-spec",
                              executor_config=k8s.V1Pod(
                                      spec=k8s.V1PodSpec(
                                          containers=[
                                              k8s.V1Container(
                                                  name="base",
                                                  image="myimage",
                                                  image_pull_policy="Always"
                                              )
                                          ]
                                      )))

or

    test = DummyOperator(task_id="new-pod-spec",
                              executor_config={"KubernetesExecutor": k8s.V1Pod(
                                      spec=k8s.V1PodSpec(
                                          containers=[
                                              k8s.V1Container(
                                                  name="base",
                                                  image="myimage",
                                                  image_pull_policy="Always"
                                              )
                                          ]
                                      ))})

essentially anything where it expects a dict but gets something else, and run the scheduler using the kubernetes executor

@fjmacagno fjmacagno added the kind:bug This is a clearly a bug label Feb 10, 2021
@kaxil kaxil added the provider:cncf-kubernetes Kubernetes provider related issues label Feb 10, 2021
@dimberman
Copy link
Contributor

Hi @fjmacagno so if I'm understanding the issue: rather than killing the entire scheduler when this happens, a better result would be for this to return a task failure with the error upon run?

@kaxil @jhtimmins @ephraimbuddy this would be a good start k8sexecutor bug if you're interested. Glad to help you find the solution.

@fjmacagno
Copy link
Contributor Author

Yeah, in general i want to avoid letting my users break airflow for everyone, and since this is a mis-configuration of a DAG I feel it should not load in the first place.

Maybe Executors need a verifyExecutorConfig method?

@dimberman
Copy link
Contributor

@fjmacagno that could work or at the minimum wrapping it in a try/catch block.

This should be a very easy fix, so you or anyone mentioned should be able to take it on and I'd be glad to help/review (I'm trying to spread k8sexecutor knowledge so we can start answering these tickets more quickly)

@kaxil
Copy link
Member

kaxil commented Feb 19, 2021

#14323 should fix it

ashb pushed a commit that referenced this issue Mar 19, 2021
kaxil added a commit to astronomer/airflow that referenced this issue Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants