Skip to content

Timeouts in Airflow scheduler logs on AKS #10860

@KIRY4

Description

@KIRY4

Apache Airflow version:
1.10.12

Kubernetes version (if you are using kubernetes) (use kubectl version):
AKS 1.17.9

Environment:

  • Cloud provider or hardware configuration:
    Azure AKS
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
    I'm using AKS+KubernetesExecutor+KubernetesPodOperator+Helm stable Airflow chart

What happened:

My log full of following errors:
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/executors/kubernetes_executor.py", line 287, in run
self.worker_uuid, self.kube_config)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/executors/kubernetes_executor.py", line 323, in _run
for event in list_worker_pods():
File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 46, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 441, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.158.0.1', port=443): Read timed out.
[2020-09-10 13:39:23,281] {kubernetes_executor.py:302} INFO - Event: and now my watch begins starting at resource_version: 0

What you expected to happen:

Clean up log from this errors... I have suspicions that it could cause heartbeat problem witch scheduler, usually after 5-10 DAG's execution I'm getting error in UI: The scheduler does not appear to be running. Last heartbeat was received X minutes ago.The DAGs list may not update, and new tasks will not be scheduled. After that I can't schedule anything I'm not sure that error in logs and heartbeat problem are related but it could be...

How to reproduce it:
Schedule more or less long running DAG on following setup. AKS+KubernetesExecutor+KubernetesPodOperator+Helm stable Airflow chart.

As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.

Anything else we need to know:

One cool guy from Airflow Slack tell me that he knows this issue and he have solution: https://github.com/maganaluis/k8s-api-python in timeout-fix branch

How often does this problem occur? Once? Every time etc?
This error floods log every time...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions