Timeouts in Airflow scheduler logs on AKS

**Apache Airflow version**:
1.10.12

**Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
AKS 1.17.9

**Environment**:

- **Cloud provider or hardware configuration**:
Azure AKS
- **OS** (e.g. from /etc/os-release):
- **Kernel** (e.g. `uname -a`):
- **Install tools**:
- **Others**:
I'm using AKS+KubernetesExecutor+KubernetesPodOperator+Helm stable Airflow chart

**What happened**:

**My log full of following errors:**
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/executors/kubernetes_executor.py", line 287, in run
    self.worker_uuid, self.kube_config)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/executors/kubernetes_executor.py", line 323, in _run
    for event in list_worker_pods():
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 144, in stream
    for line in iter_resp_lines(resp):
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 46, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
    self._original_response.close()
  File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 441, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.158.0.1', port=443): Read timed out.
[2020-09-10 13:39:23,281] {kubernetes_executor.py:302} INFO - Event: and now my watch begins starting at resource_version: 0

**What you expected to happen**:

Clean up log from this errors... I have suspicions that it could cause heartbeat problem witch scheduler, usually after 5-10 DAG's execution I'm getting error in UI: **The scheduler does not appear to be running. Last heartbeat was received X minutes ago.The DAGs list may not update, and new tasks will not be scheduled**. After that I can't schedule anything I'm not sure that error in logs and heartbeat problem are related but it could be...

**How to reproduce it**:
Schedule more or less long running DAG on following setup. AKS+KubernetesExecutor+KubernetesPodOperator+Helm stable Airflow chart.

As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.

**Anything else we need to know**:

One cool guy from Airflow Slack tell me that he knows this issue and he have solution: https://github.com/maganaluis/k8s-api-python in timeout-fix branch

How often does this problem occur? Once? Every time etc?
This error floods log every time...



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timeouts in Airflow scheduler logs on AKS #10860

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeouts in Airflow scheduler logs on AKS #10860

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions