-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version:
1.10.12
Kubernetes version (if you are using kubernetes) (use kubectl version):
AKS 1.17.9
Environment:
- Cloud provider or hardware configuration:
Azure AKS - OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): - Install tools:
- Others:
I'm using AKS+KubernetesExecutor+KubernetesPodOperator+Helm stable Airflow chart
What happened:
My log full of following errors:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/executors/kubernetes_executor.py", line 287, in run
self.worker_uuid, self.kube_config)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/executors/kubernetes_executor.py", line 323, in _run
for event in list_worker_pods():
File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 46, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/home/airflow/.local/lib/python3.6/site-packages/urllib3/response.py", line 441, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.158.0.1', port=443): Read timed out.
[2020-09-10 13:39:23,281] {kubernetes_executor.py:302} INFO - Event: and now my watch begins starting at resource_version: 0
What you expected to happen:
Clean up log from this errors... I have suspicions that it could cause heartbeat problem witch scheduler, usually after 5-10 DAG's execution I'm getting error in UI: The scheduler does not appear to be running. Last heartbeat was received X minutes ago.The DAGs list may not update, and new tasks will not be scheduled. After that I can't schedule anything I'm not sure that error in logs and heartbeat problem are related but it could be...
How to reproduce it:
Schedule more or less long running DAG on following setup. AKS+KubernetesExecutor+KubernetesPodOperator+Helm stable Airflow chart.
As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
Anything else we need to know:
One cool guy from Airflow Slack tell me that he knows this issue and he have solution: https://github.com/maganaluis/k8s-api-python in timeout-fix branch
How often does this problem occur? Once? Every time etc?
This error floods log every time...