Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubernetesPodOperator Fails on empty log line #21605

Closed
2 tasks done
bhavaniravi opened this issue Feb 16, 2022 · 29 comments
Closed
2 tasks done

KubernetesPodOperator Fails on empty log line #21605

bhavaniravi opened this issue Feb 16, 2022 · 29 comments
Labels
area:providers kind:bug This is a clearly a bug

Comments

@bhavaniravi
Copy link
Contributor

bhavaniravi commented Feb 16, 2022

Apache Airflow Provider(s)

cncf-kubernetes

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==1!2.1.0

Apache Airflow version

2.2.3 (latest released)

Operating System

Debian GNU/Linux 10 (buster)

Deployment

Astronomer

Deployment details

No response

What happened

Some KubernetesPodOperator tasks fail with an empty logline. From the following logs, you could see that monitor_pod function

  1. continuously fails to get a timestamp in log message Error parsing timestamp. Will continue execution but won't update timestamp
  2. Not able to fetch container logs unable to retrieve container logs for docker://
  3. eventually failing with an empty log line Exception: Log not in "{timestamp} {log}" format. Got:

What you expected to happen

In case of empty log line, we should graciously handle the error instead of failing the task itself

How to reproduce

Not sure what really causes this issue but this StackOverflow question Docker cleaning up the logs?

Anything else

Complete log stacktrace*

[2022-02-07, 23:33:23 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:33:23 UTC] {pod_launcher.py:176} INFO - rpc error: code = DeadlineExceeded desc = context deadline exceeded
[2022-02-07, 23:33:24 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:35:24 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:35:24 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:35:25 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:37:25 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:37:25 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:37:26 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:39:26 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:39:26 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:39:27 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:41:27 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:41:27 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:41:28 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:43:28 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:43:28 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:43:29 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:45:29 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:45:29 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:45:30 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:47:30 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:47:30 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:47:31 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:49:31 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:49:31 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:49:32 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:51:32 UTC] {pod_launcher.py:231} ERROR - Error parsing timestamp. Will continue execution but won't update timestamp
[2022-02-07, 23:51:32 UTC] {pod_launcher.py:176} INFO - unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a
[2022-02-07, 23:51:33 UTC] {pod_launcher.py:192} WARNING - Pod dbt-hourly-run-task.537231fc6daf403484e342380a5caf53 log read interrupted
[2022-02-07, 23:51:56 UTC] {taskinstance.py:1700} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1506, in _execute_task
    result = execute_callable(context=context)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 367, in execute
    final_state, remote_pod, result = self.create_new_pod_for_operator(labels, launcher)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 524, in create_new_pod_for_operator
    final_state, remote_pod, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/utils/pod_launcher.py", line 175, in monitor_pod
    timestamp, message = self.parse_log_line(line.decode('utf-8'))
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/utils/pod_launcher.py", line 225, in parse_log_line
    raise Exception(f'Log not in "{{timestamp}} {{log}}" format. Got: {line}')
Exception: Log not in "{timestamp} {log}" format. Got:

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@bhavaniravi bhavaniravi added area:providers kind:bug This is a clearly a bug labels Feb 16, 2022
@bhavaniravi
Copy link
Contributor Author

I'm not sure of the root cause of the issue.
If the fix is as simple as handling empty log line I would be happy to raise a PR for the same

@raphaelauv
Copy link
Contributor

Could you try update to latest version of the provider https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/

3.0.2

@bhavaniravi
Copy link
Contributor Author

3.0.2 Also has the same logic for parsing log lines, wouldn't be of much help
https://github.com/apache/airflow/blob/providers-cncf-kubernetes/3.0.2/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L254

@bhavaniravi
Copy link
Contributor Author

bhavaniravi commented Feb 23, 2022

With reference to #15638 @dimberman, are you aware of a case where the log lines can be empty?

@potiuk
Copy link
Member

potiuk commented Mar 7, 2022

3.0.2 Also has the same logic for parsing log lines, wouldn't be of much help https://github.com/apache/airflow/blob/providers-cncf-kubernetes/3.0.2/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L254

It has different kubernetes library. Can you please chack it ?

@aljanson
Copy link

Hello, I tried using the latest package version 3.1.1 but face the same issue as above, any recommendations?

@potiuk
Copy link
Member

potiuk commented Mar 27, 2022

@bhavaniravi I think the root cause of the problem is that you can't read the logs of the POD. This looks like misconfiguration of your service account.

The root cause of the problem is not parsing, nor even empty line but the fact that you cannot read the logs:

unable to retrieve container logs for docker://c2cd33567efa4a95f555ccb6d7fda760b65bb9cf2d13087d2704e2e8b974e42a

Maybe you should look closer at your k8s logs and see what is the root cause of the problem. Even if we add an extra step to "react" on empty lines it will not solve the root cause, which is inability to read the logs. Can any of those who has similar problems get a deeper look at your k8s logs and see if there are any other errors - for example indicating that thare are some permission issues or any other anomalies.

Looking at it as "parsing" issue is just masking a real problem you have in your deployment I am afraid.

@aljanson
Copy link

aljanson commented Mar 27, 2022

@potiuk Just adding a little more context. I am using Airflow 2.x to spin up pods on an Azure Kubernetes Cluster using the KubernetesPodOperator operator.

Its interesting that I dont face the above logging issues when running the pod on a provisioned node. However, the moment I try to execute the same on the newer virtual nodes, it starts hitting me with the logging error.

Furthermore, I've also noticed that if I disable the do_xcom_push argument, the job then succeeds just fine, however will still have those warning messages throughout the logs saying "Error parsing timestamp. Will continue execution but won't update timestamp".

On the documentation page for AKS virtual nodes, it does mention that init containers are not supported. I believe, the way xcom works is by running a sidecar container however right? Just trying to make sense of any obvious limitations which I might be missing right off the bat :)

potiuk added a commit to potiuk/airflow that referenced this issue Mar 28, 2022
It seems that in some circumstances, the K8S client might return
empty logs even if "timestamps" options is specified.

That should not happen in general, but apparently it does in
some cases and leads to task being killed.

Rather than killing the tasks we should log it as an error
(on top of trying to find out why and preventing it from
happening - also to be able to gather more information and
diagnosis on when it happens).

Related to: apache#21605
potiuk added a commit that referenced this issue Mar 28, 2022
…2566)

It seems that in some circumstances, the K8S client might return
empty logs even if "timestamps" options is specified.

That should not happen in general, but apparently it does in
some cases and leads to task being killed.

Rather than killing the tasks we should log it as an error
(on top of trying to find out why and preventing it from
happening - also to be able to gather more information and
diagnosis on when it happens).

Related to: #21605
@potiuk
Copy link
Member

potiuk commented Mar 28, 2022

While we do not know the root cause, the #22566 should mitigate the crash. Will be released in the next provider (but the next provider will only be installable in 2.3.0 +).

@potiuk potiuk closed this as completed Mar 28, 2022
@aljanson
Copy link

aljanson commented Mar 31, 2022

Thanks! however what fix/workaround can we use in the interim for this to prevent the errors and the crashing until the above is released? (as 2.3.0 release seems far away)

Is there a possibility of using a custom Xcom backend (s3) with the pod operator instead of the default writing to the side-car container?

@potiuk
Copy link
Member

potiuk commented May 8, 2022

Thanks! however what fix/workaround can we use in the interim for this to prevent the errors and the crashing until the above is released? (as 2.3.0 release seems far away)

2.3.0 is out. I do not think there were any workarounds.

@HemanthKumar8124
Copy link

HemanthKumar8124 commented Apr 25, 2023

Hi Team,

Even with Airflow MWAA 2.4.3 version with Kubernetes POD Operator task, we are seeing the below issue in Airflow logs.

kubernetes version - 1.24

{{pod_manager.py:410}} ERROR - Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp
{{logging_mixin.py:137}} WARNING - /usr/local/airflow/.local/lib/python3.10/site-packages/watchtower/init.py:349 WatchtowerWarning: Received empty message. Empty messages cannot be sent to CloudWatch Logs
{{logging_mixin.py:137}} WARNING - Traceback (most recent call last):
{{logging_mixin.py:137}} WARNING - File "/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
self.sniff_errors(record)
{{logging_mixin.py:137}} WARNING - File "/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
if pattern.search(record.message):
{{logging_mixin.py:137}} WARNING - AttributeError: 'LogRecord' object has no attribute 'message'

I don't see any issue with POD logs when I manually checked using kubectl client. Each line contains timestamp in it. But the above warning message appears in Airflow logs.

@potiuk

@raphaelauv
Copy link
Contributor

MWAA 2.4.3 use by default apache-airflow-providers-cncf-kubernetes==4.4.0

@muscovitebob
Copy link

I am hitting this on the very latest Cloud Composer which uses apache-airflow-providers-cncf-kubernetes==6.0.0. The logs are ingested fine until I have some unexpected warnings from an underlying library printed, which kills the task.

@Blind-Watchmaker
Copy link

Blind-Watchmaker commented May 24, 2023

I'm also experiencing this problem while using dbt in Airflow:

Airflow v2.5.0
Kubernetes: 1.24.10 (Azure Kubernetes Service)
apache-airflow-providers-cncf-kubernetes 7.0.0

[2023-05-24, 14:02:49 UTC] {pod_manager.py:367} INFO - �[2m2023-05-24T13:57:48.744628Z�[0m [�[32m�[1minfo     �[0m] �[1m�[0m13:57:48  105 of 105 START sql table model tap_mongodb.mart_name_here .................. [RUN]�[0m �[36mcmd_type�[0m=�[35mcommand�[0m �[36mname�[0m=�[35mdbt-***�[0m �[36mstdio�[0m=�[35mstderr�[0m

[2023-05-24, 14:07:49 UTC] {pod_manager.py:438} ERROR - Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp

In this case, we're at a step in the execution process that could take a long time and you can see that the error occurs exactly 5 minutes after the previous line got executed. There isn't an empty log line in the log that follows. My pipeline was working before when I was dealing with a smaller dataset because the build time of this step was small enough, but I think there might be some kind of KubernetesPodOperator related timeout at play here.

@potiuk / anyone else: Any ideas on Airflow variables I could play around with to test this hypothesis?

@potiuk
Copy link
Member

potiuk commented May 24, 2023

Maybe @dstandish and maybe upgrading to latest airflow ? there were some changes in how logs are pulled from K8S pods ?

@dstandish
Copy link
Contributor

@potiuk it looks like you tried to fix this in #22566

That, I believe, was released in cncf provider version 4.0

If users are still experiencing this, then maybe the except is too narrow?

@potiuk
Copy link
Member

potiuk commented May 24, 2023

Ah right. Now I remember.

ERROR - Error parsing timestamp (no timestamp in message ' still is printed but it does not crash. @Blind-Watchmaker - seems that this is just a log entry. but you should not experience crash. So all fine.

@dstandish
Copy link
Contributor

Anyone who is experiencing failures with provider version >= 4.0, please share the traceback

@solomonshorser
Copy link

We've started seeing Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp recently (I think someone upgraded something just before this started appearing) in our logs and were initially concerned, but the jobs always succeed. I'm guessing this message can be ignored, but is there something with more detail to explain why this message is safe to ignore?

@applevladko
Copy link

applevladko commented Jun 15, 2023

Seems like we faced similar issue on the later version:

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==6.1.0
kubernetes==23.6.0
kubernetes-asyncio==24.2.3

Apache Airflow version

2.6.1

Python version: 3.10

Operating System

Debian VERSION="11 (bullseye)"

Seems like after empty string timestamp error (that appears during long query) log parses stops and fail into 404.
the issue is not permanent and appears sometimes

[2023-06-14, 15:16:36 UTC] {pod_manager.py:342} INFO - SELECT * FROM columns_to_select [2023-06-14, 15:16:36 UTC] {pod_manager.py:342} INFO - -- /* {"app": "dbt", "dbt_version": "1.5.0", "profile_name": "athena", "target_name": "athena", "node_id": "model.data_models.stg_event_wallet_balance_changed"} */ [2023-06-14, 15:17:28 UTC] {pod_manager.py:410} ERROR - Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp [2023-06-14, 15:17:28 UTC] {pod_manager.py:342} INFO - [2023-06-14, 15:17:31 UTC] {pod.py:905} ERROR - (404) Reason: Not Found HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fc6d1076-bb5e-4c1d-ad92-d568c53bfd3e', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'b6307941-9fe6-4061-9796-76ce2ad8cd8b', 'X-Kubernetes-Pf-Prioritylevel-Uid': '3dbd0f1c-d64d-450a-805b-741194c49f71', 'Date': 'Wed, 14 Jun 2023 15:17:31 GMT', 'Content-Length': '288'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"dbt-dm-data-vault--*pod name*-b9k0b0t1\" not found","reason":"NotFound","details":{"name":"dbt-dm-data-vault--*pod name*-b9k0b0t1","kind":"pods"},"code":404} Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 557, in execute_sync self.remote_pod = self.pod_manager.await_pod_completion(self.pod) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 394, in await_pod_completion remote_pod = self.read_pod(pod) File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f return self(f, *args, **kw) File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 325, in iter raise retry_exc.reraise() File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 158, in reraise raise self.last_attempt.result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 490, in read_pod return self._client.read_namespaced_pod(pod.metadata.name, pod.metadata.namespace) File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 23483, in read_namespaced_pod return self.read_namespaced_pod_with_http_info(name, namespace, **kwargs) # noqa: E501 File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 23570, in read_namespaced_pod_with_http_info return self.api_client.call_api( File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 373, in request return self.rest_client.GET(url, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 240, in GET return self.request("GET", url, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 234, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (404) Reason: Not Found HTTP response headers: HTTPHeaderDict({'Audit-Id': 'caf17afd-96b2-4b53-b9ff-cef1ada8b553', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'b6307941-9fe6-4061-9796-76ce2ad8cd8b', 'X-Kubernetes-Pf-Prioritylevel-Uid': '3dbd0f1c-d64d-450a-805b-741194c49f71', 'Date': 'Wed, 14 Jun 2023 15:17:31 GMT', 'Content-Length': '288'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"dbt-dm-data-vault--*pod name*-b9k0b0t1\" not found","reason":"NotFound","details":{"name":"dbt-dm-data-vault--*pod name*-b9k0b0t1","kind":"pods"},"code":404} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 745, in patch_already_checked self.client.patch_namespaced_pod( File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 19662, in patch_namespaced_pod return self.patch_namespaced_pod_with_http_info(name, namespace, body, **kwargs) # noqa: E501 File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 19777, in patch_namespaced_pod_with_http_info return self.api_client.call_api( File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 407, in request return self.rest_client.PATCH(url, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 295, in PATCH return self.request("PATCH", url, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 234, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (404) Reason: Not Found HTTP response headers: HTTPHeaderDict({'Audit-Id': 'fc6d1076-bb5e-4c1d-ad92-d568c53bfd3e', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'b6307941-9fe6-4061-9796-76ce2ad8cd8b', 'X-Kubernetes-Pf-Prioritylevel-Uid': '3dbd0f1c-d64d-450a-805b-741194c49f71', 'Date': 'Wed, 14 Jun 2023 15:17:31 GMT', 'Content-Length': '288'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"dbt-dm-data-vault--*pod name*-b9k0b0t1\" not found","reason":"NotFound","details":{"name":"dbt-dm-data-vault--*pod name*-b9k0b0t1","kind":"pods"},"code":404} [2023-06-14, 15:17:31 UTC] {pod.py:721} INFO - Deleting pod: dbt-dm-data-vault--*pod name*-b9k0b0t1 [2023-06-14, 15:17:31 UTC] {taskinstance.py:1847} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 557, in execute_sync self.remote_pod = self.pod_manager.await_pod_completion(self.pod) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 394, in await_pod_completion remote_pod = self.read_pod(pod) File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f return self(f, *args, **kw) File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__ do = self.iter(retry_state=retry_state) File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 325, in iter raise retry_exc.reraise() File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 158, in reraise raise self.last_attempt.result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__ result = fn(*args, **kwargs) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 490, in read_pod return self._client.read_namespaced_pod(pod.metadata.name, pod.metadata.namespace) File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 23483, in read_namespaced_pod return self.read_namespaced_pod_with_http_info(name, namespace, **kwargs) # noqa: E501 File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 23570, in read_namespaced_pod_with_http_info return self.api_client.call_api( File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 373, in request return self.rest_client.GET(url, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 240, in GET return self.request("GET", url, File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 234, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (404) Reason: Not Found HTTP response headers: HTTPHeaderDict({'Audit-Id': 'caf17afd-96b2-4b53-b9ff-cef1ada8b553', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'b6307941-9fe6-4061-9796-76ce2ad8cd8b', 'X-Kubernetes-Pf-Prioritylevel-Uid': '3dbd0f1c-d64d-450a-805b-741194c49f71', 'Date': 'Wed, 14 Jun 2023 15:17:31 GMT', 'Content-Length': '288'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"dbt-dm-data-vault--*pod name*-b9k0b0t1\" not found","reason":"NotFound","details":{"name":"dbt-dm-data-vault--*pod name*-b9k0b0t1","kind":"pods"},"code":404} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/airflow/dags/operators/dbt_operator.py", line 181, in execute dbt.execute(context=context) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 529, in execute return self.execute_sync(context) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 559, in execute_sync self.cleanup( File "/opt/airflow/dags/operators/k8s_pod.py", line 30, in cleanup status = next((x for x in statuses if x.name == self.base_container_name), None) TypeError: 'NoneType' object is not iterable [2023-06-14, 15:17:31 UTC] {taskinstance.py:1368} INFO - Marking task as UP_FOR_RETRY. dag_id=dbt_dm_data_vault, task_id=data_vault.stg_event_wallet_balance_changed.run, execution_date=20230613T205009, start_date=20230614T151608, end_date=20230614T151731 [2023-06-14, 15:17:31 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 9356814 for task data_vault.stg_event_wallet_balance_changed.run ('NoneType' object is not iterable; 12158) [2023-06-14, 15:17:31 UTC] {local_task_job_runner.py:232} INFO - Task exited with return code 1 [2023-06-14, 15:17:31 UTC] {taskinstance.py:2674} INFO - 0 downstream tasks scheduled from follow-on schedule check

@y0zg
Copy link

y0zg commented Jul 11, 2023

Hi there!
Is there any workaround for above issue? Any chances this fix will be released soon?

@tanthml
Copy link

tanthml commented Jul 13, 2023

I got the same issue when updated to 2.6.2,
ERROR - Error parsing timestamp (no timestamp in message ''). Will continue execution but won't update timestamp!

@y0zg
Copy link

y0zg commented Jul 13, 2023

I solved the issue by adding persistence for logs

@ricoms
Copy link

ricoms commented Jul 25, 2023

I found this issue using GCP Composer v2 with KubernetesPodOperator.

I solved the issue by adding persistence for logs

And how did you do that @y0zg ?

@romanzdk
Copy link

Hi, why is the issue closed? I am on 2.6.1 with airflow-cncf-kubernetes 7.5.1 (latest) and still getting this error

@potiuk
Copy link
Member

potiuk commented Sep 15, 2023

Hi, why is the issue closed? I am on 2.6.1 with airflow-cncf-kubernetes 7.5.1 (latest) and still getting this error

Because we think the original issue from version released 2 years has been fixed. You @romanzdk @ricoms (also suggestion y00zg @tanthml) might have SIMILAR issue which might be completely different. And the best way you can help someone to diagnoe and solve your issue is open a new one where you will describe what happens in your case and provide evidence from your version (ideally after upgrading to latest version of Airflow because what you see, could have been fixed since).

This is an open-source project where you get software for free and people who solve other people's problem mostoften do that in their free time - weekends and nights. And the best way that you can get help from those people is to make it easy for them to diagnose your issue. I know it is super-easy to write "I have the same issue". You do not loose any time on gathering evidences and writing the issue. But your issue (especially that we are talking about someone who raised it for version 2.2 and we are 2 years later with 2.7.1 and k8s code has been rewritten 3 times since then) . Also it might depend on multiple factors like kubernetes version, provider version, type of kubernetes cluster you have etc. But your comment did not bring anyone any closer to knowing all those details.

So, if you can open a new issue @romanzdk and provide good evidences, you are increasing chances (but only that - chances) that someone will spend their free afternoon or weekend and look at your issue and maybe even fixes it.

If all that you have is question why the issue is closed, your chances for getting it solved do not increase even by a fraction of percent.

So - if you really care about your problem being solved, - I suggest you help those who try to help you and provide good, reproducible issue with good evidences of what happens (ideally looking at your system, also and corellating what happens on your system (maybe some pods were failing? Maybe you can see some unusual behaviour or output of your pods etc. That will be a great help for those who spends their nights an weekends trying to help people who use the software for completely free.

@romanzdk
Copy link

ok, @potiuk - created a new issue - #34388

@Abad07
Copy link

Abad07 commented Sep 15, 2023

Hi, why is the issue closed? I am on 2.6.1 with airflow-cncf-kubernetes 7.5.1 (latest) and still getting this error

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests