Adding more information to kubernetes executor logs #29929

amoghrajesh · 2023-03-05T10:32:32Z

Kubernetes executor logs do not contain a lot of information that can be back tracked to narrow down the issue. To make debugging easier, this PR adds the key annotations to some logger lines which can be useful to debug issues quicker.

The annotations contain crucial information regarding a k8s log, for ex the structure can involve:

      {
            "dag_id": "dag",
            "task_id": "task",
            "run_id": "run_id",
            "try_number": "1",
            "execution_date": None,
        }

closes: #18329

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/kubernetes/kubernetes_helper_functions.py

tests/executors/test_kubernetes_executor.py

airflow/executors/kubernetes_executor.py

airflow/kubernetes/kubernetes_helper_functions.py

amoghrajesh · 2023-03-20T07:53:53Z

@jedcunningham @hussein-awala @hterik want to hear some opinions about this.
Does the following proposal work?

Since most lines have enough logs with required annotations, it is better is we don't force more annotations and flood the logs for all users
We can have a feature flag which covers if the annotations are printed in the logs for some of these k8s executor logs. This flag can be enabled by certain users who require these kind of info

potiuk · 2023-04-04T19:19:11Z

seems that "rebase" from UI - messed with the change (first time see smth like that) - can yoy please rebase your local change again manually @amoghrajesh and re-push it ?

uranusjr · 2023-05-17T01:56:51Z

airflow/config_templates/config.yml

@@ -2671,6 +2671,13 @@ kubernetes_executor:
      type: string
      example: '{ "total": 3, "backoff_factor": 0.5 }'
      default: ""
+    detailed_executor_logs:
+      description: |
+        Flag to have more details in kubernetes executor pod logs


Can this be more specific than “more details”? What details are they? Why/when would people not want those details?

@uranusjr thanks for the review. All the logs present under kubernetes executor do not contain a lot of data needed to have the best debugging experience.

Example:
When trying to trace the lifecycle of a task in the kubernetes executor, you currently must search first for the name of the pod created by the task, then search for the pod name in the logs. This means you need to be pretty familiar with the structure of the scheduler logs in order to search effectively for the lifecycle of a task that had a problem.

Some log statements like Attempting to finish pod do have the annotations for the pod, which include dag name, task name, and run_id, but others do not. For instance, Event: podname-a2f2c1ac706 had an event of type DELETED has no such annotations.

I will rename the flag to something more meaningful like: traceable-executor-logs. What do you think @uranusjr?
Or any other alternative thoughts?

Sounds good to me

What about logs_task_metadata? if you look to the added information, they describes the task and not the executor. WDYT?

Great point. I agree with you @hussein-awala.
I will make the changes for this

uranusjr · 2023-05-19T04:38:02Z

airflow/executors/kubernetes_executor.py

+                if get_logs_task_metadata():
+                    self.log.info(
+                        "Event: Failed to start pod %s, annotations: %s", pod_name, annotations_string
+                    )
+                else:
+                    self.log.info("Event: Failed to start pod %s", pod_name)


I wonder instead of doing if-else everywhere, if it’s better to simply do something like

if get_logs_task_metadata(): annotations_for_logging = annotations_to_str(annotations) else: annotations_for_logging = "<omitted>"

and always add the annotations: %s part.

Or maybe do this if-else in annotations_to_st directly? (If we give this function a better name.)

Imo, leaving the omitted annotations would not look good in the logs when we do not have any annotations or the logs metadata flag enabled. What do you think? @uranusjr

Logs should prioritise simplcity over making things pretty.

Sure, let me address this comment in that case.

airflow/executors/kubernetes_executor.py

airflow/kubernetes/kubernetes_helper_functions.py

amoghrajesh · 2023-05-22T04:23:28Z

@uranusjr I have addressed your review comments, can you help in another round of review when you have some time?

amoghrajesh · 2023-05-22T07:46:14Z

Closed pull requests to re launch the entire test suite

amoghrajesh · 2023-05-22T12:36:50Z

tests/executors/test_kubernetes_executor.py

+    def test_annotations_for_logging_task_metadata(self):
+        annotations_test = {
+            "dag_id": "dag",
+            "run_id": "run_id",
+            "task_id": "task",
+            "try_number": "1",
+        }
+        with mock.patch.dict(
+            os.environ, {"AIRFLOW__KUBERNETES_EXECUTOR__LOGS_TASK_METADATA": "True"}, clear=True
+        ):
+            expected_annotations = {
+                "dag_id": "dag",
+                "run_id": "run_id",
+                "task_id": "task",
+                "try_number": "1",
+            }
+            annotations_actual = annotations_for_logging_task_metadata(annotations_test)
+            assert annotations_actual == expected_annotations
+
+    def test_annotations_for_logging_task_metadata_fallback(self):
+        annotations_test = {
+            "dag_id": "dag",
+            "run_id": "run_id",
+            "task_id": "task",
+            "try_number": "1",
+        }
+        with mock.patch.dict(
+            os.environ, {"AIRFLOW__KUBERNETES_EXECUTOR__LOGS_TASK_METADATA": "False"}, clear=True
+        ):
+            expected_annotations = "<omitted>"
+            annotations_actual = annotations_for_logging_task_metadata(annotations_test)
+            assert annotations_actual == expected_annotations
+


These tests seem to work fine in my environment. Not sure how/why these are failing in the CI. Any hints @hussein-awala ?

Instead of using mock.patch, it’s more reliable to use the conf_vars helper. You also want to patch both the true and false cases because it’s not deterministic in the CI which the config is set to when the test is run.

@uranusjr I didn't quite get what you meant by You also want to patch both the true and false cases because it’s not deterministic in the CI which the config is set to when the test is run.

I have patched both the test cases, True on line 1177 and False on line 1196

amoghrajesh · 2023-05-23T09:51:41Z

I tried using conf_vars instead and the tests pass for me in my dev setup but not sure why it consistently fails here.

@uranusjr @hussein-awala any tips?

uranusjr · 2023-05-23T10:08:53Z

When a function is decorated with cache, it is only executed exactly once, and later calls automatically re-uses the previous result. This means that after the first test (whichever that is) is run, the other test would receive the wrong, cached result. You need to clear the cache. Search for cache_clear() in the tests directory for examples.

amoghrajesh · 2023-05-23T11:14:40Z

Thank you for that pointer, @uranusjr! Great help.
Pushing a commit with that included.

amoghrajesh · 2023-05-23T15:57:00Z

I was able to fix the tests here, @hussein-awala / @uranusjr can you help in merging this PR?

Thanks!

Adding more information to kubernetes executor logs

41fc315

amoghrajesh requested review from dstandish and jedcunningham as code owners March 5, 2023 10:32

boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels Mar 5, 2023

hussein-awala reviewed Mar 5, 2023

View reviewed changes

airflow/kubernetes/kubernetes_helper_functions.py Outdated Show resolved Hide resolved

jedcunningham reviewed Mar 11, 2023

View reviewed changes

tests/executors/test_kubernetes_executor.py Outdated Show resolved Hide resolved

airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved

airflow/kubernetes/kubernetes_helper_functions.py Outdated Show resolved Hide resolved

Handling review comments from Jed

5fe4b16

potiuk requested review from ephraimbuddy, kaxil, potiuk, ashb, eladkal, o-nikolas, josh-fell, mik-laj, uranusjr, jhtimmins, ryanahamilton, bbovenzi, bolkedebruin and XD-DENG as code owners April 4, 2023 19:17

potiuk mentioned this pull request Apr 4, 2023

Databricks SQL Sensor #30428

Closed

Merge branch 'main' into k8sExecutorLogs

b56c622

amoghrajesh force-pushed the k8sExecutorLogs branch from 6b86e8f to b56c622 Compare May 15, 2023 11:21

Adding logs through configuration

3d01fa9

uranusjr reviewed May 17, 2023

View reviewed changes

amoghrajesh requested a review from uranusjr May 17, 2023 04:02

uranusjr reviewed May 19, 2023

View reviewed changes

airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved

nits from uranusjr

fb063f5

uranusjr reviewed May 19, 2023

View reviewed changes

airflow/kubernetes/kubernetes_helper_functions.py Outdated Show resolved Hide resolved

Amogh added 2 commits May 19, 2023 12:55

Simplifying logging as per uranusjr

e33e1d8

Fixing tests

8a82344

amoghrajesh requested a review from uranusjr May 19, 2023 08:02

Merge branch 'main' into k8sExecutorLogs

30a9707

uranusjr approved these changes May 22, 2023

View reviewed changes

amoghrajesh closed this May 22, 2023

amoghrajesh reopened this May 22, 2023

amoghrajesh closed this May 22, 2023

amoghrajesh reopened this May 22, 2023

amoghrajesh commented May 22, 2023

View reviewed changes

Switching to conf_vars

b481cea

Amogh added 2 commits May 23, 2023 15:54

Using cache_clear

992248c

Using cache_clear before too

dde501d

amoghrajesh added 2 commits May 23, 2023 17:07

Merge branch 'main' into k8sExecutorLogs

e541d29

Merge branch 'main' into k8sExecutorLogs

232a174

uranusjr approved these changes May 25, 2023

View reviewed changes

hussein-awala approved these changes May 25, 2023

View reviewed changes

hussein-awala merged commit 64b0872 into apache:main May 25, 2023

hussein-awala added this to the Airflow 2.7.0 milestone May 25, 2023

hussein-awala added the type:new-feature Changelog: New Features label May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding more information to kubernetes executor logs #29929

Adding more information to kubernetes executor logs #29929

amoghrajesh commented Mar 5, 2023 •

edited by hussein-awala

Loading

amoghrajesh commented Mar 20, 2023

potiuk commented Apr 4, 2023 •

edited

Loading

uranusjr May 17, 2023

amoghrajesh May 17, 2023

amoghrajesh May 17, 2023

uranusjr May 17, 2023

hussein-awala May 17, 2023

amoghrajesh May 17, 2023

uranusjr May 19, 2023

uranusjr May 19, 2023

amoghrajesh May 19, 2023 •

edited

Loading

uranusjr May 19, 2023

amoghrajesh May 19, 2023

amoghrajesh commented May 22, 2023

amoghrajesh commented May 22, 2023 •

edited

Loading

amoghrajesh May 22, 2023

uranusjr May 23, 2023

amoghrajesh May 23, 2023

amoghrajesh commented May 23, 2023

uranusjr commented May 23, 2023 •

edited

Loading

amoghrajesh commented May 23, 2023

amoghrajesh commented May 23, 2023

Adding more information to kubernetes executor logs #29929

Adding more information to kubernetes executor logs #29929

Conversation

amoghrajesh commented Mar 5, 2023 • edited by hussein-awala Loading

amoghrajesh commented Mar 20, 2023

potiuk commented Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amoghrajesh May 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amoghrajesh commented May 22, 2023

amoghrajesh commented May 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amoghrajesh commented May 23, 2023

uranusjr commented May 23, 2023 • edited Loading

amoghrajesh commented May 23, 2023

amoghrajesh commented May 23, 2023

amoghrajesh commented Mar 5, 2023 •

edited by hussein-awala

Loading

potiuk commented Apr 4, 2023 •

edited

Loading

amoghrajesh May 19, 2023 •

edited

Loading

amoghrajesh commented May 22, 2023 •

edited

Loading

uranusjr commented May 23, 2023 •

edited

Loading