-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding more information to kubernetes executor logs #29929
Conversation
@jedcunningham @hussein-awala @hterik want to hear some opinions about this.
|
seems that "rebase" from UI - messed with the change (first time see smth like that) - can yoy please rebase your local change again manually @amoghrajesh and re-push it ? |
6b86e8f
to
b56c622
Compare
airflow/config_templates/config.yml
Outdated
@@ -2671,6 +2671,13 @@ kubernetes_executor: | |||
type: string | |||
example: '{ "total": 3, "backoff_factor": 0.5 }' | |||
default: "" | |||
detailed_executor_logs: | |||
description: | | |||
Flag to have more details in kubernetes executor pod logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be more specific than “more details”? What details are they? Why/when would people not want those details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr thanks for the review. All the logs present under kubernetes executor do not contain a lot of data needed to have the best debugging experience.
Example:
When trying to trace the lifecycle of a task in the kubernetes executor, you currently must search first for the name of the pod created by the task, then search for the pod name in the logs. This means you need to be pretty familiar with the structure of the scheduler logs in order to search effectively for the lifecycle of a task that had a problem.
Some log statements like Attempting to finish pod do have the annotations for the pod, which include dag name, task name, and run_id, but others do not. For instance, Event: podname-a2f2c1ac706 had an event of type DELETED has no such annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will rename the flag to something more meaningful like: traceable-executor-logs
. What do you think @uranusjr?
Or any other alternative thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about logs_task_metadata
? if you look to the added information, they describes the task and not the executor. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I agree with you @hussein-awala.
I will make the changes for this
if get_logs_task_metadata(): | ||
self.log.info( | ||
"Event: Failed to start pod %s, annotations: %s", pod_name, annotations_string | ||
) | ||
else: | ||
self.log.info("Event: Failed to start pod %s", pod_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder instead of doing if-else everywhere, if it’s better to simply do something like
if get_logs_task_metadata():
annotations_for_logging = annotations_to_str(annotations)
else:
annotations_for_logging = "<omitted>"
and always add the annotations: %s
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe do this if-else in annotations_to_st
directly? (If we give this function a better name.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo, leaving the omitted annotations would not look good in the logs when we do not have any annotations or the logs metadata flag enabled. What do you think? @uranusjr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs should prioritise simplcity over making things pretty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me address this comment in that case.
@uranusjr I have addressed your review comments, can you help in another round of review when you have some time? |
Closed pull requests to re launch the entire test suite |
def test_annotations_for_logging_task_metadata(self): | ||
annotations_test = { | ||
"dag_id": "dag", | ||
"run_id": "run_id", | ||
"task_id": "task", | ||
"try_number": "1", | ||
} | ||
with mock.patch.dict( | ||
os.environ, {"AIRFLOW__KUBERNETES_EXECUTOR__LOGS_TASK_METADATA": "True"}, clear=True | ||
): | ||
expected_annotations = { | ||
"dag_id": "dag", | ||
"run_id": "run_id", | ||
"task_id": "task", | ||
"try_number": "1", | ||
} | ||
annotations_actual = annotations_for_logging_task_metadata(annotations_test) | ||
assert annotations_actual == expected_annotations | ||
|
||
def test_annotations_for_logging_task_metadata_fallback(self): | ||
annotations_test = { | ||
"dag_id": "dag", | ||
"run_id": "run_id", | ||
"task_id": "task", | ||
"try_number": "1", | ||
} | ||
with mock.patch.dict( | ||
os.environ, {"AIRFLOW__KUBERNETES_EXECUTOR__LOGS_TASK_METADATA": "False"}, clear=True | ||
): | ||
expected_annotations = "<omitted>" | ||
annotations_actual = annotations_for_logging_task_metadata(annotations_test) | ||
assert annotations_actual == expected_annotations | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seem to work fine in my environment. Not sure how/why these are failing in the CI. Any hints @hussein-awala ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using mock.patch
, it’s more reliable to use the conf_vars
helper. You also want to patch both the true and false cases because it’s not deterministic in the CI which the config is set to when the test is run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr I didn't quite get what you meant by You also want to patch both the true and false cases because it’s not deterministic in the CI which the config is set to when the test is run.
I have patched both the test cases, True on line 1177 and False on line 1196
I tried using conf_vars instead and the tests pass for me in my dev setup but not sure why it consistently fails here. @uranusjr @hussein-awala any tips? |
When a function is decorated with |
Thank you for that pointer, @uranusjr! Great help. |
I was able to fix the tests here, @hussein-awala / @uranusjr can you help in merging this PR? Thanks! |
Kubernetes executor logs do not contain a lot of information that can be back tracked to narrow down the issue. To make debugging easier, this PR adds the key annotations to some logger lines which can be useful to debug issues quicker.
The annotations contain crucial information regarding a k8s log, for ex the structure can involve:
closes: #18329
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.