New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47602][CORE][K8S][FOLLOWUP] Improve structure logging for isExecutorIdleTimedOut #45849
base: master
Are you sure you want to change the base?
Conversation
…ecutorIdleTimedOut
@@ -97,20 +97,23 @@ trait Logging { | |||
} | |||
|
|||
implicit class LogStringContext(val sc: StringContext) { | |||
def log(args: MDC*): MessageWithContext = { | |||
def log(args: Any*): MessageWithContext = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gengliangwang I'm not sure if there were discussions on the following case:
logInfo(log"MDC(LogKey.ABC, some_string) ${an_object} something")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the above case has already been discussed
#45813
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, great, let hold on this one for a while
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's make every variable MDC for now. We can decide which context keys to show in the context by default once we got all the log keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing a log4j configuration will be much easier than debating in logging entries of the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make every variable MDC for now
if so, this might not be a good case for the 1st round migration, IMO it's critical to inject a string variable(a large JSON-like pod spec) to the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pan3793 ok, we can skip this one
@@ -31,7 +31,8 @@ object LogKey extends Enumeration { | |||
val MAX_SIZE = Value | |||
val MIN_SIZE = Value | |||
val REMOTE_ADDRESS = Value | |||
val POD_ID = Value | |||
val KUBERNETES_NAMESPACE = Value | |||
val KUBERNETES_POD_NAME = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some thoughts for the KUBERNETES_
prefix, POD_ID
should be identical but NAMESPACE
does not.
for CONTAINER_ID
, maybe we should change it to YARN_CONTAINER_ID
too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KUBERNETES
-> K8S
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I mentioned this in #45862
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for providing a Guideline
logError(log"Cannot get the creationTimestamp of the pod " + | ||
log"${MDC(LogKey.KUBERNETES_POD_NAME, state.pod.getMetadata.getName)} in namespace " + | ||
log"${MDC(LogKey.KUBERNETES_NAMESPACE, state.pod.getMetadata.getNamespace)}. " + | ||
log"Resource details: ${state.pod}", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will state.pod
include state.pod.getMetadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, state.pod
will output a large JSON-like string, it is not suitable for LogKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pod
public String toString() {
return "Pod(apiVersion=" + this.getApiVersion() + ", kind=" + this.getKind() + ", metadata=" + this.getMetadata() + ", spec=" + this.getSpec() + ", status=" + this.getStatus() + ", additionalProperties=" + this.getAdditionalProperties() + ")";
}
ObjectMeta
public String toString() {
return "ObjectMeta(annotations=" + this.getAnnotations() + ", creationTimestamp=" + this.getCreationTimestamp() + ", deletionGracePeriodSeconds=" + this.getDeletionGracePeriodSeconds() + ", deletionTimestamp=" + this.getDeletionTimestamp() + ", finalizers=" + this.getFinalizers() + ", generateName=" + this.getGenerateName() + ", generation=" + this.getGeneration() + ", labels=" + this.getLabels() + ", managedFields=" + this.getManagedFields() + ", name=" + this.getName() + ", namespace=" + this.getNamespace() + ", ownerReferences=" + this.getOwnerReferences() + ", resourceVersion=" + this.getResourceVersion() + ", selfLink=" + this.getSelfLink() + ", uid=" + this.getUid() + ", additionalProperties=" + this.getAdditionalProperties() + ")";
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gengliangwang
I'm not sure if the current modification violates the original intention of the log
, perhaps we just need a simple POD_ID
-> POD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes a simple POD_ID -> POD sounds good. Having the extra POD name nad namespace in the log for quick search is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the K8s docs and found there is a UID concept, but TBH, it's rare to use it ...
Each object in your cluster has a Name that is unique for that type of resource. Every Kubernetes object also has a UID that is unique across your whole cluster.
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/
the most common cases to identify a pod are
kubectl get pod <pod-name> --namespace <namespace>
kubectl describe pod <pod-name> --namespace <namespace>
What changes were proposed in this pull request?
Improve structure logging for isExecutorIdleTimedOut
Why are the changes needed?
#45808 (comment)
Does this PR introduce any user-facing change?
Yes, it improves error log message.
How was this patch tested?
Pass GA
Was this patch authored or co-authored using generative AI tooling?
No