New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22763][Core]SHS: Ignore unknown events and parse through the file #19953
Conversation
logWarning(s"Dropped incompatible Structured Streaming log: $currentLine") | ||
case _: ClassNotFoundException | _: UnrecognizedPropertyException => | ||
// Ignore unknown events or unrecognized properties, parse through the event log file. | ||
logWarning(s"Drop incompatible event log: $currentLine") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the case before, but isn't this going to spam the SHS log? Perhaps only log this once per log file being replayed, or per class, or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it configurable?
I think both behaviors make sense in different use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log subsequent errors at debug level, instead of adding a new config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just afraid some events are not correctly parsed and then silently ignored.
What I am proposing here is not related to the log level. I think we can have something like, zero tolerance policy, event level, and file level with a whitelist of events we can ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're suggesting to restore the previous version that had a whitelist and extend it so that the behavior is configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can extend the existing whitelist mechanism and make it configurable too. At the same time, we can make the tolerance level configurable. Just want to make it more configurable. Also keep it strict during the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just log it once and saying unrecognized event found in file xxx
Test build #84779 has finished for PR 19953 at commit
|
@vanzin @gatorsmile @cloud-fan Thanks for the comments. |
LGTM |
LGTM pending Jenkins |
Test build #84832 has finished for PR 19953 at commit
|
Test build #84833 has finished for PR 19953 at commit
|
retest this please |
Test build #84844 has finished for PR 19953 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #84853 has finished for PR 19953 at commit
|
retest this please |
Test build #84868 has finished for PR 19953 at commit
|
Thanks! Merged to master. |
What changes were proposed in this pull request?
While spark code changes, there are new events in event log: #19649
And we used to maintain a whitelist to avoid exceptions: #15663
Currently Spark history server will stop parsing on unknown events or unrecognized properties. We may still see part of the UI data.
For better compatibility, we can ignore unknown events and parse through the log file.
How was this patch tested?
Unit test