New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "[SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for StreamingQueryProgressWrapper" #39416
Conversation
…StreamingQueryProgressWrapper" This reverts commit 915e9c6.
I am running end-to-end tests from #39415 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clean revert, right?
@dongjoon-hyun yes it is |
timestamp = process.getTimestamp, | ||
batchId = process.getBatchId, | ||
batchDuration = process.getBatchDuration, | ||
durationMs = process.getDurationMsMap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about change line 70 to
durationMs = new JHashMap(process.getDurationMsMap)
then UISeleniumWithRocksDBBackendSuite
can pass. What are we worried about? The change performance is poor? Or other problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 4 ju.map
in the whole object..and for the string fields some of them are nullable...
I feel that it is risky for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LuciferYang If you are confident about this one. Feel free to add the serializer again after the tests are merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I am fine to revert this one first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are risks. I think it is better to use default JSON serializer for this class for Spark 3.4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
@dongjoon-hyun @LuciferYang Thanks for the review. |
…ocessSummary ### What changes were proposed in this pull request? After revisiting #39416 and #39623, I propose: * checking nullability of all string fields to avoid NPE * using `optional string` for the protobuf definition of all string fields. If the deserialized result is None, then set the string field as null. Take `AccumulableInfo` as an example, it can be null on created: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L744 The null value can make difference in the UI logic: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L791 This PR updates the developer guide and introduces utility methods for serializing/deserializing string fields. It also handles null string values in AccumulableInfo and ProcessSummary for setting an example. ### Why are the changes needed? * update developer guide for better handling of null string values * add utility methods for future development of string serialization/deserialization * Properly handles null string values in AccumulableInfo and ProcessSummary for setting an example ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test null string input values in AccumulableInfo and ProcessSummary protobuf serializer. Closes #39666 from gengliangwang/fixAcc. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
This reverts commit 915e9c6.
Why are the changes needed?
When running end-to-end tests, there are 5 NPE errors from string fields:
After fixing them, there is following error:
The deserialized map
StreamingQueryProgress.durationMs
needs to be mutable.Give the StreamingQueryProgressWrapper contains nullable fields and mutable map, I suggest using the default JSON serailizer for this class.
Does this PR introduce any user-facing change?
No
How was this patch tested?
GA tests