Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "[SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for StreamingQueryProgressWrapper" #39416

Closed
wants to merge 1 commit into from

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

This reverts commit 915e9c6.

Why are the changes needed?

When running end-to-end tests, there are 5 NPE errors from string fields:

  • SourceProgress.latestOffset
  • SourceProgress.endOffset
  • SourceProgress.startOffset
  • StreamingQueryData.name
  • StreamingQueryProgress.name

After fixing them, there is following error:

java.lang.UnsupportedOperationException
	at java.base/java.util.Collections$UnmodifiableMap.remove(Collections.java:1460)
	at org.apache.spark.sql.streaming.ui.StreamingQueryStatisticsPage.$anonfun$generateStatTable$27(StreamingQueryStatisticsPage.scala:401)

The deserialized map StreamingQueryProgress.durationMs needs to be mutable.

Give the StreamingQueryProgressWrapper contains nullable fields and mutable map, I suggest using the default JSON serailizer for this class.

Does this PR introduce any user-facing change?

No

How was this patch tested?

GA tests

…StreamingQueryProgressWrapper"

This reverts commit 915e9c6.
@gengliangwang
Copy link
Member Author

@gengliangwang
Copy link
Member Author

I am running end-to-end tests from #39415

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clean revert, right?

@gengliangwang
Copy link
Member Author

@dongjoon-hyun yes it is

timestamp = process.getTimestamp,
batchId = process.getBatchId,
batchDuration = process.getBatchDuration,
durationMs = process.getDurationMsMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change line 70 to

durationMs = new JHashMap(process.getDurationMsMap)

then UISeleniumWithRocksDBBackendSuite can pass. What are we worried about? The change performance is poor? Or other problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4 ju.map in the whole object..and for the string fields some of them are nullable...
I feel that it is risky for this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang If you are confident about this one. Feel free to add the serializer again after the tests are merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am fine to revert this one first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are risks. I think it is better to use default JSON serializer for this class for Spark 3.4.0

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@gengliangwang
Copy link
Member Author

@dongjoon-hyun @LuciferYang Thanks for the review.
Merging to master

gengliangwang added a commit that referenced this pull request Jan 20, 2023
…ocessSummary

### What changes were proposed in this pull request?

After revisiting #39416 and #39623, I propose:
* checking nullability of all string fields to avoid NPE
* using `optional string` for the protobuf definition of all string fields. If the deserialized result is None, then set the string field as null.

Take `AccumulableInfo` as an example, it can be null on created: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L744
The null value can make difference in the UI logic: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala#L791

This PR updates the developer guide and introduces utility methods for serializing/deserializing string fields. It also handles null string values in AccumulableInfo and ProcessSummary for setting an example.
### Why are the changes needed?

* update developer guide for better handling of null string values
* add utility methods for future development of string serialization/deserialization
* Properly handles null string values in AccumulableInfo and ProcessSummary for setting an example

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test null string input values in AccumulableInfo and ProcessSummary protobuf serializer.

Closes #39666 from gengliangwang/fixAcc.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants