[HUDI-5484] Avoid using GenericRecord in HoodieColumnStatMetadata#7573
Merged
alexeykudinkin merged 7 commits intoapache:masterfrom Jan 9, 2023
Merged
[HUDI-5484] Avoid using GenericRecord in HoodieColumnStatMetadata#7573alexeykudinkin merged 7 commits intoapache:masterfrom
GenericRecord in HoodieColumnStatMetadata#7573alexeykudinkin merged 7 commits intoapache:masterfrom
Conversation
Contributor
Author
|
@hudi-bot run azure |
Contributor
Author
|
AVRO-2377 1.9.2 Modified the type of SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter |
cxzl25
commented
Jan 3, 2023
| } | ||
|
|
||
| @Test | ||
| public void testSerHoodieMetadataPayload() throws IOException { |
Contributor
Author
There was a problem hiding this comment.
mvn test -Punit-tests -pl hudi-common -am -B -DfailIfNoTests=false -Dtest=TestSerializationUtils -Pspark3.2
Contributor
Author
|
@hudi-bot run azure |
1 similar comment
Contributor
Author
|
@hudi-bot run azure |
Contributor
|
@cxzl25 please update the issue with the description of the root-cause as well |
GenericRecord in HoodieColumnStatMetadata
| } | ||
|
|
||
| @Test | ||
| public void testSerHoodieMetadataPayload() throws IOException { |
Contributor
There was a problem hiding this comment.
Let's move this test to hudi-spark module to make sure it's being run against every Spark version
| .setColumnName((String) columnStatsRecord.get(COLUMN_STATS_FIELD_COLUMN_NAME)) | ||
| .setMinValue(columnStatsRecord.get(COLUMN_STATS_FIELD_MIN_VALUE)) | ||
| .setMaxValue(columnStatsRecord.get(COLUMN_STATS_FIELD_MAX_VALUE)) | ||
| .setMinValue(wrapStatisticValue(unwrapStatisticValueWrapper(columnStatsRecord.get(COLUMN_STATS_FIELD_MIN_VALUE)))) |
Contributor
There was a problem hiding this comment.
Let's add a comment explaining why we need to do that here
Contributor
Author
|
@hudi-bot run azure |
alexeykudinkin
approved these changes
Jan 9, 2023
Contributor
|
Thank you very much for fixing this @cxzl25! |
fengjian428
pushed a commit
to fengjian428/hudi
that referenced
this pull request
Jan 31, 2023
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
nsivabalan
pushed a commit
to nsivabalan/hudi
that referenced
this pull request
Mar 22, 2023
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
fengjian428
pushed a commit
to fengjian428/hudi
that referenced
this pull request
Apr 5, 2023
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.
flashJd
pushed a commit
to flashJd/hudi
that referenced
this pull request
May 5, 2023
…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later. (cherry picked from commit 6727519)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Logs
Avoid using GenericRecord in ColumnStatMetadata.
HoodieMetadataPayloadis constructed usingGenericRecordwith reflection, andcolumnStatMetadatastoresminValueandmaxValue, both of which areGenericRecordtypes.Once spill is generated, kryo deserialization fails.
Root cause
AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet.
https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483
SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2.
As a result, Hudi may encounter
UnsupportedOperationExceptionwhen running Spark3.2.0 or later.Fail log
construct HoodieMetadataPayload
Impact
cause write failure
Risk level (write none, low medium or high below)
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist