Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Remove ImmutableMap from SparkPartition for Kryo SerDe #3667

Merged

Conversation

kbendick
Copy link
Contributor

@kbendick kbendick commented Dec 4, 2021

This PR replaces #3597, as @dubeme has not responded about some minor changes and we need this patch prior to the upcoming 0.13.0 release.

I have added them as a co-author on this PR so that they still get credit for their contribution 😄

Issue #3586 shows situation where Kyro is unable to serialize the SparkPartition object (@rdblue points it out from the stack trace). This commit replaces the ImmutableMap with HashMap and adds a test for round trip Kryo serialization and Java serialization for SparkPartition objects.

I also tested this by running the add_files test suite with Kryo serialization enabled. The tests failed until this patch was applied. Unfortunately, I couldn't add a test there as the usual withSQLConf method did not pick up the change in spark.serializer.

Co-authored-by: Dubem Enyekwe dubem@dubemenyekwe.com

cc @rdblue @jackye1995 @dubeme

@github-actions github-actions bot added the spark label Dec 4, 2021
@kbendick kbendick added this to the Iceberg 0.13.0 Release milestone Dec 4, 2021
@kbendick kbendick force-pushed the kryo-serialization-on-spark-partition branch 2 times, most recently from 7308de4 to 396cfdb Compare December 4, 2021 02:15
@kbendick
Copy link
Contributor Author

kbendick commented Dec 4, 2021

Hmmm it seems like when I rebased, it might have removed the co-authorship from @dubeme.

Before we merge this in, can we please ensure that Co-authored-by: Dubem Enyekwe <dubem@dubemenyekwe.com> is included so they get credit?

I can update it on my end somehow if need be.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a one-line change, could you also fix the other versions?

@kbendick
Copy link
Contributor Author

kbendick commented Dec 5, 2021

Given this is a one-line change, could you also fix the other versions?

Sure thing. I almost did but I wasn’t sure where exactly we’re drawing the line. But this is small and if people need it, they need it for all versions (or can edit one or more out).

EDIT: I have covered and added basic tests for Spark 2.4, 3.0, and 3.1.

@kbendick kbendick changed the title Spark: Remove ImmutableMap from SparkPartition for Kryo SerDe (Spark 3.2) Spark: Remove ImmutableMap from SparkPartition for Kryo SerDe Dec 5, 2021
@kbendick kbendick force-pushed the kryo-serialization-on-spark-partition branch from 5d0a14b to 44af3f0 Compare December 5, 2021 04:18
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, will merge on Monday if no one objects.

@rdblue rdblue merged commit 56839c8 into apache:master Dec 5, 2021
@rdblue
Copy link
Contributor

rdblue commented Dec 5, 2021

Thanks for getting this finished, @kbendick!

@dubeme
Copy link
Contributor

dubeme commented Dec 6, 2021

Thanks @kbendick for stepping in in my absence

ajantha-bhat pushed a commit to ajantha-bhat/iceberg that referenced this pull request Dec 6, 2021
Co-authored-by: Dubem Enyekwe <dubem@dubemenyekwe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants