Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42137][CORE] Enable spark.kryo.unsafe by default #39679

Closed
wants to merge 1 commit into from
Closed

[SPARK-42137][CORE] Enable spark.kryo.unsafe by default #39679

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 20, 2023

What changes were proposed in this pull request?

This PR aims to enable spark.kryo.unsafe by default at Apache Spark 3.4. This configuration was introduced at Apache Spark 2.1.0 and has been used.

val KRYO_USE_UNSAFE = ConfigBuilder("spark.kryo.unsafe")
.version("2.1.0")
.booleanConf
.createWithDefault(false)

Note that this is used only inside KryoSerializer class which is still disabled by default. New behavior will be used only when the user sets KryoSerializer explicitly.

Why are the changes needed?

To help a user use KroSerializer more easily by setting spark.serializer=org.apache.spark.serializer.KryoSerializer.

Apache Spark provides the benchmark result already.

================================================================================================
Benchmark Kryo Unsafe vs safe Serialization
================================================================================================
OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Benchmark Kryo Unsafe vs safe Serialization: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
basicTypes: Int with unsafe:true 243 250 4 4.1 242.9 1.0X
basicTypes: Long with unsafe:true 281 283 2 3.6 280.9 0.9X
basicTypes: Float with unsafe:true 282 283 2 3.5 282.0 0.9X
basicTypes: Double with unsafe:true 289 290 1 3.5 289.2 0.8X
Array: Int with unsafe:true 3 3 0 343.7 2.9 83.5X
Array: Long with unsafe:true 4 5 0 229.3 4.4 55.7X
Array: Float with unsafe:true 3 3 0 343.5 2.9 83.5X
Array: Double with unsafe:true 4 5 0 229.2 4.4 55.7X
Map of string->Double with unsafe:true 36 37 0 27.7 36.1 6.7X
basicTypes: Int with unsafe:false 306 309 4 3.3 306.0 0.8X
basicTypes: Long with unsafe:false 323 325 1 3.1 323.3 0.8X
basicTypes: Float with unsafe:false 299 300 1 3.3 299.1 0.8X
basicTypes: Double with unsafe:false 313 315 1 3.2 313.4 0.8X
Array: Int with unsafe:false 20 20 0 50.5 19.8 12.3X
Array: Long with unsafe:false 29 30 0 34.1 29.4 8.3X
Array: Float with unsafe:false 8 8 0 130.4 7.7 31.7X
Array: Double with unsafe:false 13 13 0 75.0 13.3 18.2X
Map of string->Double with unsafe:false 39 39 0 25.8 38.8 6.3X

Does this PR introduce any user-facing change?

No. This is still behind spark.serializer configuration which is unchanged.

How was this patch tested?

Pass the CIs.

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review January 20, 2023 22:22
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you review this, @mridulm ?

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @mridulm !

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 21, 2023

Merged to master for Apache Spark 3.4.0.
The one pyspark pipeline seems to slow, but it's irrelevant to this PR and verified here before.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-42137 branch January 21, 2023 01:36
@LuciferYang
Copy link
Contributor

late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants