Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 7, 2025

What changes were proposed in this pull request?

This PR aims to se SparkStreamUtils.toString instead of CharStreams.toString.

Why are the changes needed?

SparkStreamUtils.toString is faster than CharStreams.toString.

scala> spark.time(org.apache.spark.util.SparkStreamUtils.toString(new java.io.FileInputStream("/tmp/1G.bin")).length)
Time taken: 322 ms
val res0: Int = 1073741824

scala> spark.time(com.google.common.io.CharStreams.toString(new java.io.InputStreamReader(java.nio.file.Files.newInputStream(Path.of("/tmp/1G.bin")))).length)
Time taken: 533 ms
val res1: Int = 1073741824

Does this PR introduce any user-facing change?

No behavior change.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this test PR, @peter-toth ?

@dongjoon-hyun
Copy link
Member Author

Thank you always, @peter-toth !

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.1.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-53179 branch August 7, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants