Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40912][CORE]Overhead of Exceptions in KryoDeserializationStream #38428

Closed
wants to merge 16 commits into from

Conversation

eejbyfeldt
Copy link
Contributor

What changes were proposed in this pull request?

This PR avoid exceptions in the implementation of KryoDeserializationStream.

Why are the changes needed?

Using an exceptions for end of stream is slow, especially for small streams. It also problematic as it the exception caught in the KryoDeserializationStream could also be caused by corrupt data which would just be ignored in the current implementation.

Does this PR introduce any user-facing change?

Yes, it changes so some method on KryoDeserializationStream no longer raises EOFException.

How was this patch tested?

Existing tests.

This PR only changes KryoDeserializationStream as a proof of concept. If this is the direction we want to go we should probably change DerserializationStream isntead so that the interface is consistent.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself looks promising, thanks for working on it @eejbyfeldt !
Given DeserializationStream is a public api, I would want to be more conservative in changes to it. Let us see how to proposal develops.

final override def asKeyValueIterator: Iterator[(Any, Any)] = new NextIterator[(Any, Any)] {
override protected def getNext() = {
if (KryoDeserializationStream.this.hasNext) {
(readKey[Any](), readValue[Any]())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are fix this, not make assumptions that if key is present, value will be as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that if only a key exist we just ignore it like the current implementation would?

Copy link
Contributor

@mridulm mridulm Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or potentially do something better.

  if (hasNext) {
    val key = readKey()
    if (hasNext) {
      return (key, readValue())
    }
  }

But given this is corner case enough, I would consider this change mostly a nit.

case e: KryoException
if e.getMessage.toLowerCase(Locale.ROOT).contains("buffer underflow") =>
throw new EOFException
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserve this even with the proposed change of checking eof - to continue catching cases where EOF is encountered prematurely ?
This will be mainly to handle abnormal cases, instead of the common case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will add it back. I think that catching and ignoring the exceptions here should be revisited in some other change as it seems to me like it could case dataloss that we just assume the exception here means EOF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We should investigate that - but let us do it separately from this PR, since this change will be beneficial even without that.

@eejbyfeldt eejbyfeldt changed the title [SPARK-40912][CORE][WIP] Overhead of Exceptions in DeserializationStream [SPARK-40912][CORE][WIP] Overhead of Exceptions in KryoDeserializationStream Nov 1, 2022
@mridulm
Copy link
Contributor

mridulm commented Nov 2, 2022

The PR as such looks reasonable to me - can we add a test to explicitly test for EOF behavior ?

+CC @JoshRosen who had worked on this in the distant past :-)
+CC @Ngone51

I want to make sure there are more eyes on this change.

@mridulm
Copy link
Contributor

mridulm commented Jan 10, 2023

Want to see if we can make this for 3.4 - more eyes on it would be good.

+CC @Ngone51, @srowen, @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @mridulm .
Also, cc @sunchao too.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rebase to master branch and run the microbenchmarks?

We run the benchmark via GitHub Action. Please see Running benchmarks in your forked repository section of our developer guide, @eejbyfeldt .

@github-actions github-actions bot removed the DSTREAM label Jan 11, 2023
@eejbyfeldt eejbyfeldt changed the title [SPARK-40912][CORE][WIP] Overhead of Exceptions in KryoDeserializationStream [SPARK-40912][CORE]Overhead of Exceptions in KryoDeserializationStream Jan 11, 2023
@eejbyfeldt
Copy link
Contributor Author

eejbyfeldt commented Jan 11, 2023

So I ran the benchmark:

================================================================================================
Benchmark Kryo Unsafe vs safe Serialization
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark Kryo Unsafe vs safe Serialization:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
basicTypes: Int with unsafe:true                       224            233           9          4.5         223.9       1.0X
basicTypes: Long with unsafe:true                      253            255           3          4.0         252.7       0.9X
basicTypes: Float with unsafe:true                     237            240           3          4.2         237.1       0.9X
basicTypes: Double with unsafe:true                    237            240           3          4.2         237.1       0.9X
Array: Int with unsafe:true                              4              5           0        235.5           4.2      52.7X
Array: Long with unsafe:true                             7              7           0        149.6           6.7      33.5X
Array: Float with unsafe:true                            4              4           0        247.3           4.0      55.4X
Array: Double with unsafe:true                           7              7           0        143.0           7.0      32.0X
Map of string->Double  with unsafe:true                 41             41           1         24.4          41.0       5.5X
basicTypes: Int with unsafe:false                      257            260           4          3.9         256.7       0.9X
basicTypes: Long with unsafe:false                     279            281           2          3.6         279.1       0.8X
basicTypes: Float with unsafe:false                    252            256           3          4.0         251.9       0.9X
basicTypes: Double with unsafe:false                   260            261           2          3.8         260.0       0.9X
Array: Int with unsafe:false                            24             25           0         41.3          24.2       9.2X
Array: Long with unsafe:false                           33             34           0         30.0          33.3       6.7X
Array: Float with unsafe:false                           9              9           0        109.3           9.2      24.5X
Array: Double with unsafe:false                         16             16           0         63.3          15.8      14.2X
Map of string->Double  with unsafe:false                42             43           1         23.7          42.2       5.3X

This is seems to be within the stdev of what we have in the master branch. Which is expected since this code does not use the interface that that uses the deserialization stream iterators. (And it used the same cpu)

For KryoSerializerBenchmark the branch had:

================================================================================================
Benchmark KryoPool vs old"pool of 1" implementation
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark KryoPool vs old"pool of 1" implementation:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-----------------------------------------------------------------------------------------------------------------------------------
KryoPool:true                                                 8289          10867         NaN          0.0    16577450.0       1.0X
KryoPool:false                                               12592          15035         NaN          0.0    25184133.2       0.7X

This is a slower than what we have on master:

OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Benchmark KryoPool vs old"pool of 1" implementation:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-----------------------------------------------------------------------------------------------------------------------------------
KryoPool:true                                                 7098           8972         NaN          0.0    14196810.5       1.0X
KryoPool:false                                               10232          11945         744          0.0    20464754.5       0.7X

But it ran with a different (slower) cpu then master. I also ran the benchmark on current master

OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Benchmark KryoPool vs old"pool of 1" implementation:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-----------------------------------------------------------------------------------------------------------------------------------
KryoPool:true                                                 9646          13191         NaN          0.0    19292739.3       1.0X
KryoPool:false                                               14323          17933         433          0.0    28645212.3       0.7X

There it was even slower since it used an even slower cpu.

Is there someway to better control the cpu used? Or should I just run the benchmark a couple of times?

@eejbyfeldt
Copy link
Contributor Author

I ran the master branch again and used an executor with the same cpu.

OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark KryoPool vs old"pool of 1" implementation:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-----------------------------------------------------------------------------------------------------------------------------------
KryoPool:true                                                 9375          12171         NaN          0.0    18750400.9       1.0X
KryoPool:false                                               13849          16799         NaN          0.0    27697646.0       0.7X

Based on this it looks like the branch is branch might is a bit faster. But I think it might also be in noise territory and that one would need a more specific benchmark that creates a lot of small streams for the difference to show up. I think it only expected to be order of percent better in the "worst case" when we are creating lots of small streams.

@dongjoon-hyun
Copy link
Member

Thank you for sharing the result. Without a clear win, it's hard for us to accept this proposal because this is one of the crucial part.

  • Could you add a benchmark for your specific cases (lots of small streams)?
  • If there is no regression in the existing benchmarks, your new benchmark can provide us more explicit evidence of this PR's contribution and help us to build a consensus on this direction.

@eejbyfeldt
Copy link
Contributor Author

The PR as such looks reasonable to me - can we add a test to explicitly test for EOF behavior ?

@mridulm I added a spec for this in: 77e616a

Could you add a benchmark for your specific cases (lots of small streams)?

Added a benchmark that shows that there is overhead in using asIterator.toArray compared to just reading the number expected elements in the current master that goes away in this branch. Add results from master with benchmark added (this branch: https://github.com/eejbyfeldt/spark/tree/SPARK-40912-only-adding-benchmark) in 7580633 and then overwrote them in with this branch in bc011c6

@eejbyfeldt eejbyfeldt requested review from mridulm and dongjoon-hyun and removed request for mridulm and dongjoon-hyun January 16, 2023 12:15
@srowen
Copy link
Member

srowen commented May 5, 2023

Looks OK to me; @mridulm ?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you run the benchmark once more? The generated files look wrong to me because the Java version is downgraded.

- OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
- Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+ OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1031-azure
+ Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz

According to the document, 11.0.18+10 (default) is supposed to be there.

@mridulm
Copy link
Contributor

mridulm commented May 6, 2023

Looks fine to me.
We can merge once @dongjoon-hyun's comment is addressed.

@eejbyfeldt
Copy link
Contributor Author

My plan was to update the benchmarks. But I did not get around to uploading the results until today. But now the branch should be updated with an up to date run.

@srowen srowen closed this in 4def99d May 10, 2023
@srowen
Copy link
Member

srowen commented May 10, 2023

Merged to master

val name = "Benchmark of kryo asIterator on deserialization stream"
runBenchmark(name) {
val benchmark = new Benchmark(name, N, 10, output = output)
Seq(true, false).map(useIterator => run(useIterator, benchmark))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should use .foreach instead of .map

val elements = Array.fill[T](elementCount)(createElement)

benchmark.addCase(
s"Colletion of $name with $elementCount elements, useIterator: $useIterator") { _ =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Colletion -> Collection

useIterator: Boolean,
ser: SerializerInstance): Int = {
val serialized: Array[Byte] = {
val baos = new ByteArrayOutputStream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial size of ByteArrayOutputStream is 32. Will the grow of underlying byte[] and GC affect the test results? If so, is it possible to estimate a reasonable initial size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC will/might make the benchmark more noisy but it should not introduce any bias?

I guess choosing a bigger initial size will reduce the issue for some of the benchmark for some of the cases as it will not need to resize, but I can not see any simple way to estimate the total size in general. But maybe using a bigger initial size is better/good enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants