Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46205][CORE] Improve PersistenceEngine performance with KryoSerializer #44113

Closed
wants to merge 3 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Dec 1, 2023

What changes were proposed in this pull request?

This PR aims to improve PersistenceEngine performance with KryoSerializer via introducing a new configuration, spark.deploy.recoverySerializer.

Why are the changes needed?

Allow users to choose a better serializer to get a better performance in their environment. Especially, KryoSerializer is about 3x faster than JavaSerializer with FileSystemPersistenceEngine.

================================================================================================
PersistenceEngineBenchmark
================================================================================================

OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1051-azure
AMD EPYC 7763 64-Core Processor
1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------
ZooKeeperPersistenceEngine with JavaSerializer            1202           1298         138          0.0     1201614.2       1.0X
ZooKeeperPersistenceEngine with KryoSerializer             951           1004          48          0.0      950559.0       1.3X
FileSystemPersistenceEngine with JavaSerializer            212            217           6          0.0      211623.2       5.7X
FileSystemPersistenceEngine with KryoSerializer             79             81           2          0.0       79132.5      15.2X
BlackHolePersistenceEngine                                   0              0           0         30.9          32.4   37109.8X

Does this PR introduce any user-facing change?

No. The default behavior is the same.

How was this patch tested?

Pass the CIs with the new added test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Dec 1, 2023
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 1, 2023

I'm going to update the benchmark files. This is done.

if (out != null) {
out.close()
}
fileOut.close()
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member Author

Could you review this when you have a chance, @yaooqinn , @beliefer , @LuciferYang ?

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some comments.

Comment on lines +338 to +339
val e = master.invokePrivate(_persistenceEngine()).asInstanceOf[FileSystemPersistenceEngine]
assert(e.serializer.isInstanceOf[KryoSerializer])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val e = master.invokePrivate(_persistenceEngine()).asInstanceOf[FileSystemPersistenceEngine]
assert(e.serializer.isInstanceOf[KryoSerializer])
val persistenceEngine = master.invokePrivate(_persistenceEngine()).asInstanceOf[FileSystemPersistenceEngine]
assert(persistenceEngine.serializer.isInstanceOf[KryoSerializer])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you but this was intention in order to avoid Scalastyle failure. Line 338 has 99 char already.

Screenshot 2023-12-01 at 6 25 35 PM

@@ -60,15 +61,23 @@ object PersistenceEngineBenchmark extends BenchmarkBase {
runBenchmark("PersistenceEngineBenchmark") {
val benchmark = new Benchmark(s"$numWorkers Workers", numWorkers, output = output)

benchmark.addCase("ZooKeeperPersistenceEngine", numIters) { _ =>
benchmark.addCase("ZooKeeperPersistenceEngine with JavaSerializer", numIters) { _ =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Seq(serializerJava, serializerKryo).foreach { serializer =>
  benchmark.addCase("ZooKeeperPersistenceEngine with ${serializer.getClass.getSimpleName}", numIters) { _ =>
    val engine = new ZooKeeperPersistenceEngine(conf, serializer)
    ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but this is also intentional for the ordering like the following.

OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1051-azure
AMD EPYC 7763 64-Core Processor
1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------
ZooKeeperPersistenceEngine with JavaSerializer            1202           1298         138          0.0     1201614.2       1.0X
ZooKeeperPersistenceEngine with KryoSerializer             951           1004          48          0.0      950559.0       1.3X
FileSystemPersistenceEngine with JavaSerializer            212            217           6          0.0      211623.2       5.7X
FileSystemPersistenceEngine with KryoSerializer             79             81           2          0.0       79132.5      15.2X
BlackHolePersistenceEngine                                   0              0           0         30.9          32.4   37109.8X

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, for this one, it seems that I misunderstood your previous comment. Let me address it at the next PR (or a follow-up), @beliefer . Sorry for my mistake.

import org.apache.spark.util.Utils


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Sure.

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @beliefer .

Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-46205 branch December 2, 2023 02:53
dongjoon-hyun added a commit that referenced this pull request Dec 2, 2023
### What changes were proposed in this pull request?

This is a follow-up of #44113 to address a comment about simplifying the benchmark.

### Why are the changes needed?

To simplify.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.
```
$ build/sbt "core/Test/runMain org.apache.spark.deploy.master.PersistenceEngineBenchmark"
...
[info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.2
[info] Apple M1 Max
[info] 1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] ZooKeeperPersistenceEngine with JavaSerializer           12602          12847         253          0.0    12601717.5       1.0X
[info] ZooKeeperPersistenceEngine with KryoSerializer           12116          12130          13          0.0    12116373.8       1.0X
[info] FileSystemPersistenceEngine with JavaSerializer            429            435           6          0.0      429374.9      29.3X
[info] FileSystemPersistenceEngine with KryoSerializer            179            180           2          0.0      178795.7      70.5X
[info] BlackHolePersistenceEngine                                   0              0           0         46.6          21.5  587273.6X
[success] Total time: 126 s (02:06), completed Dec 1, 2023, 7:59:25 PM
```

### Was this patch authored or co-authored using generative AI tooling?

Closes #44118 from dongjoon-hyun/SPARK-46205-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
…oSerializer`

### What changes were proposed in this pull request?

This PR aims to improve `PersistenceEngine` performance with `KryoSerializer` via introducing a new configuration, `spark.deploy.recoverySerializer`.

### Why are the changes needed?

Allow users to choose a better serializer to get a better performance in their environment. Especially, `KryoSerializer` is about **3x faster** than `JavaSerializer` with `FileSystemPersistenceEngine`.

```
================================================================================================
PersistenceEngineBenchmark
================================================================================================

OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1051-azure
AMD EPYC 7763 64-Core Processor
1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------
ZooKeeperPersistenceEngine with JavaSerializer            1202           1298         138          0.0     1201614.2       1.0X
ZooKeeperPersistenceEngine with KryoSerializer             951           1004          48          0.0      950559.0       1.3X
FileSystemPersistenceEngine with JavaSerializer            212            217           6          0.0      211623.2       5.7X
FileSystemPersistenceEngine with KryoSerializer             79             81           2          0.0       79132.5      15.2X
BlackHolePersistenceEngine                                   0              0           0         30.9          32.4   37109.8X
```

### Does this PR introduce _any_ user-facing change?

No. The default behavior is the same.

### How was this patch tested?

Pass the CIs with the new added test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44113 from dongjoon-hyun/SPARK-46205.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
### What changes were proposed in this pull request?

This is a follow-up of apache#44113 to address a comment about simplifying the benchmark.

### Why are the changes needed?

To simplify.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.
```
$ build/sbt "core/Test/runMain org.apache.spark.deploy.master.PersistenceEngineBenchmark"
...
[info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.2
[info] Apple M1 Max
[info] 1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] ZooKeeperPersistenceEngine with JavaSerializer           12602          12847         253          0.0    12601717.5       1.0X
[info] ZooKeeperPersistenceEngine with KryoSerializer           12116          12130          13          0.0    12116373.8       1.0X
[info] FileSystemPersistenceEngine with JavaSerializer            429            435           6          0.0      429374.9      29.3X
[info] FileSystemPersistenceEngine with KryoSerializer            179            180           2          0.0      178795.7      70.5X
[info] BlackHolePersistenceEngine                                   0              0           0         46.6          21.5  587273.6X
[success] Total time: 126 s (02:06), completed Dec 1, 2023, 7:59:25 PM
```

### Was this patch authored or co-authored using generative AI tooling?

Closes apache#44118 from dongjoon-hyun/SPARK-46205-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Dec 7, 2023
### What changes were proposed in this pull request?

This PR aims to update `Spark Standalone` cluster recovery configurations.

### Why are the changes needed?

We need to document
- #44173
- #44129
- #44113

![Screenshot 2023-12-06 at 7 15 24 PM](https://github.com/apache/spark/assets/9700541/04f0be6f-cdfb-4d87-b1b5-c4bf131f460a)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44227 from dongjoon-hyun/SPARK-46299.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
…oSerializer`

### What changes were proposed in this pull request?

This PR aims to improve `PersistenceEngine` performance with `KryoSerializer` via introducing a new configuration, `spark.deploy.recoverySerializer`.

### Why are the changes needed?

Allow users to choose a better serializer to get a better performance in their environment. Especially, `KryoSerializer` is about **3x faster** than `JavaSerializer` with `FileSystemPersistenceEngine`.

```
================================================================================================
PersistenceEngineBenchmark
================================================================================================

OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1051-azure
AMD EPYC 7763 64-Core Processor
1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------
ZooKeeperPersistenceEngine with JavaSerializer            1202           1298         138          0.0     1201614.2       1.0X
ZooKeeperPersistenceEngine with KryoSerializer             951           1004          48          0.0      950559.0       1.3X
FileSystemPersistenceEngine with JavaSerializer            212            217           6          0.0      211623.2       5.7X
FileSystemPersistenceEngine with KryoSerializer             79             81           2          0.0       79132.5      15.2X
BlackHolePersistenceEngine                                   0              0           0         30.9          32.4   37109.8X
```

### Does this PR introduce _any_ user-facing change?

No. The default behavior is the same.

### How was this patch tested?

Pass the CIs with the new added test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44113 from dongjoon-hyun/SPARK-46205.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
### What changes were proposed in this pull request?

This is a follow-up of apache#44113 to address a comment about simplifying the benchmark.

### Why are the changes needed?

To simplify.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.
```
$ build/sbt "core/Test/runMain org.apache.spark.deploy.master.PersistenceEngineBenchmark"
...
[info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.2
[info] Apple M1 Max
[info] 1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] ZooKeeperPersistenceEngine with JavaSerializer           12602          12847         253          0.0    12601717.5       1.0X
[info] ZooKeeperPersistenceEngine with KryoSerializer           12116          12130          13          0.0    12116373.8       1.0X
[info] FileSystemPersistenceEngine with JavaSerializer            429            435           6          0.0      429374.9      29.3X
[info] FileSystemPersistenceEngine with KryoSerializer            179            180           2          0.0      178795.7      70.5X
[info] BlackHolePersistenceEngine                                   0              0           0         46.6          21.5  587273.6X
[success] Total time: 126 s (02:06), completed Dec 1, 2023, 7:59:25 PM
```

### Was this patch authored or co-authored using generative AI tooling?

Closes apache#44118 from dongjoon-hyun/SPARK-46205-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
### What changes were proposed in this pull request?

This PR aims to update `Spark Standalone` cluster recovery configurations.

### Why are the changes needed?

We need to document
- apache#44173
- apache#44129
- apache#44113

![Screenshot 2023-12-06 at 7 15 24 PM](https://github.com/apache/spark/assets/9700541/04f0be6f-cdfb-4d87-b1b5-c4bf131f460a)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44227 from dongjoon-hyun/SPARK-46299.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
…oSerializer`

### What changes were proposed in this pull request?

This PR aims to improve `PersistenceEngine` performance with `KryoSerializer` via introducing a new configuration, `spark.deploy.recoverySerializer`.

### Why are the changes needed?

Allow users to choose a better serializer to get a better performance in their environment. Especially, `KryoSerializer` is about **3x faster** than `JavaSerializer` with `FileSystemPersistenceEngine`.

```
================================================================================================
PersistenceEngineBenchmark
================================================================================================

OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1051-azure
AMD EPYC 7763 64-Core Processor
1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------
ZooKeeperPersistenceEngine with JavaSerializer            1202           1298         138          0.0     1201614.2       1.0X
ZooKeeperPersistenceEngine with KryoSerializer             951           1004          48          0.0      950559.0       1.3X
FileSystemPersistenceEngine with JavaSerializer            212            217           6          0.0      211623.2       5.7X
FileSystemPersistenceEngine with KryoSerializer             79             81           2          0.0       79132.5      15.2X
BlackHolePersistenceEngine                                   0              0           0         30.9          32.4   37109.8X
```

### Does this PR introduce _any_ user-facing change?

No. The default behavior is the same.

### How was this patch tested?

Pass the CIs with the new added test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44113 from dongjoon-hyun/SPARK-46205.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
### What changes were proposed in this pull request?

This is a follow-up of apache#44113 to address a comment about simplifying the benchmark.

### Why are the changes needed?

To simplify.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.
```
$ build/sbt "core/Test/runMain org.apache.spark.deploy.master.PersistenceEngineBenchmark"
...
[info] OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Mac OS X 14.2
[info] Apple M1 Max
[info] 1000 Workers:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -------------------------------------------------------------------------------------------------------------------------------
[info] ZooKeeperPersistenceEngine with JavaSerializer           12602          12847         253          0.0    12601717.5       1.0X
[info] ZooKeeperPersistenceEngine with KryoSerializer           12116          12130          13          0.0    12116373.8       1.0X
[info] FileSystemPersistenceEngine with JavaSerializer            429            435           6          0.0      429374.9      29.3X
[info] FileSystemPersistenceEngine with KryoSerializer            179            180           2          0.0      178795.7      70.5X
[info] BlackHolePersistenceEngine                                   0              0           0         46.6          21.5  587273.6X
[success] Total time: 126 s (02:06), completed Dec 1, 2023, 7:59:25 PM
```

### Was this patch authored or co-authored using generative AI tooling?

Closes apache#44118 from dongjoon-hyun/SPARK-46205-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
### What changes were proposed in this pull request?

This PR aims to update `Spark Standalone` cluster recovery configurations.

### Why are the changes needed?

We need to document
- apache#44173
- apache#44129
- apache#44113

![Screenshot 2023-12-06 at 7 15 24 PM](https://github.com/apache/spark/assets/9700541/04f0be6f-cdfb-4d87-b1b5-c4bf131f460a)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#44227 from dongjoon-hyun/SPARK-46299.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Apr 16, 2024
### What changes were proposed in this pull request?

This is a logical revert of SPARK-46205
- #44113
- #44118

### Why are the changes needed?

The initial implementation didn't handle the class initialization logic properly.
Until we have a fix, I'd like to revert this from `master` branch.

### Does this PR introduce _any_ user-facing change?

No, this is not released yet.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46087 from dongjoon-hyun/SPARK-47875.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants