Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 #32826

Closed
wants to merge 4 commits into from

Conversation

dchristle
Copy link
Contributor

@dchristle dchristle commented Jun 8, 2021

What changes were proposed in this pull request?

This PR aims to upgrade zstd-jni to 1.5.0-2, which uses zstd version 1.5.0.

Why are the changes needed?

Major improvements to Zstd support are targeted for the upcoming 3.2.0 release of Spark. Zstd 1.5.0 introduces significant compression (+25% to 140%) and decompression (~15%) speed improvements in benchmarks described in more detail on the releases page:

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Build passes build tests, but the benchmark tests seem flaky. I am unsure if this change is responsible. The error is:

Running org.apache.spark.rdd.CoalescedRDDBenchmark:
21/06/08 18:53:10 ERROR SparkContext: Failed to add file:/home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar to Spark environment
java.lang.IllegalArgumentException: requirement failed: File spark-core_2.12-3.2.0-SNAPSHOT-tests.jar was already registered with a different path (old path = /home/runner/work/spark/spark/core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar, new path = /home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar

https://github.com/dchristle/spark/runs/2776123749?check_suite_focus=true

cc: @dongjoon-hyun

@github-actions github-actions bot added the BUILD label Jun 8, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. BTW, do you have a chance to test this with Apache Parquet/Avro/Kafka first project by project? Historically, we hit several incompatibility issue.

pom.xml Outdated Show resolved Hide resolved
@dongjoon-hyun
Copy link
Member

BTW, @dchristle . I quickly ran the micro benchmark on master branch on my Mac before and after. We also need to check the memory usage, too.

 OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            173            348         161          0.1       17267.7       1.0X
-Compression 10000 times at level 2 without buffer pool            616            669          57          0.0       61574.1       0.3X
-Compression 10000 times at level 3 without buffer pool           1302           1327          35          0.0      130234.4       0.1X
-Compression 10000 times at level 1 with buffer pool               177            192           9          0.1       17709.8       1.0X
-Compression 10000 times at level 2 with buffer pool               670            709          36          0.0       66965.9       0.3X
-Compression 10000 times at level 3 with buffer pool              1201           1209          11          0.0      120144.2       0.1X
+Compression 10000 times at level 1 without buffer pool            271            348          75          0.0       27106.5       1.0X
+Compression 10000 times at level 2 without buffer pool            655            720          59          0.0       65510.4       0.4X
+Compression 10000 times at level 3 without buffer pool            908            963          73          0.0       90777.5       0.3X
+Compression 10000 times at level 1 with buffer pool               181            194          11          0.1       18089.8       1.5X
+Compression 10000 times at level 2 with buffer pool               489            531          48          0.0       48917.8       0.6X
+Compression 10000 times at level 3 with buffer pool               859            948         108          0.0       85869.0       0.3X

 OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool            422            441          12          0.0       42239.7       1.0X
-Decompression 10000 times from level 2 without buffer pool            433            460          19          0.0       43342.9       1.0X
-Decompression 10000 times from level 3 without buffer pool           1241           1306          93          0.0      124091.5       0.3X
-Decompression 10000 times from level 1 with buffer pool               373            387          12          0.0       37268.1       1.1X
-Decompression 10000 times from level 2 with buffer pool               383            387           3          0.0       38296.0       1.1X
-Decompression 10000 times from level 3 with buffer pool              1116           1168          73          0.0      111603.1       0.4X
+Decompression 10000 times from level 1 without buffer pool            464            487          21          0.0       46420.0       1.0X
+Decompression 10000 times from level 2 without buffer pool            436            452          11          0.0       43596.2       1.1X
+Decompression 10000 times from level 3 without buffer pool           1308           1345          53          0.0      130781.6       0.4X
+Decompression 10000 times from level 1 with buffer pool               379            392          11          0.0       37895.4       1.2X
+Decompression 10000 times from level 2 with buffer pool               380            396          12          0.0       38038.2       1.2X
+Decompression 10000 times from level 3 with buffer pool               909           1001         130          0.0       90861.3       0.5X

@dchristle
Copy link
Contributor Author

dchristle commented Jun 9, 2021

@dongjoon-hyun That's a good suggestion. I have three PRs to update Kafka, Parquet, and ORC:

apache/kafka#10847

apache/parquet-java#914

apache/orc#715

They appear to pass their respective CIs. I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 9, 2021

Thank you for your efforts. BTW, @dchristle . Please note that your ORC PR is not about ZSTD-JNI. It's native ZSTD library only. I commented on your ORC PR about the difference.

For the following, I saw Kafka failures.

They appear to pass their respective CIs.

Screen Shot 2021-06-08 at 11 16 21 PM

No worry~ For Apache Avro, they have a dependency bot. I guess they will catch up soon. Let's wait and see their activity.

I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

In addition, all libraries should be synced inside Apache Spark because Apache Spark is using everything.

@dchristle
Copy link
Contributor Author

Thank you for your efforts. BTW, @dchristle . Please note that your ORC PR is not about ZSTD-JNI. It's native ZSTD library only. I commented on your ORC PR about the difference.

For the following, I saw Kafka failures.

They appear to pass their respective CIs.

Screen Shot 2021-06-08 at 11 16 21 PM

No worry~ For Apache Avro, they have a dependency bot. I guess they will catch up soon. Let's wait and see their activity.

I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

In addition, all libraries should be synced inside Apache Spark because Apache Spark is using everything.

Thank you for your message. The Kafka PR failures seem to not be related to the Zstd change -- the tests appear to be bugged/flaky, as many other recent PRs are also failing. I looked over the Zstd release notes but did not catch any obvious big changes that could trigger an incompatibility. However, my plan is to wait till the Kafka PR can pass the CI, and shepherd the change through.

Regarding Spark: Is it necessary to have all dependencies upgraded to Zstd 1.5.x before merging if the Spark CI/dependency tests appear to pass? For instance, the move to 1.5.0-1 is a scheduled for Kafka 3.0 (unless there is a back-port), but I imagine that release will be some time from now.

Regarding this PR: Do we have a good understanding of why the benchmark tests fail? I cannot tell if it is actually related to this code change.

Thank you for your guidance with this process.

@dchristle
Copy link
Contributor Author

Thank you for your efforts. BTW, @dchristle . Please note that your ORC PR is not about ZSTD-JNI. It's native ZSTD library only. I commented on your ORC PR about the difference.

For the following, I saw Kafka failures.

They appear to pass their respective CIs.

Screen Shot 2021-06-08 at 11 16 21 PM

No worry~ For Apache Avro, they have a dependency bot. I guess they will catch up soon. Let's wait and see their activity.

I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

In addition, all libraries should be synced inside Apache Spark because Apache Spark is using everything.

Yes, for ORC it's the native C library and not Java. I have a tangential question for you: Does it make sense to use aircompressor for ZSTD in ORC, rather than the zstd-jni? It does not seem to keep up with the latest zstd, and the implementation seems to lack support for many of the strategies employed at different compression levels, if I understand the code here https://github.com/airlift/aircompressor/blob/495bae80ac7487d2efa1bba437d04e8a2a42bb7b/src/main/java/io/airlift/compress/zstd/CompressionParameters.java#L143 correctly.

The reason I ask is because it is conceivable that zstd in the future makes an incompatible change that propagates to zstd-jni but not aircompressor.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 10, 2021

It's just a historical fact. IMO, I believe that we need to replace it to zstd-jni.

Does it make sense to use aircompressor for ZSTD in ORC, rather than the zstd-jni?

Yes, aircompressor is behind and also has ZSTD bug. That's the reason why the community (not only Apache ORC, but also Presto) complains at the new version of aircompressor.

BTW, please note that your PR is merged to Apache ORC 1.7 which has no release plan yet. The situation is the same for the other communities. Apache Kafka with ZSTD 1.5? Apache Avro with ZSTD 1.5? Apache Parquet with ZSTD 1.5? Apache Spark should embrace those Apache Projects together because our customers are able to use them together in a single app.

@dchristle
Copy link
Contributor Author

It's just a historical fact. IMO, I believe that we need to replace it to zstd-jni.

Does it make sense to use aircompressor for ZSTD in ORC, rather than the zstd-jni?

Yes, aircompressor is behind and also has ZSTD bug. That's the reason why the community (not only Apache ORC, but also Presto) complains at the new version of aircompressor.

BTW, please note that your PR is merged to Apache ORC 1.7 which has no release plan yet. The situation is the same for the other communities. Apache Kafka with ZSTD 1.5? Apache Avro with ZSTD 1.5? Apache Parquet with ZSTD 1.5? Apache Spark should embrace those Apache Projects together because our customers are able to use them together in a single app.

Yes. So it seems like, in order to get Spark on zstd 1.5.0, we need these other dependencies to have an actual release with this version of zstd.

So far, it seems like there are no build/CI failures due to zstd 1.5.0 relative to the 1.4.x brach on various projects (orc for cpp and parquet for zstd-jni); combined with that I haven't seen any incompatibility notifications in the release notes, my hunch is upgrading the 1.5.0 is "safe".

But, I don't have anywhere close to your experience, and your idea we should only incorporate specific versions of parquet-mr, kafka, avro, and orc that explicitly support 1.5.x is likely the safest option for the community.

Given this, it seems the PR's I've pushed for 1.5.x alone aren't sufficient. Is a better plan to try to backport the 1.5.0 zstd-jni into minor bumps (rather than more major bumps with no near-term release plan) of Spark's dependencies? This way, there is some hope we could get 1.5.0 into 3.2.0 in August.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 12, 2021

@dchristle . I'm a big supporter of ZStandard and have no doubt that we need to upgrade ZSTD-JNI in the future. Your PR will be a part of Apache Spark definitely. It's a matter of timing.

Here, I'm saying that what we need for Apache Spark. What we need is the actual verification by testing, not a hunch. Both of us don't want to break Apache Spark 3.2.0, do we? As you see SPARK-34651, I synced multiple Apache project for ZSTD-JNI 1.4.9-1 and Apache Spark 3.2.0 because there were incompatibility issues.

For your PR, we can proceed in this way. First of all, let's make it sure that ZSTD-JNI 1.5 passes all UTs of Parquet/Kafka/Avro at least. Second, let's merge your Apache Spark PR first temporarily for the wider Apache Spark community testing. If something broken is found during Apache Spark 3.2.0 QA period, we can revert it during that period.

@dchristle
Copy link
Contributor Author

dchristle commented Jun 12, 2021

Here, I'm saying that what we need for Apache Spark. What we need is the actual verification by testing, not a hunch. Both of us don't want to break Apache Spark 3.2.0, do we?

I think there is a misunderstanding. My proposal is to backport zstd-jni 1.5.0-1 into Kafka/Avro/Parquet so that upcoming minor releases of those projects pick up the change before we put it into Spark. I used the word "hunch" only to indicate that I expect the backporting process in those projects to include 1.5.0 to go smoothly -- based on some of the CI passing -- and not to suggest that we should not test this change to Spark.

If getting the change into upcoming dependency minor releases (1.10.x Avro, 2.8.x Kafka, 1.12.x or 1.13.x Parquet) truly does work with no issues, we can synchronize all dependencies in Spark's pom in this PR (and convert it to a WIP in the meantime) so we use 1.5.0-1 uniformly where possible before merging to Spark master branch for QA testing. What do you think about this plan?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 15, 2021

Well, do you know that the feature freeze of Apache Spark 3.2.0 is July 1st for now? It seems that that's the root cause of misunderstanding. For the rest of the plan, both of us know that we will have ZStandard 1.5 eventually as a community-bless versions across several Apache projects. There is no arguments about that.

This way, there is some hope we could get 1.5.0 into 3.2.0 in August.

I think there is a misunderstanding. My proposal is to backport zstd-jni 1.5.0-1 into Kafka/Avro/Parquet so that upcoming minor releases of those projects pick up the change before we put it into Spark.

@dchristle dchristle changed the title [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-1 [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 Jun 16, 2021
@dchristle dchristle force-pushed the ZSTD150 branch 2 times, most recently from 1e585f0 to ab43055 Compare June 16, 2021 07:20
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 16, 2021

Thank you for updating to track new ZSTD-JNI 1.5.0-2, @dchristle .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member

And, here is the updated benchmark result.

~s:ZSTD150 ✗ $ git diff
diff --git a/core/benchmarks/ZStandardBenchmark-results.txt b/core/benchmarks/ZStandardBenchmark-results.txt
index fd39951717..b87d8971d6 100644
--- a/core/benchmarks/ZStandardBenchmark-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-results.txt
@@ -6,22 +6,22 @@ OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            241            314          75          0.0       24136.6       1.0X
-Compression 10000 times at level 2 without buffer pool            627            649          20          0.0       62740.8       0.4X
-Compression 10000 times at level 3 without buffer pool           1046           1064          27          0.0      104568.9       0.2X
-Compression 10000 times at level 1 with buffer pool               191            195           7          0.1       19062.9       1.3X
-Compression 10000 times at level 2 with buffer pool               513            609          71          0.0       51333.9       0.5X
-Compression 10000 times at level 3 with buffer pool               992           1033          58          0.0       99204.2       0.2X
+Compression 10000 times at level 1 without buffer pool            257            338          86          0.0       25688.3       1.0X
+Compression 10000 times at level 2 without buffer pool            556            598          28          0.0       55638.0       0.5X
+Compression 10000 times at level 3 without buffer pool            855            914          67          0.0       85481.5       0.3X
+Compression 10000 times at level 1 with buffer pool               109            111           1          0.1       10850.6       2.4X
+Compression 10000 times at level 2 with buffer pool               281            405         116          0.0       28125.6       0.9X
+Compression 10000 times at level 3 with buffer pool               520            718         180          0.0       52028.5       0.5X

 OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool            419            421           2          0.0       41859.9       1.0X
-Decompression 10000 times from level 2 without buffer pool            415            421           5          0.0       41481.8       1.0X
-Decompression 10000 times from level 3 without buffer pool           1302           1347          63          0.0      130218.8       0.3X
-Decompression 10000 times from level 1 with buffer pool               368            370           2          0.0       36783.7       1.1X
-Decompression 10000 times from level 2 with buffer pool               367            371           3          0.0       36741.1       1.1X
-Decompression 10000 times from level 3 with buffer pool              1200           1245          63          0.0      120008.4       0.3X
+Decompression 10000 times from level 1 without buffer pool            411            416           3          0.0       41131.5       1.0X
+Decompression 10000 times from level 2 without buffer pool            417            421           4          0.0       41709.7       1.0X
+Decompression 10000 times from level 3 without buffer pool           1223           1263          58          0.0      122251.4       0.3X
+Decompression 10000 times from level 1 with buffer pool               364            366           2          0.0       36391.6       1.1X
+Decompression 10000 times from level 2 with buffer pool               364            369           3          0.0       36441.1       1.1X
+Decompression 10000 times from level 3 with buffer pool              1183           1198          21          0.0      118294.6       0.3X

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (except one comment), @dchristle . Thanks.

As I mentioned before, we will merge this for Apache Spark 3.2.0. However, there is a chance of reverting from 3.2.0 due to the some regressions. Even in that case, we will try this for Apache Spark 3.3.0 based on your contributions and other Apache projects' releases.

Second, let's merge your Apache Spark PR first temporarily for the wider Apache Spark community testing. If something broken is found during Apache Spark 3.2.0 QA period, we can revert it during that period.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I'm checking the benchmark result as a final review stage. It seems that there is some regression on JDK11 environment. I'll check the result on the linux box too. Could you double-check the result of ZStandardBenchmark in your JDK11 environment, @dchristle ? You should generate the result twice (first with 1.4.9-1 and second with 1.5.0-2).

--- a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
@@ -6,22 +6,22 @@ OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Mac OS X 11.4
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool           1523           1526           4          0.0      152315.0       1.0X
-Compression 10000 times at level 2 without buffer pool           1227           1229           2          0.0      122734.5       1.2X
-Compression 10000 times at level 3 without buffer pool           1548           1551           4          0.0      154821.8       1.0X
-Compression 10000 times at level 1 with buffer pool               782            793          13          0.0       78221.2       1.9X
-Compression 10000 times at level 2 with buffer pool              1127           1183          79          0.0      112668.4       1.4X
-Compression 10000 times at level 3 with buffer pool              1454           1469          21          0.0      145383.8       1.0X
+Compression 10000 times at level 1 without buffer pool           1451           1455           6          0.0      145071.2       1.0X
+Compression 10000 times at level 2 without buffer pool            447            517          53          0.0       44732.6       3.2X
+Compression 10000 times at level 3 without buffer pool           2287           2314          39          0.0      228662.8       0.6X
+Compression 10000 times at level 1 with buffer pool              1530           1534           6          0.0      153036.3       0.9X
+Compression 10000 times at level 2 with buffer pool              1894           1912          26          0.0      189350.2       0.8X
+Compression 10000 times at level 3 with buffer pool              2150           2218          96          0.0      215042.6       0.7X

 OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Mac OS X 11.4
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool           1458           1458           1          0.0      145788.4       1.0X
-Decompression 10000 times from level 2 without buffer pool           1460           1465           7          0.0      145988.5       1.0X
-Decompression 10000 times from level 3 without buffer pool           2223           2261          55          0.0      222258.0       0.7X
-Decompression 10000 times from level 1 with buffer pool              1397           1397           0          0.0      139660.8       1.0X
-Decompression 10000 times from level 2 with buffer pool              1391           1395           5          0.0      139148.7       1.0X
-Decompression 10000 times from level 3 with buffer pool              2249           2315          94          0.0      224883.8       0.6X
+Decompression 10000 times from level 1 without buffer pool           1571           1571           1          0.0      157078.0       1.0X
+Decompression 10000 times from level 2 without buffer pool           1581           1586           7          0.0      158062.5       1.0X
+Decompression 10000 times from level 3 without buffer pool           2439           2514         107          0.0      243850.6       0.6X
+Decompression 10000 times from level 1 with buffer pool              1378           1381           5          0.0      137771.0       1.1X
+Decompression 10000 times from level 2 with buffer pool              1391           1392           2          0.0      139109.7       1.1X
+Decompression 10000 times from level 3 with buffer pool              1940           2106         235          0.0      193981.0       0.8X

Undo unintended whitespace changes.
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 16, 2021

For the linux box, it looks reasonable. In this case, we need to add the benchmark result into the PR.

Could you follow the instruction at https://spark.apache.org/developer-tools.html Running benchmarks in your forked repository section and add the result into this PR, please?

--- a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
@@ -6,22 +6,22 @@ OpenJDK 64-Bit Server VM 11.0.10+9 on Linux 5.11.0-18-generic
 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            430            472          38          0.0       43018.6       1.0X
-Compression 10000 times at level 2 without buffer pool            216            218           1          0.0       21605.7       2.0X
-Compression 10000 times at level 3 without buffer pool            442            443           1          0.0       44182.3       1.0X
-Compression 10000 times at level 1 with buffer pool               233            235           1          0.0       23269.7       1.8X
-Compression 10000 times at level 2 with buffer pool               272            275           2          0.0       27203.8       1.6X
-Compression 10000 times at level 3 with buffer pool               386            390           6          0.0       38597.4       1.1X
+Compression 10000 times at level 1 without buffer pool            166            188          67          0.1       16637.9       1.0X
+Compression 10000 times at level 2 without buffer pool            208            209           0          0.0       20758.7       0.8X
+Compression 10000 times at level 3 without buffer pool            428            430           3          0.0       42785.1       0.4X
+Compression 10000 times at level 1 with buffer pool               229            230           1          0.0       22885.1       0.7X
+Compression 10000 times at level 2 with buffer pool               267            268           1          0.0       26675.6       0.6X
+Compression 10000 times at level 3 with buffer pool               372            376           3          0.0       37249.8       0.4X

 OpenJDK 64-Bit Server VM 11.0.10+9 on Linux 5.11.0-18-generic
 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool            442            447           9          0.0       44178.4       1.0X
-Decompression 10000 times from level 2 without buffer pool            443            443           1          0.0       44254.7       1.0X
-Decompression 10000 times from level 3 without buffer pool            441            443           1          0.0       44120.0       1.0X
-Decompression 10000 times from level 1 with buffer pool               383            384           1          0.0       38295.0       1.2X
-Decompression 10000 times from level 2 with buffer pool               385            388           2          0.0       38461.7       1.1X
-Decompression 10000 times from level 3 with buffer pool               384            387           2          0.0       38360.3       1.2X
+Decompression 10000 times from level 1 without buffer pool            440            443           5          0.0       43967.4       1.0X
+Decompression 10000 times from level 2 without buffer pool            439            439           0          0.0       43869.5       1.0X
+Decompression 10000 times from level 3 without buffer pool            439            441           2          0.0       43875.2       1.0X
+Decompression 10000 times from level 1 with buffer pool               379            381           2          0.0       37903.4       1.2X
+Decompression 10000 times from level 2 with buffer pool               382            383           2          0.0       38178.6       1.2X
+Decompression 10000 times from level 3 with buffer pool               381            383           2          0.0       38059.7       1.2X

@dongjoon-hyun
Copy link
Member

I updated the above result with a clean-build result, @dchristle .

@dongjoon-hyun
Copy link
Member

BTW, if you get the benchmark result via GitHub Action, please share the GitHub Action links with us.

@dongjoon-hyun
Copy link
Member

I ran the benchmark on your branches.

Here is the summary.

SPARK-PR-32826:SPARK-PR-32826 ✗ $ git diff
diff --git a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
index 3895e7b..e93671d 100644
--- a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
@@ -2,26 +2,26 @@
 Benchmark ZStandardCompressionCodec
 ================================================================================================

-OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            606            614           6          0.0       60645.3       1.0X
-Compression 10000 times at level 2 without buffer pool            686            693           7          0.0       68594.9       0.9X
-Compression 10000 times at level 3 without buffer pool            906            920          14          0.0       90642.7       0.7X
-Compression 10000 times at level 1 with buffer pool               389            403          20          0.0       38901.4       1.6X
-Compression 10000 times at level 2 with buffer pool               450            466          13          0.0       45032.0       1.3X
-Compression 10000 times at level 3 with buffer pool               680            682           2          0.0       68004.2       0.9X
+Compression 10000 times at level 1 without buffer pool            805           1103         500          0.0       80501.4       1.0X
+Compression 10000 times at level 2 without buffer pool            728            744          20          0.0       72819.9       1.1X
+Compression 10000 times at level 3 without buffer pool            987            995           7          0.0       98719.4       0.8X
+Compression 10000 times at level 1 with buffer pool               371            377           8          0.0       37092.3       2.2X
+Compression 10000 times at level 2 with buffer pool               465            473           6          0.0       46509.8       1.7X
+Compression 10000 times at level 3 with buffer pool               715            738          20          0.0       71500.2       1.1X

-OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool           1209           1226          25          0.0      120862.8       1.0X
-Decompression 10000 times from level 2 without buffer pool           1191           1193           3          0.0      119064.9       1.0X
-Decompression 10000 times from level 3 without buffer pool           1188           1193           6          0.0      118843.3       1.0X
-Decompression 10000 times from level 1 with buffer pool               998           1004           9          0.0       99754.7       1.2X
-Decompression 10000 times from level 2 with buffer pool               990           1001          11          0.0       99043.8       1.2X
-Decompression 10000 times from level 3 with buffer pool               983            999          20          0.0       98269.5       1.2X
+Decompression 10000 times from level 1 without buffer pool            776            786          11          0.0       77649.5       1.0X
+Decompression 10000 times from level 2 without buffer pool            787            792           5          0.0       78686.6       1.0X
+Decompression 10000 times from level 3 without buffer pool            782            790           7          0.0       78195.4       1.0X
+Decompression 10000 times from level 1 with buffer pool               529            551          21          0.0       52914.0       1.5X
+Decompression 10000 times from level 2 with buffer pool               523            537          11          0.0       52266.2       1.5X
+Decompression 10000 times from level 3 with buffer pool               519            527          10          0.0       51932.3       1.5X


diff --git a/core/benchmarks/ZStandardBenchmark-results.txt b/core/benchmarks/ZStandardBenchmark-results.txt
index 6990c28..d1aa07a 100644
--- a/core/benchmarks/ZStandardBenchmark-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-results.txt
@@ -2,26 +2,26 @@
 Benchmark ZStandardCompressionCodec
 ================================================================================================

-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            670            681           9          0.0       67011.0       1.0X
-Compression 10000 times at level 2 without buffer pool            569            571           2          0.0       56932.0       1.2X
-Compression 10000 times at level 3 without buffer pool            748            751           2          0.0       74813.8       0.9X
-Compression 10000 times at level 1 with buffer pool               336            337           1          0.0       33630.6       2.0X
-Compression 10000 times at level 2 with buffer pool               395            397           2          0.0       39472.6       1.7X
-Compression 10000 times at level 3 with buffer pool               563            567           4          0.0       56272.8       1.2X
+Compression 10000 times at level 1 without buffer pool            444            606         183          0.0       44440.9       1.0X
+Compression 10000 times at level 2 without buffer pool            514            527          10          0.0       51421.8       0.9X
+Compression 10000 times at level 3 without buffer pool            725            729           6          0.0       72531.4       0.6X
+Compression 10000 times at level 1 with buffer pool               229            235           6          0.0       22886.7       1.9X
+Compression 10000 times at level 2 with buffer pool               288            303          15          0.0       28802.3       1.5X
+Compression 10000 times at level 3 with buffer pool               493            521          26          0.0       49339.5       0.9X

-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool           1029           1031           3          0.0      102887.4       1.0X
-Decompression 10000 times from level 2 without buffer pool           1028           1031           4          0.0      102847.8       1.0X
-Decompression 10000 times from level 3 without buffer pool           1029           1029           0          0.0      102941.0       1.0X
-Decompression 10000 times from level 1 with buffer pool               798            799           0          0.0       79838.0       1.3X
-Decompression 10000 times from level 2 with buffer pool               799            799           0          0.0       79852.9       1.3X
-Decompression 10000 times from level 3 with buffer pool               796            798           2          0.0       79630.5       1.3X
+Decompression 10000 times from level 1 without buffer pool           1188           1192           6          0.0      118770.4       1.0X
+Decompression 10000 times from level 2 without buffer pool           1176           1199          33          0.0      117574.4       1.0X
+Decompression 10000 times from level 3 without buffer pool           1174           1175           1          0.0      117426.0       1.0X
+Decompression 10000 times from level 1 with buffer pool              1020           1046          36          0.0      102021.9       1.2X
+Decompression 10000 times from level 2 with buffer pool               996           1005          14          0.0       99561.0       1.2X
+Decompression 10000 times from level 3 with buffer pool              1021           1022           1          0.0      102050.9       1.2X

@dongjoon-hyun
Copy link
Member

I made a PR to you, @dchristle . Please review and merge that if you think that's okay. And, let's finish this PR.

@dchristle
Copy link
Contributor Author

I made a PR to you, @dchristle . Please review and merge that if you think that's okay. And, let's finish this PR.

Excellent! I had triggered the full benchmark suite, so it took much longer than necessary. Thanks for updating this PR.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs). Thank you, @dchristle .

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 17, 2021

BTW, I saw your result. You can use your result here too, @dchristle . I just wanted to help you to run the jobs.

@dchristle
Copy link
Contributor Author

BTW, I saw your result. You can use your result here too, @dchristle . I just wanted to help you to run the jobs.

Let's use your commit. I didn't know how to specify only the ZStandardCodec benchmark to exclusively run, which was much faster. Good knowledge for the future :)

@dongjoon-hyun
Copy link
Member

GitHub Action passed. Thank you for your contribution and patience. We are very careful because this is very important to Apache Spark 3.2.0, @dchristle .

Merged master to Apache Spark 3.2.0.

@dongjoon-hyun
Copy link
Member

I added you to the Apache Spark contributor group in JIRA and assigned SPARK-35670 to you.
Congratulation for your first JIRA and welcome!

@dchristle
Copy link
Contributor Author

I added you to the Apache Spark contributor group in JIRA and assigned SPARK-35670 to you.
Congratulation for your first JIRA and welcome!

Thank you for your help with this. I'm very glad to contribute. I've already run some pre-production workflows using the zstd-jni-1.5.0-2 build and haven't observed any issues so far. I'll think if there's more testing I can do.

@mixermt
Copy link

mixermt commented Feb 5, 2022

Hi @dchristle and @dongjoon-hyun,
I see great gains in comp/de-comp speeds when using RecyclingBufferPool. Are there any downsides of using it?

@dongjoon-hyun
Copy link
Member

@mixermt . It's enabled by default since Apache Spark 3.2.0.

Although ZSTD-JNI implementation is improved a lot recently, you will hit high memory usage if you turn off the configuration.

@mixermt
Copy link

mixermt commented Feb 6, 2022

Thanks @dongjoon-hyun for your quick reply 🙏
Another small question regarding benchmarks, is there any benchmarks were done with different number of workers defined by parquet.compression.codec.zstd.workers ? Unfortunately, I can't find any benchmarks regarding this.
In our production tests, we see a significant degradation while using, other than default, number of workers, although by description it should improve speed. We provide plenty of memory to executors.
image

@dongjoon-hyun
Copy link
Member

@mixermt . It seems that you are in the wrong community. You had better ask Apache Parquet community for Apache Parquet configuration issue instead of here. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants