[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 #32826

dchristle · 2021-06-08T20:10:14Z

What changes were proposed in this pull request?

This PR aims to upgrade zstd-jni to 1.5.0-2, which uses zstd version 1.5.0.

Why are the changes needed?

Major improvements to Zstd support are targeted for the upcoming 3.2.0 release of Spark. Zstd 1.5.0 introduces significant compression (+25% to 140%) and decompression (~15%) speed improvements in benchmarks described in more detail on the releases page:

https://github.com/facebook/zstd/releases/tag/v1.5.0

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Build passes build tests, but the benchmark tests seem flaky. I am unsure if this change is responsible. The error is:

Running org.apache.spark.rdd.CoalescedRDDBenchmark:
21/06/08 18:53:10 ERROR SparkContext: Failed to add file:/home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar to Spark environment
java.lang.IllegalArgumentException: requirement failed: File spark-core_2.12-3.2.0-SNAPSHOT-tests.jar was already registered with a different path (old path = /home/runner/work/spark/spark/core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar, new path = /home/runner/work/spark/spark/./core/target/scala-2.12/spark-core_2.12-3.2.0-SNAPSHOT-tests.jar

https://github.com/dchristle/spark/runs/2776123749?check_suite_focus=true

cc: @dongjoon-hyun

AmplabJenkins · 2021-06-08T20:23:27Z

Can one of the admins verify this patch?

dongjoon-hyun

Thank you. BTW, do you have a chance to test this with Apache Parquet/Avro/Kafka first project by project? Historically, we hit several incompatibility issue.

pom.xml

dongjoon-hyun · 2021-06-08T22:35:46Z

BTW, @dchristle . I quickly ran the micro benchmark on master branch on my Mac before and after. We also need to check the memory usage, too.

 OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            173            348         161          0.1       17267.7       1.0X
-Compression 10000 times at level 2 without buffer pool            616            669          57          0.0       61574.1       0.3X
-Compression 10000 times at level 3 without buffer pool           1302           1327          35          0.0      130234.4       0.1X
-Compression 10000 times at level 1 with buffer pool               177            192           9          0.1       17709.8       1.0X
-Compression 10000 times at level 2 with buffer pool               670            709          36          0.0       66965.9       0.3X
-Compression 10000 times at level 3 with buffer pool              1201           1209          11          0.0      120144.2       0.1X
+Compression 10000 times at level 1 without buffer pool            271            348          75          0.0       27106.5       1.0X
+Compression 10000 times at level 2 without buffer pool            655            720          59          0.0       65510.4       0.4X
+Compression 10000 times at level 3 without buffer pool            908            963          73          0.0       90777.5       0.3X
+Compression 10000 times at level 1 with buffer pool               181            194          11          0.1       18089.8       1.5X
+Compression 10000 times at level 2 with buffer pool               489            531          48          0.0       48917.8       0.6X
+Compression 10000 times at level 3 with buffer pool               859            948         108          0.0       85869.0       0.3X

 OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool            422            441          12          0.0       42239.7       1.0X
-Decompression 10000 times from level 2 without buffer pool            433            460          19          0.0       43342.9       1.0X
-Decompression 10000 times from level 3 without buffer pool           1241           1306          93          0.0      124091.5       0.3X
-Decompression 10000 times from level 1 with buffer pool               373            387          12          0.0       37268.1       1.1X
-Decompression 10000 times from level 2 with buffer pool               383            387           3          0.0       38296.0       1.1X
-Decompression 10000 times from level 3 with buffer pool              1116           1168          73          0.0      111603.1       0.4X
+Decompression 10000 times from level 1 without buffer pool            464            487          21          0.0       46420.0       1.0X
+Decompression 10000 times from level 2 without buffer pool            436            452          11          0.0       43596.2       1.1X
+Decompression 10000 times from level 3 without buffer pool           1308           1345          53          0.0      130781.6       0.4X
+Decompression 10000 times from level 1 with buffer pool               379            392          11          0.0       37895.4       1.2X
+Decompression 10000 times from level 2 with buffer pool               380            396          12          0.0       38038.2       1.2X
+Decompression 10000 times from level 3 with buffer pool               909           1001         130          0.0       90861.3       0.5X

dchristle · 2021-06-09T01:46:52Z

@dongjoon-hyun That's a good suggestion. I have three PRs to update Kafka, Parquet, and ORC:

apache/kafka#10847

apache/parquet-java#914

apache/orc#715

They appear to pass their respective CIs. I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

dongjoon-hyun · 2021-06-09T06:14:51Z

Thank you for your efforts. BTW, @dchristle . Please note that your ORC PR is not about ZSTD-JNI. It's native ZSTD library only. I commented on your ORC PR about the difference.

For the following, I saw Kafka failures.

They appear to pass their respective CIs.

No worry~ For Apache Avro, they have a dependency bot. I guess they will catch up soon. Let's wait and see their activity.

I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

In addition, all libraries should be synced inside Apache Spark because Apache Spark is using everything.

dchristle · 2021-06-09T22:36:25Z

Thank you for your efforts. BTW, @dchristle . Please note that your ORC PR is not about ZSTD-JNI. It's native ZSTD library only. I commented on your ORC PR about the difference.

For the following, I saw Kafka failures.

They appear to pass their respective CIs.

No worry~ For Apache Avro, they have a dependency bot. I guess they will catch up soon. Let's wait and see their activity.

I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

In addition, all libraries should be synced inside Apache Spark because Apache Spark is using everything.

Thank you for your message. The Kafka PR failures seem to not be related to the Zstd change -- the tests appear to be bugged/flaky, as many other recent PRs are also failing. I looked over the Zstd release notes but did not catch any obvious big changes that could trigger an incompatibility. However, my plan is to wait till the Kafka PR can pass the CI, and shepherd the change through.

Regarding Spark: Is it necessary to have all dependencies upgraded to Zstd 1.5.x before merging if the Spark CI/dependency tests appear to pass? For instance, the move to 1.5.0-1 is a scheduled for Kafka 3.0 (unless there is a back-port), but I imagine that release will be some time from now.

Regarding this PR: Do we have a good understanding of why the benchmark tests fail? I cannot tell if it is actually related to this code change.

Thank you for your guidance with this process.

dchristle · 2021-06-10T22:15:42Z

Thank you for your efforts. BTW, @dchristle . Please note that your ORC PR is not about ZSTD-JNI. It's native ZSTD library only. I commented on your ORC PR about the difference.

For the following, I saw Kafka failures.

They appear to pass their respective CIs.

No worry~ For Apache Avro, they have a dependency bot. I guess they will catch up soon. Let's wait and see their activity.

I have less familiarity with Avro's build chains/codebase, so I did not attempt to test it yet.

In addition, all libraries should be synced inside Apache Spark because Apache Spark is using everything.

Yes, for ORC it's the native C library and not Java. I have a tangential question for you: Does it make sense to use aircompressor for ZSTD in ORC, rather than the zstd-jni? It does not seem to keep up with the latest zstd, and the implementation seems to lack support for many of the strategies employed at different compression levels, if I understand the code here https://github.com/airlift/aircompressor/blob/495bae80ac7487d2efa1bba437d04e8a2a42bb7b/src/main/java/io/airlift/compress/zstd/CompressionParameters.java#L143 correctly.

The reason I ask is because it is conceivable that zstd in the future makes an incompatible change that propagates to zstd-jni but not aircompressor.

dongjoon-hyun · 2021-06-10T22:53:40Z

It's just a historical fact. IMO, I believe that we need to replace it to zstd-jni.

Does it make sense to use aircompressor for ZSTD in ORC, rather than the zstd-jni?

Yes, aircompressor is behind and also has ZSTD bug. That's the reason why the community (not only Apache ORC, but also Presto) complains at the new version of aircompressor.

Got "Overflow detected" at presto-orc with zstd airlift/aircompressor#122

BTW, please note that your PR is merged to Apache ORC 1.7 which has no release plan yet. The situation is the same for the other communities. Apache Kafka with ZSTD 1.5? Apache Avro with ZSTD 1.5? Apache Parquet with ZSTD 1.5? Apache Spark should embrace those Apache Projects together because our customers are able to use them together in a single app.

dchristle · 2021-06-11T05:04:36Z

It's just a historical fact. IMO, I believe that we need to replace it to zstd-jni.

Does it make sense to use aircompressor for ZSTD in ORC, rather than the zstd-jni?

Yes, aircompressor is behind and also has ZSTD bug. That's the reason why the community (not only Apache ORC, but also Presto) complains at the new version of aircompressor.

airlift/aircompressor#122

BTW, please note that your PR is merged to Apache ORC 1.7 which has no release plan yet. The situation is the same for the other communities. Apache Kafka with ZSTD 1.5? Apache Avro with ZSTD 1.5? Apache Parquet with ZSTD 1.5? Apache Spark should embrace those Apache Projects together because our customers are able to use them together in a single app.

Yes. So it seems like, in order to get Spark on zstd 1.5.0, we need these other dependencies to have an actual release with this version of zstd.

So far, it seems like there are no build/CI failures due to zstd 1.5.0 relative to the 1.4.x brach on various projects (orc for cpp and parquet for zstd-jni); combined with that I haven't seen any incompatibility notifications in the release notes, my hunch is upgrading the 1.5.0 is "safe".

But, I don't have anywhere close to your experience, and your idea we should only incorporate specific versions of parquet-mr, kafka, avro, and orc that explicitly support 1.5.x is likely the safest option for the community.

Given this, it seems the PR's I've pushed for 1.5.x alone aren't sufficient. Is a better plan to try to backport the 1.5.0 zstd-jni into minor bumps (rather than more major bumps with no near-term release plan) of Spark's dependencies? This way, there is some hope we could get 1.5.0 into 3.2.0 in August.

dongjoon-hyun · 2021-06-12T14:44:17Z

@dchristle . I'm a big supporter of ZStandard and have no doubt that we need to upgrade ZSTD-JNI in the future. Your PR will be a part of Apache Spark definitely. It's a matter of timing.

https://databricks.com/session_na21/the-rise-of-zstandard-apache-spark-parquet-orc-avro

Here, I'm saying that what we need for Apache Spark. What we need is the actual verification by testing, not a hunch. Both of us don't want to break Apache Spark 3.2.0, do we? As you see SPARK-34651, I synced multiple Apache project for ZSTD-JNI 1.4.9-1 and Apache Spark 3.2.0 because there were incompatibility issues.

For your PR, we can proceed in this way. First of all, let's make it sure that ZSTD-JNI 1.5 passes all UTs of Parquet/Kafka/Avro at least. Second, let's merge your Apache Spark PR first temporarily for the wider Apache Spark community testing. If something broken is found during Apache Spark 3.2.0 QA period, we can revert it during that period.

dchristle · 2021-06-12T20:21:48Z

Here, I'm saying that what we need for Apache Spark. What we need is the actual verification by testing, not a hunch. Both of us don't want to break Apache Spark 3.2.0, do we?

I think there is a misunderstanding. My proposal is to backport zstd-jni 1.5.0-1 into Kafka/Avro/Parquet so that upcoming minor releases of those projects pick up the change before we put it into Spark. I used the word "hunch" only to indicate that I expect the backporting process in those projects to include 1.5.0 to go smoothly -- based on some of the CI passing -- and not to suggest that we should not test this change to Spark.

If getting the change into upcoming dependency minor releases (1.10.x Avro, 2.8.x Kafka, 1.12.x or 1.13.x Parquet) truly does work with no issues, we can synchronize all dependencies in Spark's pom in this PR (and convert it to a WIP in the meantime) so we use 1.5.0-1 uniformly where possible before merging to Spark master branch for QA testing. What do you think about this plan?

dongjoon-hyun · 2021-06-15T19:59:16Z

Well, do you know that the feature freeze of Apache Spark 3.2.0 is July 1st for now? It seems that that's the root cause of misunderstanding. For the rest of the plan, both of us know that we will have ZStandard 1.5 eventually as a community-bless versions across several Apache projects. There is no arguments about that.

This way, there is some hope we could get 1.5.0 into 3.2.0 in August.

I think there is a misunderstanding. My proposal is to backport zstd-jni 1.5.0-1 into Kafka/Avro/Parquet so that upcoming minor releases of those projects pick up the change before we put it into Spark.

dongjoon-hyun · 2021-06-16T16:28:57Z

Thank you for updating to track new ZSTD-JNI 1.5.0-2, @dchristle .

dongjoon-hyun

BTW, @dchristle . https://github.com/apache/spark/pull/32826/files#r647855915 is added again. :)

dongjoon-hyun · 2021-06-16T16:48:47Z

And, here is the updated benchmark result.

~s:ZSTD150 ✗ $ git diff
diff --git a/core/benchmarks/ZStandardBenchmark-results.txt b/core/benchmarks/ZStandardBenchmark-results.txt
index fd39951717..b87d8971d6 100644
--- a/core/benchmarks/ZStandardBenchmark-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-results.txt
@@ -6,22 +6,22 @@ OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            241            314          75          0.0       24136.6       1.0X
-Compression 10000 times at level 2 without buffer pool            627            649          20          0.0       62740.8       0.4X
-Compression 10000 times at level 3 without buffer pool           1046           1064          27          0.0      104568.9       0.2X
-Compression 10000 times at level 1 with buffer pool               191            195           7          0.1       19062.9       1.3X
-Compression 10000 times at level 2 with buffer pool               513            609          71          0.0       51333.9       0.5X
-Compression 10000 times at level 3 with buffer pool               992           1033          58          0.0       99204.2       0.2X
+Compression 10000 times at level 1 without buffer pool            257            338          86          0.0       25688.3       1.0X
+Compression 10000 times at level 2 without buffer pool            556            598          28          0.0       55638.0       0.5X
+Compression 10000 times at level 3 without buffer pool            855            914          67          0.0       85481.5       0.3X
+Compression 10000 times at level 1 with buffer pool               109            111           1          0.1       10850.6       2.4X
+Compression 10000 times at level 2 with buffer pool               281            405         116          0.0       28125.6       0.9X
+Compression 10000 times at level 3 with buffer pool               520            718         180          0.0       52028.5       0.5X

 OpenJDK 64-Bit Server VM 1.8.0_292-b09 on Mac OS X 10.16
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool            419            421           2          0.0       41859.9       1.0X
-Decompression 10000 times from level 2 without buffer pool            415            421           5          0.0       41481.8       1.0X
-Decompression 10000 times from level 3 without buffer pool           1302           1347          63          0.0      130218.8       0.3X
-Decompression 10000 times from level 1 with buffer pool               368            370           2          0.0       36783.7       1.1X
-Decompression 10000 times from level 2 with buffer pool               367            371           3          0.0       36741.1       1.1X
-Decompression 10000 times from level 3 with buffer pool              1200           1245          63          0.0      120008.4       0.3X
+Decompression 10000 times from level 1 without buffer pool            411            416           3          0.0       41131.5       1.0X
+Decompression 10000 times from level 2 without buffer pool            417            421           4          0.0       41709.7       1.0X
+Decompression 10000 times from level 3 without buffer pool           1223           1263          58          0.0      122251.4       0.3X
+Decompression 10000 times from level 1 with buffer pool               364            366           2          0.0       36391.6       1.1X
+Decompression 10000 times from level 2 with buffer pool               364            369           3          0.0       36441.1       1.1X
+Decompression 10000 times from level 3 with buffer pool              1183           1198          21          0.0      118294.6       0.3X

dongjoon-hyun

+1, LGTM (except one comment), @dchristle . Thanks.

As I mentioned before, we will merge this for Apache Spark 3.2.0. However, there is a chance of reverting from 3.2.0 due to the some regressions. Even in that case, we will try this for Apache Spark 3.3.0 based on your contributions and other Apache projects' releases.

Second, let's merge your Apache Spark PR first temporarily for the wider Apache Spark community testing. If something broken is found during Apache Spark 3.2.0 QA period, we can revert it during that period.

dongjoon-hyun

Ur, I'm checking the benchmark result as a final review stage. It seems that there is some regression on JDK11 environment. I'll check the result on the linux box too. Could you double-check the result of ZStandardBenchmark in your JDK11 environment, @dchristle ? You should generate the result twice (first with 1.4.9-1 and second with 1.5.0-2).

--- a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
@@ -6,22 +6,22 @@ OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Mac OS X 11.4
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool           1523           1526           4          0.0      152315.0       1.0X
-Compression 10000 times at level 2 without buffer pool           1227           1229           2          0.0      122734.5       1.2X
-Compression 10000 times at level 3 without buffer pool           1548           1551           4          0.0      154821.8       1.0X
-Compression 10000 times at level 1 with buffer pool               782            793          13          0.0       78221.2       1.9X
-Compression 10000 times at level 2 with buffer pool              1127           1183          79          0.0      112668.4       1.4X
-Compression 10000 times at level 3 with buffer pool              1454           1469          21          0.0      145383.8       1.0X
+Compression 10000 times at level 1 without buffer pool           1451           1455           6          0.0      145071.2       1.0X
+Compression 10000 times at level 2 without buffer pool            447            517          53          0.0       44732.6       3.2X
+Compression 10000 times at level 3 without buffer pool           2287           2314          39          0.0      228662.8       0.6X
+Compression 10000 times at level 1 with buffer pool              1530           1534           6          0.0      153036.3       0.9X
+Compression 10000 times at level 2 with buffer pool              1894           1912          26          0.0      189350.2       0.8X
+Compression 10000 times at level 3 with buffer pool              2150           2218          96          0.0      215042.6       0.7X

 OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Mac OS X 11.4
 Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool           1458           1458           1          0.0      145788.4       1.0X
-Decompression 10000 times from level 2 without buffer pool           1460           1465           7          0.0      145988.5       1.0X
-Decompression 10000 times from level 3 without buffer pool           2223           2261          55          0.0      222258.0       0.7X
-Decompression 10000 times from level 1 with buffer pool              1397           1397           0          0.0      139660.8       1.0X
-Decompression 10000 times from level 2 with buffer pool              1391           1395           5          0.0      139148.7       1.0X
-Decompression 10000 times from level 3 with buffer pool              2249           2315          94          0.0      224883.8       0.6X
+Decompression 10000 times from level 1 without buffer pool           1571           1571           1          0.0      157078.0       1.0X
+Decompression 10000 times from level 2 without buffer pool           1581           1586           7          0.0      158062.5       1.0X
+Decompression 10000 times from level 3 without buffer pool           2439           2514         107          0.0      243850.6       0.6X
+Decompression 10000 times from level 1 with buffer pool              1378           1381           5          0.0      137771.0       1.1X
+Decompression 10000 times from level 2 with buffer pool              1391           1392           2          0.0      139109.7       1.1X
+Decompression 10000 times from level 3 with buffer pool              1940           2106         235          0.0      193981.0       0.8X

Undo unintended whitespace changes.

dongjoon-hyun · 2021-06-16T18:40:18Z

For the linux box, it looks reasonable. In this case, we need to add the benchmark result into the PR.

Could you follow the instruction at https://spark.apache.org/developer-tools.html Running benchmarks in your forked repository section and add the result into this PR, please?

--- a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
@@ -6,22 +6,22 @@ OpenJDK 64-Bit Server VM 11.0.10+9 on Linux 5.11.0-18-generic
 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            430            472          38          0.0       43018.6       1.0X
-Compression 10000 times at level 2 without buffer pool            216            218           1          0.0       21605.7       2.0X
-Compression 10000 times at level 3 without buffer pool            442            443           1          0.0       44182.3       1.0X
-Compression 10000 times at level 1 with buffer pool               233            235           1          0.0       23269.7       1.8X
-Compression 10000 times at level 2 with buffer pool               272            275           2          0.0       27203.8       1.6X
-Compression 10000 times at level 3 with buffer pool               386            390           6          0.0       38597.4       1.1X
+Compression 10000 times at level 1 without buffer pool            166            188          67          0.1       16637.9       1.0X
+Compression 10000 times at level 2 without buffer pool            208            209           0          0.0       20758.7       0.8X
+Compression 10000 times at level 3 without buffer pool            428            430           3          0.0       42785.1       0.4X
+Compression 10000 times at level 1 with buffer pool               229            230           1          0.0       22885.1       0.7X
+Compression 10000 times at level 2 with buffer pool               267            268           1          0.0       26675.6       0.6X
+Compression 10000 times at level 3 with buffer pool               372            376           3          0.0       37249.8       0.4X

 OpenJDK 64-Bit Server VM 11.0.10+9 on Linux 5.11.0-18-generic
 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool            442            447           9          0.0       44178.4       1.0X
-Decompression 10000 times from level 2 without buffer pool            443            443           1          0.0       44254.7       1.0X
-Decompression 10000 times from level 3 without buffer pool            441            443           1          0.0       44120.0       1.0X
-Decompression 10000 times from level 1 with buffer pool               383            384           1          0.0       38295.0       1.2X
-Decompression 10000 times from level 2 with buffer pool               385            388           2          0.0       38461.7       1.1X
-Decompression 10000 times from level 3 with buffer pool               384            387           2          0.0       38360.3       1.2X
+Decompression 10000 times from level 1 without buffer pool            440            443           5          0.0       43967.4       1.0X
+Decompression 10000 times from level 2 without buffer pool            439            439           0          0.0       43869.5       1.0X
+Decompression 10000 times from level 3 without buffer pool            439            441           2          0.0       43875.2       1.0X
+Decompression 10000 times from level 1 with buffer pool               379            381           2          0.0       37903.4       1.2X
+Decompression 10000 times from level 2 with buffer pool               382            383           2          0.0       38178.6       1.2X
+Decompression 10000 times from level 3 with buffer pool               381            383           2          0.0       38059.7       1.2X

dongjoon-hyun · 2021-06-16T19:06:02Z

I updated the above result with a clean-build result, @dchristle .

dongjoon-hyun · 2021-06-17T06:12:38Z

BTW, if you get the benchmark result via GitHub Action, please share the GitHub Action links with us.

dongjoon-hyun · 2021-06-17T14:39:44Z

I ran the benchmark on your branches.

Here is the summary.

SPARK-PR-32826:SPARK-PR-32826 ✗ $ git diff
diff --git a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
index 3895e7b..e93671d 100644
--- a/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-jdk11-results.txt
@@ -2,26 +2,26 @@
 Benchmark ZStandardCompressionCodec
 ================================================================================================

-OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            606            614           6          0.0       60645.3       1.0X
-Compression 10000 times at level 2 without buffer pool            686            693           7          0.0       68594.9       0.9X
-Compression 10000 times at level 3 without buffer pool            906            920          14          0.0       90642.7       0.7X
-Compression 10000 times at level 1 with buffer pool               389            403          20          0.0       38901.4       1.6X
-Compression 10000 times at level 2 with buffer pool               450            466          13          0.0       45032.0       1.3X
-Compression 10000 times at level 3 with buffer pool               680            682           2          0.0       68004.2       0.9X
+Compression 10000 times at level 1 without buffer pool            805           1103         500          0.0       80501.4       1.0X
+Compression 10000 times at level 2 without buffer pool            728            744          20          0.0       72819.9       1.1X
+Compression 10000 times at level 3 without buffer pool            987            995           7          0.0       98719.4       0.8X
+Compression 10000 times at level 1 with buffer pool               371            377           8          0.0       37092.3       2.2X
+Compression 10000 times at level 2 with buffer pool               465            473           6          0.0       46509.8       1.7X
+Compression 10000 times at level 3 with buffer pool               715            738          20          0.0       71500.2       1.1X

-OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool           1209           1226          25          0.0      120862.8       1.0X
-Decompression 10000 times from level 2 without buffer pool           1191           1193           3          0.0      119064.9       1.0X
-Decompression 10000 times from level 3 without buffer pool           1188           1193           6          0.0      118843.3       1.0X
-Decompression 10000 times from level 1 with buffer pool               998           1004           9          0.0       99754.7       1.2X
-Decompression 10000 times from level 2 with buffer pool               990           1001          11          0.0       99043.8       1.2X
-Decompression 10000 times from level 3 with buffer pool               983            999          20          0.0       98269.5       1.2X
+Decompression 10000 times from level 1 without buffer pool            776            786          11          0.0       77649.5       1.0X
+Decompression 10000 times from level 2 without buffer pool            787            792           5          0.0       78686.6       1.0X
+Decompression 10000 times from level 3 without buffer pool            782            790           7          0.0       78195.4       1.0X
+Decompression 10000 times from level 1 with buffer pool               529            551          21          0.0       52914.0       1.5X
+Decompression 10000 times from level 2 with buffer pool               523            537          11          0.0       52266.2       1.5X
+Decompression 10000 times from level 3 with buffer pool               519            527          10          0.0       51932.3       1.5X


diff --git a/core/benchmarks/ZStandardBenchmark-results.txt b/core/benchmarks/ZStandardBenchmark-results.txt
index 6990c28..d1aa07a 100644
--- a/core/benchmarks/ZStandardBenchmark-results.txt
+++ b/core/benchmarks/ZStandardBenchmark-results.txt
@@ -2,26 +2,26 @@
 Benchmark ZStandardCompressionCodec
 ================================================================================================

-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 --------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool            670            681           9          0.0       67011.0       1.0X
-Compression 10000 times at level 2 without buffer pool            569            571           2          0.0       56932.0       1.2X
-Compression 10000 times at level 3 without buffer pool            748            751           2          0.0       74813.8       0.9X
-Compression 10000 times at level 1 with buffer pool               336            337           1          0.0       33630.6       2.0X
-Compression 10000 times at level 2 with buffer pool               395            397           2          0.0       39472.6       1.7X
-Compression 10000 times at level 3 with buffer pool               563            567           4          0.0       56272.8       1.2X
+Compression 10000 times at level 1 without buffer pool            444            606         183          0.0       44440.9       1.0X
+Compression 10000 times at level 2 without buffer pool            514            527          10          0.0       51421.8       0.9X
+Compression 10000 times at level 3 without buffer pool            725            729           6          0.0       72531.4       0.6X
+Compression 10000 times at level 1 with buffer pool               229            235           6          0.0       22886.7       1.9X
+Compression 10000 times at level 2 with buffer pool               288            303          15          0.0       28802.3       1.5X
+Compression 10000 times at level 3 with buffer pool               493            521          26          0.0       49339.5       0.9X

-OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-1033-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 Benchmark ZStandardCompressionCodec:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------------------------
-Decompression 10000 times from level 1 without buffer pool           1029           1031           3          0.0      102887.4       1.0X
-Decompression 10000 times from level 2 without buffer pool           1028           1031           4          0.0      102847.8       1.0X
-Decompression 10000 times from level 3 without buffer pool           1029           1029           0          0.0      102941.0       1.0X
-Decompression 10000 times from level 1 with buffer pool               798            799           0          0.0       79838.0       1.3X
-Decompression 10000 times from level 2 with buffer pool               799            799           0          0.0       79852.9       1.3X
-Decompression 10000 times from level 3 with buffer pool               796            798           2          0.0       79630.5       1.3X
+Decompression 10000 times from level 1 without buffer pool           1188           1192           6          0.0      118770.4       1.0X
+Decompression 10000 times from level 2 without buffer pool           1176           1199          33          0.0      117574.4       1.0X
+Decompression 10000 times from level 3 without buffer pool           1174           1175           1          0.0      117426.0       1.0X
+Decompression 10000 times from level 1 with buffer pool              1020           1046          36          0.0      102021.9       1.2X
+Decompression 10000 times from level 2 with buffer pool               996           1005          14          0.0       99561.0       1.2X
+Decompression 10000 times from level 3 with buffer pool              1021           1022           1          0.0      102050.9       1.2X

dongjoon-hyun · 2021-06-17T14:43:44Z

I made a PR to you, @dchristle . Please review and merge that if you think that's okay. And, let's finish this PR.

Add benchmark result dchristle/spark#1

Add benchmark result

dchristle · 2021-06-17T16:08:36Z

I made a PR to you, @dchristle . Please review and merge that if you think that's okay. And, let's finish this PR.

dchristle#1

Excellent! I had triggered the full benchmark suite, so it took much longer than necessary. Thanks for updating this PR.

dongjoon-hyun

+1, LGTM (Pending CIs). Thank you, @dchristle .

dongjoon-hyun · 2021-06-17T16:15:12Z

BTW, I saw your result. You can use your result here too, @dchristle . I just wanted to help you to run the jobs.

Add benchmark result dchristle/spark#1 (comment)

dchristle · 2021-06-17T16:16:59Z

BTW, I saw your result. You can use your result here too, @dchristle . I just wanted to help you to run the jobs.

dchristle#1 (comment)

Let's use your commit. I didn't know how to specify only the ZStandardCodec benchmark to exclusively run, which was much faster. Good knowledge for the future :)

dongjoon-hyun · 2021-06-17T18:05:42Z

GitHub Action passed. Thank you for your contribution and patience. We are very careful because this is very important to Apache Spark 3.2.0, @dchristle .

Merged master to Apache Spark 3.2.0.

dongjoon-hyun · 2021-06-17T18:09:17Z

I added you to the Apache Spark contributor group in JIRA and assigned SPARK-35670 to you.
Congratulation for your first JIRA and welcome!

dchristle · 2021-06-21T20:25:34Z

I added you to the Apache Spark contributor group in JIRA and assigned SPARK-35670 to you.
Congratulation for your first JIRA and welcome!

Thank you for your help with this. I'm very glad to contribute. I've already run some pre-production workflows using the zstd-jni-1.5.0-2 build and haven't observed any issues so far. I'll think if there's more testing I can do.

mixermt · 2022-02-05T13:33:14Z

Hi @dchristle and @dongjoon-hyun,
I see great gains in comp/de-comp speeds when using RecyclingBufferPool. Are there any downsides of using it?

dongjoon-hyun · 2022-02-05T22:59:09Z

@mixermt . It's enabled by default since Apache Spark 3.2.0.

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 1778 to 1783 in 74ebef2

    
           private[spark] val IO_COMPRESSION_ZSTD_BUFFERPOOL_ENABLED = 
        
             ConfigBuilder("spark.io.compression.zstd.bufferPool.enabled") 
        
               .doc("If true, enable buffer pool of ZSTD JNI library.") 
        
               .version("3.2.0") 
        
               .booleanConf 
        
               .createWithDefault(true)

Although ZSTD-JNI implementation is improved a lot recently, you will hit high memory usage if you turn off the configuration.

mixermt · 2022-02-06T06:48:47Z

Thanks @dongjoon-hyun for your quick reply 🙏
Another small question regarding benchmarks, is there any benchmarks were done with different number of workers defined by parquet.compression.codec.zstd.workers ? Unfortunately, I can't find any benchmarks regarding this.
In our production tests, we see a significant degradation while using, other than default, number of workers, although by description it should improve speed. We provide plenty of memory to executors.

dongjoon-hyun · 2022-02-06T21:38:05Z

@mixermt . It seems that you are in the wrong community. You had better ask Apache Parquet community for Apache Parquet configuration issue instead of here. :)

github-actions bot added the BUILD label Jun 8, 2021

dongjoon-hyun reviewed Jun 8, 2021

View reviewed changes

pom.xml Outdated Show resolved Hide resolved

dchristle changed the title ~~[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-1~~ [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 Jun 16, 2021

github-actions bot added CORE DOCS INFRA KUBERNETES ML MLLIB PYTHON SQL STRUCTURED STREAMING WEB UI labels Jun 16, 2021

dchristle force-pushed the ZSTD150 branch 2 times, most recently from 1e585f0 to ab43055 Compare June 16, 2021 07:20

dongjoon-hyun reviewed Jun 16, 2021

View reviewed changes

dongjoon-hyun approved these changes Jun 16, 2021

View reviewed changes

dongjoon-hyun requested changes Jun 16, 2021

View reviewed changes

dchristle force-pushed the ZSTD150 branch from ab43055 to 40dd912 Compare June 16, 2021 18:25

[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2

a44b866

dchristle force-pushed the ZSTD150 branch from 74c36dd to a44b866 Compare June 16, 2021 18:28

Update pom.xml

2799e6a

Undo unintended whitespace changes.

benchmark

016641f

Merge pull request #1 from dongjoon-hyun/SPARK-PR-32826

4a567dd

Add benchmark result

dongjoon-hyun approved these changes Jun 17, 2021

View reviewed changes

dongjoon-hyun closed this in 7fcb127 Jun 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 #32826

[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 #32826

dchristle commented Jun 8, 2021 •

edited

Loading

AmplabJenkins commented Jun 8, 2021

dongjoon-hyun left a comment

dongjoon-hyun commented Jun 8, 2021

dchristle commented Jun 9, 2021 •

edited

Loading

dongjoon-hyun commented Jun 9, 2021 •

edited

Loading

dchristle commented Jun 9, 2021

dchristle commented Jun 10, 2021

dongjoon-hyun commented Jun 10, 2021 •

edited

Loading

dchristle commented Jun 11, 2021

dongjoon-hyun commented Jun 12, 2021 •

edited

Loading

dchristle commented Jun 12, 2021 •

edited

Loading

dongjoon-hyun commented Jun 15, 2021 •

edited

Loading

dongjoon-hyun commented Jun 16, 2021 •

edited

Loading

dongjoon-hyun left a comment

dongjoon-hyun commented Jun 16, 2021

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Jun 16, 2021 •

edited

Loading

dongjoon-hyun commented Jun 16, 2021

dongjoon-hyun commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dchristle commented Jun 17, 2021

dongjoon-hyun left a comment

dongjoon-hyun commented Jun 17, 2021 •

edited

Loading

dchristle commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dchristle commented Jun 21, 2021

mixermt commented Feb 5, 2022 •

edited

Loading

dongjoon-hyun commented Feb 5, 2022

mixermt commented Feb 6, 2022

dongjoon-hyun commented Feb 6, 2022

[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 #32826

[SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-2 #32826

Conversation

dchristle commented Jun 8, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Jun 8, 2021

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jun 8, 2021

dchristle commented Jun 9, 2021 • edited Loading

dongjoon-hyun commented Jun 9, 2021 • edited Loading

dchristle commented Jun 9, 2021

dchristle commented Jun 10, 2021

dongjoon-hyun commented Jun 10, 2021 • edited Loading

dchristle commented Jun 11, 2021

dongjoon-hyun commented Jun 12, 2021 • edited Loading

dchristle commented Jun 12, 2021 • edited Loading

dongjoon-hyun commented Jun 15, 2021 • edited Loading

dongjoon-hyun commented Jun 16, 2021 • edited Loading

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jun 16, 2021

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun left a comment • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun commented Jun 16, 2021 • edited Loading

dongjoon-hyun commented Jun 16, 2021

dongjoon-hyun commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dchristle commented Jun 17, 2021

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jun 17, 2021 • edited Loading

dchristle commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dongjoon-hyun commented Jun 17, 2021

dchristle commented Jun 21, 2021

mixermt commented Feb 5, 2022 • edited Loading

dongjoon-hyun commented Feb 5, 2022

mixermt commented Feb 6, 2022

dongjoon-hyun commented Feb 6, 2022

dchristle commented Jun 8, 2021 •

edited

Loading

dchristle commented Jun 9, 2021 •

edited

Loading

dongjoon-hyun commented Jun 9, 2021 •

edited

Loading

dongjoon-hyun commented Jun 10, 2021 •

edited

Loading

dongjoon-hyun commented Jun 12, 2021 •

edited

Loading

dchristle commented Jun 12, 2021 •

edited

Loading

dongjoon-hyun commented Jun 15, 2021 •

edited

Loading

dongjoon-hyun commented Jun 16, 2021 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dongjoon-hyun commented Jun 16, 2021 •

edited

Loading

dongjoon-hyun commented Jun 17, 2021 •

edited

Loading

mixermt commented Feb 5, 2022 •

edited

Loading