Test Spark 4.0.0-SNAPSHOT #1909

cxzl25 · 2024-04-24T08:42:45Z

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

GA

Was this patch authored or co-authored using generative AI tooling?

No

cxzl25 · 2024-04-24T08:45:01Z

java/bench/spark/pom.xml

-                    <exclude>META-INF/DUMMY.DSA</exclude>
+                    <exclude>META-INF/*.SF</exclude>
+                    <exclude>META-INF/*.DSA</exclude>
+                    <exclude>META-INF/*.RSA</exclude>


[WARNING] eclipse-collections-11.1.0.jar, eclipse-collections-api-11.1.0.jar define 4 overlapping resources: [WARNING] - LICENSE-EDL-1.0.txt [WARNING] - LICENSE-EPL-1.0.txt [WARNING] - META-INF/ECLIPSE_.RSA [WARNING] - META-INF/ECLIPSE_.SF

Currently, Spark has fixed this problem by upgrading the arrow-vector version, so there is no modification here.

[SPARK-47981][BUILD] Upgrade Arrow to 16.0.0

cxzl25 · 2024-04-24T08:45:26Z

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java

@@ -74,7 +74,7 @@
 @BenchmarkMode(Mode.AverageTime)
 @OutputTimeUnit(TimeUnit.MICROSECONDS)
 @AutoService(OrcBenchmark.class)
-@Fork(jvmArgsAppend = "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED")
+@Fork(jvmArgsAppend = {"--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED"})


Caused by: java.lang.IllegalAccessException: symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo, from interface org.apache.spark.sql.catalyst.util.SparkDateTimeUtils (unnamed module @2b71fc7e) at java.base/java.lang.invoke.MemberName.makeAccessException(MemberName.java:955) at java.base/java.lang.invoke.MethodHandles$Lookup.checkSymbolicClass(MethodHandles.java:3686) at java.base/java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:3646) at java.base/java.lang.invoke.MethodHandles$Lookup.findVirtual(MethodHandles.java:2680) at org.apache.spark.sql.catalyst.util.SparkDateTimeUtils.$init$(SparkDateTimeUtils.scala:206) at org.apache.spark.sql.catalyst.util.DateTimeUtils$.<clinit>(DateTimeUtils.scala:41)

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java

dongjoon-hyun

Thank you for testing this. 😄

I'd recommend to create a JIRA for migration to Scala 2.13 of Apache Spark 3.5.1 first. :)

…mark ### What changes were proposed in this pull request? This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark. ### Why are the changes needed? #1909 (review) ### How was this patch tested? local test ```bash java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet -compress zstd -data taxi ``` ``` Benchmark (compression) (dataset) (format) Mode Cnt Score Error Units SparkBenchmark.partialRead zstd taxi parquet avgt 5 17211.731 ± 11836.315 us/op SparkBenchmark.partialRead:bytesPerRecord zstd taxi parquet avgt 5 0.002 # SparkBenchmark.partialRead:ops zstd taxi parquet avgt 5 10.000 # SparkBenchmark.partialRead:perRecord zstd taxi parquet avgt 5 0.001 ± 0.001 us/op SparkBenchmark.partialRead:records zstd taxi parquet avgt 5 113791180.000 # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #1912 from cxzl25/ORC-1704. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…mark ### What changes were proposed in this pull request? This PR aims to migrate to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark. ### Why are the changes needed? #1909 (review) ### How was this patch tested? local test ```bash java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet -compress zstd -data taxi ``` ``` Benchmark (compression) (dataset) (format) Mode Cnt Score Error Units SparkBenchmark.partialRead zstd taxi parquet avgt 5 17211.731 ± 11836.315 us/op SparkBenchmark.partialRead:bytesPerRecord zstd taxi parquet avgt 5 0.002 # SparkBenchmark.partialRead:ops zstd taxi parquet avgt 5 10.000 # SparkBenchmark.partialRead:perRecord zstd taxi parquet avgt 5 0.001 ± 0.001 us/op SparkBenchmark.partialRead:records zstd taxi parquet avgt 5 113791180.000 # ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #1912 from cxzl25/ORC-1704. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit dc634cb) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2024-04-30T17:23:23Z

Hi, @cxzl25 . Sorry for asking this, but could you rebase this PR once more?

dongjoon-hyun · 2024-05-01T04:55:08Z

Thank you!

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java

…nchmark runs on JDK17 ### What changes were proposed in this pull request? This PR aims to fix `sun.util.calendar` IllegalAccessException when SparkBenchmark runs on JDK17. ### Why are the changes needed? #1909 (comment) ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1919 from cxzl25/ORC-1707. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…nchmark runs on JDK17 ### What changes were proposed in this pull request? This PR aims to fix `sun.util.calendar` IllegalAccessException when SparkBenchmark runs on JDK17. ### Why are the changes needed? #1909 (comment) ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? No Closes #1919 from cxzl25/ORC-1707. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 5bb2346) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

cxzl25 marked this pull request as draft April 24, 2024 08:42

github-actions bot added BUILD JAVA labels Apr 24, 2024

cxzl25 commented Apr 24, 2024

View reviewed changes

cxzl25 mentioned this pull request Apr 24, 2024

ORC-1700: Write parquet decimal type data in Benchmark using FIXED_LEN_BYTE_ARRAY type #1910

Open

dongjoon-hyun reviewed Apr 24, 2024

View reviewed changes

cxzl25 mentioned this pull request Apr 25, 2024

ORC-1704: Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark #1912

Closed

cxzl25 force-pushed the support_spark_4 branch from ad9e79a to 68f02f7 Compare April 25, 2024 04:22

cxzl25 force-pushed the support_spark_4 branch from 8afc779 to a988573 Compare May 1, 2024 02:12

dongjoon-hyun reviewed May 1, 2024

View reviewed changes

java/bench/spark/src/java/org/apache/orc/bench/spark/SparkBenchmark.java Outdated Show resolved Hide resolved

cxzl25 mentioned this pull request May 1, 2024

ORC-1707: Fix sun.util.calendar IllegalAccessException when SparkBenchmark runs on JDK17 #1919

Closed

cxzl25 force-pushed the support_spark_4 branch from eff9a57 to 590c8f3 Compare May 1, 2024 16:08

cxzl25 mentioned this pull request Jun 3, 2024

Bump spark.version from 3.5.1 to 4.0.0-preview1 in /java #1951

Closed

cxzl25 added 6 commits June 3, 2024 20:51

test spark 4.0.0-snapshot

575c920

sytle

5e20e54

META-INF

40f9f2f

trigger test

8de2c90

scala 2.13.14 [SPARK-48049]

428be68

4.0.0-preview1

7b3cdb0

cxzl25 force-pushed the support_spark_4 branch from d1d9f38 to 7b3cdb0 Compare June 3, 2024 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Spark 4.0.0-SNAPSHOT #1909

Test Spark 4.0.0-SNAPSHOT #1909

cxzl25 commented Apr 24, 2024

cxzl25 Apr 24, 2024

cxzl25 Apr 26, 2024

cxzl25 Apr 24, 2024

dongjoon-hyun left a comment

dongjoon-hyun commented Apr 30, 2024

dongjoon-hyun commented May 1, 2024

Test Spark 4.0.0-SNAPSHOT #1909

Are you sure you want to change the base?

Test Spark 4.0.0-SNAPSHOT #1909

Conversation

cxzl25 commented Apr 24, 2024

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

cxzl25 Apr 24, 2024

Choose a reason for hiding this comment

cxzl25 Apr 26, 2024

Choose a reason for hiding this comment

cxzl25 Apr 24, 2024

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Apr 30, 2024

dongjoon-hyun commented May 1, 2024