Spark 3.4: Initial support #7378

aokolnychyi · 2023-04-19T16:48:32Z

This PR adds initial support for Spark 3.4 and consists of 3 commits that must be preserved while merging.

a0012e6 moves 3.3 classes to 3.4 folder
c6aa3f1 copies 3.4 folder back into 3.3
723487b makes 3.4 work

The last change is the most important to review.

Note that this approach preserves the commit history only for 3.4 (our new default version). There are tricks to keep history both for 3.3 and 3.4 but they may cause issues while rebasing. That's why I followed the exact approach we used in 3.3.

It is worth mentioning that the DROP table behavior in the Spark session catalog is broken. That's why some tests had to be adapted. We are exploring a Spark fix at the moment.

Resolves #7174.

aokolnychyi · 2023-04-19T16:59:53Z

cc @nastra @rdblue @singhpk234 @RussellSpitzer @flyrain @amogh-jahagirdar @jackye1995 @szehon-ho

aokolnychyi · 2023-04-19T18:49:32Z

Note to reviewers, Russell left comments directly on 723487b. I'll create a JIRA for the Spark issue so that we can reference it in currently ignored tests.

Fokko

Thanks for adding this @aokolnychyi! Should we do a performance benchmark at some point between 3.3 and 3.4?

aokolnychyi · 2023-04-19T20:22:08Z

@Fokko, absolutely! I remember @bryanck was mentioning some benchmarking framework. Is there any chance we can use it for this? We also have some benchmarks internally, I'll ask @szehon-ho once 3.4 work is complete.

bryanck · 2023-04-19T21:02:00Z

The TPC-DS benchmarking tool we currently use is EMR-specific unfortunately.

aokolnychyi · 2023-04-19T22:18:17Z

I merged this PR as it is really hard to keep up with changes in master. I had started from scratch multiple times. We will need to cherry-pick #6480 back to 3.3 as it is only in 3.4 after this PR. I'll do that next, it is easier than doing the copy again.

singhpk234

LGTM as well, Thanks a ton for adding this @aokolnychyi !

Just some minor comments on 723487b and 3.4 directory

singhpk234 · 2023-04-19T21:21:56Z

...src/jmh/java/org/apache/iceberg/spark/data/parquet/SparkParquetWritersFlatDataBenchmark.java

+ * A benchmark that evaluates the performance of writing Parquet data with a flat schema using
+ * Iceberg and Spark Parquet writers.
+ *
+ * <p>To run this benchmark for spark-3.3: <code>


[minor] do we need to update this to 3.4 as well ?

Good catch! I'll follow up.

aokolnychyi · 2023-04-20T04:08:59Z

@singhpk234, I replied to comments on the change. Let me know if that makes sense, I'll follow up to fix JMH Javadoc.

singhpk234 · 2023-04-20T16:32:22Z

@singhpk234, I replied to comments on the change. Let me know if that makes sense

Makes sense to me. Thanks @aokolnychyi !

mgorsk1 · 2023-05-07T15:28:54Z

thanks for this @aokolnychyi any eta on when we can expect iceberg-spark-runtime-3.14_2.12:1.2.0 in maven?

vakarisbk · 2023-05-08T17:49:58Z

I'm also interested in spark-runtime for 3.4.0

aokolnychyi · 2023-05-08T17:56:31Z

@mgorsk1 @vakarisbk, the plan to get a public release out mid May.

github-actions bot added build INFRA spark labels Apr 19, 2023

aokolnychyi force-pushed the spark-3.4-support branch from 723487b to 089dc4d Compare April 19, 2023 16:55

RussellSpitzer approved these changes Apr 19, 2023

View reviewed changes

aokolnychyi force-pushed the spark-3.4-support branch from 089dc4d to 59571fa Compare April 19, 2023 19:25

Fokko approved these changes Apr 19, 2023

View reviewed changes

aokolnychyi added 3 commits April 19, 2023 13:45

Spark 3.4: Move 3.3 classes to 3.4 folder

d51ee16

Spark 3.3: Copy 3.3 classes back

3576396

Spark 3.4: Initial support

1d8481e

aokolnychyi force-pushed the spark-3.4-support branch from 59571fa to 1d8481e Compare April 19, 2023 20:46

aokolnychyi merged commit a880794 into apache:master Apr 19, 2023
34 checks passed

singhpk234 reviewed Apr 19, 2023

View reviewed changes

aokolnychyi mentioned this pull request Apr 20, 2023

Spark 3.3: Surface better error message during streaming planning when checkpoint snapshot not found #7381

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.4: Initial support #7378

Spark 3.4: Initial support #7378

aokolnychyi commented Apr 19, 2023 •

edited

aokolnychyi commented Apr 19, 2023

aokolnychyi commented Apr 19, 2023

Fokko left a comment

aokolnychyi commented Apr 19, 2023

bryanck commented Apr 19, 2023

aokolnychyi commented Apr 19, 2023

singhpk234 left a comment

singhpk234 Apr 19, 2023

aokolnychyi Apr 20, 2023

aokolnychyi commented Apr 20, 2023

singhpk234 commented Apr 20, 2023

mgorsk1 commented May 7, 2023

vakarisbk commented May 8, 2023

aokolnychyi commented May 8, 2023

Spark 3.4: Initial support #7378

Spark 3.4: Initial support #7378

Conversation

aokolnychyi commented Apr 19, 2023 • edited

aokolnychyi commented Apr 19, 2023

aokolnychyi commented Apr 19, 2023

Fokko left a comment

Choose a reason for hiding this comment

aokolnychyi commented Apr 19, 2023

bryanck commented Apr 19, 2023

aokolnychyi commented Apr 19, 2023

singhpk234 left a comment

Choose a reason for hiding this comment

singhpk234 Apr 19, 2023

Choose a reason for hiding this comment

aokolnychyi Apr 20, 2023

Choose a reason for hiding this comment

aokolnychyi commented Apr 20, 2023

singhpk234 commented Apr 20, 2023

mgorsk1 commented May 7, 2023

vakarisbk commented May 8, 2023

aokolnychyi commented May 8, 2023

aokolnychyi commented Apr 19, 2023 •

edited