Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten #4822

Merged
merged 2 commits into from
Mar 4, 2024

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Mar 1, 2024

What changes were proposed in this pull request?

This PR is a follow-up to PR#4425. Many thanks to @holdenk for her previous work

The mainly changes as below:

  1. WhostageTransfomer: Spark 3.5 has changed the parameter type of the generateTreeString API in TreeNode, which prevents WhostageTransfomer from directly overriding the generateTreeString method. A GenerateTreeStringShim trait is defined in the shim to override generateTreeString.
  2. Spark 3.5 has changed the PartitionDirectory API and PartitionedFileUtil.splitFiles API. To be compatible with 3.5, we have made these changes in the shim layer.
  3. ColumnarShuffleExchangeExec: Spark 3.5 has added a new interface, advisoryPartitionSize, in ShuffleExchangeLike.
  4. GlutenQueryTest: Spark 3.5 has introduced ExtendedAnalysisException to replace the previous AnalysisException. The solution is to introduce ExtendedAnalysisException in versions prior to 3.5.
  5. ColumnarBuildSideRelation and BroadcastUtils: Spark 3.5 has modified the fromAttributes and toAttributes interfaces in StructType.

How was this patch tested?

Add tpcds/h test

Copy link

github-actions bot commented Mar 1, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Mar 1, 2024

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor

the failed tests in ubuntu2004-test-spark32 are due to #4815, please do a rebase

@JkSelf JkSelf changed the title [CORE] Upgrade spark version to 3.5.1 in Gluten [GLUTEN-4424][CORE] Upgrade spark version to 3.5.1 in Gluten Mar 1, 2024
Copy link

github-actions bot commented Mar 1, 2024

#4424

Copy link

github-actions bot commented Mar 1, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 1, 2024

Run Gluten Clickhouse CI

@JkSelf
Copy link
Contributor Author

JkSelf commented Mar 1, 2024

@holdenk @ulysses-you @PHILO-HE @zhouyuan Can you help to review? Thanks.

Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @JkSelf for the work!

Copy link

github-actions bot commented Mar 1, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Mar 1, 2024

Run Gluten Clickhouse CI

@holdenk
Copy link
Contributor

holdenk commented Mar 1, 2024

Awesome, thanks for taking on the upgrade :)
If it's still open on Thursday I'll take a look (otherwise I'm going to be riding motorcycles or on the beach hiding from computers)

P.S. my pronouns are she/her :)

Co-authored-by: Holden Karau <holden@pigscanfly.ca>
@zhouyuan
Copy link
Contributor

zhouyuan commented Mar 2, 2024

There's a new operator for partial window need to be supported, fired one issue to track this #4836

@zhouyuan
Copy link
Contributor

zhouyuan commented Mar 2, 2024

/Benchmark Velox

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4822_time.csv log/native_master_03_01_2024_731c84c7b_time.csv difference percentage
q1 32.58 36.32 3.743 111.49%
q2 24.42 24.31 -0.113 99.54%
q3 37.95 37.58 -0.376 99.01%
q4 38.03 37.63 -0.404 98.94%
q5 69.84 70.45 0.612 100.88%
q6 7.13 7.25 0.124 101.73%
q7 86.86 82.49 -4.367 94.97%
q8 86.55 85.80 -0.757 99.13%
q9 123.71 126.02 2.313 101.87%
q10 42.50 45.33 2.824 106.65%
q11 20.50 21.15 0.649 103.17%
q12 27.78 28.50 0.717 102.58%
q13 45.67 45.73 0.053 100.12%
q14 15.44 18.98 3.547 122.98%
q15 29.08 29.48 0.400 101.37%
q16 14.61 13.60 -1.009 93.10%
q17 103.02 103.34 0.317 100.31%
q18 150.56 149.23 -1.338 99.11%
q19 12.58 14.97 2.384 118.95%
q20 27.90 26.66 -1.241 95.55%
q21 224.45 224.93 0.480 100.21%
q22 13.58 13.67 0.094 100.69%
total 1234.74 1243.39 8.655 100.70%

@zhouyuan
Copy link
Contributor

zhouyuan commented Mar 4, 2024

CC: @zzcclp

Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except one comment

Copy link

github-actions bot commented Mar 4, 2024

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some trivial comments. Could you check and see if it makes sense? Thanks!

assert(child.isInstanceOf[TransformSupport])

def stageId: Int = transformStageId

def substraitPlanJsonValue: String = substraitPlanJson
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the new method substraitPlanJsonValue and just use the existing substraitPlanJson? We may just need to declare substraitPlanJson in the extending trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will updated in the following PRs. Thanks @PHILO-HE

* @since 1.3.0
*/
@Stable
class ExtendedAnalysisException protected[sql] (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just keep one for this new exception class? Assume they have no difference. Maybe, we can move it into a common place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to move it in shim/common? This class only need in 32/33/34. And 35 is not needed. It maybe not make sense to move to shim/common.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong feeling on this. Keeping the current change is OK to me. BTW, it would be better to also add some comment for these classes to let people know why they are introduced. Thanks!

@JkSelf JkSelf merged commit 6b0f346 into apache:main Mar 4, 2024
20 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4822_time.csv log/native_master_03_03_2024_6b5ee6971_time.csv difference percentage
q1 31.78 36.61 4.827 115.19%
q2 24.23 24.50 0.270 101.12%
q3 37.42 39.42 2.000 105.34%
q4 36.06 38.53 2.468 106.84%
q5 70.33 70.78 0.447 100.64%
q6 5.48 7.30 1.823 133.28%
q7 84.03 84.97 0.941 101.12%
q8 84.73 84.52 -0.211 99.75%
q9 127.68 121.76 -5.919 95.36%
q10 42.86 44.77 1.913 104.46%
q11 20.30 20.74 0.442 102.18%
q12 27.24 29.46 2.214 108.13%
q13 45.60 45.96 0.358 100.79%
q14 16.65 21.41 4.760 128.59%
q15 29.11 27.97 -1.141 96.08%
q16 15.70 13.22 -2.477 84.22%
q17 102.32 103.13 0.805 100.79%
q18 148.69 146.21 -2.480 98.33%
q19 13.91 13.66 -0.254 98.18%
q20 26.20 26.57 0.370 101.41%
q21 226.36 225.15 -1.216 99.46%
q22 13.66 13.79 0.124 100.91%
total 1230.35 1240.41 10.065 100.82%

@zemin-piao zemin-piao mentioned this pull request Jun 24, 2024
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants