-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kyuubi-spark-extensions module #631
Conversation
</parent> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<artifactId>kyuubi-sql-spark_3.1</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the folder named ...-3_1
but artifactId is ..._3.1
, it's by design?
Codecov Report
@@ Coverage Diff @@
## master #631 +/- ##
=======================================
Coverage 79.91% 79.91%
=======================================
Files 119 119
Lines 4636 4636
Branches 560 560
=======================================
Hits 3705 3705
Misses 620 620
Partials 311 311 Continue to review full report at Codecov.
|
cc @cloud-fan (if interested) |
Can we add some real hammers in doc for showing performance improvement with these extensions? |
</parent> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<artifactId>kyuubi-extension-spark_3.1</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<name>Kyuubi Project Dev Spark Extensions</name>
<packaging>jar</packaging>
|
||
<properties> | ||
<spark.version>3.1.1</spark.version> | ||
<scala.binary.version>2.12</scala.binary.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unecessary
val INSERT_REPARTITION_BEFORE_WRITE = | ||
buildConf("spark.sql.optimizer.insertRepartitionBeforeWrite.enabled") | ||
.doc("Add repartition node at the top of plan. A approach of merging small files.") | ||
.version("0.0.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use 1.2.0
<outputDirectory>target/scala-2.12/classes</outputDirectory> | ||
<testOutputDirectory>target/scala-2.12/test-classes</testOutputDirectory> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${scala.binary.version}
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary
|
||
<dependencies> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/NetEase/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> The added sql module structure looks like: ``` kyuubi | - dev | - kyuubi-extension-spark_3.1 ``` This PR mainly add 3 feature: * merging small files automatically (include dynamic partition insertion case) * insert shuffle node before Join to make AQE `OptimizeSkewedJoin` work * stage level config isolation in AQE Note that, the sql rule depend on the Apache Spark interface so we need make the sql module verion independence. Currently, this PR only supports the Spark 3.1.1. Due to the version issue, we need to check and deploy this extension manually currently. ### _How was this patch tested?_ Add new test. Closes #631 from ulysses-you/add-sql-module. Closes #631 2cf12f1 [ulysses-you] version cfbf72c [ulysses-you] address comment 7740ca6 [ulysses-you] module name 0f723eb [ulysses-you] workflow 45c23d8 [ulysses-you] line 80378f5 [ulysses-you] assembly 95528aa [ulysses-you] move module 5fe5d87 [ulysses-you] license 6578440 [ulysses-you] init work Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 43f40dc) Signed-off-by: Kent Yao <yao@apache.org>
thanks, merged to master for v1.3.0 and branch 1.2 for v1.2.0 |
module version of master is not correct |
nice catch can you send a followup? |
sure |
…module version <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/NetEase/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request Closes #633 from pan3793/minor. Closes #633 108aca0 [Cheng Pan] [BUILD] Fix kyuubi-extension-spark_3.1 module version Authored-by: Cheng Pan <379377944@qq.com> Signed-off-by: Kent Yao <yao@apache.org>
…-spark_3.1 <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/NetEase/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Fix `-Pkyuubi-sql-spark_3.1` introduced by KYUUBI #631 in GH action ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request Closes #645 from pan3793/build. Closes #645 d20c6d8 [Cheng Pan] suppress CI log for kyuubi-extension-spark_3.1 1499d5b [Cheng Pan] [BUILD] Fix CI profile kyuubi-extension-spark_3.1 Authored-by: Cheng Pan <379377944@qq.com> Signed-off-by: Kent Yao <yao@apache.org>
…-spark_3.1 <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/NetEase/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Fix `-Pkyuubi-sql-spark_3.1` introduced by KYUUBI #631 in GH action ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request Closes #645 from pan3793/build. Closes #645 d20c6d8 [Cheng Pan] suppress CI log for kyuubi-extension-spark_3.1 1499d5b [Cheng Pan] [BUILD] Fix CI profile kyuubi-extension-spark_3.1 Authored-by: Cheng Pan <379377944@qq.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 10d72e0) Signed-off-by: Kent Yao <yao@apache.org>
Hi @ulysses-you, I wonder is this feature product ready? I cannot find any documentation about how to use it in Kyuubi website. |
@yangrong688 yes, it's ready. But now this only available with Spark branch-3.1 (i.e 3.1.1 and 3.1.2). Since Kyuubi 1.2.0 has not released yet, you can download the last rc version in https://github.com/NetEase/kyuubi/releases. About the document, yeah it's better to improve the documents about this feature. You can try this
|
OK, thanks. I will have a try. |
please help with the doc @ulysses-you :) |
Why are the changes needed?
The added sql module structure looks like:
This PR mainly add 3 feature:
OptimizeSkewedJoin
workNote that, the sql rule depend on the Apache Spark interface so we need make the sql module verion independence. Currently, this PR only supports the Spark 3.1.1.
Due to the version issue, we need to check and deploy this extension manually currently.
How was this patch tested?
Add new test.