Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37445][BUILD] Rename the maven hadoop profile to hadoop-3 and hadoop-2 #34715

Closed
wants to merge 31 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Nov 26, 2021

What changes were proposed in this pull request?

Rename the maven hadoop profile to hadoop-3 and hadoop-2

Why are the changes needed?

Upgrade hadoop profile

Does this PR introduce any user-facing change?

n this pr, we change hadoop profile form hadoop-3 and hadoop-2, and hadoop-3 is default hadoop profile.
User can build other lower version by -Dhadoop.version=3.2.1 such as

./build/mvn -Pyarn -Dhadoop.version=3.2.1 -DskipTests clean package

If user want to built hadoop 2 related version. you can run

./build/mvn -Pyarn -Phadoop-2 -Dhadoop.version=2.8.3 -DskipTests clean package

How was this patch tested?

Not need

@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50113/

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Nov 26, 2021

@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Test build #145648 has finished for PR 34715 at commit 259dfc7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50113/

@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50119/

@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Test build #145653 has finished for PR 34715 at commit 7f9b1f0.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50175/

@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Test build #145705 has finished for PR 34715 at commit 758b267.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good @AngersZhuuuu ! will take another pass later.

python/pyspark/install.py Show resolved Hide resolved
@HyukjinKwon
Copy link
Member

From a cursory look, seems fine to me too but I think it's best to defer to @sunchao

@mridulm
Copy link
Contributor

mridulm commented Dec 1, 2021

+CC @xkrogen

python/pyspark/install.py Outdated Show resolved Hide resolved
resource-managers/kubernetes/integration-tests/README.md Outdated Show resolved Hide resolved
dev/run-tests-jenkins.py Show resolved Hide resolved
@@ -34,8 +34,8 @@ HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive \
-Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud"
MVN="build/mvn"
HADOOP_HIVE_PROFILES=(
hadoop-2.7-hive-2.3
hadoop-3.2-hive-2.3
hadoop-2-hive-2.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated but the hive-2.3 seems redundant, since we don't support alternative Hive profile right now

docs/building-spark.md Outdated Show resolved Hide resolved
pom.xml Show resolved Hide resolved
python/docs/source/getting_started/install.rst Outdated Show resolved Hide resolved
python/pyspark/install.py Outdated Show resolved Hide resolved
@SparkQA
Copy link

SparkQA commented Dec 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50401/

@SparkQA
Copy link

SparkQA commented Dec 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50401/

@SparkQA
Copy link

SparkQA commented Dec 5, 2021

Test build #145925 has finished for PR 34715 at commit c222d0e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50444/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50444/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Test build #145969 has finished for PR 34715 at commit 07621be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50450/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50450/

@SparkQA
Copy link

SparkQA commented Dec 7, 2021

Test build #145975 has finished for PR 34715 at commit 7b39401.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @AngersZhuuuu !

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. After this gets merged, let's also send an email to dev mailing list about the renamed profiles to prevent other external build trains to be updated too in other forked repos.

return spark_version, convert_old_hadoop_version(spark_version, hadoop_version), hive_version


def convert_old_hadoop_version(spark_version, hadoop_version):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this together guys.

@AngersZhuuuu
Copy link
Contributor Author

@HyukjinKwon This GA failed caused by the same SQLMetricsSuite mentioned by you in my PR.

SPARK-34399: Add job commit duration metrics for DataWritingCommand

@sunchao I think we can merge this if there are no more question.
And I will review

SPARK-34399: Add job commit duration metrics for DataWritingCommand

and try to make it stable.

@sunchao sunchao closed this in fdc276b Dec 8, 2021
@sunchao
Copy link
Member

sunchao commented Dec 8, 2021

Thanks, merged!

@HyukjinKwon
Copy link
Member

Awesome. @AngersZhuuuu, mind sending an email to dev mailing list articulating what's the diff and impact? We would have to list up e.g.) profile name, released tar file name, pip installation option, etc.

@HyukjinKwon
Copy link
Member

e.g.) if people are using hadoop-2.7 in their build, now their build will use hadoop 3 by default with warning that hadoop 2.7 profile does not exist.

@AngersZhuuuu
Copy link
Contributor Author

Awesome. @AngersZhuuuu, mind sending an email to dev mailing list articulating what's the diff and impact? We would have to list up e.g.) profile name, released tar file name, pip installation option, etc.

Yea. Will organize an email later

HyukjinKwon pushed a commit that referenced this pull request Dec 8, 2021
…p3 for the scheduled GA job for branch-3.2

### What changes were proposed in this pull request?

This PR fixes an issue that the scheduled GA job for `branch-3.2` fails.
SPARK-37445 (#34715), renamed the profile name `hadoop3.2` to `hadoop3` but It should be `hadoop3.2` for the scheduled build.
https://github.com/apache/spark/runs/4453894964?check_suite_focus=true

### Why are the changes needed?

To recover the job.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The scheduled job itself.

Closes #34835 from sarutak/followup-SPARK-37445.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants