Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful #38974

Closed
wants to merge 3 commits into from

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Dec 8, 2022

What changes were proposed in this pull request?

This pr aims to add bc-java related to test dependencies to sql module to make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful.

Why are the changes needed?

Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass GitHub Actions
  • Manual test:
build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -DskipTests -Dhadoop.version=3.4.0-SNAPSHOT -Psnapshots-and-staging

Before

Failed with org.bouncycastle.jce.provider.BouncyCastleProvider ClassNotFoundException.

After

BUILD SUCCESS

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Dec 8, 2022

https://github.com/LuciferYang/make-distribution.sh/blob/master/.github/workflows/blank.yml

# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the workflow will run
on:
  # Triggers the workflow on push or pull request events but only for the "master" branch
  push:
    branches: [ "master" ]
  pull_request:
    branches: [ "master" ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

jobs:
  build:

    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        java: [ '1.8', '11', '17' ]
    name: Build Spark with JDK ${{ matrix.java }}

    steps:
      - uses: actions/checkout@master
      - name: Set up JDK ${{ matrix.java }}
        uses: actions/setup-java@v1
        with:
          java-version: ${{ matrix.java }}
          distribution: 'zulu'
      - name: Install Python 3.8
        uses: actions/setup-python@v4
        with:
          python-version: 3.8
          architecture: x64
      - name: Install Python packages (Python 3.8)
        run: |
          python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==4.21.6'
          python3.8 -m pip list
      - name: build
        env:
          GH_TOKEN: ${{ github.token }}
        run: |
          git clone https://github.com/apache/spark.git
          cd spark
          gh pr checkout 38974
          build/mvn clean install -DskipTests -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Dhadoop.version=3.4.0-SNAPSHOT -Psnapshots-and-staging

https://github.com/LuciferYang/make-distribution.sh/actions/runs/3645098350

c1539ee (with scala-maven-plugin 4.8.0)build with above Github Actions success

image

@LuciferYang
Copy link
Contributor Author

Run

build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -DskipTests -Dhadoop.version=3.4.0-SNAPSHOT -Psnapshots-and-staging

locally with

openjdk version "1.8.0_352"
OpenJDK Runtime Environment (Zulu 8.66.0.15-CA-macos-aarch64) (build 1.8.0_352-b08)
OpenJDK 64-Bit Server VM (Zulu 8.66.0.15-CA-macos-aarch64) (build 25.352-b08, mixed mode)
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /Users/yangjie01/SourceCode/tools/maven
Java version: 1.8.0_352, vendor: Azul Systems, Inc., runtime: /Users/yangjie01/SourceCode/tools/zulu8u352/zulu-8.jdk/Contents/Home/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "mac os x", version: "11.4", arch: "aarch64", family: "mac"

SUCCESS

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.4.0-SNAPSHOT:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.019 s]
[INFO] Spark Project Tags ................................. SUCCESS [  8.647 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  9.546 s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 10.114 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 14.079 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 11.954 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 12.983 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  7.930 s]
[INFO] Spark Project Core ................................. SUCCESS [02:03 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 26.217 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 30.525 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 49.190 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [03:59 min]
[INFO] Spark Project SQL .................................. SUCCESS [03:30 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:26 min]
[INFO] Spark Project Tools ................................ SUCCESS [  6.388 s]
[INFO] Spark Project Hive ................................. SUCCESS [01:14 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 17.813 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 35.463 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 36.535 s]
[INFO] Spark Project Mesos ................................ SUCCESS [ 26.025 s]
[INFO] Spark Project Kubernetes ........................... SUCCESS [ 54.553 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 35.123 s]
[INFO] Spark Ganglia Integration .......................... SUCCESS [  8.270 s]
[INFO] Spark Project Hadoop Cloud Integration ............. SUCCESS [ 53.754 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 14.055 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 15.740 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 23.491 s]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 31.237 s]
[INFO] Spark Kinesis Integration .......................... SUCCESS [ 23.730 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 57.467 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  6.690 s]
[INFO] Spark Avro ......................................... SUCCESS [ 29.417 s]
[INFO] Spark Project Connect .............................. SUCCESS [ 52.574 s]
[INFO] Spark Protobuf ..................................... SUCCESS [ 25.662 s]
[INFO] Spark Project Kinesis Assembly ..................... SUCCESS [  9.153 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  25:43 min
[INFO] Finished at: 2022-12-08T12:24:31+08:00
[INFO] ------------------------------------------------------------------------

maven-build-java8-hadoop34.log

@LuciferYang
Copy link
Contributor Author

Downgrading scala-maven-plugin will reach 4.7.2, and the local maven build will still pass. I will update pr and test the GA compilation

[INFO] Reactor Summary for Spark Project Parent POM 3.4.0-SNAPSHOT:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  3.641 s]
[INFO] Spark Project Tags ................................. SUCCESS [  6.959 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  7.881 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  7.379 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 11.208 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.188 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 10.616 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  5.707 s]
[INFO] Spark Project Core ................................. SUCCESS [02:22 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 29.559 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 35.464 s]
[INFO] Spark Project Streaming ............................ SUCCESS [01:01 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [04:02 min]
[INFO] Spark Project SQL .................................. SUCCESS [03:49 min]
[INFO] Spark Project ML Library ........................... SUCCESS [02:45 min]
[INFO] Spark Project Tools ................................ SUCCESS [  6.600 s]
[INFO] Spark Project Hive ................................. SUCCESS [01:27 min]
[INFO] Spark Project REPL ................................. SUCCESS [ 18.706 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 30.993 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 36.845 s]
[INFO] Spark Project Mesos ................................ SUCCESS [ 30.759 s]
[INFO] Spark Project Kubernetes ........................... SUCCESS [ 37.600 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 43.337 s]
[INFO] Spark Ganglia Integration .......................... SUCCESS [  7.996 s]
[INFO] Spark Project Hadoop Cloud Integration ............. SUCCESS [ 17.219 s]
[INFO] Spark Project Assembly ............................. SUCCESS [  5.834 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ 17.551 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 24.742 s]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 38.248 s]
[INFO] Spark Kinesis Integration .......................... SUCCESS [ 27.528 s]
[INFO] Spark Project Examples ............................. SUCCESS [01:03 min]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  9.940 s]
[INFO] Spark Avro ......................................... SUCCESS [ 33.881 s]
[INFO] Spark Project Connect .............................. SUCCESS [ 54.508 s]
[INFO] Spark Protobuf ..................................... SUCCESS [ 32.005 s]
[INFO] Spark Project Kinesis Assembly ..................... SUCCESS [ 13.541 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  26:48 min
[INFO] Finished at: 2022-12-08T13:07:58+08:00
[INFO] ------------------------------------------------------------------------

maven-build-java8-hadoop34-plugin472.log

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Dec 8, 2022

https://github.com/LuciferYang/make-distribution.sh/actions/runs/3645502265/jobs/6155706406

9706cf5 (with scala-maven-plugin 4.7.2) test build with Github Actions successful

image

@LuciferYang
Copy link
Contributor Author

This pr can fix the compile issue in the dev mail list reported by @steveloughran, but should we wait until Hadoop 3.4 is upgraded?. What do you think? @HyukjinKwon @dongjoon-hyun @srowen

@LuciferYang LuciferYang changed the title [DON'T MERGE] Test build with hadoop 3.4.0-SNAPSHOT [SPARK-41392][BUILD] Add test dependencies for sql module to make maven build successful with Hadoop 3.4.0-SNAPSHOT Dec 9, 2022
@LuciferYang LuciferYang changed the title [SPARK-41392][BUILD] Add test dependencies for sql module to make maven build successful with Hadoop 3.4.0-SNAPSHOT [SPARK-41392][BUILD] Add test dependencies to sql module to make maven build successful with Hadoop 3.4.0-SNAPSHOT Dec 9, 2022
@LuciferYang LuciferYang changed the title [SPARK-41392][BUILD] Add test dependencies to sql module to make maven build successful with Hadoop 3.4.0-SNAPSHOT [SPARK-41392][BUILD] Add bc-java test dependencies to sql module to make maven build successful with Hadoop 3.4.0-SNAPSHOT Dec 9, 2022
@LuciferYang LuciferYang changed the title [SPARK-41392][BUILD] Add bc-java test dependencies to sql module to make maven build successful with Hadoop 3.4.0-SNAPSHOT [SPARK-41392][BUILD] Make maven build Spark master with Hadoop 3.4.0-SNAPSHOT successful Dec 9, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't want to depend on any SNAPSHOT dependencies, let's wait for the official Apache Hadoop 3.4.0, @LuciferYang .

@LuciferYang
Copy link
Contributor Author

fine to me, close first ~

@LuciferYang LuciferYang closed this Dec 9, 2022
@steveloughran
Copy link
Contributor

yeah, not going to happen for a while; 3.3.5 RC0 coming soon though; just trying to wrap up an abfs prefetch bug

@steveloughran
Copy link
Contributor

time to revisit this; 3.4.0 is in RC phase

dongjoon-hyun pushed a commit that referenced this pull request Feb 29, 2024
…`sql/core` module for Hadoop 3.4.0

### What changes were proposed in this pull request?

Adds bouncy-castle jdk18 artifacts to test builds in spark-sql.

Based on #38974
* only applies the test import changes
* dependencies are those of #44359

### Why are the changes needed?

Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle
JARs; maven builds fail.

### Does this PR introduce _any_ user-facing change?

No: test time dependency declarations only.

### How was this patch tested?

This was done through the release build/test project
https://github.com/apache/hadoop-release-support

1. Latest RC2 artifacts pulled from apache maven staging
2. Spark maven build triggered with the hadoop-version passed down.
3. The 3.3.6 release template worked with spark master (as it should!)
4. With this change the 3.4.0 RC build worked with this change

Note: have not *yet* done a maven test run through this yet

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45317 from steveloughran/SPARK-41392-HADOOP-3.4.0.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
…`sql/core` module for Hadoop 3.4.0

### What changes were proposed in this pull request?

Adds bouncy-castle jdk18 artifacts to test builds in spark-sql.

Based on apache#38974
* only applies the test import changes
* dependencies are those of apache#44359

### Why are the changes needed?

Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle
JARs; maven builds fail.

### Does this PR introduce _any_ user-facing change?

No: test time dependency declarations only.

### How was this patch tested?

This was done through the release build/test project
https://github.com/apache/hadoop-release-support

1. Latest RC2 artifacts pulled from apache maven staging
2. Spark maven build triggered with the hadoop-version passed down.
3. The 3.3.6 release template worked with spark master (as it should!)
4. With this change the 3.4.0 RC build worked with this change

Note: have not *yet* done a maven test run through this yet

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45317 from steveloughran/SPARK-41392-HADOOP-3.4.0.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…`sql/core` module for Hadoop 3.4.0

### What changes were proposed in this pull request?

Adds bouncy-castle jdk18 artifacts to test builds in spark-sql.

Based on apache#38974
* only applies the test import changes
* dependencies are those of apache#44359

### Why are the changes needed?

Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle
JARs; maven builds fail.

### Does this PR introduce _any_ user-facing change?

No: test time dependency declarations only.

### How was this patch tested?

This was done through the release build/test project
https://github.com/apache/hadoop-release-support

1. Latest RC2 artifacts pulled from apache maven staging
2. Spark maven build triggered with the hadoop-version passed down.
3. The 3.3.6 release template worked with spark master (as it should!)
4. With this change the 3.4.0 RC build worked with this change

Note: have not *yet* done a maven test run through this yet

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45317 from steveloughran/SPARK-41392-HADOOP-3.4.0.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
jpcorreia99 pushed a commit to jpcorreia99/spark that referenced this pull request Mar 12, 2024
…`sql/core` module for Hadoop 3.4.0

### What changes were proposed in this pull request?

Adds bouncy-castle jdk18 artifacts to test builds in spark-sql.

Based on apache#38974
* only applies the test import changes
* dependencies are those of apache#44359

### Why are the changes needed?

Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle
JARs; maven builds fail.

### Does this PR introduce _any_ user-facing change?

No: test time dependency declarations only.

### How was this patch tested?

This was done through the release build/test project
https://github.com/apache/hadoop-release-support

1. Latest RC2 artifacts pulled from apache maven staging
2. Spark maven build triggered with the hadoop-version passed down.
3. The 3.3.6 release template worked with spark master (as it should!)
4. With this change the 3.4.0 RC build worked with this change

Note: have not *yet* done a maven test run through this yet

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45317 from steveloughran/SPARK-41392-HADOOP-3.4.0.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants