Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44898][BUILD] Upgrade gcs-connector to 2.2.17 #42588

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 21, 2023

What changes were proposed in this pull request?

This PR aims to upgrade gcs-connector to 2.2.17.

Why are the changes needed?

To have the latest auth updates,

- <google.auth.version>1.12.1</google.auth.version>
+ <google.auth.version>1.14.0</google.auth.version>
- <google.cloud-storage.bom.version>2.23.0</google.cloud-storage.bom.version>
+ <google.cloud-storage.bom.version>2.25.0</google.cloud-storage.bom.version>

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs and manual tests.

BUILD

dev/make-distribution.sh -Phadoop-cloud

TEST

$ cd dist
$ export KEYFILE=~/.ssh/apache-spark-k8s-54ccbe6102d9.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/21 12:17:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1692645442153).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Aug 21, 2023
@dongjoon-hyun
Copy link
Member Author

Could you review this when you have some time, @viirya ?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Pending CI

@dongjoon-hyun
Copy link
Member Author

Thank you so much!

@dongjoon-hyun
Copy link
Member Author

The UT failures are irrelevant to this one. It will be fixed via #42589 .

@dongjoon-hyun
Copy link
Member Author

The root cause of failure is reverted via b9c4fa4.

In addition, the R CRAN failure is irrelevant.

* checking CRAN incoming feasibility ...Warning in read.dcf(con) :
  URL 'https://cran.r-project.org/src/contrib/PACKAGES.in': Timeout of 60 seconds was reached
Error in read.dcf(con) : cannot read from connection
Execution halted

@dongjoon-hyun
Copy link
Member Author

Merged to master~

valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
### What changes were proposed in this pull request?

This PR aims to upgrade gcs-connector to 2.2.17.

### Why are the changes needed?

To have the latest auth updates,

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.17 (2023-08-15)
  - GoogleCloudDataproc/hadoop-connectors#1041

```xml
- <google.auth.version>1.12.1</google.auth.version>
+ <google.auth.version>1.14.0</google.auth.version>
- <google.cloud-storage.bom.version>2.23.0</google.cloud-storage.bom.version>
+ <google.cloud-storage.bom.version>2.25.0</google.cloud-storage.bom.version>
```

- https://github.com/googleapis/google-auth-library-java/releases/tag/v1.14.0 (2022-12-06)
  - googleapis/google-auth-library-java#1100
  - googleapis/google-auth-library-java#993

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ cd dist
$ export KEYFILE=~/.ssh/apache-spark-k8s-54ccbe6102d9.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/21 12:17:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1692645442153).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#42588 from dongjoon-hyun/SPARK-44898.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
### What changes were proposed in this pull request?

This PR aims to upgrade gcs-connector to 2.2.17.

### Why are the changes needed?

To have the latest auth updates,

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.17 (2023-08-15)
  - GoogleCloudDataproc/hadoop-connectors#1041

```xml
- <google.auth.version>1.12.1</google.auth.version>
+ <google.auth.version>1.14.0</google.auth.version>
- <google.cloud-storage.bom.version>2.23.0</google.cloud-storage.bom.version>
+ <google.cloud-storage.bom.version>2.25.0</google.cloud-storage.bom.version>
```

- https://github.com/googleapis/google-auth-library-java/releases/tag/v1.14.0 (2022-12-06)
  - googleapis/google-auth-library-java#1100
  - googleapis/google-auth-library-java#993

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ cd dist
$ export KEYFILE=~/.ssh/apache-spark-k8s-54ccbe6102d9.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/21 12:17:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1692645442153).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#42588 from dongjoon-hyun/SPARK-44898.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
### What changes were proposed in this pull request?

This PR aims to upgrade gcs-connector to 2.2.17.

### Why are the changes needed?

To have the latest auth updates,

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.17 (2023-08-15)
  - GoogleCloudDataproc/hadoop-connectors#1041

```xml
- <google.auth.version>1.12.1</google.auth.version>
+ <google.auth.version>1.14.0</google.auth.version>
- <google.cloud-storage.bom.version>2.23.0</google.cloud-storage.bom.version>
+ <google.cloud-storage.bom.version>2.25.0</google.cloud-storage.bom.version>
```

- https://github.com/googleapis/google-auth-library-java/releases/tag/v1.14.0 (2022-12-06)
  - googleapis/google-auth-library-java#1100
  - googleapis/google-auth-library-java#993

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ cd dist
$ export KEYFILE=~/.ssh/apache-spark-k8s-54ccbe6102d9.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/21 12:17:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1692645442153).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#42588 from dongjoon-hyun/SPARK-44898.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants