[SPARK-44898][BUILD] Upgrade `gcs-connector` to 2.2.17 #42588

dongjoon-hyun · 2023-08-21T18:48:50Z

What changes were proposed in this pull request?

This PR aims to upgrade gcs-connector to 2.2.17.

Why are the changes needed?

To have the latest auth updates,

https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.17 (2023-08-15)
- upgrade google-cloud-storage GoogleCloudDataproc/hadoop-connectors#1041

- <google.auth.version>1.12.1</google.auth.version>
+ <google.auth.version>1.14.0</google.auth.version>
- <google.cloud-storage.bom.version>2.23.0</google.cloud-storage.bom.version>
+ <google.cloud-storage.bom.version>2.25.0</google.cloud-storage.bom.version>

https://github.com/googleapis/google-auth-library-java/releases/tag/v1.14.0 (2022-12-06)
- fix: AwsCredentials should not call metadata server if security creds and region are retrievable through environment vars googleapis/google-auth-library-java#1100
- Fix: Not loosing the access token when calling UserCredentials#ToBuil… googleapis/google-auth-library-java#993

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs and manual tests.

BUILD

dev/make-distribution.sh -Phadoop-cloud

TEST

$ cd dist
$ export KEYFILE=~/.ssh/apache-spark-k8s-54ccbe6102d9.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/21 12:17:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1692645442153).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2023-08-21T19:19:32Z

Could you review this when you have some time, @viirya ?

viirya

Looks good to me. Pending CI

dongjoon-hyun · 2023-08-21T19:23:24Z

Thank you so much!

dongjoon-hyun · 2023-08-21T20:55:26Z

The UT failures are irrelevant to this one. It will be fixed via #42589 .

dongjoon-hyun · 2023-08-21T21:19:21Z

The root cause of failure is reverted via b9c4fa4.

In addition, the R CRAN failure is irrelevant.

* checking CRAN incoming feasibility ...Warning in read.dcf(con) :
  URL 'https://cran.r-project.org/src/contrib/PACKAGES.in': Timeout of 60 seconds was reached
Error in read.dcf(con) : cannot read from connection
Execution halted

dongjoon-hyun · 2023-08-21T22:27:45Z

Merged to master~

### What changes were proposed in this pull request? This PR aims to upgrade gcs-connector to 2.2.17. ### Why are the changes needed? To have the latest auth updates, - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.17 (2023-08-15) - GoogleCloudDataproc/hadoop-connectors#1041 ```xml - <google.auth.version>1.12.1</google.auth.version> + <google.auth.version>1.14.0</google.auth.version> - <google.cloud-storage.bom.version>2.23.0</google.cloud-storage.bom.version> + <google.cloud-storage.bom.version>2.25.0</google.cloud-storage.bom.version> ``` - https://github.com/googleapis/google-auth-library-java/releases/tag/v1.14.0 (2022-12-06) - googleapis/google-auth-library-java#1100 - googleapis/google-auth-library-java#993 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manual tests. **BUILD** ``` dev/make-distribution.sh -Phadoop-cloud ``` **TEST** ``` $ cd dist $ export KEYFILE=~/.ssh/apache-spark-k8s-54ccbe6102d9.json $ export EMAIL=$(jq -r '.client_email' < $KEYFILE) $ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE) $ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)" $ bin/spark-shell \ -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \ -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \ -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/08/21 12:17:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1692645442153). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.text("gs://apache-spark-bucket/README.md").count() res0: Long = 124 scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc") scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show() +------+--------------+----------------+ | name|favorite_color|favorite_numbers| +------+--------------+----------------+ |Alyssa| NULL| [3, 9, 15, 20]| | Ben| red| []| +------+--------------+----------------+ ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#42588 from dongjoon-hyun/SPARK-44898. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

[SPARK-44898][BUILD] Upgrade gcs-connector to 2.2.17

e1fe4e9

github-actions bot added the BUILD label Aug 21, 2023

viirya approved these changes Aug 21, 2023

View reviewed changes

dongjoon-hyun closed this in 109e9cd Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44898][BUILD] Upgrade `gcs-connector` to 2.2.17 #42588

[SPARK-44898][BUILD] Upgrade `gcs-connector` to 2.2.17 #42588

dongjoon-hyun commented Aug 21, 2023 •

edited

Loading

dongjoon-hyun commented Aug 21, 2023

viirya left a comment

dongjoon-hyun commented Aug 21, 2023

dongjoon-hyun commented Aug 21, 2023

dongjoon-hyun commented Aug 21, 2023

dongjoon-hyun commented Aug 21, 2023

[SPARK-44898][BUILD] Upgrade gcs-connector to 2.2.17 #42588

[SPARK-44898][BUILD] Upgrade gcs-connector to 2.2.17 #42588

Conversation

dongjoon-hyun commented Aug 21, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

dongjoon-hyun commented Aug 21, 2023

viirya left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Aug 21, 2023

dongjoon-hyun commented Aug 21, 2023

dongjoon-hyun commented Aug 21, 2023

dongjoon-hyun commented Aug 21, 2023

[SPARK-44898][BUILD] Upgrade `gcs-connector` to 2.2.17 #42588

[SPARK-44898][BUILD] Upgrade `gcs-connector` to 2.2.17 #42588

dongjoon-hyun commented Aug 21, 2023 •

edited

Loading