[SPARK-44723][BUILD] Upgrade `gcs-connector` to 2.2.16 by dongjoon-hyun · Pull Request #42401 · apache/spark

dongjoon-hyun · 2023-08-08T17:18:54Z

What changes were proposed in this pull request?

This PR aims to upgrade gcs-connector to 2.2.16.

Why are the changes needed?

https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs and do the manual tests.

BUILD

dev/make-distribution.sh -Phadoop-cloud

TEST

$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+

HyukjinKwon · 2023-08-09T00:23:18Z

Merged to master.

dongjoon-hyun · 2023-08-09T01:05:01Z

Thank you, @HyukjinKwon !

### What changes were proposed in this pull request? This PR aims to upgrade `gcs-connector` to 2.2.16. ### Why are the changes needed? - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30) - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and do the manual tests. **BUILD** ``` dev/make-distribution.sh -Phadoop-cloud ``` **TEST** ``` $ export KEYFILE=your-credential-file.json $ export EMAIL=$(jq -r '.client_email' < $KEYFILE) $ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE) $ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)" $ bin/spark-shell \ -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \ -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \ -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1691516610108). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.text("gs://apache-spark-bucket/README.md").count() 23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md res0: Long = 124 scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc") 23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc 23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0 23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary 23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS scala> scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show() +------+--------------+----------------+ | name|favorite_color|favorite_numbers| +------+--------------+----------------+ |Alyssa| NULL| [3, 9, 15, 20]| | Ben| red| []| +------+--------------+----------------+ ``` Closes apache#42401 from dongjoon-hyun/SPARK-44723. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

This PR aims to upgrade `gcs-connector` to 2.2.16. - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30) - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02) No. Pass the CIs and do the manual tests. **BUILD** ``` dev/make-distribution.sh -Phadoop-cloud ``` **TEST** ``` $ export KEYFILE=your-credential-file.json $ export EMAIL=$(jq -r '.client_email' < $KEYFILE) $ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE) $ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)" $ bin/spark-shell \ -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \ -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \ -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1691516610108). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.text("gs://apache-spark-bucket/README.md").count() 23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md res0: Long = 124 scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc") 23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc 23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0 23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary 23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS scala> scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show() +------+--------------+----------------+ | name|favorite_color|favorite_numbers| +------+--------------+----------------+ |Alyssa| NULL| [3, 9, 15, 20]| | Ben| red| []| +------+--------------+----------------+ ``` Closes apache#42401 from dongjoon-hyun/SPARK-44723. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 3920a41)

This PR aims to upgrade `gcs-connector` to 2.2.16. - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30) - https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02) No. Pass the CIs and do the manual tests. **BUILD** ``` dev/make-distribution.sh -Phadoop-cloud ``` **TEST** ``` $ export KEYFILE=your-credential-file.json $ export EMAIL=$(jq -r '.client_email' < $KEYFILE) $ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE) $ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)" $ bin/spark-shell \ -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \ -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \ -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY" Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1691516610108). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) Type in expressions to have them evaluated. Type :help for more information. scala> spark.read.text("gs://apache-spark-bucket/README.md").count() 23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md res0: Long = 124 scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc") 23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc 23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0 23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary 23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS scala> scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show() +------+--------------+----------------+ | name|favorite_color|favorite_numbers| +------+--------------+----------------+ |Alyssa| NULL| [3, 9, 15, 20]| | Ben| red| []| +------+--------------+----------------+ ``` Closes apache#42401 from dongjoon-hyun/SPARK-44723. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 3920a41) (cherry picked from commit b3b9a55)

[SPARK-44723][BUILD] Upgrade gcs-connector to 2.2.16

9759e23

github-actions bot added the BUILD label Aug 8, 2023

HyukjinKwon approved these changes Aug 9, 2023

View reviewed changes

HyukjinKwon closed this in 3920a41 Aug 9, 2023

dongjoon-hyun deleted the SPARK-44723 branch August 9, 2023 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44723][BUILD] Upgrade `gcs-connector` to 2.2.16#42401

[SPARK-44723][BUILD] Upgrade `gcs-connector` to 2.2.16#42401
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-44723

dongjoon-hyun commented Aug 8, 2023 •

edited

Loading

Uh oh!

HyukjinKwon commented Aug 9, 2023

Uh oh!

dongjoon-hyun commented Aug 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dongjoon-hyun commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon commented Aug 9, 2023

Uh oh!

dongjoon-hyun commented Aug 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dongjoon-hyun commented Aug 8, 2023 •

edited

Loading