Skip to content

[SPARK-44723][BUILD] Upgrade gcs-connector to 2.2.16#42401

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-44723
Closed

[SPARK-44723][BUILD] Upgrade gcs-connector to 2.2.16#42401
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-44723

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 8, 2023

What changes were proposed in this pull request?

This PR aims to upgrade gcs-connector to 2.2.16.

Why are the changes needed?

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs and do the manual tests.

BUILD

dev/make-distribution.sh -Phadoop-cloud

TEST

$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+

@github-actions github-actions bot added the BUILD label Aug 8, 2023
@HyukjinKwon
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-44723 branch August 9, 2023 01:05
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
### What changes were proposed in this pull request?

This PR aims to upgrade `gcs-connector` to 2.2.16.

### Why are the changes needed?

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
### What changes were proposed in this pull request?

This PR aims to upgrade `gcs-connector` to 2.2.16.

### Why are the changes needed?

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
prabhjyotsingh pushed a commit to acceldata-io/spark3 that referenced this pull request Feb 8, 2025
This PR aims to upgrade `gcs-connector` to 2.2.16.

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

No.

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3920a41)
prabhjyotsingh pushed a commit to acceldata-io/spark3 that referenced this pull request Feb 8, 2025
This PR aims to upgrade `gcs-connector` to 2.2.16.

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

No.

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3920a41)
(cherry picked from commit b3b9a55)
prabhjyotsingh pushed a commit to acceldata-io/spark3 that referenced this pull request Feb 8, 2025
This PR aims to upgrade `gcs-connector` to 2.2.16.

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

No.

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3920a41)
(cherry picked from commit b3b9a55)
shubhluck pushed a commit to acceldata-io/spark3 that referenced this pull request May 16, 2025
This PR aims to upgrade `gcs-connector` to 2.2.16.

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

No.

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3920a41)
(cherry picked from commit b3b9a55)
senthh pushed a commit to acceldata-io/spark3 that referenced this pull request May 26, 2025
This PR aims to upgrade `gcs-connector` to 2.2.16.

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

No.

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3920a41)
(cherry picked from commit b3b9a55)
shubhluck pushed a commit to acceldata-io/spark3 that referenced this pull request Sep 3, 2025
This PR aims to upgrade `gcs-connector` to 2.2.16.

- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.16 (2023-06-30)
- https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.15 (2023-06-02)

No.

Pass the CIs and do the manual tests.

**BUILD**
```
dev/make-distribution.sh -Phadoop-cloud
```

**TEST**
```
$ export KEYFILE=your-credential-file.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
    -c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
    -c spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
    -c spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/08 10:43:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1691516610108).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
23/08/08 10:43:46 WARN GhfsStorageStatistics: Detected potential high latency for operation op_get_file_status. latencyMs=823; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/README.md
res0: Long = 124

scala> spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=549; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc
23/08/08 10:43:59 WARN GhfsStorageStatistics: Detected potential high latency for operation op_mkdirs. latencyMs=440; previousMaxLatencyMs=0; operationCount=1; context=gs://apache-spark-bucket/users.orc/_temporary/0
23/08/08 10:44:04 WARN GhfsStorageStatistics: Detected potential high latency for operation op_delete. latencyMs=631; previousMaxLatencyMs=549; operationCount=2; context=gs://apache-spark-bucket/users.orc/_temporary
23/08/08 10:44:05 WARN GhfsStorageStatistics: Detected potential high latency for operation stream_write_close_operations. latencyMs=572; previousMaxLatencyMs=393; operationCount=2; context=gs://apache-spark-bucket/users.orc/_SUCCESS

scala>

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+------+--------------+----------------+
|  name|favorite_color|favorite_numbers|
+------+--------------+----------------+
|Alyssa|          NULL|  [3, 9, 15, 20]|
|   Ben|           red|              []|
+------+--------------+----------------+
```

Closes apache#42401 from dongjoon-hyun/SPARK-44723.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 3920a41)
(cherry picked from commit b3b9a55)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants