[SPARK-28981][K8S] Missing library for reading/writing Snappy-compressed files by psschwei · Pull Request #25686 · apache/spark

psschwei · 2019-09-05T00:18:26Z

What changes were proposed in this pull request?

Adding gcompat library to Dockerfile for Spark on Kubernetes

Why are the changes needed?

Current Dockerfile throws an error when trying to read/write snappy-compressed files. As Snappy is one of the default Spark compression codecs, it should be supported out-of-the-box.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Built Spark container image and testing reading / writing Snappy files.

felixcheung · 2019-09-05T06:09:37Z

Jenkins, ok to test

felixcheung

looks reasonable to me

SparkQA · 2019-09-05T06:21:25Z

Test build #110164 has finished for PR 25686 at commit 696a9cc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-05T06:32:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/15172/

SparkQA · 2019-09-05T06:47:39Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/15172/

srowen · 2019-09-05T13:47:49Z

See #25112 which suggests that the issue was resolved by the change in https://issues.apache.org/jira/browse/SPARK-26995 Are we sure here?

dongjoon-hyun · 2019-09-05T17:42:21Z

Thank you for your first contribution, @psschwei . As @srowen mentioned, this is resolved at 2.4.4. Please download and build docker image from Apache Spark 2.4.4. I'll close this PR.

$ docker build -t spark:2.4.4 -f kubernetes/dockerfiles/spark/Dockerfile .
$ docker run --rm -it spark:2.4.4 /opt/spark/bin/spark-shell
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=/opt/spark/bin/spark-shell
+ case "$SPARK_K8S_CMD" in
+ echo 'Non-spark-on-k8s command provided, proceeding in pass-through mode...'
Non-spark-on-k8s command provided, proceeding in pass-through mode...
+ exec /sbin/tini -s -- /opt/spark/bin/spark-shell
19/09/05 17:39:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://454a817f8cee:4040
Spark context available as 'sc' (master = local[*], app id = local-1567705163260).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.range(10).write.parquet("/tmp/p")
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 96.54% for 7 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 84.47% for 8 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 75.08% for 9 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 67.58% for 10 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 61.43% for 11 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 67.58% for 10 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 75.08% for 9 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 84.47% for 8 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 96.54% for 7 writers

scala> spark.read.parquet("/tmp/p").count
res1: Long = 10

felixcheung · 2019-09-06T03:33:00Z

oops

added gcompat for reading/writing snappy files

696a9cc

felixcheung approved these changes Sep 5, 2019

View reviewed changes

dongjoon-hyun added the KUBERNETES label Sep 5, 2019

dongjoon-hyun closed this Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-28981][K8S] Missing library for reading/writing Snappy-compressed files#25686

[SPARK-28981][K8S] Missing library for reading/writing Snappy-compressed files#25686
psschwei wants to merge 1 commit intoapache:masterfrom
psschwei:develop

psschwei commented Sep 5, 2019

Uh oh!

felixcheung commented Sep 5, 2019

Uh oh!

felixcheung left a comment

Uh oh!

SparkQA commented Sep 5, 2019

Uh oh!

SparkQA commented Sep 5, 2019

Uh oh!

SparkQA commented Sep 5, 2019

Uh oh!

srowen commented Sep 5, 2019

Uh oh!

dongjoon-hyun commented Sep 5, 2019

Uh oh!

felixcheung commented Sep 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

psschwei commented Sep 5, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

felixcheung commented Sep 5, 2019

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 5, 2019

Uh oh!

SparkQA commented Sep 5, 2019

Uh oh!

SparkQA commented Sep 5, 2019

Uh oh!

srowen commented Sep 5, 2019

Uh oh!

dongjoon-hyun commented Sep 5, 2019

Uh oh!

felixcheung commented Sep 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants