Skip to content

[SPARK-28981][K8S] Missing library for reading/writing Snappy-compressed files#25686

Closed
psschwei wants to merge 1 commit intoapache:masterfrom
psschwei:develop
Closed

[SPARK-28981][K8S] Missing library for reading/writing Snappy-compressed files#25686
psschwei wants to merge 1 commit intoapache:masterfrom
psschwei:develop

Conversation

@psschwei
Copy link

@psschwei psschwei commented Sep 5, 2019

What changes were proposed in this pull request?

Adding gcompat library to Dockerfile for Spark on Kubernetes

Why are the changes needed?

Current Dockerfile throws an error when trying to read/write snappy-compressed files. As Snappy is one of the default Spark compression codecs, it should be supported out-of-the-box.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Built Spark container image and testing reading / writing Snappy files.

@felixcheung
Copy link
Member

Jenkins, ok to test

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable to me

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

Test build #110164 has finished for PR 25686 at commit 696a9cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

@srowen
Copy link
Member

srowen commented Sep 5, 2019

See #25112 which suggests that the issue was resolved by the change in https://issues.apache.org/jira/browse/SPARK-26995 Are we sure here?

@dongjoon-hyun
Copy link
Member

Thank you for your first contribution, @psschwei . As @srowen mentioned, this is resolved at 2.4.4. Please download and build docker image from Apache Spark 2.4.4. I'll close this PR.

$ docker build -t spark:2.4.4 -f kubernetes/dockerfiles/spark/Dockerfile .
$ docker run --rm -it spark:2.4.4 /opt/spark/bin/spark-shell
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=/opt/spark/bin/spark-shell
+ case "$SPARK_K8S_CMD" in
+ echo 'Non-spark-on-k8s command provided, proceeding in pass-through mode...'
Non-spark-on-k8s command provided, proceeding in pass-through mode...
+ exec /sbin/tini -s -- /opt/spark/bin/spark-shell
19/09/05 17:39:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://454a817f8cee:4040
Spark context available as 'sc' (master = local[*], app id = local-1567705163260).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.range(10).write.parquet("/tmp/p")
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 96.54% for 7 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 84.47% for 8 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 75.08% for 9 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 67.58% for 10 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 61.43% for 11 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 67.58% for 10 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 75.08% for 9 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 84.47% for 8 writers
19/09/05 17:39:38 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 96.54% for 7 writers

scala> spark.read.parquet("/tmp/p").count
res1: Long = 10

@felixcheung
Copy link
Member

oops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants