Skip to content

Commit

Permalink
[SPARK-25262][K8S] Document K8S local storage config
Browse files Browse the repository at this point in the history
Adds documentation of how K8S uses local storage and how to configure it
for different environments.
  • Loading branch information
rvesse committed Aug 29, 2018
1 parent 70338f1 commit 8762ac1
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions docs/running-on-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,22 @@ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai

The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below.

## Local Storage

Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `SPARK_LOCAL_DIRS`. If no directories are explicitly specified then a default directory is created and configured appropriately.

`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod.

### Using RAM for local storage

As `emptyDir` volumes use the nodes backing storage for ephemeral storage this default behaviour may not be appropriate for some compute environments. For example if you have diskless nodes with remote storage mounted over a network having lots of executors doing IO to this remote storage may actually degrade performance.

In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes. When configured like this Sparks local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests via the normal `spark.driver.memory` and `spark.executor.memory` configuration properties.

### Using arbitrary volumes for local storage

Alternatively if using the pod template feature you can provide a volume named `spark-local-dirs-N`, where N is a 1 based index to the entires in your `SPARK_LOCAL_DIRS` variable, in your specification and that will be used for local storage. This enables you to use a volume type that is appropriate to your compute environment.

## Introspection and Debugging

These are the different ways in which you can investigate a running/completed Spark application, monitor progress, and
Expand Down

0 comments on commit 8762ac1

Please sign in to comment.