Migrate temporary storage for Spark jobs to EmptyDir #439

kimoonkim · 2017-08-16T19:08:26Z

Currently, Spark jobs use dirs inside the driver and executor pods for storing temporary files. For instance, the work dirs for the Spark driver and executors use dirs inside the pods. And internal shuffle service per executor also uses in-pod dirs.

These in-pod dirs are within the docker storage backend, which can be slow due to its copy-on-write overhead. Many of the storage backends implement block level CoW. Each small write will incur copy of the entire block. The overhead can become very high if the files are updated by many small writes. It is recommended to avoid using docker storage backend for such use cases. From the first link above:

Ideally, very little data is written to a container’s writable layer, and you use Docker volumes to write data.

We should use EmptyDir for temporary storage to avoid this overhead.

The text was updated successfully, but these errors were encountered:

ash211 · 2017-08-16T20:08:12Z

This suggestion seems very reasonable as a potential performance improvement. Good suggestion @kimoonkim !

Ideally we could look at some benchmarks before/after of some Spark jobs to compare the difference. And they would need to be shuffle heavy to emphasize the differences on this write path.

mccheah mentioned this issue Sep 8, 2017

Mount emptyDir volumes for temporary directories on executors in static allocation mode. #486

Closed

mccheah mentioned this issue Oct 11, 2017

Mount emptyDir volumes for temporary directories on executors in static allocation mode (rebased) #522

Merged

ash211 closed this as completed in #522 Oct 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate temporary storage for Spark jobs to EmptyDir #439

Migrate temporary storage for Spark jobs to EmptyDir #439

kimoonkim commented Aug 16, 2017

ash211 commented Aug 16, 2017

Migrate temporary storage for Spark jobs to EmptyDir #439

Migrate temporary storage for Spark jobs to EmptyDir #439

Comments

kimoonkim commented Aug 16, 2017

ash211 commented Aug 16, 2017