Only one executor doing the join/shuffle #643

shockdm · 2019-10-01T18:35:00Z

Hi all,

I am relatively new to Spark and am having a problem with my workload running on Spark on K8s. The workflow is rather simple:

Read data from disk.
Export data from ElasticSearch.
Join it with data read from disk.
Load it into final storage.

What I am seeing is - stages 1 and 4 performed by all executors concurrently, while stage 2/3 seems to be performed only by a single executor, while others stand idle waiting. The setup I have has CephFS as the shared filesystem across all executors and driver, from where the "data on disk" is read. Am I doing something completely wrong? Has anyone else experienced this problem?

Thanks in advance.

runzhliu · 2019-10-02T03:45:40Z

@shockdm

while stage 2/3 seems to be performed only by a single executor, while others stand idle waiting

it could happen when you have only one task at stage 2 or 3, which means the only task would be scheduled to one executor.
This is something about Spark not for the Operator here, maybe you could find tips from the Spark Documents.

shockdm · 2019-10-02T12:26:56Z

@runzhliu The problem though is that this behaviour is not seen on a Yarn cluster we were using previously. Before - all of the steps were spread evenly across all of the executors. Specifically, the stage that is being worked on by a single executor - has 200 partitions, but is for some reason only taken care of sequentially by a single executor.

This started showing up only when we started trying Spark on K8s... Perhaps there is something wrong with the way the setup is? Like I said, we have a Spark on K8s cluster, that is backed by CephFS which is mounter as a RWX volume in every executor and driver. Data is shared in that volume and made accessible to every participant that way.

runzhliu · 2019-10-07T04:09:34Z

Hi @shockdm , you probably should try to read data from local filesystem or else to see if it would happen again. What's more, do you load the data from CephFS also on Yarn Cluster? I don't think we can do it directly on the official Spark, so you have some private changes on Apache Spark?

shockdm · 2019-10-07T14:24:25Z

@runzhliu to answer your questions..

you probably should try to read data from local filesystem or else to see if it would happen again

We mount CephFS as a RWX volume in the Spark pods. So to Spark pods - this is being treated as local file system. Unless I am misunderstanding the question.

What's more, do you load the data from CephFS also on Yarn Cluster? I don't think we can do it directly on the official Spark, so you have some private changes on Apache Spark?

On Yarn cluster we do not - we use HDFS. However we've used HDFS previously with the K8s cluster as well, and have gotten the same results as we get on CephFS. CephFS volume is shared across multiple pods (driver and executors) as a shared local volume, akin to NFS. No other special handling is used.

shockdm changed the title ~~Only one executor doing the join~~ Only one executor doing the join/shuffle Oct 1, 2019

liyinan926 closed this as completed Feb 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only one executor doing the join/shuffle #643

Only one executor doing the join/shuffle #643

shockdm commented Oct 1, 2019

runzhliu commented Oct 2, 2019 •

edited

shockdm commented Oct 2, 2019 •

edited

runzhliu commented Oct 7, 2019

shockdm commented Oct 7, 2019 •

edited

Only one executor doing the join/shuffle #643

Only one executor doing the join/shuffle #643

Comments

shockdm commented Oct 1, 2019

runzhliu commented Oct 2, 2019 • edited

shockdm commented Oct 2, 2019 • edited

runzhliu commented Oct 7, 2019

shockdm commented Oct 7, 2019 • edited

runzhliu commented Oct 2, 2019 •

edited

shockdm commented Oct 2, 2019 •

edited

shockdm commented Oct 7, 2019 •

edited