Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler #634

Closed
wants to merge 4 commits into from

Conversation

sryza
Copy link
Contributor

@sryza sryza commented May 4, 2014

No description provided.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14647/

@mridulm
Copy link
Contributor

mridulm commented May 4, 2014

Better would be to make this configurable (and increase it - not remove).
Currently, we actually sleep for much longer in our jobs to ensure a minimum number of containers are available.
The sleep was not arbitrarily thrown in there :-)

@pwendell
Copy link
Contributor

pwendell commented May 4, 2014

@sryza @mridulm what is this sleep actually for? We should really have in-code documentation for things like this.

If it's necessary to sleep like this, could we restructure it as a 1 second busy-wait on a particular condition? That way we wouldn't need to hard code any particular value.

@sryza
Copy link
Contributor Author

sryza commented May 4, 2014

A little more detail on the JIRA, but, to summarize, the sleep is to wait for executors to connect. What's confusing to me is that Spark standalone doesn't have a similar sleep.

@mridulm is the worry that only some executors would register and that tasks would then unnecessarily get scheduled on nodes where the data isn't local? Have you seen this negatively impact performance? If that's the case, I think the best thing would be to wait until some number of executors have registered.

@pwendell
Copy link
Contributor

pwendell commented May 4, 2014

Ah I see - sorry I didn't read the JIRA.

So since we don't know a-priori how many executors to expect, I don't think we can wait on any particular condition. Even in YARN mode, AFAIK the executor number is just a request to the RM and not a guarentee.

I'm still not 100% sure why this needs to exist. @mridulm is the main concern here just launching non-local tasks? If so, why not just set spark.locality.wait threshold (that's the whole reason it exists).

In standalone mode this may not really be necessary because the executors typically launch in under 3 seconds which is the default locality wait.

@pwendell
Copy link
Contributor

pwendell commented May 4, 2014

@sryza @mridulm if we do decide we want to have a new type of waiting here, maybe we could add spark.scheduler.initialWait or something to give time to allow executors to show up before we try to schedule.

@sryza
Copy link
Contributor Author

sryza commented May 5, 2014

Keep in mind this sleep is after we've already waited to receive a set of containers from the YARN ResourceManager and sent RPCs to the NodeManagers to start them. So we do know how many executors to expect.

@pwendell
Copy link
Contributor

pwendell commented May 5, 2014

@sryza Ah I see, in that case, we could wait on the specific condition, but what if e.g. there is a straggler and one container takes a long time to launch (or never launches)?

@sryza
Copy link
Contributor Author

sryza commented May 5, 2014

Yeah, I think we'd want some (probably configurable) max wait time.

@pwendell
Copy link
Contributor

pwendell commented May 5, 2014

Seems reasonable - still curious exactly what the case @mridulm is dealing with is. I guess it's just the case where the containers take a long time to launch (once allocated), much longer than the delay scheduling threhsold?

@mridulm
Copy link
Contributor

mridulm commented May 5, 2014

The scheduling and data locality come later.
This preceeds all that.

To give example - suppose we need 200 containers to run a job; as soon as we start, we might get say 2 or 5 nodes allocated : it takes some time for yarn to ramp up to a good number.

Now, if we start off the actual job before sufficient containers have been allocated - and we get job failures since a) data cant be cached in memory b) too much load on too low number of containers.

Task locality, scheduling comes later - but they are also impacted.
Assuming job does not fail, now all data is available on these small number of containers while rest of the cluster is busy pulling data from them which causes suboptimal performance (I have not seen job failures due to this - yet).

@mridulm
Copy link
Contributor

mridulm commented May 5, 2014

Btw I dont recall exactly why the actual value of 3 seconds is there; I think earlier spark used to crash in case there are no containers available when job is going to be launched - thankfully that is not the case anymore !

@tgravescs
Copy link
Contributor

Note I had filed SPARK-1453 to make the wait configurable, see the discussions there. I'm fine with removing this once the other one is configurable. I would hesitate to do it before that.
Like was already mentioned there are 2 different timeouts, this is the second and hardcoded to 3 seconds. I'm not really sure what the difference is between this one and the ALLOCATOR_LOOP_WAIT_COUNT. I was originally thinking of removing this one and just making the other configurable but I haven't investigated in depth yet.

I don't know the details of spark.locality.wait to know if it works in all cases. I assume the spark.locality.wait would work fine as long as you have an input from like hadoop or somewhere that tells you the locations, but for RDD's without the locations it wouldn't work to wait?

@sryza
Copy link
Contributor Author

sryza commented May 5, 2014

I like the idea of making the first wait configurable and taking out the second. For the second, we could possibly still wait for the executors we've launched to register.

Regarding spark.locality.wait, yes, it needs to locations, but it doesn't need the preferredLocalityData to be explicitly passed in by the user in the way it's needed for requesting where to put executors.

@tgravescs
Copy link
Contributor

@sryza Feel free to take over that jira to make it configurable.

@tgravescs
Copy link
Contributor

@sryza can we close this and just do SPARK-1453 and/or SPARK-1946

@tgravescs
Copy link
Contributor

I guess we could keep this around incase we do SPARK-1946

@tgravescs
Copy link
Contributor

@sryza I merged in SPARK-1946, would you want to update this to handle both cluster mode and client mode. I think we can actually also remove the waitForInitialAllocations() call.

@SparkQA
Copy link

SparkQA commented Jul 14, 2014

QA tests have started for PR 634. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16642/consoleFull

@sryza
Copy link
Contributor Author

sryza commented Jul 14, 2014

Posted a patch that takes out the sleeps and waitForInitialAllocations

@SparkQA
Copy link

SparkQA commented Jul 14, 2014

QA tests have started for PR 634. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16644/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 15, 2014

QA results for PR 634:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16642/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 15, 2014

QA results for PR 634:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16644/consoleFull

@@ -416,19 +407,8 @@ object ApplicationMaster extends Logging {
// This is to ensure that we have reasonable number of containers before we start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this whole comment block now.

@tgravescs
Copy link
Contributor

I just thought of something bad here. Now the default behavior is to wait for 0% and we have removed the bit of sleep there so if someone is just using the default they aren't very likely to get many before starting.

originally I was thinking keep the default 0 so behavior doesn't change, but personally I would rather see it higher so out of the box behavior is better. thoughts on changing it for yarn? To say 80%?

@sryza
Copy link
Contributor Author

sryza commented Jul 16, 2014

That's a good point. Changing it for YARN seems like the right thing, and 80% sounds reasonable to me.

Another thing is the wait time. Previously it was 6 seconds, but now spark.scheduler.maxRegisteredExecutorsWaitingTime defaults to 5 times that. 30 seconds seems a little excessive to me in general - at least for jobs without caching, after a couple seconds the wait outweighs scheduling some non-local tasks. What do you think about decreasing this to 6 seconds in general? Or at least for YARN.

@tgravescs
Copy link
Contributor

He had increased it to 30 seconds based on some experiments he did. I guess it depends on how well the scheduler recovers from starting with fewer executors. I know they have been doing a bit of work around this area but I'm not sure the real affect. I have definitely seen issues with jobs when they start to soon so I think I would rather leave it at 30 for now unless we have more concrete data saying its not longer an issue.

@sryza
Copy link
Contributor Author

sryza commented Jul 20, 2014

Updated patch sets the min registered executors ratio to .8 for YARN

@SparkQA
Copy link

SparkQA commented Jul 20, 2014

QA tests have started for PR 634. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16877/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 20, 2014

QA results for PR 634:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16877/consoleFull


override def postStartHook() {

super.postStartHook()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when it is yarn-client mode, super.postStartHook() donot be deleted. because it can waitBackendReady when configured ratio executors have not been registered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YarnClientClusterScheduler extends TaskSchedulerImpl, so it will just fall through to TaskSchedulerImpl.postStartHook() which calls waitBackendReady

@tgravescs
Copy link
Contributor

Looks good. +1, Thanks @sryza.

@asfgit asfgit closed this in f89cf65 Jul 21, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Author: Sandy Ryza <sandy@cloudera.com>

Closes apache#634 from sryza/sandy-spark-1707 and squashes the following commits:

2f6e358 [Sandy Ryza] Default min registered executors ratio to .8 for YARN
354c630 [Sandy Ryza] Remove outdated comments
c744ef3 [Sandy Ryza] Take out waitForInitialAllocations
2a4329b [Sandy Ryza] SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
As reported in https://spark-project.atlassian.net/browse/SPARK-1055

"The used Spark version in the .../base/Dockerfile is stale on 0.8.1 and should be updated to 0.9.x to match the release."

Author: CodingCat <zhunansjtu@gmail.com>
Author: Nan Zhu <CodingCat@users.noreply.github.com>

Closes apache#634 from CodingCat/SPARK-1055 and squashes the following commits:

cb7330e [Nan Zhu] Update Dockerfile
adf8259 [CodingCat] fix the SCALA_VERSION and SPARK_VERSION in docker file

(cherry picked from commit 1aa4f8a)
Signed-off-by: Aaron Davidson <aaron@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants