Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow slaves to substantiate in parallel #2155

Closed
wants to merge 1 commit into from
Closed

Allow slaves to substantiate in parallel #2155

wants to merge 1 commit into from

Conversation

morrone
Copy link
Contributor

@morrone morrone commented Apr 21, 2016

BuildRequestDistributor's _maybeStartBuildsOnBuilder() method is
decorated with @defer.inlineCallbacks. This means that when it
uses "yield bldr.maybeStartBuild", the entire slave substantiation
process is sequentialized at that point.

slave substantiation can take considerable time, especially with
something like EC2 latent slaves. Even without anything substantial
happening in the cloud init (user-data) script, it can easily take
a couple of minutes. If any substantial configuration of the
OS takes place ins the user-data script, maybeStartBuild() can
easily take many minutes.

To enable parallel slave substantiation, we need to stop waiting on
bldr.maybeStartBuild() inline. Instead we move the error
handling code into a callback function that can be added to the
Deferred that is returned by bldr.maybeStartBuild().

At least for for the EC2LatentBuildSlave class, everything else down
the chain is properly event based and/or threaded. This small
code change allows slaves to substantiate in parallel.

Signed-off-by: Christopher J. Morrone morrone2@llnl.gov

BuildRequestDistributor's _maybeStartBuildsOnBuilder() method is
decorated with @defer.inlineCallbacks.  This means that when it
uses "yield bldr.maybeStartBuild", the entire slave substantiation
process is sequentialized at that point.

slave substantiation can take considerable time, especially with
something like EC2 latent slaves.  Even without anything substantial
happening in the cloud init (user-data) script, it can easily take
a couple of minutes.  If any substantial configuration of the
OS takes place ins the user-data script, maybeStartBuild() can
easily take many minutes.

To enable parallel slave substantiation, we need to stop waiting on
bldr.maybeStartBuild() inline.  Instead we move the error
handling code into a callback function that can be added to the
Deferred that is returned by bldr.maybeStartBuild().

At least for for the EC2LatentBuildSlave class, everything else down
the chain is properly event based and/or threaded.  This small
code change allows slaves to substantiate in parallel.

Signed-off-by: Christopher J. Morrone <morrone2@llnl.gov>
@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @rutsky, @djmitche and @tardyp to be potential reviewers

@tomprince
Copy link
Member

Something like this makes sense, but the issue with this is that the buildrequest will be claimed without a corresponding build while the worker starts up.

I am finishing up a branch that will implement a solution to this that doesn't suffer from that drawback.

@morrone
Copy link
Contributor Author

morrone commented Apr 21, 2016

OK, thanks. I'm new to the code; can you explain why that is a problem? It seems like the builds are associated during worker startup just as they would be previously, only now multiple will be happening in parallel.

I'll look forward to seeing you branch.

@tomprince
Copy link
Member

The code is here. I plan to submit it (and the dependencies) in several branches.

@aelsabbahy
Copy link
Contributor

@tomprince Glad you're doing this, this should be a huge improvement for EC2 latent workers.

If I want to test your code, do you suggest I test the commit you linked, or is there a particular branch I should pull down?

@tomprince
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants