New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Git operations #518

Merged
merged 14 commits into from Feb 15, 2017

Conversation

Projects
None yet
3 participants
@BinaryMuse
Member

BinaryMuse commented Feb 15, 2017

This PR introduces a new AsyncQueue that has the ability to parallelize tasks. These parallelized tasks won't block other operations at the queue level, though of course JS semantics may still cause the underlying operations to block one another.

The queue also has the ability to add tasks that are non-parallelizable; currently, this is used to ensure that writes always execute alone, waiting for the queue to drain before they run, and running to completion before normal parallel queue semantics are resumed.

Additionally, a new segment in the timings view has been added called nexttick; it is colored yellow and shows the time after prepare has run but before the actual call to exec runs. In particular, this can help show time growing in a stepwise fashion waiting for the JS event loop to finish with the previous task invocation before the next one can start, indicating we can save some time by optimizing this or making it fully async:

github_package_timings_view


TODO:

  • Variable names
  • Convert exec args to an option object

@BinaryMuse BinaryMuse requested review from smashwilson and removed request for smashwilson Feb 15, 2017

@smashwilson

Overall summary:

mind blown

Show outdated Hide outdated lib/async-queue.js
}
execute() {
return this.fn.call(undefined).then(this.resolve, this.reject);

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

Any reason to clobber this with .call() rather than just calling it directly with this.fn().then(...)?

@smashwilson

smashwilson Feb 15, 2017

Member

Any reason to clobber this with .call() rather than just calling it directly with this.fn().then(...)?

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Yeah, I specifically don't want the Task to be the context in the function.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Yeah, I specifically don't want the Task to be the context in the function.

this.commandQueue.shift();
constructor(options = {}) {
this.parallelism = options.parallelism || 1;
this.nonParallelizableOperation = false;

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

This PR is a great exercise in spelling different forms of "parallelizable" 😂

@smashwilson

smashwilson Feb 15, 2017

Member

This PR is a great exercise in spelling different forms of "parallelizable" 😂

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Or saying it out loud 😏 @kuychaco

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Or saying it out loud 😏 @kuychaco

Show outdated Hide outdated lib/async-queue.js
constructor(options = {}) {
this.parallelism = options.parallelism || 1;
this.nonParallelizableOperation = false;
this.threadsInUse = 0;

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

threadsInUse feels like a misleading name to me because there aren't really "threads" in play. How about tasksInProgress?

@smashwilson

smashwilson Feb 15, 2017

Member

threadsInUse feels like a misleading name to me because there aren't really "threads" in play. How about tasksInProgress?

Show outdated Hide outdated lib/async-queue.js
try {
await task.execute();
} catch (err) {
// nothing

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

Will this swallow any important errors... ? I'm assuming that we expect most task failures to manifest as Promise rejections 🤔

@smashwilson

smashwilson Feb 15, 2017

Member

Will this swallow any important errors... ? I'm assuming that we expect most task failures to manifest as Promise rejections 🤔

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

The Promise inside the Task — the one returned from getPromise — will be rejected, but we don't want the task failure itself to affect the control flow of the queue.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

The Promise inside the Task — the one returned from getPromise — will be rejected, but we don't want the task failure itself to affect the control flow of the queue.

Show outdated Hide outdated lib/git-shell-out-strategy.js
}
// Execute a command and read the output using the embedded Git environment
exec(args, stdin = null, useGitPromptServer = false) {
exec(args, stdin = null, useGitPromptServer = false, writeOperation = false) {

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

Yah, I can see why you want to turn this into an options map. 👍

@smashwilson

smashwilson Feb 15, 2017

Member

Yah, I can see why you want to turn this into an options map. 👍

Show outdated Hide outdated lib/git-shell-out-strategy.js
@@ -32,11 +32,11 @@ function withGpgScript(args) {
export default class GitShellOutStrategy {
constructor(workingDir) {
this.workingDir = workingDir;
this.commandQueue = new AsyncQueue();
this.commandQueue = new AsyncQueue({parallelism: 5});

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

Hmm, maybe inject as a package config option?

@smashwilson

smashwilson Feb 15, 2017

Member

Hmm, maybe inject as a package config option?

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Yeah, this is temporary. I plan on using the number of CPUs or other similar things to determine this number.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Yeah, this is temporary. I plan on using the number of CPUs or other similar things to determine this number.

return stdout;
return new Promise(resolve => {
timingMarker.mark('nexttick');
setImmediate(() => {

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

Is there a specific reason to use setImmediate vs. setTimeout(..., 0) or process.nextTick for this? I can never remember what the implications of those are.

@smashwilson

smashwilson Feb 15, 2017

Member

Is there a specific reason to use setImmediate vs. setTimeout(..., 0) or process.nextTick for this? I can never remember what the implications of those are.

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

I had used nextTick at first but switched to setImmediate. I'll need to look into it to see what the difference is.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

I had used nextTick at first but switched to setImmediate. I'll need to look into it to see what the difference is.

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Use setImmediate if you want to queue the function behind whatever I/O event callbacks that are already in the event queue. Use process.nextTick to effectively queue the function at the head of the event queue so that it executes immediately after the current function completes.

So in a case where you're trying to break up a long running, CPU-bound job using recursion, you would now want to use setImmediate rather than process.nextTick to queue the next iteration as otherwise any I/O event callbacks wouldn't get the chance to run between iterations.

I'm not sure it matters a ton here but I'm going to stick with setImmediate for now.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Use setImmediate if you want to queue the function behind whatever I/O event callbacks that are already in the event queue. Use process.nextTick to effectively queue the function at the head of the event queue so that it executes immediately after the current function completes.

So in a case where you're trying to break up a long running, CPU-bound job using recursion, you would now want to use setImmediate rather than process.nextTick to queue the next iteration as otherwise any I/O event callbacks wouldn't get the chance to run between iterations.

I'm not sure it matters a ton here but I'm going to stick with setImmediate for now.

class Task {
constructor(fn, parallel = true) {
this.fn = fn;
this.parallel = parallel;

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

Thinking more about the terminology here: would it be clearer to invert this flag and have a Task be "serial" or "exclusive" rather than "parallel"? Or would that lead to a bunch of double negatives elsewhere.

@smashwilson

smashwilson Feb 15, 2017

Member

Thinking more about the terminology here: would it be clearer to invert this flag and have a Task be "serial" or "exclusive" rather than "parallel"? Or would that lead to a bunch of double negatives elsewhere.

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Haha yeah @kuychaco and I debated a bit on this. I think she was right. We'll make it better.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

Haha yeah @kuychaco and I debated a bit on this. I think she was right. We'll make it better.

Show outdated Hide outdated test/async-queue.test.js
}
}
describe.only('AsyncQueue', function() {

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

🔥 .only( 😉

@smashwilson

smashwilson Feb 15, 2017

Member

🔥 .only( 😉

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

😡

@smashwilson

👍

mergeMessage: null,
aheadCount: null,
behindCount: null,
remoteName: await repository.getRemoteForBranch(this.branchName),
remoteName: await dataPromises.remoteName,

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

👍 Neat.

@smashwilson

smashwilson Feb 15, 2017

Member

👍 Neat.

@@ -8,7 +8,7 @@ chai.use(chaiAsPromised);
global.assert = chai.assert;
// Give tests that rely on filesystem event delivery lots of breathing room.
until.setDefaultTimeout(5000);
until.setDefaultTimeout(3000);

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

Hmm will this cause problems on AppVeyor?

@smashwilson

smashwilson Feb 15, 2017

Member

Hmm will this cause problems on AppVeyor?

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

I hope not. But it has to be lower than the mocha timeout or else test-until can never timeout with an appropriate message because the whole test times out instead.

We can keep an eye on it and go from there.

@BinaryMuse

BinaryMuse Feb 15, 2017

Member

I hope not. But it has to be lower than the mocha timeout or else test-until can never timeout with an appropriate message because the whole test times out instead.

We can keep an eye on it and go from there.

This comment has been minimized.

@smashwilson

smashwilson Feb 15, 2017

Member

But it has to be lower than the mocha timeout or else test-until can never timeout with an appropriate message because the whole test times out instead.

Ohhh that's what's been happening. Good catch.

@smashwilson

smashwilson Feb 15, 2017

Member

But it has to be lower than the mocha timeout or else test-until can never timeout with an appropriate message because the whole test times out instead.

Ohhh that's what's been happening. Good catch.

@BinaryMuse BinaryMuse merged commit 55fc31f into master Feb 15, 2017

2 of 5 checks passed

continuous-integration/travis-ci/push The Travis CI build failed
Details
schema/pr GraphQL Schema out of date
Details
schema/push GraphQL Schema out of date
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@BinaryMuse BinaryMuse deleted the mkt-ku-parallelize-git-ops branch Feb 15, 2017

}
/**
* File Status and Diffs
*/
async getStatusesForChangedFiles() {
const output = await this.exec(['status', '--untracked-files=all', '-z']);
const output = await this.exec(['status', '--untracked-files=all', '-z'], {writeOperation: true});

This comment has been minimized.

@shiftkey

shiftkey Feb 16, 2017

Contributor

Lots of these operations seem to be writeOperation: true unnecessarily. Is there some context here I'm missing?

@shiftkey

shiftkey Feb 16, 2017

Contributor

Lots of these operations seem to be writeOperation: true unnecessarily. Is there some context here I'm missing?

This comment has been minimized.

@BinaryMuse

BinaryMuse Feb 16, 2017

Member

@shiftkey These are all status calls and can create index.locks (as far as I understand). We're converting some of them to nodegit calls in #524

@BinaryMuse

BinaryMuse Feb 16, 2017

Member

@shiftkey These are all status calls and can create index.locks (as far as I understand). We're converting some of them to nodegit calls in #524

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment