Limited Process Pools #791

dcousineau · 2016-04-26T21:09:23Z

Functionality to address #718.

Usage

~~Running ava --pool-size=5~~ Running ava --concurrency=5 will switch behavior to the limited process pool code (otherwise existing 'legacy' code will always run). ~~If --serial is defined the limited pool code is not run.~~ If --serial is specified with a concurrency size, the concurrency size will be overridden and forced to 1.

Notes

I encapsulated existing functionality and left it default. We cannot do the exclusive/.only with the limited pools as it requires us to have loaded and understand the entire test suite to make the decision about whether there is exclusivity to respect or not.

As a beta feature using the process pool limit could come with the caveat that any features related to .only are disabled/ignored. For our use case the pool limiting is being used to keep memory usage under circleci's limits and ideally no .only tests make it into source control.

Easy Wins

The watcher worked right out of the box

Things that make me nervous

~~The error handling, particularly around Couldn\'t find any matching tests, feels a bit like I'm stumbling in the dark, they work and I'm not 100% confident I know why.~~

Tests

Testing was done initially by wrapping almost all api.js tests in a generator, and running them twice: once against the old behavior and once against the new behavior

As a result 3 tests are currently failing. This is because theres no real way to test exclusivity/deal with .only. These tests cannot pass until the fate of .only in relation to process pools is decided.

Tests related to .only functionality are not run against the new behavior.

Personal Opinions

While ava could potentially statically analyze all the test files before entering in the process pool to look for exclusivity I'm concerned this is wasted effort that could be better spent deprecating .only and improving the filename globs.

Alternatively .only could be scoped to be intended to only apply to the current running file (and not to the whole testsuite) which would eliminate any need for static analysis and if a developer wanted to run only a single file they could simply use filename glob matching on a case by case basis.

…js#718

mention-bot · 2016-04-26T21:09:25Z

By analyzing the blame information on this pull request, we identified @sotojuan, @ariporad and @spudly to be potential reviewers

dcousineau · 2016-04-26T21:10:44Z

I suggest using the ?w=1 trick to view the diff for the test file changes.

A link for the lazy: https://github.com/sindresorhus/ava/pull/791/files?w=1

dcousineau · 2016-04-26T21:11:20Z

@jamestalmage ^

novemberborn · 2016-04-27T15:04:52Z

.gitignore

@@ -2,3 +2,4 @@ node_modules
 .nyc_output
 coverage
 bench/.results
+.idea


Please add a newline at the end of the file.

Don't add this. It should be in your own global gitignore, not here.

novemberborn · 2016-04-27T15:11:31Z

If --serial is defined the limited pool code is not run.

It should use the pool but with a concurrency of 1. Else AVA will still fork for each test file, even though it runs them serially.

I wonder if, rather than duplicating the code, we can use infinite concurrency when the pool limit is not set, and still wait for all stats unless there is limited concurrency?

dcousineau · 2016-04-27T15:38:23Z

@novemberborn I "forked" the code because of the logic that initializes each test looking for a .only being incompatible with the structure of the pooled code. But I can see what happens with unlimited concurrency and see if the new code passes the 3 exclusive tests and report back.

dcousineau · 2016-04-27T15:41:11Z

@novemberborn also concurrency of 1 makes more sense than me whining that serialMap doesn't allow concurrency :| I was so deep in the weeds I was like "I can't guarantee proper execution order" not realizing who cares as long as they ran one at a time

novemberborn · 2016-04-27T15:44:35Z

I "forked" the code because of the logic that initializes each test looking for a .only being incompatible with the structure of the pooled code.

Yea that makes sense. I'm proposing adding a logic branch for when concurrency is restricted, which I realize may be worse than duplicating code.

dcousineau · 2016-04-27T15:50:39Z

Yea that makes sense. I'm proposing adding a logic branch for when concurrency is restricted, which I realize may be worse than duplicating code.

If we either de-scope exclusive tests/.only to only apply to the tests in the current singular file, or we eliminate the feature entirely, we can drop the old logic and stick with the concurrency pool logic.

The more I think about previous suggestions of static analysis for .only support the more I get heartburn but once thats implemented we can drop old behavior as well.

novemberborn · 2016-04-27T16:06:31Z

Yea maybe we can live with the duplication until we sort out .only, especially if pool limiting is marketed as beta. (Which is odd, because AVA is still major-0, but hey). Would be good to unblock users with lots of test files.

sindresorhus · 2016-04-28T08:42:04Z

api.js

+		});
+	});
+
+	return new Promise(function (resolve) {


Would be nice to reduce the nesting below here ⬇️

I'll tinker and see what I can do

sindresorhus · 2016-04-28T08:49:31Z

Should we have a better default than Infinity? From experience, sometimes it's more efficient with 10 in concurrency than Infinity, since systems are bad at handling that much concurrency and get overloaded. We should maybe do some test runs and find some magic number as default.

dcousineau · 2016-04-28T15:50:53Z

@sindresorhus I'm more familiar with multi-process in Python and a quick check shows that Celery likes to default to the number of CPUs.

sindresorhus · 2016-04-28T16:03:51Z

@dcousineau Last time I did concurrency profiling, I found the following to be optimal:

Math.max((os.cpus().length || 1) * 2, 2)

From: https://github.com/sindresorhus/grunt-concurrent/blob/9618f67c01e15ca447db5a36bd03bff88e40ce9d/tasks/concurrent.js#L13

@sindresorhus

Renamed the CLI flag to `--concurrency, -c` as per @sindresorhus suggestion. Marked it as experimental instead of beta. Updated the new concurrency pool runner to force pool size to 1 when `--serial` flag is set as per @novemberborn suggestion. Reduced the nesting complexity of the concurrency pool code and broke up portions of the logic to ensure readability.

dcousineau · 2016-04-28T16:20:49Z

@sindresorhus I'm cool with that. As a part of my most recent update Api.prototype._runLimitedPool now uses the concurrency pool size via a function argument so I can support serial execution. I haven't bolted it in yet but the code is ready for it.

I noticed the conflicts and will have a resolution pushed soon

sindresorhus · 2016-04-28T16:25:03Z

api.js

@@ -224,3 +236,86 @@ Api.prototype._run = function (files, _options) {
 		return runStatus;
 	});
 };
+
+Api._blankResults = {


I think this should be a getter that returns the object. It might be an edge case, but if the consumer modifies the object, later results will also be modified. I might be missing something though and this might be a non-issue, but thought I would bring it up.

Everywhere that returns this object structure (but with real data) forms it themselves. On error we need to return an empty object so the runStatus.processResults(results); doesn't choke on an undefined so ideally no consumer gets the blank results structure. I only pulled it into its own variable because there are 3 failure cases that need to send it up stream.

Writing this I realized I can simplify down to 2 and will be pushing a change to simplify as well as changing it to a getter Just In Case™

sindresorhus · 2016-04-28T16:27:30Z

cli.js

+	'  --watch, -w       Re-run tests when tests and source files change',
+	'  --source, -S      Pattern to match source files so tests can be re-run (Can be repeated)',
+	'  --timeout, -T     Set global timeout',
+	'  --concurrency, -c Maximum number of child processes (EXPERIMENTAL)',


Use two spaces to separate the flag and description columns

jamestalmage · 2016-04-28T18:14:24Z

Math.max((os.cpus().length || 1) * 2, 2)

Can be simplified to just:

(os.cpus().length || 1) * 2

sindresorhus · 2016-04-28T18:32:32Z

cli.js

+	'  --watch, -w        Re-run tests when tests and source files change',
+	'  --source, -S       Pattern to match source files so tests can be re-run (Can be repeated)',
+	'  --timeout, -T      Set global timeout',
+	'  --concurrency, -c  Maximum number of child processes (EXPERIMENTAL)',


Maybe this?

Maximum number of test files running at the same time (EXPERIMENTAL)

Can you also sync this output into the readme? Just copy paste node cli.js --help.

sindresorhus · 2016-05-01T07:19:42Z

readme.md

@@ -132,21 +132,24 @@ AVA comes with an intelligent watch mode. [Learn more in its recipe](docs/recipe
 ```console
 $ ava --help

+  Futuristic test runner 🚀
+


This is intentionally left out as it just duplicates our readme and repo description.

sindresorhus · 2016-05-01T07:24:04Z

Tests are failing

dcousineau · 2016-05-05T16:15:53Z

@sindresorhus @novemberborn Sorry about the delay, made the updates.

dcousineau · 2016-05-05T16:18:58Z

I found an issue with the using --match ... tests where the test/fixture/match-no-match.js was actually failing trying to import ava instead of ../../. I repaired the test and added a positive test case and an additional fixture.

The updated and new tests can be found here since the diff tab isn't showing them: https://github.com/dcousineau/ava/blob/process-pool/test/api.js#L920-L967

dcousineau · 2016-05-05T17:13:15Z

All tests pass for the first time.

jamestalmage · 2016-05-05T22:21:00Z

api.js

@@ -116,6 +116,19 @@ Api.prototype._run = function (files, _options) {
 	self.precompiler = new CachingPrecompiler(cacheDir, self.options.babelConfig);
 	self.fileCount = files.length;

+	var overwatch;
+	if (this.options.concurrency > 0) {


This should still use _runNoPool if concurrency is less than the fileCount.

Normally I would agree but with the current behavior, no matter what if the --concurrency flag is used the code behind _runLimitedPool is used. I vote we keep this as-is until we decide to drop _runNoPool entirely so that we can deterministically say if you use the concurrency flag you will use the new process pool code.

If enough people object to my opinion, however, I will change it.

But if I add a concurrency option to package.json, I want to be able to re-enable .only behavior. By doing

ava [files ...]

Where the number of files is less than concurrency.

I'm similarly worried about people using the --concurrency flag and seeing different behaviors based on their filename glob patterns. It's much simpler to convey in documentation "If the concurrency flag/option is present, automatic .only behavior is not supported" than a list of rules.

Additionally, a user manually running ava with a deliberate subset of tests is likely only doing so to run a few specific tests because they internally do not use .only (like my team) which actually cycles back up to my argument of nixing .only entirely since with the --watch flag and good file name globbing, there is no real need for .only.

jamestalmage · 2016-05-05T22:39:55Z

GH isn't letting me comment directly on test/api.js because the diff is too huge.

https://github.com/dcousineau/ava/blob/process-pool/test/api.js#L920

Does this test cover the scenario where we have more files than concurrency? We would need tests for:

No match anywhere
No match in the first batch, match in a latter batch
Match in the first batch, no match in a latter batch

jamestalmage · 2016-05-06T00:59:15Z

Similar here - https://github.com/dcousineau/ava/blob/process-pool/test/api.js#L769

Do we need a test that verifies the test counts are added up correctly across batched runs?

novemberborn · 2016-05-06T15:59:23Z

api.js

+					};
+					test.run(options)
+						.then(resolve)
+						.catch(reject);


.then(resolve, reject). resolve won't throw, so we can save the extra promise allocation.

Meh, I prefer it the existing way. Two argument then is considered an anti pattern now.

novemberborn · 2016-05-06T16:08:38Z

@sindresorhus @jamestalmage @vdemedes what shall we do about the ready callback? Best I can tell it's unused so it should just be removed. With this PR it is emitted for every fork, rather than once.

dcousineau · 2016-05-07T15:40:37Z

@novemberborn looking over the older code I see how self.emit('ready') is emitted only once which I missed.

I can move it above the Promise.map easy, and the tests pass. I'll wait for a final word.

dcousineau · 2016-05-07T15:43:25Z

@jamestalmage Dropping the concurrency pool to 1 and re-running the tests ensured they still pass. I'm checking for matches after all test files have run so it shouldn't be an issue but I can circle back and come up with a few more tests

jamestalmage · 2016-05-07T20:43:21Z

Let's kill ready in a separate PR.

If you can "circle back" with a few more tests, that would be great. We want to guard against regressions.

…an concurrency counts

sindresorhus · 2016-05-11T08:30:00Z

LGTM

novemberborn · 2016-05-12T09:34:59Z

sindresorhus · 2016-05-12T09:37:15Z

🎉 Woo! Thanks for your perseverance on this @dcousineau :)

jamestalmage · 2016-05-12T15:26:44Z

Well done!

sotojuan · 2016-05-12T19:47:00Z

Great work @dcousineau! By the way I sat in front of you at ManhattanJS yesterday :p

dcousineau · 2016-05-12T20:20:07Z

@sotojuan Ha! Next time grab me and say hi! (I'm at all of BoroJS meetups obviously :P). Sidebar: have you considered reaching out for EmpireJS OSS Night?

sotojuan · 2016-05-13T01:48:46Z

@dcousineau If I am still in NYC by then I'll go. Thanks!

Early POC of limited process pools. Initial attempt at addressing ava…

94dd695

…js#718

novemberborn reviewed Apr 27, 2016
View reviewed changes

sindresorhus reviewed Apr 28, 2016
View reviewed changes

Merge branch 'master' into process-pool

bc3a9f6

sindresorhus reviewed Apr 28, 2016
View reviewed changes

dcousineau added 2 commits April 28, 2016 12:38

Cleaned up error cases and blank results handling

18ab69d

Formatting CLI flags with minimum of two spaces

d1615a8

sindresorhus reviewed Apr 28, 2016
View reviewed changes

Updated help output and README

c6bc602

sindresorhus reviewed May 1, 2016
View reviewed changes

Strings to single quote, fixing xo error

cb996bc

Looser test title matching for varying Windows output

24fddb1

jamestalmage reviewed May 5, 2016
View reviewed changes

novemberborn reviewed May 6, 2016
View reviewed changes

Consolodated exception handling and cleaned up reporting code

631c9bf

Additional testfiles loaded in certain testcases to ensure greater th…

777d405

…an concurrency counts

novemberborn merged commit 1c3c86c into avajs:master May 12, 2016

novemberborn mentioned this pull request May 12, 2016

Remove 'ready' event from API #826

Closed

jamestalmage mentioned this pull request May 12, 2016

Forking for every test file is slow #78

Closed

sindresorhus mentioned this pull request Jul 15, 2016

Default concurrency #966

Closed

bettiolo mentioned this pull request Mar 8, 2017

Implement pool for managing forks #718

Closed

Limited Process Pools #791

Limited Process Pools #791

Conversation

dcousineau commented Apr 26, 2016 • edited Loading

Usage

Notes

Easy Wins

Things that make me nervous

Tests

Personal Opinions

mention-bot commented Apr 26, 2016

dcousineau commented Apr 26, 2016

dcousineau commented Apr 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

novemberborn commented Apr 27, 2016

dcousineau commented Apr 27, 2016

dcousineau commented Apr 27, 2016 • edited Loading

novemberborn commented Apr 27, 2016

dcousineau commented Apr 27, 2016

novemberborn commented Apr 27, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sindresorhus commented Apr 28, 2016

dcousineau commented Apr 28, 2016

sindresorhus commented Apr 28, 2016

dcousineau commented Apr 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamestalmage commented Apr 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sindresorhus commented May 1, 2016

dcousineau commented May 5, 2016

dcousineau commented May 5, 2016

dcousineau commented May 5, 2016 • edited Loading

Choose a reason for hiding this comment

dcousineau May 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamestalmage commented May 5, 2016

jamestalmage commented May 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

novemberborn commented May 6, 2016

dcousineau commented May 7, 2016

dcousineau commented May 7, 2016

jamestalmage commented May 7, 2016

sindresorhus commented May 11, 2016

novemberborn commented May 12, 2016

sindresorhus commented May 12, 2016

jamestalmage commented May 12, 2016

sotojuan commented May 12, 2016

dcousineau commented May 12, 2016 • edited Loading

sotojuan commented May 13, 2016

dcousineau commented Apr 26, 2016 •

edited

Loading

dcousineau commented Apr 27, 2016 •

edited

Loading

dcousineau commented May 5, 2016 •

edited

Loading

dcousineau May 7, 2016 •

edited

Loading

dcousineau commented May 12, 2016 •

edited

Loading