New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce cross-process resource management for tasks #5859
Merged
Merged
Changes from 112 commits
Commits
Show all changes
118 commits
Select commit
Hold shift + click to select a range
f48f3d9
Initial draft of changes.
benvillalobos bcccfea
Modified mockengine to support ibuildengine7
benvillalobos 73b3466
Simpler null checks
benvillalobos a8df6ac
Merge remote-tracking branch 'upstream/master' into 74-cpp-parallel
rainersigwald 963a785
Clean up API surface
rainersigwald e1a30e6
Tasks type name
rainersigwald 055116a
simple project
rainersigwald 10f6eed
Simplify if-appdomain in NodeConfiguration
rainersigwald aa3f385
Checkpoint
rainersigwald 57a5c93
Remove FEATURE_VARIOUS_EXCEPTIONS
rainersigwald 9229122
Checkpoint: works on full only, doesn't properly block when all resou…
rainersigwald b919834
Make TaskHost.MarkAsInactive work on Core
rainersigwald 01953aa
Tweak test to show cross-process handling
rainersigwald 1183e87
Introduce RequireCores
rainersigwald 0437c32
WIP
rainersigwald 1bf3017
Revert "WIP"
rainersigwald 1ada7c2
Horrible pile of WIP hacks to debug hang
rainersigwald e943655
Revert "Horrible pile of WIP hacks to debug hang"
rainersigwald ca6697d
Move requiring core to ExecuteInstantiatedTask
rainersigwald 63a324b
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 09618ec
Release core when yielding (hopefully working around hang when many t…
rainersigwald 5a7ad2b
WIP: new semaphore name per session (by default)
rainersigwald 141c57c
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 007d9f3
Release a core when calling BuildProjectFiles
rainersigwald 6b2be21
Switch expression for MockHost.GetComponent
rainersigwald 5d49876
Resource manager in MockHost
rainersigwald 0486730
Just don't do resource management on non-Windows
rainersigwald 2f85924
Delete bogus tests
rainersigwald 5e4b035
Doc for RequestCores
rainersigwald a9d37a2
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 43bdec0
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 479cdfc
Add BlockingWaitForCore
rainersigwald 9db29bb
Treat resources as a separate pool; don't auto-acquire for tasks
rainersigwald e1c3bff
fixup! Add BlockingWaitForCore
rainersigwald 58fa355
Doc updates
rainersigwald a6a9d6d
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 91fe07a
Block for at least one core in RequestCores
rainersigwald c9b3c78
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald ba2db9a
Remove BlockingWaitForCore() since it's now redundant
rainersigwald cc60df9
fixup! Block for at least one core in RequestCores
rainersigwald 0ab0481
Move to IBuildEngine8 since 7 shipped already
rainersigwald f1e26c3
remove test-project.proj
rainersigwald dfced2f
Remove SemaphoreCPUTask
rainersigwald ed6a404
??
rainersigwald 4545858
sort usings in MockHost
rainersigwald 3de11c0
fixup! Treat resources as a separate pool; don't auto-acquire for tasks
rainersigwald 608bed7
fixup! Remove BlockingWaitForCore() since it's now redundant
rainersigwald 7a3da6b
fixup! Treat resources as a separate pool; don't auto-acquire for tasks
rainersigwald d87903f
Remove RequireCores
rainersigwald 9a83f35
Better non-Windows behavior
rainersigwald 77344e2
fixup! Move to IBuildEngine8 since 7 shipped already
rainersigwald 7fffd46
Generalize MockEngine semaphore
rainersigwald 7d8b359
Return nullable int to indicate whether resource management is possible
rainersigwald 82a25c4
fixup! Remove RequireCores
rainersigwald c6d1a8a
Merge remote-tracking branch 'upstream/master' into HEAD
rainersigwald b8b52cc
Nix whitespace-only changes in ProjectCollection
rainersigwald f3e347e
fixup! fixup! Remove BlockingWaitForCore() since it's now redundant
rainersigwald ab384aa
Task whitespace fixes
rainersigwald a9d8061
Update doc for non-Windows behavior
rainersigwald 25ec10c
Update doc with better example, caveats
rainersigwald b362600
Whitespace fixes in TaskBuilder
rainersigwald b830c49
Whitespace cleanup in TaskHost
rainersigwald bb2d947
Clarity in MockHost
rainersigwald 59af842
fixup! fixup! fixup! Remove BlockingWaitForCore() since it's now redu…
rainersigwald cacc2ac
Log resource requests/releases
rainersigwald 3c0a8e7
Move clamp on release to service layer
rainersigwald ded86d5
Switch system off more gracefully on non-Windows
rainersigwald ad77314
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 0c0c0b4
Release nodes on reacquire
rainersigwald a16ca54
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 1a497d7
Merge remote-tracking branch 'upstream/master' into exp/resource-mana…
rainersigwald 7a4e1e0
Implicit core for nonblocking 1 return
rainersigwald 3d2575c
Allow environment variable MSBUILDRESOURCEMANAGEROVERSUBSCRIPTION to …
rainersigwald c416bf5
Revert "Release nodes on reacquire"
rainersigwald 444ffde
WIP: acquire/release resources in the scheduler
rainersigwald 761fb6a
Update documentation/specs/resource-management.md
cdmihai 2fb5789
Merge remote-tracking branch 'dotnet/master' into exp/resource-manage…
ladipro 2ce2af8
Remove Semaphore-based logic
ladipro e3439e4
Plumbing to pass ResourceRequest to scheduler and ResourceResponse ba…
ladipro 1653806
Add missing files: ResourceRequest.cs, ResourceResponse.cs
ladipro 6d27a28
Plumbing fixes
ladipro df40170
Implement scheduling policy
ladipro 3d03f86
Use 'implicit core' always, not only when scheduler returns 0
ladipro ba264da
Use different limits for scheduling and explicit core requests
ladipro d12bd75
Make RequestCores ignore the executing request count
ladipro 1a6326e
Subtract one from _coreLimit
ladipro 0b1a487
Change return value of RequestCores to int (null is no longer used)
ladipro d38500b
Do not assume that RequestCores calls come only from Executing requests
ladipro 445ec33
Make RequestCores block and wait for at least one core
ladipro a98adee
Revert "Use 'implicit core' always, not only when scheduler returns 0"
ladipro 15121be
Make the first RequestCores call non-blocking (via 'implicit' core)
ladipro 02bce34
Make the implicit core the last one to release (LIFO)
ladipro b382aab
Introduce MSBUILDNODECOREALLOCATIONWEIGHT
ladipro 8dbe055
Merge remote-tracking branch 'dotnet/main' into exp/resource-management
ladipro 14c5d3d
Allow calling RequestCores/ReleaseCores after Yield
ladipro aad6f76
Update resource-management.md
ladipro 3a554b8
Comments, renames, and tweaks
ladipro 63d782e
Remove the now unused ResourceManagerService
ladipro 874ecf3
Comments, renames, and tweaks
ladipro 8ed06ad
Comments, renames, and tweaks
ladipro 97e9f5d
Revert string changes
ladipro 531ee86
Tweaks in Scheduler.cs
ladipro 27865a8
Refactor SchedulingData & SchedulableRequest, don't consider nodes wi…
ladipro 8016cf9
Don't wait for ResourceResponse under a lock
ladipro d10923f
Release all cores on reacquire
ladipro 3ac6642
Renames and tweaks
ladipro 059ac7b
Add unit tests to TaskHost_Tests
ladipro 1340f7c
Fix bug in scheduler where cores were not granted during build rundown
ladipro f2381ad
Rework RequestCores concurrency
ladipro 605f6f2
Fix ArgumentOutOfRangeException in ReleaseAllCores
ladipro 876fef6
Add ResourceManagement_Tests
ladipro f9f0185
Move state check and make tests Core-compatible
ladipro 719425b
PR feedback: Make instantiation of ResourceRequest more readable
ladipro 2a6cabe
PR feedback: Move the new API to IBuildEngine9
ladipro 8b53959
PR feedback: Add comments to RequestBuilder.RequestCores
ladipro 74046a0
PR feedback: Document callers of IRequestBuilderCallback.RequestCores…
ladipro 93db0c1
PR feedback: Comment new environment variables
ladipro de74af6
PR feedback: Add RequestCores/ReleaseCores logging
ladipro File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Managing tools with their own parallelism in MSBuild | ||
|
||
MSBuild supports building projects in parallel using multiple processes. Most users opt into `Environment.ProcessorCount` parallelism at the MSBuild layer. | ||
|
||
In addition, tools sometimes support parallel execution. The Visual C++ compiler `cl.exe` supports `/MP[n]`, which parallelizes compilation at the translation-unit (file) level. If a number isn't specified, it defaults to `NUM_PROCS`. | ||
|
||
When used in combination, `NUM_PROCS * NUM_PROCS` compiler processes can be launched, all of which would like to do file I/O and intense computation. This generally overwhelms the operating system's scheduler and causes thrashing and terrible build times. | ||
|
||
As a result, the standard guidance is to use only one multiproc option: MSBuild's _or_ `cl.exe`'s. But that leaves the machine underloaded when things could be happening in parallel. | ||
|
||
## Design | ||
|
||
`IBuildEngine` will be extended to allow a task to indicate to MSBuild that it would like to consume more than one CPU core (`RequestCores`). These will be advisory only — a task can still do as much work as it desires with as many threads and processes as it desires. | ||
|
||
A cooperating task would limit its own parallelism to the number of CPU cores MSBuild can reserve for the requesting task. | ||
|
||
`RequestCores(int requestedCores)` will always return a positive value, possibly less than the parameter if that many cores are not available. If no cores are available at the moment, the call blocks until at least one becomes available. The first `RequestCores` call made by a task is guaranteed to be non-blocking, though, as at minimum it will return the "implicit" core allocated to the task itself. This leads to two conceptual ways of adopting the API. Either the task calls `RequestCores` once, passing the desired number of cores, and then limiting its parallelism to whatever the call returns. Or the task makes additional calls throughout its execution, perhaps as it discovers more work to do. In this second scenario the task must be OK with waiting for additional cores for a long time or even forever if the sum of allocated cores has exceeded the limit defined by the policy. | ||
|
||
All resources acquired by a task will be automatically returned when the task's `Execute()` method returns, and a task can optionally return a subset by calling `ReleaseCores`. Additionally, all resources will be returned when the task calls `Reacquire` as this call is a signal to the scheduler that external tools have finished their work and the task can continue running. It does not matter when the resources where allocated - whether it was before or after calling `Yield` - they will all be released. Depending on the scheduling policy, freeing resources on `Reacquire` may prevent deadlocks. | ||
|
||
The exact core reservation policy and its interaction with task execution scheduling is still TBD. The pool of resources explicitly allocated by tasks may be completely separate, i.e. MSBuild will not wait until a resource is freed before starting execution of new tasks. Or it may be partially or fully shared to prevent oversubscribing the machine. In general, `ReleaseCores` may cause a transition of a waiting task to a Ready state. And vice-versa, completing a task or calling `Yield` may unblock a pending `RequestCores` call issued by a task. | ||
|
||
## Example 1 | ||
|
||
In a 16-process build of a solution with 30 projects, 16 worker nodes are launched and begin executing work. Most block on dependencies to projects `A`, `B`, `C`, `D`, and `E`, so they don't have tasks running holding resources. | ||
|
||
Task `Work` is called in project `A` with 25 inputs. It would like to run as many as possible in parallel. It calls | ||
|
||
```C# | ||
int allowedParallelism = BuildEngine8.RequestCores(Inputs.Count); // Inputs.Count == 25 | ||
``` | ||
|
||
and gets up to `16`--the number of cores available to the build overall. | ||
|
||
While `A` runs `Work`, projects `B` and `C` run another task `Work2` that also calls `RequestCores` with a high value. Since `Work` in `A` has reserved all cores, the calls in `B` and `C` may return only 1, indicating that the task should not be doing parallel work. Subsequent `RequestCores` may block, waiting on `Work` to release cores (or return). | ||
|
||
When `Work` returns, MSBuild automatically returns all resources reserved by the task to the pool. At that time blocked `RequestCores` calls in `Work2` may unblock. | ||
|
||
## Implementation | ||
|
||
The `RequestCores` and `ReleaseCores` calls are marshaled back to the scheduler via newly introduced `INodePacket` implementations. The scheduler, having full view of the state of the system - i.e. number of build requests running, waiting, yielding, ..., number of cores explicitly allocated by individual tasks using the new API - is free to implement an arbitrary core allocation policy. In the initial implementation the policy will be controlled by a couple of environment variables to make it easy to test different settings. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is well-written, but it reads to me as general statement of purpose --> example --> actually, we only care about the example. Although cl.exe was the motivating example, I imagine others will start using it later, so I'd focus on describing this generally rather than diving into a specific example in the first section. On a related note, are you planning to make documentation not in the
specs
folder? If so, this may be a moot point.