Skip to content

Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion#13855

Open
dfederm wants to merge 2 commits into
dotnet:mainfrom
dfederm:dfederm/msbuild-expanddefaulttargets-marker-dedup
Open

Dedup target list in ProjectGraph.ExpandDefaultTargets to prevent graph explosion#13855
dfederm wants to merge 2 commits into
dotnet:mainfrom
dfederm:dfederm/msbuild-expanddefaulttargets-marker-dedup

Conversation

@dfederm
Copy link
Copy Markdown
Contributor

@dfederm dfederm commented May 23, 2026

Context

Microsoft.Build.Graph.ProjectGraph builds the static graph by walking project references breadth-first. For each edge it expands ProjectReferenceTargets (PRT) items to decide which targets to propagate, including two markers:

  • .default → the referenced project's DefaultTargets
  • .projectReferenceTargetsOrDefaultTargets → the entry-point PRT value if set, else .default

If marker expansion produces a target that also appears literally in the same Targets metadata — or if the materialized Targets metadata itself contains duplicates — the per-edge target list grows. The downstream ProjectInterpretation.TargetsToPropagate.FromProjectAndEntryTargets cross-products each entry-target against every matching PRT, so an N-duplicate entry list at one hop produces ~N² propagations at the next. Across BFS depth D this becomes ~N^D, burning gigabytes and minutes on graphs of only a few dozen nodes.

The reproducer that surfaced this in a large internal codebase was two SDK targets files both prepending to the same property and both emitting <ProjectReferenceTargets Include="Build" Targets="$(...)"/>. The second emitter snapshotted a property value that already contained the marker, so the materialized Targets had the marker more than once and the explosion took off from hop 1. The authoring-side double-emission could also be fixed where it originates, but this PR is the engine-side guard rail so any future recurrence (or any literal Targets="Build;Build"-style authoring quirk) can't take the graph down.

Changes Made

The fix (src/Build/Graph/ProjectGraph.cs)

ExpandDefaultTargets now dedupes its output unconditionally, with a hybrid fast/slow shape:

  • Fast path for n ≤ 8 (the dominant BFS-hop size): inline O(n²) scan. If no marker is present and no duplicate is found, returns the input array unchanged with zero allocation.
  • Slow path (ExpandDefaultTargetsSlow): one HashSet<string> sized to count plus a lazily-allocated List<string> buffer. Single pass, expands markers in place, dedupes via the set. Used for n > 8 or when the fast scan flagged anything.

Dedup is OrdinalIgnoreCase, first-occurrence wins.

Why this is behavior-preserving (no ChangeWave)

GetTargetLists already collapses each per-node final target list to OrdinalIgnoreCase-unique entries via ImmutableHashSet + ImmutableList.AddRange, and ExpandDefaultTargets is only called from GetTargetLists. Adding inner dedup only changes BFS internal state (encounteredEdges set membership, per-edge requestedTargets list size) — never the public return value. No new public API, no new warnings or errors, no consumer-observable change.

Supporting refactor (justification below)

The dedup fix on its own is small. The PR also takes the BFS hot path off ImmutableList<string>, which is the dominant cost once the explosion is gone:

  • ProjectGraphBuildRequest.RequestedTargets: ImmutableList<string>string[]. Equals/GetHashCode use .Length + indexer, no virtual dispatch or AVL traversal.
  • BFS working-set types throughout ProjectGraph.cs: string[] cascade replaces ImmutableList<string> / IReadOnlyList<string>. ImmutableList<string> is kept at the public GetTargetLists boundary and in targetLists[node], where AddRange actually derives each version from the prior.
  • ProjectInterpretation.TargetsToPropagate: signature widened to string[]; _outerBuildTargets+_allTargets (two ImmutableList<TargetSpecification>) collapsed to a single TargetSpecification[] _allTargets + int _outerBuildTargetCount — one allocation/copy per source list instead of three+two.
  • PRT-emission loop: Where(...).Select(...).ToArray() + AddRange → direct foreach over the SemiColonTokenizer struct from ExpressionShredder.SplitSemiColonSeparatedList, appending via a ref local. Drops the WhereSelectArrayIterator state machine, an intermediate TargetSpecification[], and tokenizer boxing.
  • GetApplicableTargetsForReference returns string[] directly, sized for the no-skip common case and Array.Resized only when something is skipped. Drops LINQ state machine + LargeArrayBuilder doubling copies.

Why drop ImmutableList<T> here at all: the original draft used ImmutableList<string>.Builder for the dedup buffer. In review it became clear that AVL-tree ImmutableList<T> is materially more expensive per element than List<T>/string[], and the structural-sharing benefit doesn't apply on this path — every per-edge applicableTargets/expandedTargets is built fresh from raw ProjectReferenceTargets items, never as an .Add/.Remove derivative of a common ancestor, so the trees share zero internal nodes.

Testing

New file: src/Build.UnitTests/Graph/ProjectGraph_ExpandDefaultTargetsDedup_Tests.cs — 11 tests, covering:

  • Marker expansion producing a duplicate of an entry literal is deduped.
  • .projectReferenceTargetsOrDefaultTargets with literal duplicates is deduped.
  • Explicit non-marker duplicates (Build;Build;Build) are deduped.
  • Case-insensitive dedup, first occurrence wins.
  • Clean input returns the same instance (reference identity asserts the zero-allocation fast path).
  • Marker present, no duplicates produced.
  • Marker expands to empty DefaultTargets.
  • Every entry collapses to a singleton.
  • First-occurrence order preserved for mixed literal+marker+literal inputs.
  • End-to-end GetTargetLists smoke at depth 12 with the duplicate-marker shape — confirms result size stays bounded.
  • End-to-end GetTargetLists sanity at depth 6 for the common single-marker case.

All 11 pass on net10.0 and net48 (22/22). The rest of Microsoft.Build.Graph.UnitTests is unchanged by this PR; the 6 pre-existing failures (3 tests × 2 TFMs) are all [ActiveIssue("https://github.com/dotnet/msbuild/issues/4368")].

Performance evidence (why the refactor scope is justified)

End-to-end ProjectGraph.GetTargetLists(["Build"]) via BenchmarkDotNet on .NET 10.0.8, X64 RyuJIT AVX2, --job short. The graph is built once in [GlobalSetup] so the measured op is purely the BFS hot path. Each project carries <ProjectReferenceTargets Include="Build" Targets=".projectReferenceTargetsOrDefaultTargets;GetNativeManifest;_GetCopyToOutputDirectoryItemsFromThisProject"/> to mirror the realistic shape produced by Microsoft.Common.CurrentVersion.targets. V1 = upstream main at the PR base; the same benchmark DLL is rebuilt against each engine.

Scenario V1 Time / Alloc This PR Time / Alloc Time Alloc
15-node balanced binary tree 11.75 µs / 19.45 KB 5.77 µs / 14.34 KB 0.49× 0.74×
50-node balanced binary tree 25.81 µs / 68.91 KB 19.31 µs / 50.78 KB 0.75× 0.74×
200-node tree, fanout 3 90.33 µs / 262.14 KB 72.92 µs / 199.77 KB 0.81× 0.76×
100-node linear chain 71.01 µs / 171.09 KB 52.36 µs / 118.52 KB 0.74× 0.69×
Duplicate-marker PRT shape, 50 nodes (the bug) 8,972 µs / 18,069 KB 47.98 µs / 104.81 KB 0.005× 0.006×

Every realistic shape is 19–51% faster and 24–31% lower allocation. The pathological shape — which is what would otherwise OOM/hang on a large graph — is 187× faster and 172× lower allocation at just 50 nodes; because the growth is geometric in BFS depth, the gap widens rapidly past that.

Benchmark source is kept out of this PR to keep the diff focused; happy to upstream it separately if maintainers want it in ref/ or documentation/.

Notes

  • No ChangeWaves.md entry — there is no observable behavior change at the GetTargetLists boundary, only reduced internal work.
  • No new public API. No new diagnostics. No Strings.resx changes.
  • File deltas: ProjectGraph.cs +137/-33, ProjectInterpretation.cs +78/-33, new test file +268/-0.
  • The authoring-side issue that first triggered this (two SDK targets files prepending to the same property) has already been fixed outside this repo. This PR is the engine-side guard rail: any future recurrence — or any plain literal duplicate in Targets metadata — is now bounded.

…ph explosion

ProjectGraph.ExpandDefaultTargets now unconditionally dedupes its output via
a hybrid fast/slow path. The downstream BFS cross-products entry-targets
against matching PRT items, so any N-duplicate entry list at one hop becomes
~N^2 propagations at the next; over BFS depth D this is N^D edges. Duplicates
arise from PRT marker double-emission and from explicit literal duplicates
in Targets metadata.

ExpandDefaultTargets has a zero-allocation fast path (inline O(n^2) scan for
n <= 8) returning the input unchanged when no marker or duplicate is found,
and a HashSet-backed slow path otherwise. Dedup is OrdinalIgnoreCase,
first-occurrence wins, matching the existing post-BFS dedup at the public
GetTargetLists boundary -- so no consumer-observable behavior changes and no
ChangeWave is needed.

BFS hot path moved off ImmutableList<string>/ImmutableList<TargetSpecification>:
ProjectGraphBuildRequest.RequestedTargets, ExpandDefaultTargets,
TargetsToPropagate.FromProjectAndEntryTargets, and
GetApplicableTargetsForReference all flow string[] end-to-end.
TargetsToPropagate collapses two ImmutableLists to one flat
TargetSpecification[] + outer-build count. LINQ removed from the per-edge
loop in FromProjectAndEntryTargets. ImmutableList<string> is retained only
at the public GetTargetLists boundary and in targetLists[node] where the
AddRange chain actually derives each version from the prior.

11 new tests (22 with TFMs) cover dedup behavior plus end-to-end GetTargetLists
smoke at depth 12 with the duplicate-marker shape and depth 6 with the common
single-marker shape.

End-to-end GetTargetLists(["Build"]) benchmark on realistic .NET-shape
graphs: 19-51% faster and 24-31% lower allocation across small/medium/large
trees and linear chains; the pathological duplicate-marker shape is 187x
faster and 172x lower allocation at 50 nodes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 23, 2026 00:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/Build/Graph/ProjectInterpretation.cs Outdated
Comment thread src/Build/Graph/ProjectGraph.cs
Comment thread src/Build/Graph/ProjectGraph.cs
Comment thread src/Build/Graph/ProjectInterpretation.cs
- Wrap unit tests in TestEnvironment.Create(_output) to match the in-file
  end-to-end test pattern and the sibling ProjectGraph_Tests convention,
  ensuring evaluation state from in-memory Project instances doesn't leak
  across tests via the global ProjectCollection (Copilot bot suggestion).

- Defer the empty-list allocation in FromProjectAndEntryTargets until the
  first target is actually appended, avoiding an empty List<TargetSpecification>
  allocation when a matched PRT item has empty Targets metadata (Copilot bot
  suggestion).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants