Skip to content

Profiler-driven performance optimization: CPU profiling, hotspot fixes, and new benchmarks#7

Merged
MichaConrad merged 6 commits intomainfrom
copilot/add-cpu-profiling-to-benchmarks
Feb 22, 2026
Merged

Profiler-driven performance optimization: CPU profiling, hotspot fixes, and new benchmarks#7
MichaConrad merged 6 commits intomainfrom
copilot/add-cpu-profiling-to-benchmarks

Conversation

Copy link
Contributor

Copilot AI commented Feb 22, 2026

Adds CPU profiling to benchmarks, runs two full profile→fix iterations, and extends benchmark coverage to previously untested APIs. All 997 tests pass; CodeQL clean.

CPU Profiling Setup

  • Added [EventPipeProfiler(EventPipeProfile.CpuSampling)] to both ConstrainedSwedenBenchmarks and SmallDatasetBenchmarks — produces .nettrace/.speedscope.json alongside BDN results

Optimization Candidates A–E (initial pass)

  • A (Types.cs): Edge.GetHashCode — kept as HashCode.Combine(V1, V2) for collision-resistance in HashSet<Edge> / Dictionary<Edge,...>
  • B (Triangulation.cs): Lift polyL/polyR/outerTris/intersected out of InsertEdgeIteration — allocate once in InsertEdges, pass as parameters and Clear() per call
  • C (Triangulation.cs): InsertVertexInsideTriangle / InsertVertexOnEdge changed to accept a caller-owned Stack<int> (cleared at entry). New InsertVertex(int, int, Stack<int>) and InsertVertex(int, Stack<int>) overloads let all three bulk insertion loops (KDTreeBFS, Randomized, AsProvided) allocate one stack per batch and thread it through every vertex — eliminates O(N) per-vertex stack allocations
  • D (KdTree.cs): Cache T.One + T.One as private readonly T _two in both constructors; GetMid changed to instance method
  • E (Triangulation.cs): NthElement made generic over TComparer : struct, IComparer<int>; two readonly struct comparers replace the capturing lambdas in InsertVertices_KDTreeBFS — zero delegate allocations

Hotspot Fixes F–G (first profile round)

Profile showed List<Triangle>.AddWithResize at 4.9–11.6% and IsFlipNeeded at 9–20%.

  • F (Triangulation.cs): Pre-allocate _triangles, _vertices, _vertTris in InsertVertices on first insertion using Euler's formula (~2N triangles for N points):
    _triangles.EnsureCapacity(2 * n + 4);
    _vertices.EnsureCapacity(n + Indices.SuperTriangleVertexCount);
    _vertTris.EnsureCapacity(n + Indices.SuperTriangleVertexCount);
  • G (Triangulation.cs): Fast path in IsFlipNeeded: skip _fixedEdges.Contains entirely when _fixedEdges.Count == 0 (pure vertex-insertion path). Original branch structure and orientation predicates are preserved exactly.

Hotspot Fixes H–I (second profile round, all APIs)

  • H (Triangulation.cs): _two = T.One + T.One cached as readonly field on Triangulation<T>; removes recomputation in AddSuperTriangle and ConformToEdgeIteration
  • I (Triangulation.cs): Thread a single reusable Stack<int> + List<Edge> from ConformToEdges through ConformToEdgeIterationInsertVertex_FlipFixedEdges — eliminates O(N) per-midpoint-insertion allocations in the conforming path

New Benchmarks

Added to cover previously untested APIs:

Class New benchmarks
ConstrainedSwedenBenchmarks Conforming–AsProvided, EraseSuperTriangle–Auto, EraseOuterTriangles–Auto
SmallDatasetBenchmarks Small–ConformingAuto, Small–EraseSuperTriangle, Small–EraseOuterTrianglesAndHoles

Benchmark Delta (ConstrainedSweden, ~2 600 verts / ~2 600 edges)

Benchmark Before After Δ Time Δ Alloc
Vertices only – Auto 2.20 ms 1.89 ms −14% 863 KB → 492 KB −43%
Vertices only – AsProvided 4.12 ms 3.66 ms −11% 995 KB → 625 KB −37%
Constrained – Auto 4.94 ms 4.71 ms −5% 1015 KB → 644 KB −37%
Constrained – AsProvided 6.79 ms 6.45 ms −5% 1147 KB → 777 KB −32%
Full pipeline – Auto 14.32 ms 13.84 ms −3% 1399 KB → 1029 KB −26%

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits February 22, 2026 17:53
Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
…rmToEdges; new benchmarks for all APIs

Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
Copilot AI changed the title [WIP] Add CPU profiling to benchmark project Profiler-driven performance optimization: CPU profiling, hotspot fixes, and new benchmarks Feb 22, 2026
Copilot AI requested a review from MichaConrad February 22, 2026 18:16
@MichaConrad MichaConrad requested review from Copilot and removed request for MichaConrad February 22, 2026 18:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds profiler-enabled benchmarks and applies several targeted allocation/CPU optimizations to the constrained Delaunay triangulation implementation, with the goal of improving hot-path performance while expanding benchmark coverage for previously unmeasured APIs.

Changes:

  • Enable EventPipe CPU sampling profiling on benchmark suites and add new benchmark cases for conforming and finalization APIs.
  • Reduce allocations in constrained/conforming insertion paths (workspace reuse, stack/list reuse) and pre-size core collections on first insertion.
  • Micro-optimize hot paths (custom Edge.GetHashCode, cached two constant, delegate-free quickselect comparers, IsFlipNeeded fast paths).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/CDT.Core/Types.cs Replaces Edge.GetHashCode implementation to reduce hashing overhead.
src/CDT.Core/Triangulation.cs Multiple performance refactors: capacity preallocation, workspace reuse, delegate-free NthElement, cached constants, and IsFlipNeeded fast paths.
src/CDT.Core/KdTree.cs Caches constant two and updates midpoint computation to use cached value.
benchmark/CDT.Benchmarks/Benchmarks.cs Adds EventPipe profiler attributes and new benchmarks for conforming/finalization APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Michael Conrad and others added 2 commits February 22, 2026 19:28
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…tHashCode, fix per-vertex Stack allocation

Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
Copy link
Collaborator

@MichaConrad MichaConrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍
performance and memory allocation actually improved quite a bit

Start:

Method Categories Mean Error StdDev Gen0 Gen1 Gen2 Allocated
'Conforming – Auto' Conforming 2.028 ms 0.2340 ms 0.0128 ms 89.8438 89.8438 89.8438 1417.03 KB
'Constrained – AsProvided' Constrained 3.021 ms 0.1272 ms 0.0070 ms 89.8438 89.8438 89.8438 1220.82 KB
'Constrained – Auto' Constrained 1.752 ms 0.2863 ms 0.0157 ms 89.8438 89.8438 89.8438 1196.81 KB
'Full pipeline – Auto' FullPipeline 2.325 ms 0.1574 ms 0.0086 ms 179.6875 89.8438 89.8438 1581.55 KB
'Vertices only – AsProvided' VerticesOnly 2.745 ms 0.2556 ms 0.0140 ms 89.8438 89.8438 89.8438 1040.3 KB
'Vertices only – Auto' VerticesOnly 1.701 ms 0.5033 ms 0.0276 ms 89.8438 89.8438 89.8438 1016.29 KB

Final (with some new benchmarks)

Method Categories Mean Error StdDev Gen0 Gen1 Gen2 Allocated
'Conforming – Auto' Conforming 1.589 ms 0.1569 ms 0.0086 ms 117.1875 117.1875 117.1875 1023.33 KB
'Conforming – AsProvided' Conforming 2.315 ms 0.4291 ms 0.0235 ms 117.1875 117.1875 117.1875 1008.67 KB
'Constrained – AsProvided' Constrained 2.196 ms 0.1036 ms 0.0057 ms 39.0625 39.0625 39.0625 417.46 KB
'Constrained – Auto' Constrained 1.339 ms 0.0211 ms 0.0012 ms 39.0625 39.0625 39.0625 433.15 KB
'EraseSuperTriangle – Auto' Finalization 1.585 ms 0.0344 ms 0.0019 ms 39.0625 39.0625 39.0625 509.9 KB
'EraseOuterTriangles – Auto' Finalization 1.740 ms 0.1420 ms 0.0078 ms 39.0625 39.0625 39.0625 629.86 KB
'Full pipeline – Auto' FullPipeline 1.944 ms 0.0956 ms 0.0052 ms 78.1250 39.0625 39.0625 817.9 KB
'Vertices only – AsProvided' VerticesOnly 2.075 ms 0.0723 ms 0.0040 ms 39.0625 39.0625 39.0625 265.48 KB
'Vertices only – Auto' VerticesOnly 1.164 ms 0.0873 ms 0.0048 ms 39.0625 39.0625 39.0625 281.22 KB

@MichaConrad MichaConrad marked this pull request as ready for review February 22, 2026 18:41
@github-actions
Copy link

Summary

Summary
Generated on: 2/22/2026 - 6:43:27 PM
Coverage date: 2/22/2026 - 6:43:23 PM
Parser: Cobertura
Assemblies: 1
Classes: 17
Files: 7
Line coverage: 85% (1257 of 1478)
Covered lines: 1257
Uncovered lines: 221
Coverable lines: 1478
Total lines: 3694
Branch coverage: 79.9% (606 of 758)
Covered branches: 606
Total branches: 758
Method coverage: Feature is only available for sponsors
Tag: 46_22282827581

Coverage

CDT.Core - 85%
Name Line Branch
CDT.Core 85% 79.9%
CDT.Box2d`1 68.7% 100%
CDT.CdtUtils 83.1% 72.8%
CDT.CovariantReadOnlyDictionary`3 18.1% 50%
CDT.DictionaryExtensions 100% 100%
CDT.DuplicatesInfo 100%
CDT.DuplicateVertexException 66.6%
CDT.Edge 75% 66.6%
CDT.IntersectingConstraintsException 0%
CDT.KdTree`1 72.7% 65%
CDT.LayerDepth 0%
CDT.Predicates.PredicatesAdaptive 97.6% 91.5%
CDT.Predicates.PredicatesExact 33.7% 100%
CDT.TopologyVerifier 77.2% 63.8%
CDT.Triangle 66.6% 71.4%
CDT.Triangulation`1 93.7% 83.9%
CDT.TriangulationException 100%
CDT.V2d`1 44.4% 25%

@MichaConrad MichaConrad merged commit 9184854 into main Feb 22, 2026
4 checks passed
@MichaConrad MichaConrad deleted the copilot/add-cpu-profiling-to-benchmarks branch February 22, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants