Profiler-driven performance optimization: CPU profiling, hotspot fixes, and new benchmarks#7
Merged
MichaConrad merged 6 commits intomainfrom Feb 22, 2026
Conversation
Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
…rmToEdges; new benchmarks for all APIs Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add CPU profiling to benchmark project
Profiler-driven performance optimization: CPU profiling, hotspot fixes, and new benchmarks
Feb 22, 2026
There was a problem hiding this comment.
Pull request overview
This PR adds profiler-enabled benchmarks and applies several targeted allocation/CPU optimizations to the constrained Delaunay triangulation implementation, with the goal of improving hot-path performance while expanding benchmark coverage for previously unmeasured APIs.
Changes:
- Enable EventPipe CPU sampling profiling on benchmark suites and add new benchmark cases for conforming and finalization APIs.
- Reduce allocations in constrained/conforming insertion paths (workspace reuse, stack/list reuse) and pre-size core collections on first insertion.
- Micro-optimize hot paths (custom
Edge.GetHashCode, cachedtwoconstant, delegate-free quickselect comparers,IsFlipNeededfast paths).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/CDT.Core/Types.cs | Replaces Edge.GetHashCode implementation to reduce hashing overhead. |
| src/CDT.Core/Triangulation.cs | Multiple performance refactors: capacity preallocation, workspace reuse, delegate-free NthElement, cached constants, and IsFlipNeeded fast paths. |
| src/CDT.Core/KdTree.cs | Caches constant two and updates midpoint computation to use cached value. |
| benchmark/CDT.Benchmarks/Benchmarks.cs | Adds EventPipe profiler attributes and new benchmarks for conforming/finalization APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…tHashCode, fix per-vertex Stack allocation Co-authored-by: MichaCo <5837539+MichaCo@users.noreply.github.com>
MichaConrad
approved these changes
Feb 22, 2026
Collaborator
MichaConrad
left a comment
There was a problem hiding this comment.
looks good 👍
performance and memory allocation actually improved quite a bit
Start:
| Method | Categories | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|---|
| 'Conforming – Auto' | Conforming | 2.028 ms | 0.2340 ms | 0.0128 ms | 89.8438 | 89.8438 | 89.8438 | 1417.03 KB |
| 'Constrained – AsProvided' | Constrained | 3.021 ms | 0.1272 ms | 0.0070 ms | 89.8438 | 89.8438 | 89.8438 | 1220.82 KB |
| 'Constrained – Auto' | Constrained | 1.752 ms | 0.2863 ms | 0.0157 ms | 89.8438 | 89.8438 | 89.8438 | 1196.81 KB |
| 'Full pipeline – Auto' | FullPipeline | 2.325 ms | 0.1574 ms | 0.0086 ms | 179.6875 | 89.8438 | 89.8438 | 1581.55 KB |
| 'Vertices only – AsProvided' | VerticesOnly | 2.745 ms | 0.2556 ms | 0.0140 ms | 89.8438 | 89.8438 | 89.8438 | 1040.3 KB |
| 'Vertices only – Auto' | VerticesOnly | 1.701 ms | 0.5033 ms | 0.0276 ms | 89.8438 | 89.8438 | 89.8438 | 1016.29 KB |
Final (with some new benchmarks)
| Method | Categories | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|---|
| 'Conforming – Auto' | Conforming | 1.589 ms | 0.1569 ms | 0.0086 ms | 117.1875 | 117.1875 | 117.1875 | 1023.33 KB |
| 'Conforming – AsProvided' | Conforming | 2.315 ms | 0.4291 ms | 0.0235 ms | 117.1875 | 117.1875 | 117.1875 | 1008.67 KB |
| 'Constrained – AsProvided' | Constrained | 2.196 ms | 0.1036 ms | 0.0057 ms | 39.0625 | 39.0625 | 39.0625 | 417.46 KB |
| 'Constrained – Auto' | Constrained | 1.339 ms | 0.0211 ms | 0.0012 ms | 39.0625 | 39.0625 | 39.0625 | 433.15 KB |
| 'EraseSuperTriangle – Auto' | Finalization | 1.585 ms | 0.0344 ms | 0.0019 ms | 39.0625 | 39.0625 | 39.0625 | 509.9 KB |
| 'EraseOuterTriangles – Auto' | Finalization | 1.740 ms | 0.1420 ms | 0.0078 ms | 39.0625 | 39.0625 | 39.0625 | 629.86 KB |
| 'Full pipeline – Auto' | FullPipeline | 1.944 ms | 0.0956 ms | 0.0052 ms | 78.1250 | 39.0625 | 39.0625 | 817.9 KB |
| 'Vertices only – AsProvided' | VerticesOnly | 2.075 ms | 0.0723 ms | 0.0040 ms | 39.0625 | 39.0625 | 39.0625 | 265.48 KB |
| 'Vertices only – Auto' | VerticesOnly | 1.164 ms | 0.0873 ms | 0.0048 ms | 39.0625 | 39.0625 | 39.0625 | 281.22 KB |
SummarySummary
CoverageCDT.Core - 85%
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds CPU profiling to benchmarks, runs two full profile→fix iterations, and extends benchmark coverage to previously untested APIs. All 997 tests pass; CodeQL clean.
CPU Profiling Setup
[EventPipeProfiler(EventPipeProfile.CpuSampling)]to bothConstrainedSwedenBenchmarksandSmallDatasetBenchmarks— produces.nettrace/.speedscope.jsonalongside BDN resultsOptimization Candidates A–E (initial pass)
Types.cs):Edge.GetHashCode— kept asHashCode.Combine(V1, V2)for collision-resistance inHashSet<Edge>/Dictionary<Edge,...>Triangulation.cs): LiftpolyL/polyR/outerTris/intersectedout ofInsertEdgeIteration— allocate once inInsertEdges, pass as parameters andClear()per callTriangulation.cs):InsertVertexInsideTriangle/InsertVertexOnEdgechanged to accept a caller-ownedStack<int>(cleared at entry). NewInsertVertex(int, int, Stack<int>)andInsertVertex(int, Stack<int>)overloads let all three bulk insertion loops (KDTreeBFS,Randomized,AsProvided) allocate one stack per batch and thread it through every vertex — eliminates O(N) per-vertex stack allocationsKdTree.cs): CacheT.One + T.Oneasprivate readonly T _twoin both constructors;GetMidchanged to instance methodTriangulation.cs):NthElementmade generic overTComparer : struct, IComparer<int>; tworeadonly structcomparers replace the capturing lambdas inInsertVertices_KDTreeBFS— zero delegate allocationsHotspot Fixes F–G (first profile round)
Profile showed
List<Triangle>.AddWithResizeat 4.9–11.6% andIsFlipNeededat 9–20%.Triangulation.cs): Pre-allocate_triangles,_vertices,_vertTrisinInsertVerticeson first insertion using Euler's formula (~2N triangles for N points):Triangulation.cs): Fast path inIsFlipNeeded: skip_fixedEdges.Containsentirely when_fixedEdges.Count == 0(pure vertex-insertion path). Original branch structure and orientation predicates are preserved exactly.Hotspot Fixes H–I (second profile round, all APIs)
Triangulation.cs):_two = T.One + T.Onecached asreadonlyfield onTriangulation<T>; removes recomputation inAddSuperTriangleandConformToEdgeIterationTriangulation.cs): Thread a single reusableStack<int>+List<Edge>fromConformToEdgesthroughConformToEdgeIteration→InsertVertex_FlipFixedEdges— eliminates O(N) per-midpoint-insertion allocations in the conforming pathNew Benchmarks
Added to cover previously untested APIs:
ConstrainedSwedenBenchmarksConforming–AsProvided,EraseSuperTriangle–Auto,EraseOuterTriangles–AutoSmallDatasetBenchmarksSmall–ConformingAuto,Small–EraseSuperTriangle,Small–EraseOuterTrianglesAndHolesBenchmark Delta (ConstrainedSweden, ~2 600 verts / ~2 600 edges)
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.