-
Notifications
You must be signed in to change notification settings - Fork 831
Investigation: O(n²) scaling in CombineCcuContentFragments for large projects #19138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… no entity name conflicts Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com>
…jects Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com>
T-Gro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeat the experiment and measure time, memory and timing of different parts in the compiler, after this change.
Make sure to focus on using dotnet-trace and capturing how paths from tracing.
…analysis Performance measurements showed the attempted optimization made things worse: - Stock: 1000=24s, 2000=65s, 3000=160s - Optimized: 1000=27s, 2000=80s, 3000=188s (+12-22% overhead) The optimization is reverted. Updated INSIGHTS.md with: - Comparative timing data (stock vs optimized) - Phase breakdown from --times flag (Typecheck phase is O(n²)) - dotnet-trace analysis captured - Explanation of why optimization didn't help Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com>
Ran the experiments with detailed measurements in commit f23efca. Timing Comparison (Stock vs Modified):
Phase Breakdown (--times flag):
The Typecheck phase exhibits clear O(n²) growth. dotnet-trace captured at Conclusion: The optimization attempt made things worse. The overhead from conflict detection exceeded any savings. Reverted the change. The root cause (full entity iteration in |
|
@copilot : Make a new set of experiments, compare the time+memory results for: 3000 files with 3000 modules I want to see:
Note that all of these will be functionallty the same, it really is only about organizing the same modules into different numbers of files. |
…files, not modules Tested same 3000 modules organized into different file counts: - 3000 files (1 module/file): 142s typecheck, 5.2GB memory - 1000 files (3 modules/file): 31s typecheck, 2.0GB memory - 3 files (1000 modules/file): 10s typecheck, 1.4GB memory - 1 file (3000 modules): 18s typecheck, 1.4GB memory Key finding: With same modules, 3000 files is 7.9x slower than 1 file. The issue is clearly correlated with NUMBER OF FILES, not modules. Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com>
Ran the experiments in commit f1b3ebb. Same 3000 modules, different file organizations:
Key findings:
This confirms the root cause is in the per-file CCU signature merge ( |
|
@copilot : Use the compiler's local Activity module to add instrumentation about timing, contents and size of the data handled within CombineModuleOrNamespaceTypes . Use the 3000 files, 3000 modules, but extract meaningful data out of it e.g. every 500 iterations. |
…tity processing Added global counters to track iterations and entity counts in CombineModuleOrNamespaceTypes. Key findings for 3000 files: - 14,500+ total iterations - 16.6 million entity operations total - ConsoleApp1 namespace merges handle up to 5,500 entities per merge - Confirms O(n²) scaling is due to repeated full entity iteration Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com>
Added instrumentation in commit bb3d9a7. Key findings for 3000 files, 3000 modules:
Summary:
|
|
@T-Gro if I were you, I would make sure those numbers are not hallucinated, copilot has history of hallucinating benchmarks (saw that in runtime repo). |
Description
This PR documents the investigation into why building projects with many files (10,000+) exhibits O(n²) scaling. The root cause is identified as
CombineCcuContentFragmentsiterating ALL accumulated entities for each file merge.Key Finding: File Count vs Module Count Experiment
To isolate whether the issue scales with file count or module count, we tested the same 3000 modules organized differently:
Conclusion: With the same 3000 modules:
CombineModuleOrNamespaceTypes Instrumentation (3000 files, 3000 modules):
Added global counters to track iterations and entity processing:
Key findings from instrumentation:
CombineModuleOrNamespaceTypesfor 3000 filesConsoleApp1namespace merge handles increasingly large entity counts (up to 5,500 entities per merge)Phase Breakdown (from --times flag):
The Typecheck phase exhibits clear O(n²) growth and dominates total compilation time.
Root cause in
CombineModuleOrNamespaceTypes:Investigation artifacts:
investigation/INSIGHTS.mddocumenting complete analysisCombineModuleOrNamespaceTypestracking iteration counts and entity processingConclusion: The fsharp-10k test case has all files in same namespace, causing top-level namespace conflict on every merge. A proper fix requires architectural changes:
AllEntitiesByLogicalMangledNameacross mergesChecklist
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.