perf(codegen): size-optimize oversized IR units at -Os instead of -O0#5674
Conversation
Oversized modules (IR past PERRY_LL_O0_THRESHOLD_BYTES, 6 MB) were compiled at clang -O0 to dodge the #4880 pathology: a unit dominated by ONE enormous generated function (a multi-thousand-element data literal lowering to a single ~800k-line function) makes LLVM's -O1+ pipeline super-linear and effectively never finishes. But -O0 is the wrong default for the *common* oversized case. A large minified CLI bundle is tens of thousands of ordinary functions, none individually huge, and -O0 emits register-pressure-blind, un-folded code: ~30-50% more __text than -Os for no benefit. That generated code is the overwhelming majority of such a binary's __text (measured: 253 MB of a 266 MB binary is codegen output, not the runtime/stdlib), so the opt level on these units dominates final binary size. Pick the level by average IR bytes-per-function, which cleanly separates the two regimes (>100x gap): a pathological monolith is megabytes-per-function (the #4880 400k-element literal is ~9 MB/fn), a real bundle is ~20 KB/fn. Oversized units at or below PERRY_LL_SIZE_OPT_MAX_FN_BYTES (256 KB) size-optimize at -Os; denser units keep -O0. A giant-literal unit always has very few, very large functions and so never reaches the -Os pipeline. Measured (whole real CLI-bundle IR unit, 15440 functions, ~20 KB/fn): -O0 __text 63 MB -> -Os __text 32 MB (-49%), clang 47s -> 93s. End-to-end (synthetic 9000-function program): -Os and -O0 produce byte-identical program output; binary __text 50.7 MB -> 40.7 MB. Small modules (<6 MB IR) are untouched (still -O3). New escape hatches: PERRY_LL_SIZE_OPT=0/1 forces the old -O0 / always -Os, PERRY_LL_SIZE_OPT_MAX_FN_BYTES tunes the density cap.
📝 WalkthroughWalkthroughAdds function-density-aware optimization selection for oversized LLVM IR units. ChangesOversized LLVM IR optimization selection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/perry-codegen/src/linker.rs`:
- Around line 123-125: The function counting logic in count_ll_functions only
matches occurrences preceded by a newline, so it misses a function when the IR
starts with define on the first line. Update count_ll_functions in linker.rs to
count a leading define as well as newline-prefixed ones, so the
bytes-per-function heuristic used for optimization level selection stays
accurate and does not incorrectly fall back to -O0 for borderline units.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f4266e4f-7218-4c72-81a8-d329f33eb1bf
📒 Files selected for processing (1)
crates/perry-codegen/src/linker.rs
| fn count_ll_functions(ll_text: &str) -> usize { | ||
| ll_text.match_indices("\ndefine ").count() | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Make function counting robust to a first-line define.
Line 124 misses a function if IR starts with define on line 1, which can skew bytes-per-function and select -O0 unnecessarily for borderline oversized units.
Suggested patch
fn count_ll_functions(ll_text: &str) -> usize {
- ll_text.match_indices("\ndefine ").count()
+ ll_text
+ .lines()
+ .filter(|line| line.starts_with("define "))
+ .count()
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn count_ll_functions(ll_text: &str) -> usize { | |
| ll_text.match_indices("\ndefine ").count() | |
| } | |
| fn count_ll_functions(ll_text: &str) -> usize { | |
| ll_text | |
| .lines() | |
| .filter(|line| line.starts_with("define ")) | |
| .count() | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/perry-codegen/src/linker.rs` around lines 123 - 125, The function
counting logic in count_ll_functions only matches occurrences preceded by a
newline, so it misses a function when the IR starts with define on the first
line. Update count_ll_functions in linker.rs to count a leading define as well
as newline-prefixed ones, so the bytes-per-function heuristic used for
optimization level selection stays accurate and does not incorrectly fall back
to -O0 for borderline units.
Problem
A large minified CLI bundle compiles to a ~320 MB native binary. Attribution shows the bloat is perry's own generated code, not the Rust runtime/stdlib/ext libs:
__textcli_ts.o(perry codegen output)So ~95% of the binary's code is the generated
.o. The runtime + stdlib are a small fraction; opt-level/panic/feature-gating levers on them can't move the needle. The lever is how perry compiles its generated IR.Root cause
Oversized modules (IR past
PERRY_LL_O0_THRESHOLD_BYTES, 6 MB) are compiled at clang-O0. That exists to dodge #4880: a unit dominated by ONE enormous generated function (a multi-thousand-element data literal → a single ~800k-line function) makes LLVM's-O1+pipeline super-linear and effectively never finishes (verified: clang times out >10 min at any optimizing level on such a function).But
-O0is the wrong default for the common oversized case. A large bundle is tens of thousands of ordinary functions, none individually huge, and-O0emits register-pressure-blind, un-folded code — ~30-50% more__textthan-Osfor no benefit.Fix
Pick the opt level by average IR bytes-per-function, which cleanly separates the two regimes (>100x gap):
-O0-OsOversized units at or below
PERRY_LL_SIZE_OPT_MAX_FN_BYTES(256 KB/fn) size-optimize at-Os; denser units keep-O0. A giant-literal unit always has very few, very large functions, so it never reaches the-Ospipeline. Modules under 6 MB IR are untouched (still-O3).Measurements
-O0__text63 MB ->-Os32 MB (-49%); clang 47s -> 93s.-Osand-O0produce byte-identical program output; binary__text50.7 MB -> 40.7 MB.-O3, output matchesnode --experimental-strip-types.-Ozmatches-Oson size but is 2-3x slower, so-Osis the sweet spot.Safety / escape hatches
PERRY_LL_SIZE_OPT=0-> force old-O0;=1-> force-Osregardless of density.PERRY_LL_SIZE_OPT_MAX_FN_BYTEStunes the density cap.Tests
New unit tests cover both branches. Full
perry-codegenlib suite green (129 tests).Summary by CodeRabbit
New Features
Bug Fixes