Skip to content

perf(codegen): size-optimize oversized IR units at -Os instead of -O0#5674

Merged
proggeramlug merged 1 commit into
mainfrom
perf/codegen-size-opt-oversized-units
Jun 25, 2026
Merged

perf(codegen): size-optimize oversized IR units at -Os instead of -O0#5674
proggeramlug merged 1 commit into
mainfrom
perf/codegen-size-opt-oversized-units

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Problem

A large minified CLI bundle compiles to a ~320 MB native binary. Attribution shows the bloat is perry's own generated code, not the Rust runtime/stdlib/ext libs:

component __text
final binary 266 MB
cli_ts.o (perry codegen output) 253 MB

So ~95% of the binary's code is the generated .o. The runtime + stdlib are a small fraction; opt-level/panic/feature-gating levers on them can't move the needle. The lever is how perry compiles its generated IR.

Root cause

Oversized modules (IR past PERRY_LL_O0_THRESHOLD_BYTES, 6 MB) are compiled at clang -O0. That exists to dodge #4880: a unit dominated by ONE enormous generated function (a multi-thousand-element data literal → a single ~800k-line function) makes LLVM's -O1+ pipeline super-linear and effectively never finishes (verified: clang times out >10 min at any optimizing level on such a function).

But -O0 is the wrong default for the common oversized case. A large bundle is tens of thousands of ordinary functions, none individually huge, and -O0 emits register-pressure-blind, un-folded code — ~30-50% more __text than -Os for no benefit.

Fix

Pick the opt level by average IR bytes-per-function, which cleanly separates the two regimes (>100x gap):

Oversized units at or below PERRY_LL_SIZE_OPT_MAX_FN_BYTES (256 KB/fn) size-optimize at -Os; denser units keep -O0. A giant-literal unit always has very few, very large functions, so it never reaches the -Os pipeline. Modules under 6 MB IR are untouched (still -O3).

Measurements

  • Whole real-bundle IR unit (15440 functions, ~20 KB/fn): -O0 __text 63 MB -> -Os 32 MB (-49%); clang 47s -> 93s.
  • 4226-function unit: 16 MB -> 11 MB (-31%).
  • End-to-end synthetic 9000-function program: -Os and -O0 produce byte-identical program output; binary __text 50.7 MB -> 40.7 MB.
  • Small program (<6 MB IR): unchanged, still -O3, output matches node --experimental-strip-types.

-Oz matches -Os on size but is 2-3x slower, so -Os is the sweet spot.

Safety / escape hatches

Tests

New unit tests cover both branches. Full perry-codegen lib suite green (129 tests).

Summary by CodeRabbit

  • New Features

    • Large LLVM-IR builds now choose a smarter optimization level based on code density, improving the balance between build speed and output size.
    • Oversized inputs may now use size-focused optimization instead of always falling back to the least optimized path.
  • Bug Fixes

    • Improved handling of very large, single-function-style inputs so they still use the most conservative compile option when appropriate.
    • Updated optimization messages to better reflect the actual compile behavior.

Oversized modules (IR past PERRY_LL_O0_THRESHOLD_BYTES, 6 MB) were compiled
at clang -O0 to dodge the #4880 pathology: a unit dominated by ONE enormous
generated function (a multi-thousand-element data literal lowering to a single
~800k-line function) makes LLVM's -O1+ pipeline super-linear and effectively
never finishes.

But -O0 is the wrong default for the *common* oversized case. A large minified
CLI bundle is tens of thousands of ordinary functions, none individually huge,
and -O0 emits register-pressure-blind, un-folded code: ~30-50% more __text than
-Os for no benefit. That generated code is the overwhelming majority of such a
binary's __text (measured: 253 MB of a 266 MB binary is codegen output, not the
runtime/stdlib), so the opt level on these units dominates final binary size.

Pick the level by average IR bytes-per-function, which cleanly separates the two
regimes (>100x gap): a pathological monolith is megabytes-per-function (the
#4880 400k-element literal is ~9 MB/fn), a real bundle is ~20 KB/fn. Oversized
units at or below PERRY_LL_SIZE_OPT_MAX_FN_BYTES (256 KB) size-optimize at -Os;
denser units keep -O0. A giant-literal unit always has very few, very large
functions and so never reaches the -Os pipeline.

Measured (whole real CLI-bundle IR unit, 15440 functions, ~20 KB/fn):
  -O0 __text 63 MB -> -Os __text 32 MB  (-49%), clang 47s -> 93s.
End-to-end (synthetic 9000-function program): -Os and -O0 produce byte-identical
program output; binary __text 50.7 MB -> 40.7 MB. Small modules (<6 MB IR) are
untouched (still -O3). New escape hatches: PERRY_LL_SIZE_OPT=0/1 forces the
old -O0 / always -Os, PERRY_LL_SIZE_OPT_MAX_FN_BYTES tunes the density cap.
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds function-density-aware optimization selection for oversized LLVM IR units. compile_ll_to_object now counts define functions and passes that count into compile-plan building, which chooses -Os or -O0 for oversized inputs. Tests cover the new oversized behaviors and updated call sites.

Changes

Oversized LLVM IR optimization selection

Layer / File(s) Summary
Size cap and function counting
crates/perry-codegen/src/linker.rs
Adds the new size-optimization settings, a function counter for LLVM IR text, and the extra ll_fn_count input on build_clang_compile_plan.
Oversized plan selection and wiring
crates/perry-codegen/src/linker.rs
Uses byte size plus function density to choose -Os or -O0 for oversized units, and passes the counted functions from compile_ll_to_object into the plan builder.
Oversized optimization tests
crates/perry-codegen/src/linker.rs
Updates compile-plan test calls for the new argument and replaces the old oversized-only -O0 assertion with dense and monolithic oversized cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • PerryTS/perry#5109: Modifies the same oversized LLVM-IR compile-plan path in crates/perry-codegen/src/linker.rs.
  • PerryTS/perry#5407: Uses compile_ll_to_object, the entry point whose compile-plan inputs are updated here.

Poem

I counted define calls by moonlit glow,
and watched the tiny flags decide the flow.
For dense little burrows, -Os I cheer,
for giant monoliths, -O0 is near.
🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: oversized IR units now size-optimize with -Os instead of -O0.
Description check ✅ Passed The description explains the problem, fix, measurements, safety toggles, and tests, though it does not follow the template headings exactly.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/codegen-size-opt-oversized-units

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/linker.rs`:
- Around line 123-125: The function counting logic in count_ll_functions only
matches occurrences preceded by a newline, so it misses a function when the IR
starts with define on the first line. Update count_ll_functions in linker.rs to
count a leading define as well as newline-prefixed ones, so the
bytes-per-function heuristic used for optimization level selection stays
accurate and does not incorrectly fall back to -O0 for borderline units.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f4266e4f-7218-4c72-81a8-d329f33eb1bf

📥 Commits

Reviewing files that changed from the base of the PR and between a8f3342 and 5bf4317.

📒 Files selected for processing (1)
  • crates/perry-codegen/src/linker.rs

Comment on lines +123 to +125
fn count_ll_functions(ll_text: &str) -> usize {
ll_text.match_indices("\ndefine ").count()
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make function counting robust to a first-line define.

Line 124 misses a function if IR starts with define on line 1, which can skew bytes-per-function and select -O0 unnecessarily for borderline oversized units.

Suggested patch
 fn count_ll_functions(ll_text: &str) -> usize {
-    ll_text.match_indices("\ndefine ").count()
+    ll_text
+        .lines()
+        .filter(|line| line.starts_with("define "))
+        .count()
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn count_ll_functions(ll_text: &str) -> usize {
ll_text.match_indices("\ndefine ").count()
}
fn count_ll_functions(ll_text: &str) -> usize {
ll_text
.lines()
.filter(|line| line.starts_with("define "))
.count()
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-codegen/src/linker.rs` around lines 123 - 125, The function
counting logic in count_ll_functions only matches occurrences preceded by a
newline, so it misses a function when the IR starts with define on the first
line. Update count_ll_functions in linker.rs to count a leading define as well
as newline-prefixed ones, so the bytes-per-function heuristic used for
optimization level selection stays accurate and does not incorrectly fall back
to -O0 for borderline units.

@proggeramlug proggeramlug merged commit c9d156b into main Jun 25, 2026
15 checks passed
@proggeramlug proggeramlug deleted the perf/codegen-size-opt-oversized-units branch June 25, 2026 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant