perf(codegen): size-optimize oversized IR units at -Os instead of -O0 by proggeramlug · Pull Request #5674 · PerryTS/perry

proggeramlug · 2026-06-25T04:49:48Z

Problem

A large minified CLI bundle compiles to a ~320 MB native binary. Attribution shows the bloat is perry's own generated code, not the Rust runtime/stdlib/ext libs:

component	`__text`
final binary	266 MB
`cli_ts.o` (perry codegen output)	253 MB

So ~95% of the binary's code is the generated .o. The runtime + stdlib are a small fraction; opt-level/panic/feature-gating levers on them can't move the needle. The lever is how perry compiles its generated IR.

Root cause

Oversized modules (IR past PERRY_LL_O0_THRESHOLD_BYTES, 6 MB) are compiled at clang -O0. That exists to dodge #4880: a unit dominated by ONE enormous generated function (a multi-thousand-element data literal → a single ~800k-line function) makes LLVM's -O1+ pipeline super-linear and effectively never finishes (verified: clang times out >10 min at any optimizing level on such a function).

But -O0 is the wrong default for the common oversized case. A large bundle is tens of thousands of ordinary functions, none individually huge, and -O0 emits register-pressure-blind, un-folded code — ~30-50% more __text than -Os for no benefit.

Fix

Pick the opt level by average IR bytes-per-function, which cleanly separates the two regimes (>100x gap):

pathological monolith: megabytes-per-function (Compile-time blowup on wide object literals (2100 keys ≈ 2 min, 3000 keys > 7 min) #4880 literal ~9 MB/fn) -> keep -O0
real bundle: ~20 KB/fn -> -Os

Oversized units at or below PERRY_LL_SIZE_OPT_MAX_FN_BYTES (256 KB/fn) size-optimize at -Os; denser units keep -O0. A giant-literal unit always has very few, very large functions, so it never reaches the -Os pipeline. Modules under 6 MB IR are untouched (still -O3).

Measurements

Whole real-bundle IR unit (15440 functions, ~20 KB/fn): -O0 __text 63 MB -> -Os 32 MB (-49%); clang 47s -> 93s.
4226-function unit: 16 MB -> 11 MB (-31%).
End-to-end synthetic 9000-function program: -Os and -O0 produce byte-identical program output; binary __text 50.7 MB -> 40.7 MB.
Small program (<6 MB IR): unchanged, still -O3, output matches node --experimental-strip-types.

-Oz matches -Os on size but is 2-3x slower, so -Os is the sweet spot.

Safety / escape hatches

PERRY_LL_SIZE_OPT=0 -> force old -O0; =1 -> force -Os regardless of density.
PERRY_LL_SIZE_OPT_MAX_FN_BYTES tunes the density cap.
Does not touch LTO, so no interaction with fix(compile): auto-optimize archives-fresh fast-path drops non-tokio routed ext staticlibs #5668/thin-LTO strips #[no_mangle] C symbols from prebuilt ext staticlib archives (node:zlib link fails) #5669.

Tests

New unit tests cover both branches. Full perry-codegen lib suite green (129 tests).

Summary by CodeRabbit

New Features
- Large LLVM-IR builds now choose a smarter optimization level based on code density, improving the balance between build speed and output size.
- Oversized inputs may now use size-focused optimization instead of always falling back to the least optimized path.
Bug Fixes
- Improved handling of very large, single-function-style inputs so they still use the most conservative compile option when appropriate.
- Updated optimization messages to better reflect the actual compile behavior.

Oversized modules (IR past PERRY_LL_O0_THRESHOLD_BYTES, 6 MB) were compiled at clang -O0 to dodge the #4880 pathology: a unit dominated by ONE enormous generated function (a multi-thousand-element data literal lowering to a single ~800k-line function) makes LLVM's -O1+ pipeline super-linear and effectively never finishes. But -O0 is the wrong default for the *common* oversized case. A large minified CLI bundle is tens of thousands of ordinary functions, none individually huge, and -O0 emits register-pressure-blind, un-folded code: ~30-50% more __text than -Os for no benefit. That generated code is the overwhelming majority of such a binary's __text (measured: 253 MB of a 266 MB binary is codegen output, not the runtime/stdlib), so the opt level on these units dominates final binary size. Pick the level by average IR bytes-per-function, which cleanly separates the two regimes (>100x gap): a pathological monolith is megabytes-per-function (the #4880 400k-element literal is ~9 MB/fn), a real bundle is ~20 KB/fn. Oversized units at or below PERRY_LL_SIZE_OPT_MAX_FN_BYTES (256 KB) size-optimize at -Os; denser units keep -O0. A giant-literal unit always has very few, very large functions and so never reaches the -Os pipeline. Measured (whole real CLI-bundle IR unit, 15440 functions, ~20 KB/fn): -O0 __text 63 MB -> -Os __text 32 MB (-49%), clang 47s -> 93s. End-to-end (synthetic 9000-function program): -Os and -O0 produce byte-identical program output; binary __text 50.7 MB -> 40.7 MB. Small modules (<6 MB IR) are untouched (still -O3). New escape hatches: PERRY_LL_SIZE_OPT=0/1 forces the old -O0 / always -Os, PERRY_LL_SIZE_OPT_MAX_FN_BYTES tunes the density cap.

coderabbitai · 2026-06-25T04:50:05Z

📝 Walkthrough

Walkthrough

Adds function-density-aware optimization selection for oversized LLVM IR units. compile_ll_to_object now counts define functions and passes that count into compile-plan building, which chooses -Os or -O0 for oversized inputs. Tests cover the new oversized behaviors and updated call sites.

Changes

Oversized LLVM IR optimization selection

Layer / File(s)	Summary
Size cap and function counting `crates/perry-codegen/src/linker.rs`	Adds the new size-optimization settings, a function counter for LLVM IR text, and the extra `ll_fn_count` input on `build_clang_compile_plan`.
Oversized plan selection and wiring `crates/perry-codegen/src/linker.rs`	Uses byte size plus function density to choose `-Os` or `-O0` for oversized units, and passes the counted functions from `compile_ll_to_object` into the plan builder.
Oversized optimization tests `crates/perry-codegen/src/linker.rs`	Updates compile-plan test calls for the new argument and replaces the old oversized-only `-O0` assertion with dense and monolithic oversized cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

PerryTS/perry#5109: Modifies the same oversized LLVM-IR compile-plan path in crates/perry-codegen/src/linker.rs.
PerryTS/perry#5407: Uses compile_ll_to_object, the entry point whose compile-plan inputs are updated here.

Poem

I counted define calls by moonlit glow,
and watched the tiny flags decide the flow.
For dense little burrows, -Os I cheer,
for giant monoliths, -O0 is near.
🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: oversized IR units now size-optimize with -Os instead of -O0.
Description check	✅ Passed	The description explains the problem, fix, measurements, safety toggles, and tests, though it does not follow the template headings exactly.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/codegen-size-opt-oversized-units

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-codegen/src/linker.rs`:
- Around line 123-125: The function counting logic in count_ll_functions only
matches occurrences preceded by a newline, so it misses a function when the IR
starts with define on the first line. Update count_ll_functions in linker.rs to
count a leading define as well as newline-prefixed ones, so the
bytes-per-function heuristic used for optimization level selection stays
accurate and does not incorrectly fall back to -O0 for borderline units.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f4266e4f-7218-4c72-81a8-d329f33eb1bf

📥 Commits

Reviewing files that changed from the base of the PR and between a8f3342 and 5bf4317.

📒 Files selected for processing (1)

crates/perry-codegen/src/linker.rs

coderabbitai · 2026-06-25T04:52:57Z

+fn count_ll_functions(ll_text: &str) -> usize {
+    ll_text.match_indices("\ndefine ").count()
+}


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make function counting robust to a first-line define.

Line 124 misses a function if IR starts with define on line 1, which can skew bytes-per-function and select -O0 unnecessarily for borderline oversized units.

Suggested patch

fn count_ll_functions(ll_text: &str) -> usize { - ll_text.match_indices("\ndefine ").count() + ll_text + .lines() + .filter(|line| line.starts_with("define ")) + .count() }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn count_ll_functions(ll_text: &str) -> usize {

ll_text.match_indices("\ndefine ").count()

}

fn count_ll_functions(ll_text: &str) -> usize {

ll_text

.lines()

.filter(|line| line.starts_with("define "))

.count()

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/perry-codegen/src/linker.rs` around lines 123 - 125, The function counting logic in count_ll_functions only matches occurrences preceded by a newline, so it misses a function when the IR starts with define on the first line. Update count_ll_functions in linker.rs to count a leading define as well as newline-prefixed ones, so the bytes-per-function heuristic used for optimization level selection stays accurate and does not incorrectly fall back to -O0 for borderline units.

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

proggeramlug merged commit c9d156b into main Jun 25, 2026
15 checks passed

proggeramlug deleted the perf/codegen-size-opt-oversized-units branch June 25, 2026 05:32

This was referenced Jun 25, 2026

perf(codegen): full-outline the generic property-GET diamond for oversized modules #5675

Merged

perf(codegen): full-outline the instance method-dispatch tower for oversized modules #5681

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(codegen): size-optimize oversized IR units at -Os instead of -O0#5674

perf(codegen): size-optimize oversized IR units at -Os instead of -O0#5674
proggeramlug merged 1 commit into
mainfrom
perf/codegen-size-opt-oversized-units

proggeramlug commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

proggeramlug commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Measurements

Safety / escape hatches

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

proggeramlug commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading