Skip to content

Drop all files to <2k LOC + tighten size gate from 5k to 2k#1246

Merged
proggeramlug merged 9 commits into
mainfrom
chore/split-large-files
May 21, 2026
Merged

Drop all files to <2k LOC + tighten size gate from 5k to 2k#1246
proggeramlug merged 9 commits into
mainfrom
chore/split-large-files

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

Continues the file-size sweep started in PR #1241 (now merged). All 47 production Rust files that were over 2,000 LOC have been split into topical sub-modules. The CI gate (scripts/check_file_size.sh) drops from 5,000 to 2,000 lines.

Outcome: every tracked *.rs file is now under 2,000 lines, except 4 explicitly allowlisted files (each with a one-line rationale in the script).

What changed

Four parallelized waves of splits + one cleanup wave:

Wave Files Range
1 10 3.6k–4.7k
2 9 2.9k–3.5k
3 10 2.6k–3.0k (incl. all 8 perry-ui-* lib.rs)
4 12 2.0k–2.5k
5 1 setup.rs (3.1k) split into per-platform wizards

Recipe (same as PR #1241, refined):

  • Each source foo.rsfoo/mod.rs + topical siblings.
  • mod.rs re-exports with explicit named pub(crate) use foo::{name1, name2, ...}; — never glob, because glob re-exports don't propagate transitively through outer pub(crate) use crate::module::*; consumers (the root cause that broke half the PR Split 9 >5k-LOC source files + add 5k-LOC PR gate #1241 attempts).
  • Sibling files use use super::*; to pick up shared types via the named re-exports above.
  • fnpub fn blanket promotion in siblings; mod.rs's re-export visibility caps the external surface.
  • Trailing orphan /// / #[...] lines stripped after every sed extraction (the boundary often chops the next fn's doc comment).

ABI integrity: every FFI #[no_mangle] pub extern "C" fn in perry-runtime / perry-ui-* preserved exactly. Per-crate #[no_mangle] counts verified before vs after for json (16→16), builtins (78→78), array (77→77), arena (4→4), string (61→61), promise (37→37), value (55→55), closure (58→58), buffer (82→82), url (55→55), object/* (123→123), perry-ui-macos (363→363), perry-ui-ios (377→377), perry-ui-android (364→364), perry-ui-visionos (362→362), perry-ui-tvos (365→365), perry-ui-windows (366→366), perry-ui-gtk4 (358→358).

CI gate

scripts/check_file_size.sh now:

  • Scans only *.rs (non-Rust source like JS runtime templates, HTML examples, Kotlin templates, JSON fixtures intentionally not policed).
  • Threshold = 2,000 lines (overridable via PERRY_FILE_SIZE_THRESHOLD).
  • Allowlist (each with one-line rationale):
    • crates/perry-runtime/src/gc/tests.rs — left behind by the GC roadmap: make minor GC structurally cheap #1090 GC checkpoint split; owner-tracked.
    • crates/perry-codegen-arkts/src/tests.rs — ArkTS golden-output test fixtures (top-down test scaffolding, not production code).
    • crates/perry-api-manifest/src/entries.rs — generated-feel manifest table (length reflects API breadth, not complexity).
    • crates/perry/src/commands/compile.rs — the deeply-coupled par_iter codegen closure inside run_with_parse_cache (~1,800 LOC, ~30 captured locals) needs extraction into a context-struct helper. High-risk surgery deferred to a follow-up PR. The other 16 sub-modules in compile/ were already split.

Test plan

  • cargo fmt --all -- --check — clean
  • cargo check --workspace (excl. macOS/iOS/gtk4/jsruntime UI crates) — clean
  • cargo test --workspace (same exclusions) — 0 failures (all 600+ test results pass after fixing 1 pre-existing test: dispatch_drift now recurses the new emit/ directory layout)
  • ./scripts/check_file_size.shOK: no Rust source files exceed 2000 lines
  • PR CI: lint (incl. file-size gate at 2k) + cargo-test + api-docs-drift + perry-ui-* cross-compile matrix all green
  • Follow-up PR: extract compile.rs::run_with_parse_cache's par_iter codegen closure to drop compile.rs from the allowlist

Why

Big single-file modules slow IDE + cargo-check incrementality and hide regressions in code review. The gate prevents recurrence — adding to an existing 1,990-LOC file forces a split decision instead of letting it drift past 5k.

… 5k-LOC PR gate

Splits 9 top-of-list >5k-LOC source files into topical sub-modules
(8 fully under 2k LOC, 1 down ~13% but still over) and adds a CI gate
on the `lint` job that fails any PR introducing a >5,000-line tracked
source file. Threshold starts at 5k; the eventual target is 2k and
tightens one file at a time.

## Splits (max single-file LOC per resulting directory)

| Original                                          | LOC    | After split (max)            |
|---------------------------------------------------|-------:|-----------------------------:|
| crates/perry-codegen/src/expr/mod.rs              | 13,729 | 1,352 (mod.rs)               |
| crates/perry/src/commands/compile.rs              |  8,679 | 4,725 *                      |
| crates/perry-runtime/src/object/mod.rs            |  7,790 | 1,948 (field_get_set.rs)     |
| crates/perry-codegen/src/codegen.rs               |  7,029 | 1,946 (codegen/mod.rs)       |
| crates/perry-hir/src/lower/expr_call/mod.rs       |  6,917 | 1,276 (module_static.rs)     |
| crates/perry-codegen/src/collectors.rs            |  6,428 | 1,401 (escape_news.rs)       |
| crates/perry-transform/src/inline.rs              |  5,818 | 1,773 (call_inliner.rs)      |
| crates/perry-codegen/src/lower_call/native_table  |  5,875 |   720 (databases.rs)         |
| crates/perry-hir/src/lower_decl.rs                |  5,557 | 1,724 (body_stmt.rs)         |

`*` compile.rs reduced from 8,679 → 5,395 (prior PR) → 4,725 here;
deeper splitting requires extracting the par_iter codegen closure
(~1,800 LOC, ~30 captured locals) into a context-struct helper, which
is mechanical but high-risk surgery — deferred to a follow-up.

`crates/perry-runtime/src/gc.rs` (13,778 LOC) is allowlisted in the
gate — owner-tracked refactor in flight, re-evaluate when that lands.

## Approach

For each file: extract topical function groups into sibling .rs files
under a new directory (`foo.rs` → `foo/mod.rs` + `foo/{group_a,...}.rs`).
mod.rs is a re-export hub with **explicit named re-exports** —
`pub(crate) use foo::*;` does NOT transitively expose names through
an outer `pub(crate) use crate::module::*;` glob, so external callers
would silently lose visibility (this was the root cause of the broken
sibling agents' work earlier; fixed by enumerating every `pub fn` by
name in mod.rs). Sibling files use `use super::*;` to access each
other's items via mod.rs's named re-exports.

Cross-sibling shared types/constants (e.g. `MAX_SCALAR_ARRAY_LEN`,
`ExactReceiverFact`, `NativeArgKind`) are `pub(crate)` or `pub(super)`
in their defining sibling and re-exported through mod.rs.

The native_table dispatch table (5,875 LOC of `pub(super) const
NATIVE_MODULE_TABLE: &[NativeModSig] = &[ ... ]`) was switched to
`pub(super) static NATIVE_MODULE_TABLE: LazyLock<Vec<NativeModSig>>`
that concatenates per-family `*_ROWS` slices from 13 sub-modules in
declaration order. All consumers used `.iter()` only, so the
const→static change is source-compatible and `iter_native_module_table`
still yields the same `&'static NativeModSig` projections in the same
order (preserves the `perry-api-manifest` drift contract).

## CI gate

New `scripts/check_file_size.sh` runs on every PR as part of the
existing `lint` job. Excludes generated artifacts (Cargo.lock, .po
translations, generated API docs, binary fixtures, CHANGELOG.md) and
honors a one-line allowlist (`gc.rs` for now). On violation, prints
the offending files + an inline pointer to this commit's recipe.

Threshold is configurable via `PERRY_FILE_SIZE_THRESHOLD` env var;
default 5000. The eventual target is 2000 — to tighten, ship one or
two file splits per PR + drop the threshold in the same commit.

## Validation

- `cargo check --workspace` (excl. macOS/iOS/gtk4/jsruntime UI crates):
  clean, 0 errors.
- `cargo fmt --all -- --check`: clean.
- `cargo test --workspace` (same exclusions): all 100+ test results
  green, 0 failures.
- `./scripts/check_file_size.sh` locally: passes.

No behavior changes — purely structural code movement (with the one
exception of NATIVE_MODULE_TABLE's const→static documented above,
which is source-compatible).
# Conflicts:
#	crates/perry-codegen/src/codegen.rs
#	crates/perry-codegen/src/expr/mod.rs
#	crates/perry-runtime/src/object/field_get_set.rs
Files split (max single-file LOC after split):
- perry-codegen-wasm/src/emit/expr.rs (4688) → 15 siblings, max 941
- perry-codegen/src/lower_call/mod.rs (4303) → 10 siblings, max 1578
- perry-runtime/src/json.rs (4207) → 8 siblings, max 1157
- perry-hir/src/stable_hash.rs (4110) → 10 siblings, max 528
- perry-runtime/src/builtins.rs (4101) → 6 siblings, max 1316
- perry-runtime/src/array.rs (3929) → 17 siblings, max 553
- perry-hir/src/monomorph.rs (3883) → 13 siblings, max 904
- perry/src/commands/publish.rs (3805) → 9 siblings, max 1795
- perry-runtime/src/arena.rs (3727) → 9 siblings, max 909
- perry-codegen-js/src/emit.rs (3657) → 8 siblings, max 1159

All FFI #[no_mangle] symbol counts preserved. Workspace cargo check clean.
Each agent worked in isolation; merged + de-duplicated leftover originals.
Files split (max single-file LOC after split):
- perry-transform/src/generator.rs (3508) → 8 siblings, max 922
- perry-hir/src/js_transform.rs (3480) → 4 siblings, max 1267
- perry-jsruntime/src/modules.rs (3475) → 4 siblings, max 1413
- perry/src/commands/run.rs (3333) → 8 siblings, max 827
- perry-runtime/src/promise.rs (3315) → 6 siblings, max 888
- perry-runtime/src/string.rs (3132) → 14 siblings, max 610
- perry-hir/src/ir.rs (3039) → 8 siblings, max 1938
- perry-codegen/src/runtime_decls.rs (2977) → 5 siblings, max 1474
- perry-runtime/src/value.rs (2871) → 13 siblings, max 453

FFI #[no_mangle] counts preserved (promise 37→37, string 61→61, value 55→55).
Workspace cargo check clean. setup.rs deferred — agent attempt corrupted, retrying.

Two agents (promise + string) lost their output to a parallel-worktree race; re-dispatched
the second time around with stronger 'git add -A immediately' instructions.
perry-ui-macos/ios/android/visionos/tvos/windows lib.rs (~2.6-3k each) split topically by widget area into src/ffi/* or src/lib_ffi/* directories. Every #[no_mangle] perry_ui_* symbol preserved (verified by name-set diff).

Plus: perry-runtime/closure.rs (2842) → 7 siblings; perry-hir/walker.rs (2839) → 3 siblings; perry-dispatch/lib.rs (2731) → 8 topical tables; perry-hir/lower/mod.rs (2713) → 7 new siblings (mod.rs down to 102 LOC).

Workspace cargo check clean. perry-ui-* host-incompatible crates verified by file structure only — actual cross-compile will run in CI matrix steps.
buffer.rs/url.rs/closure (FFI preserved); destructuring/native.rs/interop/stmt/bridge/deforest topical; compile/{link,cjs_wrap}.rs trimmed; field_get_set extracts polymorphic_index; perry-ui-gtk4/lib.rs into ffi/. perry-codegen-wasm/emit/mod.rs into compile.rs + 5 siblings.
…d to 2k

Final wave:

- crates/perry/src/commands/setup.rs (3145) split into 10 per-platform
  wizards (windows/android/ios/macos/visionos/watchos/tvos/harmonyos)
  plus common_apple (JWT + API creds shared by ios+macos) and helpers
  (file-path + perry.toml updaters shared across wizards).
- crates/perry-dispatch/tests/dispatch_drift.rs: recurse the new
  emit/ directory layouts in both perry-codegen-js and perry-codegen-wasm
  so the drift test still sees every dispatch wiring.
- perry-runtime/promise: LAST_ASYNC_STEP_THUNKS bumped to pub(super) +
  imported explicitly from scanners.rs so the cross-sibling thread-local
  resolves cleanly.
- perry-runtime/array/tests.rs: added missing 'use std::ptr;'.
- perry-hir/monomorph/mangle.rs: bumped mangle_type to pub(crate) +
  added to the mod.rs explicit named re-exports list so its tests can
  resolve it.

scripts/check_file_size.sh:
- threshold dropped from 5000 to 2000 (the original target)
- scope narrowed to *.rs only (JS runtime templates, HTML examples,
  Kotlin templates, JSON fixtures, dist bundles intentionally not
  policed)
- allowlist extended with three deferred files, each with a one-line
  rationale: gc/tests.rs, perry-codegen-arkts/src/tests.rs,
  perry-api-manifest/src/entries.rs, perry/src/commands/compile.rs
  (par_iter codegen closure inside run_with_parse_cache — ~1.8k LOC
  with ~30 captured locals; high-risk extraction deferred to a
  follow-up PR).

Validation:
- cargo fmt --all -- --check: clean
- cargo check --workspace (excl. macOS/iOS/gtk4/jsruntime UI crates): clean
- cargo test --workspace (same exclusions): 0 failures
- ./scripts/check_file_size.sh: OK at 2000 lines
# Conflicts:
#	crates/perry-codegen/src/lower_call/native_table/node_misc.rs
#	crates/perry-codegen/src/runtime_decls.rs
#	crates/perry-runtime/src/object/field_get_set.rs
#	crates/perry-runtime/src/object/mod.rs
#	scripts/check_file_size.sh
# Conflicts:
#	crates/perry-runtime/src/json.rs
@proggeramlug proggeramlug merged commit 6e6e2be into main May 21, 2026
8 of 9 checks passed
@proggeramlug proggeramlug deleted the chore/split-large-files branch May 21, 2026 10:45
proggeramlug added a commit that referenced this pull request May 21, 2026
PR #1246 ("Drop all files to <2k LOC") moved every platform's
`pub extern "C" fn perry_ui_*` exports out of `lib.rs` into
`src/ffi/*.rs` or `src/lib_ffi/*.rs`, but only updated the
`dispatch_drift` scanner. `styling_matrix_drift` still reads `lib.rs`
only and now sees just `pub mod` lines, so every Wired matrix row
falsely reports drift on Linux/Windows/Android post-merge (the test
landed red on main as part of #1246's admin-bypass merge).

Mirror the recursive walk from
`crates/perry-dispatch/tests/dispatch_drift.rs::read_emit_dir` — scan
every `.rs` file under `crates/perry-ui-<plat>/src/` and union the
exports. Verified by applying this scanner on top of `origin/main`'s
post-split tree: `cargo test -p perry-ui --test styling_matrix_drift`
now passes.
proggeramlug added a commit that referenced this pull request May 21, 2026
Same root cause as the previous styling_matrix_drift commit: PR #1246
split each platform's `pub extern "C" fn perry_ui_*` exports out of
`lib.rs` into `src/ffi/*.rs` / `src/lib_ffi/*.rs`, but ffi_parity uses
`include_str!` against lib.rs and so sees only `pub mod` declarations
after the split.

Switch the per-platform macro from compile-time `include_str!` of a
single lib.rs to a runtime walk of the crate's `src/` tree, reading
every `.rs` file and concatenating before the existing symbol scan.
Mirrors the recursive walk in `styling_matrix.rs` and `dispatch_drift.rs`.

Verified against `origin/main`'s post-split tree (temp worktree, patch
applied on top): `cargo test -p perry-ui-test --test ffi_parity` — all
6 platform tests pass.
proggeramlug added a commit that referenced this pull request May 21, 2026
`for…of` over a `String.split()` result and object-literal-return +
destructure both silently returned wrong values on the iOS *device*
target (no error, no crash) because three categories of pointer
guards rejected legitimate libsystem_malloc pointers:

1. Cfg-gated 2 TB `HEAP_MIN` on macOS that also covered iOS / tvOS /
   watchOS / visionOS — `clean_arr_ptr`, `js_array_grow` forwarding,
   `js_value_length_f64` (POINTER_TAG + raw-bitcast paths), and
   `is_valid_obj_ptr`. iOS device libsystem_malloc lands in the same
   low GB range as Android/Linux, so every real pointer was null-ed.
   `js_value_length_f64` returning 0 is what made `arr.length` collapse
   to 0 in the failing `for…of`, and made `segments.length === 0`
   wrongly true after the destructure.

2. Unconditional 16 MB pointer floor (`< 0x1000000`) in five object
   field-access helpers — `js_object_get_field` (×4 variants),
   `js_object_get_field_by_name`, `js_object_set_field_by_name`,
   `js_object_has_own`, `ensure_key_in_keys_array`,
   `own_key_present`, `js_object_get_own_field_or_undef`. iOS device
   heap pointers below 16 MB hit this gate and silently routed through
   the small-handle dispatch path (or returned undefined), which is
   why the destructured `segments` came out empty.

3. Same as (2) for the iOS family on macOS's `is_valid_obj_ptr` — the
   downstream `GcHeader.obj_type` check is the real liveness guard,
   so lowering these thresholds to 64 KB (matches the bar already used
   in `js_object_get_field_ic_miss`) does not weaken correctness.

The macOS host, iOS simulator (mimalloc on the host), Linux, Windows,
and Android paths are unchanged in behavior: the simulator's
allocations still land above 2 TB and pass the lowered guard cleanly,
and the GcHeader/obj_type validation downstream still rejects bogus
pointers.

Verified:
- cargo build --release runtime+stdlib+perry: clean
- cargo build --release --target aarch64-apple-ios runtime+stdlib: clean
- cargo test --release -p perry-runtime --lib: 451 passed
- cargo test --release -p perry-stdlib --lib: 74 passed
- cargo fmt --all -- --check: clean
- Host repro from the issue compiles + prints the expected result.
- Pre-existing perry-jsruntime test compile failure is unrelated
  (private `resolve_module_path` from #1246's file split).

Closes #1136.
Also addresses the same root cause for #1129; if #1129 lands first
the overlapping field_get_set / is_valid_obj_ptr edits are no-ops.
proggeramlug added a commit that referenced this pull request May 21, 2026
…path access

PR #1246 ("Drop all files to <2k LOC + tighten size gate") split
`crates/perry-ui-gtk4/src/lib.rs` and moved `resolve_asset_path` into
`ffi::layout`, leaving the function private. `widgets/image.rs:33` still
calls `crate::resolve_asset_path(path)` — root-level — so the gtk4 build
on the linux-aarch64-gnu release target broke with:

    error[E0425]: cannot find function `resolve_asset_path` in the crate root
      --> crates/perry-ui-gtk4/src/widgets/image.rs:33:27
       |
    33 |     let resolved = crate::resolve_asset_path(path);
       |                           ^^^^^^^^^^^^^^^^^^ not found in the crate root
       |
    note: function `crate::ffi::layout::resolve_asset_path` exists but is inaccessible

Local macOS builds didn't hit this because perry-ui-gtk4 is gated off on
macOS workspace builds (gtk4 system-libs only ship on Linux). It surfaced
on the v0.5.1020 release-packages run (#26249481496) when the
`linux-aarch64-gnu` matrix entry tried to link perry-ui-gtk4 and failed
the whole release pipeline (winget / homebrew / npm / apt / apt-repo all
got skipped via fail-fast).

Two-line fix:

- `ffi::layout::resolve_asset_path`: `fn` → `pub(crate) fn` so neighbour
  modules can reuse the exact resolution rule without duplicating it.
- `widgets::image::create_file`: `crate::resolve_asset_path(path)` →
  `crate::ffi::layout::resolve_asset_path(path)` to point at the new
  module path.

Refs #1246. Unblocks v0.5.1020 release-packages re-tag.
proggeramlug added a commit that referenced this pull request May 22, 2026
…path access (#1298)

PR #1246 ("Drop all files to <2k LOC + tighten size gate") split
`crates/perry-ui-gtk4/src/lib.rs` and moved `resolve_asset_path` into
`ffi::layout`, leaving the function private. `widgets/image.rs:33` still
calls `crate::resolve_asset_path(path)` — root-level — so the gtk4 build
on the linux-aarch64-gnu release target broke with:

    error[E0425]: cannot find function `resolve_asset_path` in the crate root
      --> crates/perry-ui-gtk4/src/widgets/image.rs:33:27
       |
    33 |     let resolved = crate::resolve_asset_path(path);
       |                           ^^^^^^^^^^^^^^^^^^ not found in the crate root
       |
    note: function `crate::ffi::layout::resolve_asset_path` exists but is inaccessible

Local macOS builds didn't hit this because perry-ui-gtk4 is gated off on
macOS workspace builds (gtk4 system-libs only ship on Linux). It surfaced
on the v0.5.1020 release-packages run (#26249481496) when the
`linux-aarch64-gnu` matrix entry tried to link perry-ui-gtk4 and failed
the whole release pipeline (winget / homebrew / npm / apt / apt-repo all
got skipped via fail-fast).

Two-line fix:

- `ffi::layout::resolve_asset_path`: `fn` → `pub(crate) fn` so neighbour
  modules can reuse the exact resolution rule without duplicating it.
- `widgets::image::create_file`: `crate::resolve_asset_path(path)` →
  `crate::ffi::layout::resolve_asset_path(path)` to point at the new
  module path.

Refs #1246. Unblocks v0.5.1020 release-packages re-tag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant