Skip to content

Split 9 >5k-LOC source files + add 5k-LOC PR gate#1241

Merged
proggeramlug merged 2 commits into
mainfrom
chore/split-large-files
May 21, 2026
Merged

Split 9 >5k-LOC source files + add 5k-LOC PR gate#1241
proggeramlug merged 2 commits into
mainfrom
chore/split-large-files

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

  • Splits 9 large source files (5k-13k LOC each) into topical sub-modules. 8 are now fully under 2k LOC; compile.rs is down ~13% (5,395 → 4,725) but a deeper split needs extracting the par_iter codegen closure (~1,800 LOC, ~30 captured locals) as a context-struct helper — deferred to a follow-up.
  • Adds scripts/check_file_size.sh and wires it into the lint job so any PR introducing a file over 5,000 lines fails CI. Threshold is configurable; eventual target is 2k (tighten one file at a time).
  • crates/perry-runtime/src/gc.rs (13,778 LOC) is allowlisted in the script — separate refactor in flight by another contributor.

Resulting max single-file LOC per split

Original LOC After split (max single file)
crates/perry-codegen/src/expr/mod.rs 13,729 1,352
crates/perry/src/commands/compile.rs 8,679 4,725 *
crates/perry-runtime/src/object/mod.rs 7,790 1,948
crates/perry-codegen/src/codegen.rs 7,029 1,946
crates/perry-hir/src/lower/expr_call/mod.rs 6,917 1,276
crates/perry-codegen/src/collectors.rs 6,428 1,401
crates/perry-transform/src/inline.rs 5,818 1,773
crates/perry-codegen/src/lower_call/native_table 5,875 720
crates/perry-hir/src/lower_decl.rs 5,557 1,724

* further compile.rs reduction deferred — see body for rationale.

Recipe (applied uniformly)

  • Each file split into topical sibling .rs files under a new directory (foo.rsfoo/mod.rs + foo/<group>.rs).
  • mod.rs re-exports each public item by name: pub(crate) use foo::*; does NOT transitively expose names through an outer pub(crate) use crate::module::*; glob — siblings end up unable to see each other's items. Enumerate every pub fn explicitly in mod.rs.
  • native_table (5.8k-row dispatch table) switched from const &[NativeModSig] to static LazyLock<Vec<NativeModSig>> assembled from 13 sibling *_ROWS slices in declaration order. All consumers used .iter() only — source-compatible. Preserves the perry-api-manifest drift contract.

Test plan

  • cargo check --workspace (excluding macOS/iOS/gtk4/jsruntime UI crates) — clean
  • cargo fmt --all -- --check — clean
  • cargo test --workspace (same exclusions) — all green, 0 failures
  • ./scripts/check_file_size.sh locally — passes
  • PR CI: lint (incl. new size gate) + cargo-test + api-docs-drift all green
  • If green, eligible to merge; future PRs can tighten the threshold below 5k as more files get split

… 5k-LOC PR gate

Splits 9 top-of-list >5k-LOC source files into topical sub-modules
(8 fully under 2k LOC, 1 down ~13% but still over) and adds a CI gate
on the `lint` job that fails any PR introducing a >5,000-line tracked
source file. Threshold starts at 5k; the eventual target is 2k and
tightens one file at a time.

## Splits (max single-file LOC per resulting directory)

| Original                                          | LOC    | After split (max)            |
|---------------------------------------------------|-------:|-----------------------------:|
| crates/perry-codegen/src/expr/mod.rs              | 13,729 | 1,352 (mod.rs)               |
| crates/perry/src/commands/compile.rs              |  8,679 | 4,725 *                      |
| crates/perry-runtime/src/object/mod.rs            |  7,790 | 1,948 (field_get_set.rs)     |
| crates/perry-codegen/src/codegen.rs               |  7,029 | 1,946 (codegen/mod.rs)       |
| crates/perry-hir/src/lower/expr_call/mod.rs       |  6,917 | 1,276 (module_static.rs)     |
| crates/perry-codegen/src/collectors.rs            |  6,428 | 1,401 (escape_news.rs)       |
| crates/perry-transform/src/inline.rs              |  5,818 | 1,773 (call_inliner.rs)      |
| crates/perry-codegen/src/lower_call/native_table  |  5,875 |   720 (databases.rs)         |
| crates/perry-hir/src/lower_decl.rs                |  5,557 | 1,724 (body_stmt.rs)         |

`*` compile.rs reduced from 8,679 → 5,395 (prior PR) → 4,725 here;
deeper splitting requires extracting the par_iter codegen closure
(~1,800 LOC, ~30 captured locals) into a context-struct helper, which
is mechanical but high-risk surgery — deferred to a follow-up.

`crates/perry-runtime/src/gc.rs` (13,778 LOC) is allowlisted in the
gate — owner-tracked refactor in flight, re-evaluate when that lands.

## Approach

For each file: extract topical function groups into sibling .rs files
under a new directory (`foo.rs` → `foo/mod.rs` + `foo/{group_a,...}.rs`).
mod.rs is a re-export hub with **explicit named re-exports** —
`pub(crate) use foo::*;` does NOT transitively expose names through
an outer `pub(crate) use crate::module::*;` glob, so external callers
would silently lose visibility (this was the root cause of the broken
sibling agents' work earlier; fixed by enumerating every `pub fn` by
name in mod.rs). Sibling files use `use super::*;` to access each
other's items via mod.rs's named re-exports.

Cross-sibling shared types/constants (e.g. `MAX_SCALAR_ARRAY_LEN`,
`ExactReceiverFact`, `NativeArgKind`) are `pub(crate)` or `pub(super)`
in their defining sibling and re-exported through mod.rs.

The native_table dispatch table (5,875 LOC of `pub(super) const
NATIVE_MODULE_TABLE: &[NativeModSig] = &[ ... ]`) was switched to
`pub(super) static NATIVE_MODULE_TABLE: LazyLock<Vec<NativeModSig>>`
that concatenates per-family `*_ROWS` slices from 13 sub-modules in
declaration order. All consumers used `.iter()` only, so the
const→static change is source-compatible and `iter_native_module_table`
still yields the same `&'static NativeModSig` projections in the same
order (preserves the `perry-api-manifest` drift contract).

## CI gate

New `scripts/check_file_size.sh` runs on every PR as part of the
existing `lint` job. Excludes generated artifacts (Cargo.lock, .po
translations, generated API docs, binary fixtures, CHANGELOG.md) and
honors a one-line allowlist (`gc.rs` for now). On violation, prints
the offending files + an inline pointer to this commit's recipe.

Threshold is configurable via `PERRY_FILE_SIZE_THRESHOLD` env var;
default 5000. The eventual target is 2000 — to tighten, ship one or
two file splits per PR + drop the threshold in the same commit.

## Validation

- `cargo check --workspace` (excl. macOS/iOS/gtk4/jsruntime UI crates):
  clean, 0 errors.
- `cargo fmt --all -- --check`: clean.
- `cargo test --workspace` (same exclusions): all 100+ test results
  green, 0 failures.
- `./scripts/check_file_size.sh` locally: passes.

No behavior changes — purely structural code movement (with the one
exception of NATIVE_MODULE_TABLE's const→static documented above,
which is source-compatible).
# Conflicts:
#	crates/perry-codegen/src/codegen.rs
#	crates/perry-codegen/src/expr/mod.rs
#	crates/perry-runtime/src/object/field_get_set.rs
@proggeramlug proggeramlug merged commit 4d3a0b0 into main May 21, 2026
9 checks passed
@proggeramlug proggeramlug deleted the chore/split-large-files branch May 21, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant