Split 9 >5k-LOC source files + add 5k-LOC PR gate#1241
Merged
Conversation
… 5k-LOC PR gate
Splits 9 top-of-list >5k-LOC source files into topical sub-modules
(8 fully under 2k LOC, 1 down ~13% but still over) and adds a CI gate
on the `lint` job that fails any PR introducing a >5,000-line tracked
source file. Threshold starts at 5k; the eventual target is 2k and
tightens one file at a time.
## Splits (max single-file LOC per resulting directory)
| Original | LOC | After split (max) |
|---------------------------------------------------|-------:|-----------------------------:|
| crates/perry-codegen/src/expr/mod.rs | 13,729 | 1,352 (mod.rs) |
| crates/perry/src/commands/compile.rs | 8,679 | 4,725 * |
| crates/perry-runtime/src/object/mod.rs | 7,790 | 1,948 (field_get_set.rs) |
| crates/perry-codegen/src/codegen.rs | 7,029 | 1,946 (codegen/mod.rs) |
| crates/perry-hir/src/lower/expr_call/mod.rs | 6,917 | 1,276 (module_static.rs) |
| crates/perry-codegen/src/collectors.rs | 6,428 | 1,401 (escape_news.rs) |
| crates/perry-transform/src/inline.rs | 5,818 | 1,773 (call_inliner.rs) |
| crates/perry-codegen/src/lower_call/native_table | 5,875 | 720 (databases.rs) |
| crates/perry-hir/src/lower_decl.rs | 5,557 | 1,724 (body_stmt.rs) |
`*` compile.rs reduced from 8,679 → 5,395 (prior PR) → 4,725 here;
deeper splitting requires extracting the par_iter codegen closure
(~1,800 LOC, ~30 captured locals) into a context-struct helper, which
is mechanical but high-risk surgery — deferred to a follow-up.
`crates/perry-runtime/src/gc.rs` (13,778 LOC) is allowlisted in the
gate — owner-tracked refactor in flight, re-evaluate when that lands.
## Approach
For each file: extract topical function groups into sibling .rs files
under a new directory (`foo.rs` → `foo/mod.rs` + `foo/{group_a,...}.rs`).
mod.rs is a re-export hub with **explicit named re-exports** —
`pub(crate) use foo::*;` does NOT transitively expose names through
an outer `pub(crate) use crate::module::*;` glob, so external callers
would silently lose visibility (this was the root cause of the broken
sibling agents' work earlier; fixed by enumerating every `pub fn` by
name in mod.rs). Sibling files use `use super::*;` to access each
other's items via mod.rs's named re-exports.
Cross-sibling shared types/constants (e.g. `MAX_SCALAR_ARRAY_LEN`,
`ExactReceiverFact`, `NativeArgKind`) are `pub(crate)` or `pub(super)`
in their defining sibling and re-exported through mod.rs.
The native_table dispatch table (5,875 LOC of `pub(super) const
NATIVE_MODULE_TABLE: &[NativeModSig] = &[ ... ]`) was switched to
`pub(super) static NATIVE_MODULE_TABLE: LazyLock<Vec<NativeModSig>>`
that concatenates per-family `*_ROWS` slices from 13 sub-modules in
declaration order. All consumers used `.iter()` only, so the
const→static change is source-compatible and `iter_native_module_table`
still yields the same `&'static NativeModSig` projections in the same
order (preserves the `perry-api-manifest` drift contract).
## CI gate
New `scripts/check_file_size.sh` runs on every PR as part of the
existing `lint` job. Excludes generated artifacts (Cargo.lock, .po
translations, generated API docs, binary fixtures, CHANGELOG.md) and
honors a one-line allowlist (`gc.rs` for now). On violation, prints
the offending files + an inline pointer to this commit's recipe.
Threshold is configurable via `PERRY_FILE_SIZE_THRESHOLD` env var;
default 5000. The eventual target is 2000 — to tighten, ship one or
two file splits per PR + drop the threshold in the same commit.
## Validation
- `cargo check --workspace` (excl. macOS/iOS/gtk4/jsruntime UI crates):
clean, 0 errors.
- `cargo fmt --all -- --check`: clean.
- `cargo test --workspace` (same exclusions): all 100+ test results
green, 0 failures.
- `./scripts/check_file_size.sh` locally: passes.
No behavior changes — purely structural code movement (with the one
exception of NATIVE_MODULE_TABLE's const→static documented above,
which is source-compatible).
# Conflicts: # crates/perry-codegen/src/codegen.rs # crates/perry-codegen/src/expr/mod.rs # crates/perry-runtime/src/object/field_get_set.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
compile.rsis down ~13% (5,395 → 4,725) but a deeper split needs extracting the par_iter codegen closure (~1,800 LOC, ~30 captured locals) as a context-struct helper — deferred to a follow-up.scripts/check_file_size.shand wires it into thelintjob so any PR introducing a file over 5,000 lines fails CI. Threshold is configurable; eventual target is 2k (tighten one file at a time).crates/perry-runtime/src/gc.rs(13,778 LOC) is allowlisted in the script — separate refactor in flight by another contributor.Resulting max single-file LOC per split
*further compile.rs reduction deferred — see body for rationale.Recipe (applied uniformly)
foo.rs→foo/mod.rs+foo/<group>.rs).mod.rsre-exports each public item by name:pub(crate) use foo::*;does NOT transitively expose names through an outerpub(crate) use crate::module::*;glob — siblings end up unable to see each other's items. Enumerate everypub fnexplicitly in mod.rs.native_table(5.8k-row dispatch table) switched fromconst &[NativeModSig]tostatic LazyLock<Vec<NativeModSig>>assembled from 13 sibling*_ROWSslices in declaration order. All consumers used.iter()only — source-compatible. Preserves theperry-api-manifestdrift contract.Test plan
cargo check --workspace(excluding macOS/iOS/gtk4/jsruntime UI crates) — cleancargo fmt --all -- --check— cleancargo test --workspace(same exclusions) — all green, 0 failures./scripts/check_file_size.shlocally — passes