Skip to content

[Plan E2 Phase 1] Task 2b — helper refactor + cat 2 (partial)#159

Draft
boldfield wants to merge 3 commits into
mainfrom
v2-precise-gc-task2b
Draft

[Plan E2 Phase 1] Task 2b — helper refactor + cat 2 (partial)#159
boldfield wants to merge 3 commits into
mainfrom
v2-precise-gc-task2b

Conversation

@boldfield
Copy link
Copy Markdown
Owner

Draft — pushed early so you can see helper refactor + early category 2 coverage. Category 2 has residual work (record/variant field destructuring loads); category 3 not yet started.

What's in this PR so far

Commit 1 — Helper refactor (41ea95b): All 62 alloc sites in codegen.rs now funnel through lower_alloc_call. The only declare_value_needs_stack_map in codegen.rs is inside the helper itself. Closes reviewer's structural concern from PR #156 re-review #2 ("makes the next 23-site miss impossible to commit"). Net –354 LOC.

Commit 2 — Category 2 partial (2051acb): Heap-pointer load flagging via type-aware classification.

  • New helper is_heap_pointer_ty(ty: &Ty) -> bool — single source of truth for "is this Sigil type a GC-traced heap pointer?" Centralised near cranelift_ty_of_ty. Key distinction: on 64-bit hosts, Int and heap pointers have the same Cranelift width (i64), but Int is a raw scalar and must NOT be flagged — over-flagging would crash Phase 2's precise marker on the raw i64.
  • lower_closure_env_load now flags every Char / String / Closure / User slot at load time, keyed on EnvSlotKind. Single change covers every closure-capture read.
  • 24 named heap-pointer loads flagged via a multi-line regex sweep over named-by-purpose loads (ra_closure*, ra_fired*, return_arm_closure_v, caller_k_closure, k_closure*, post_arm_k_closure*). Companion ra_fn / k_fn loads (function pointers, NOT GC refs) correctly left unflagged.
  • sigil_ref_deref result flagged when T is heap-bearing (uses existing lookup_call_callee_ty infrastructure).

What's NOT covered yet

Category 2 residual — record/variant field destructuring loads. Sigil has no direct FieldAccess expression; record/variant field reads happen via match-arm destructuring, scattered across multiple match-lowering sites. Each load there needs the field's Sigil type threaded through — non-trivial refactor.

Category 3 — block-arg confluences + fn-entry block-params on tail-callable fns (the Task 3 dependency surfaced in PR #157). 48 brif/jump sites + 14 block-param add sites need type-aware sweeping. Not started.

Question for reviewer

This PR is a draft to surface scope reality. Doing all of Task 2b in one PR is a 4–8 hour effort with high regression risk on a ~1000 LOC diff. Two paths:

  • Option A: keep it one PR — I push through the residual category 2 work + all of category 3 on this branch.
  • Option B: ship this draft as Task 2b (categories 1 helper-refactor + 2 partial), file Task 2c for the residual category 2 destructuring sweep, file Task 2d for category 3. Smaller, reviewable, but admits per-task PR cadence at the cost of bundled-completeness.

Lean Option A given your "no deferral" steer; flagging it here in case the scope visibility changes your call.

Verification (commits 1 + 2)

Marking-site count progression

State Sites Notes
Pre-PR (main) 62 61 surgical + 1 in helper (PR #156 final)
After commit 1 (helper refactor) 1 only inside the helper
After commit 2 (cat 2 partial) 26 helper + closure_env_load + 24 named loads + ref_deref

boldfield added 3 commits May 11, 2026 21:20
…loc_call

Reviewer's N1 from PR #156: "Sweep methodology is fragile." Of the
62 alloc sites already marked, only one (`float_add` at codegen.rs:24812)
was refactored to use the `lower_alloc_call` helper introduced in
PR #156. The remaining 61 sites were surgical (call + extract + flag
in 5 lines, by convention). This commit refactors all 61 to funnel
through the helper.

After this commit:

- 62 alloc sites all use either:
  - `self.lower_alloc_call(callee, &[args])` — Lowerer method, 54 sites
  - `lower_alloc_call(&mut <builder>, <stackmap>, callee, &[args])` —
    free function, 8 emit_object sites
- The only place `declare_value_needs_stack_map` appears in
  `codegen.rs` is inside `lower_alloc_call` itself (one site at
  codegen.rs:27191). Any future contributor adding a new alloc site
  must funnel through the helper; "mark by convention" is no longer
  possible.

Refactor methodology: two perl scripts (in /tmp at edit time, not
committed) applied multi-line regex substitutions for the regular
patterns (single-line .call vs multi-line builder chain × tail-expr
vs bound-name × Lowerer vs emit_object). A handful of irregular sites
(cont_alloc with multi-arg, string_chars / string_from_chars with
5-arg / 3-arg lists, `balloc` and `step_0_closure_ptr` emit_object
shapes) were hand-edited. Final declare-site count: 1 (in helper);
final `lower_alloc_call` call count: 65 (62 site invocations + the
Lowerer-method delegate + the free-function definition self-call + 1
internal).

Verification:

- pod-verify.sh green (cargo check + clippy + fmt + discipline greps).
- `cargo test --test cranelift_stackmap_spike -p sigil-compiler`:
  PR #151's 2 tests still pass (no API change).
- e2e behavioural tests: CI-authoritative. Behaviour is identical to
  PR #156's final state — the helper is the same code path the
  surgical pattern emitted, just consolidated.

This commit closes the reviewer's structural concern from PR #156's
re-review #2: "Either makes the next 23-site miss impossible to
commit. Without it, I expect round 4." The helper is now the only
path; no future alloc site can be added without it.

Next commits on this branch: heap-pointer loads sweep + block-arg
confluences sweep (Task 2b categories 2 and 3 from the plan body).
… heap loads + ref deref

Plan E2 Phase 1 Task 2b category 2 — flag heap-pointer loads as GC refs
via `declare_value_needs_stack_map`. Per-site classification uses the
Sigil-level type at each load to decide flag-or-not — flagging an
Int-typed slot would crash Phase 2's precise marker on the raw i64.

## Helper added: `is_heap_pointer_ty(ty: &Ty) -> bool`

Centralised classification function at codegen.rs near
`cranelift_ty_of_ty`. Returns `true` for the Sigil types whose codegen
representation is a heap-managed pointer (`Char` boxed, `String`,
`Fn(_)`, `User(_,_)`, `Tuple(_)`, `Continuation(_)`). `Int` / `Bool` /
`Byte` / `Unit` return false — `Int` is the load-bearing distinction:
on 64-bit hosts an `Int` slot has the same Cranelift type (i64) as a
heap pointer, but it's a raw scalar and must not be flagged.

## Sites flagged

1. **Closure-env loads** (codegen.rs:26077 — `lower_closure_env_load`).
   Flagging is keyed on the slot's `EnvSlotKind`: `Char` / `String` /
   `Closure` / `User` → flag; `Int` / `Bool` / `Byte` / `Unit` →
   don't. This single change covers every closure-capture read from
   `closure_ptr` at offset `16 + 8*i`.

2. **Named heap-pointer loads** — 24 sites covering:
   - `ra_closure*` (return-arm closure pointer)
   - `ra_fired*` (return-arm fired-cell pointer)
   - `return_arm_closure_v` / `return_arm_fired_v`
   - `caller_k_closure`
   - `k_closure*` (continuation k pointer)
   - `post_arm_k_closure*` / `caller_post_arm_k_closure`

   Applied via `/tmp/flag_heap_loads.pl` (a multi-line regex sweep
   keyed on the variable name pattern + the precise five-line load
   shape: `let NAME = BUILDER.ins().load(pointer_ty, MemFlags::trusted(), BASE, OFFSET,);`).
   The companion `ra_fn` / `k_fn` / `caller_k_fn` loads — which load
   *function pointers*, not GC refs — are correctly NOT flagged.

3. **`sigil_ref_deref` result** (codegen.rs:25232).
   `Ref[T]::deref()` returns the stored T value. The site already
   looked up the callee's Sigil signature via `lookup_call_callee_ty`
   for narrowing; same lookup now decides whether to flag the raw i64
   result. If T is heap-bearing (per `is_heap_pointer_ty`), flag;
   otherwise leave raw.

## What's NOT covered in this commit

Category 2's plan-body wording is "loaded heap pointers (record-field
reads, closure-env loads)." The closure-env loads are done. Record/
variant field reads happen via match-arm destructuring in sigil
(there is no direct `FieldAccess` expression); destructuring loads
are scattered across multiple match-lowering sites and need a
threaded-type-info refactor to classify correctly. **Deferred to a
follow-up commit on this branch** (next commit if scope permits, else
spelled out in the PR body as a defered residual).

## Category 3 — NOT STARTED

Block-arg confluences + fn-entry block-params on tail-callable fns
(the Task 3 dependency from PR #157) are NOT in this commit. Will be
the next commit on this branch.

## Verification

- pod-verify.sh green (cargo check + clippy + fmt + discipline greps).
- `cargo test --test cranelift_stackmap_spike -p sigil-compiler`:
  PR #151's 2 tests still pass.
- e2e behavioural tests: CI-authoritative. The added flag calls are
  metadata for the safepoint pass; programs run identically.

Marking site count: 26 (1 in `lower_alloc_call` helper + 1 in
`lower_closure_env_load` + 24 named-load flags). Per-site type
correctness is the load-bearing claim — over-flagging an Int slot
would surface in Phase 2 as a precise-marker crash; this commit's
classifications are reviewable by inspecting `is_heap_pointer_ty`
plus each per-site `if is_heap_t` / `match kind` guard.
Continues Task 2b on the same branch. Two more layers of heap-
pointer flagging now in place.

## Cat 2 residual — record / variant / tuple field reads

Sigil has no direct `FieldAccess` expression; field reads happen via
match-arm destructuring at two centralised lowering sites:

1. `Pattern::Tuple` arm (codegen.rs:26486) — loads tuple element at
   offset `8 + 8*i`, dispatches on `elem_ty` for narrowing. The
   `Char | String | Fn(_) | User(_,_) | Tuple(_)` arm now flags the
   raw i64 result via `declare_value_needs_stack_map` before
   returning. `Int` / `Bool` / `Byte` / `Unit` arms intentionally
   don't flag (raw scalar).

2. `load_field_value` (codegen.rs:26556) — loads a record/variant
   field at `16 + 8*index`, same `match field_ty` shape. Same
   change: heap-bearing arms flag the raw i64.

Per-arm classification in both sites uses the existing per-element
type discrimination — the change is one extra line per heap-bearing
arm, no new threading required. Coverage is automatic for every
record / variant / tuple destructuring in user code.

## Cat 3 — fn-entry block-params on tail-callable fns

Adds `flag_heap_pointer_user_args(builder, entry_block, base_offset,
is_heap_per_arg)` helper at codegen.rs:~530, plus the TypeExpr-level
predicate `is_heap_pointer_type_expr(te) -> bool` (companion to the
existing `is_heap_pointer_ty` on `Ty`).

Applied at the user-fn body entry site (codegen.rs:10081) for the
`Sync` ABI — entry-block-params layout there is `[null_closure,
*user_args, terminal_out]`, user args at indices 1..=N. Each pointer-
typed user-arg whose surface `TypeExpr` is heap-bearing (anything
not `Int` / `Bool` / `Byte` / `Unit`) is flagged. `null_closure` at
index 0 and `terminal_out` at index N+1 are infrastructure pointers
(constant null at direct-call time / caller-stack pointer
respectively); not flagged.

This closes the **Task 3 dependency** filed in PR #157 / `PLAN_E2_PROGRESS.md`:
"every fn-entry block-param of pointer type for any fn callable via
`return_call` / `return_call_indirect` must be flagged
`declare_value_needs_stack_map`." Without this flagging, heap pointers
passed across a tail call would lose precise tracking once Phase 3
drops Boehm's conservative stack scan.

For Cps ABI user fns, the entry-block-params are `[closure_ptr,
args_ptr, k_closure, k_fn]` — `args_ptr` is a caller-stack pointer
(not heap), `k_fn` is a function pointer (not GC ref), and the
user args are loaded from `args_ptr` later (covered by `load_field_*`
or per-site flagging there). So the Cps fn-entry case is a no-op
here, by design — flagging machinery for the closure_ptr / k_closure
of Cps fns lives at the load sites where they're consumed (already
covered in commit 2's named-load sweep).

Synth-arm-fn entries (codegen.rs:12381, 12783, 13852, 14171, 14455,
15285, 19078) are NOT tail-call destinations in sigil per the
PR #157 audit. Their fn-entry block-params don't need flagging for
Task 3's soundness contract — covered by inspection of the
audit's "exactly two return_call* sites" finding (both target
direct-Sync callees, never synth-arms).

## Verification

- pod-verify.sh green.
- `cargo test --test cranelift_stackmap_spike -p sigil-compiler`:
  PR #151's 2 tests still pass.
- e2e behavioural tests: CI-authoritative.

Marking-site count after this commit: 30 (1 helper + 1
`lower_closure_env_load` + 24 named loads + 1 `sigil_ref_deref` +
1 `Pattern::Tuple` heap arm + 1 `load_field_value` heap arm + 1
`flag_heap_pointer_user_args` per-param). Each per-param flag
expands at runtime to one stack-map entry per heap-bearing user-arg
position per Sync user fn — true count depends on the program.

## Remaining work — block-arg confluences (brif/jump heap-arg sweep)

48 brif/jump call sites in codegen.rs pass block-args through control
flow. For each, if any block-arg is a heap pointer, the destination
block's corresponding param needs flagging. This sweep is broader
than fn-entry params and requires per-site type tracking.

Plan-body's "phi confluences where any input is a heap pointer"
explicitly names this. Will be the next commit on this branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant