[Task 50] Color inference: per-monomorph, SCC-aware, --dump-color#17
Conversation
Replaces the Plan A1 always-Native stub in compiler/src/color.rs with real per-monomorph analysis. Each top-level fn (each post-mono "monomorph") is classified Native or Cps: - Native: closed effect row of [] or [IO], no `perform` of a non-IO effect anywhere in the body or nested lambdas, and (after propagation) no transitive call into a Cps-color monomorph. - Cps: anything else. The propagation pass uses a Tarjan SCC over the monomorph call graph. Within an SCC, if any member is locally Cps the whole SCC is Cps; otherwise if any member calls into a downstream SCC that's Cps, the SCC is Cps; otherwise Native. This matches Plan B's "SCC-aware" guidance and avoids over-pessimizing one fn because of an unrelated cycle. Reasons are stable, machine-readable, and human-readable, covering: - `cps: open effect row` - `cps: row contains effect <E>` - `cps: performs <E>.<op>` - `cps: transitively calls <callee> which is cps` - `cps: in SCC with cps member <name>` - `native: pure row` / `native: row is ![IO]` `--dump-color` lands in cli.rs + main.rs + new pipeline::dump_color helper. The flag runs the front end through color inference and prints `<name> native|cps <reason>` per fn (one line per monomorph in program order) to stdout, no codegen. `-o` is silently accepted but ignored; `--print-runtime-stats` conflicts with `--dump-color` and emits a usage error. Tests: - 15 color::tests cover open-row Cps, non-IO-row Cps, perform-non-IO-with-IO-row Cps (via the body-walk fallback), caller-callee both Native, mutual recursion Native, mutual recursion with one Cps member tainting whole SCC, transitive Cps taint, unrelated cycle non-pessimization, and lambda body perform tainting parent. - 4 cli::tests pin --dump-color parsing, default and human formats, -o silent ignore, and the conflict with --print-runtime-stats. - 2 e2e tests exercise `sigil <input> --dump-color` end-to-end. Pod-verify green; full lib + e2e tests deferred to CI.
Code review — Task 50 color inferenceSolid implementation overall. Plan B color spec is met, SCC propagation is correct (Tarjan emits in reverse-topological order, you consume callee SCCs before caller SCCs), and the test surface is genuinely thorough. That said, there are several issues — one possible soundness/precision gap, one robustness concern, and a handful of UX/diagnostic nits — I'd like fixed before this lands. Bugs / correctness1. Bare-
|
Review verdict: request changesImplementation is fundamentally sound. Tarjan is textbook-correct, conservative-soundness holds, output is deterministic, 340 compiler-lib tests pass (319 → 340, +21 across the directed coverage), CI green on both hosts. But there are six issues — one structural risk and five coverage/wording gaps — that should land before merge per the project's "don't put it off" discipline. Must-fix before merge1. Convert
|
…de reasons Addresses both review comments on PR #17. Six structural changes plus five new tests; existing 32 unit tests + 2 e2e tests still pass. Bugs / correctness: 1. Edge insertion now drives off CheckedProgram::call_site_instantiations instead of the bare-Ident-name heuristic. Typecheck's env-precedence rules win over fn_schemes lookup at every Ident span, so a parameter or `let` binding shadowing a top-level fn name does NOT produce a spurious outgoing edge. The bare-Ident heuristic over-approximated here — the new code uses information typecheck already computed. Pinned by parameter_shadowing_top_level_fn_does_not_taint_caller, a real-front-end test that runs through typecheck. 2. tarjan_scc is now iterative with an explicit work stack — eliminates the recursive-frame stack-overflow risk on long monomorph chains. Reverse-topological emission order preserved; behavior on existing tests unchanged (verified by re-running all 15 prior color tests plus the 5 new ones). Diagnostic / API: 3. Per-node reason text is now the node's *own* most-proximate cause: intrinsic-CPS members keep their specific local reason; bridge members (with an outgoing edge to a CPS SCC in a different SCC) get `cps: transitively calls <callee> which is cps`; non-bridge members propagated through SCC membership get either `cps: in SCC with intrinsically-cps member <name>` or `cps: in SCC bridging to cps callee via <name>` depending on which peer caused the taint. Two-variant SCC-fallback wording lets a --dump-color reader follow the causal chain in adversarial review. 4. pipeline::dump_color returns Result<String, DumpColorError> with typed ReadFailed / FrontEndErrors(usize) variants instead of conflating both as `usize`. Caller still maps to exit code 1 today, but downstream tooling can branch on the variant. 5. main.rs now warns on stderr when `-o <path>` is supplied alongside --dump-color: "warning: `-o <path>` ignored under --dump-color (no executable produced)". The flag is still accepted (shell-history ergonomics) but the misuse is no longer silent. 6. unused_param_warning_silenced renamed to synth_fn_with_param_classifies_native — the old name didn't match the test body. New tests (in addition to the rename): - self_loop_single_node_scc_native + self_loop_single_node_scc_cps — pin Tarjan correctness on a single-node-with-self-loop SCC. - let_bound_fn_value_taints_parent_via_outgoing_edge — pins the load-bearing soundness claim that `let f = some_cps_fn` taints the parent even though f is never invoked. - cps_propagates_through_three_scc_chain — pins reverse-topological propagation across depth-N hops, not just single-hop. - scc_taint_via_transitive_only_branch — exercises the transitive-only SCC reason branch (no intrinsic member, only bridge), pins the new "in SCC bridging to cps callee via <name>" reason wording. - parameter_shadowing_top_level_fn_does_not_taint_caller — the Review #1 reproducer; real front-end run; pins precision. Pod-verify green. Full lib tests + new e2e behavior (the -o warning) deferred to CI.
|
Addressed both review comments in Bugs / correctness:
Diagnostic / API: Tests:
Existing test updated: Pod-verify green; CI running. |
Re-review verdict: mergeAll six directed must-fix items landed correctly. CI green on all 4 jobs (ubuntu-24.04 + macos-14, both regular and cold-checkout). 346 compiler-lib tests pass (340 → 346, +6). Per-item verification
What the agent did beyond the directives
Mechanical
One non-blocking observationThe iterative Tarjan's ContinuationMerge whenever. Plan B Stage 5 review checkpoint sits between Tasks 52 and 53; this PR closes Task 50 cleanly. Tasks 51 ( |
Re-review — fix-up commit
|
Review note — statusRe-review complete on
One non-blocking item before merge: No further blocking concerns from my side. Ready to merge once the shadowing-test path forward is decided. |
Addresses the residual non-blocking concern on PR #17: `parameter_shadowing_top_level_fn_does_not_taint_caller` was a tautology under Stage 5 (typecheck rejects non-IO effects, so `dangerous` ended up Native and the test passed under either the precise edge logic or the old name-only heuristic). Reviewer suggested three options; this commit applies option 3 (synthetic-test span uniqueness): - New `unique_span()` test helper: atomic-counter span generator that produces a fresh span per call. Used only by tests that need to disambiguate individual `Expr::Ident` occurrences via their span. The constant `span()` helper is unchanged for the rest of the synth tests. - New `synth_program_with_calls(items, calls)` test helper that accepts a caller-supplied `call_site_instantiations` map. The auto-built map produced by `build_synthetic_calls_map` is a name-only heuristic, not env-precedence-aware; tests modeling shadowing supply their own map manually. - Rewrote `parameter_shadowing_top_level_fn_does_not_taint_caller` as a synthetic program: `dangerous` is intrinsically CPS (Raise effect), `unrelated` is Native, `caller(dangerous: Int)` body has a param-ref `dangerous` Ident (unique span deliberately omitted from the calls map) AND a real call to `unrelated()` (Ident span recorded in the calls map). Asserts `caller` Native. This is now genuinely discriminating: under the old name-only heuristic, caller would acquire a spurious edge to `dangerous` (CPS) and falsely classify CPS; under the precise calls-map-driven edge logic, the shadow Ident produces no edge and caller stays Native. Also addresses the non-blocking nit on `bridge_callee_of`: replaced the misleading "skip when already-decided" hint with an honest comment explaining we always compute the map (the per-node reason loop needs it for both intrinsic-present and intrinsic-absent SCCs) and that pure leaf SCCs naturally hit the empty case without an explicit skip. Pod-verify green; existing 32+1 unit tests pass.
|
Picked option 3 from the residual-concern menu in Changes:
This is now genuinely discriminating: under the old name-only heuristic, caller would acquire a spurious edge to Also addressed the non-blocking nit on CI running. |
Re-review —
|
…ts (#18) * [Task 51] examples/generic_map.sigil — first user-authored generics e2e First user-authored generic syntax to flow through the full Sigil pipeline (lex → parse → resolve → typecheck → elaborate → monomorphize → color → codegen). PR #17's reviewer flagged this as the canonical reproducibility checkpoint for PR #16 (Task 49)'s `$$` mangling format: prior tests stop at the monomorph-IR level, and prior end-to-end examples declare no generic parameters. What the example exercises: - `type List[A] = | Nil | Cons(A, List[A])` — a self-recursive generic type. Monomorphization dedups via `type_seen` so each instantiation produces exactly one clone (`List$$Int`, `List$$String`). - `fn map[A](xs: List[A]) -> List[A] ![]` — structure-preserving traversal, semantically `map id`. Sigil v1 surface has no `TypeExpr::Fn`, so the design doc's canonical higher-order `map[A,B,e](xs, f) -> List[B]` cannot be expressed; this is the closest Plan B can express. - `fn length[A](xs: List[A]) -> Int ![]` — generic fold returning a concrete `Int`. Pairs with `map` to produce a verifiable scalar. - `main` instantiates each generic at `Int` and `String` in the same program, producing four monomorph clones (`map$$Int`, `map$$String`, `length$$Int`, `length$$String`) plus mangled ctor sites (`Nil$$Int`, `Cons$$Int`, `Nil$$String`, `Cons$$String`). - Pattern matching on generic ctors: scrutinee `Ty::User(_, args)` propagates per-sub-pattern field types so `Cons(h, t)` correctly types `h: A`, `t: List[A]` against the scrutinee's instantiated `A` (closes Task 49 round 3's regression reproducer). Two e2e tests: - `generic_map_example_prints_3_and_2` — full lex → parse → typecheck → elaborate → monomorphize → color → codegen → run pipeline. Asserts stdout exactly `"3\n2\n"`. Length values deliberately differ (3 vs 2) so a copy-paste error between the two list literals would surface as a length mismatch rather than pass silently. - `generic_map_dump_color_all_native` — verifies Task 50's per- monomorph color inference classifies all four monomorphs plus `main` as native (bodies have row `![]`, `main` has `![IO]`, no `perform` to a non-IO effect). Pins each expected mangled name independently so a mangling-format slip on any single one (e.g. `map$$String`) lands on a directed assertion rather than an opaque overall-string diff. Smoke + reproducibility scripts updated to include the new example so the per-host byte-stable-binary invariant covers the full generic pipeline. Plan A2's `examples/higher_order.sigil` doc comment notes that `TypeExpr::Fn` was deferred to Plan A3; A3 did not actually deliver it (deferred again to v2 per scope). The plan's "generic map" text therefore describes what Plan B's surface CAN express, not the design doc's canonical higher-order signature. * [Task 52] Validation prompts P16 + P17 — generic id, compose Adds the two remaining Stage-5-feasible entries to the prompt bank: - **P16 — generic identity at Int and String**. Fully expressible in Plan B's surface (no `TypeExpr::Fn` required). Asserts oracle output `"42\nsigil\n"`. Pins the discriminating contract that Algorithm W's fresh-var-per-call instantiation plus Task 49's reachability-bounded specialization produce exactly two monomorph clones (`id$$Int`, `id$$String`) — not one polymorphic body, not three from double-counted call sites. - **P17 — generic compose applied across types**. Requires `TypeExpr::Fn` surface syntax for the higher-order parameters, same as P09 / P10. P09/P10 deferred this to Plan A3; A3 did not deliver it. P17 follows the same deferral pattern: oracle is graded against "program compiles" until function types ship. P17's distinguishing feature versus P10 is `A != C` (the result type genuinely travels through composition rather than absorbing into the trivial endo-functor case). Plan B Task 61 will add P18 / P19 / P20 (Raise-based parser, State-threaded counter, multi-shot Choose). Bank is now 17/20; remaining three land in Stage 6. * [Task 51-52] PROGRESS: flip pending-ci entries → done; record Tasks 51 + 52 Standard PROGRESS hygiene at Stage 5 closeout. Per Plan A2 / Task 49 precedent, status flips for prior PRs go in the next task's PR. - Tasks 4.5.1-4.5.5 (Stage 4.5 scaffolding) and Plan A3 carryover items: flipped `done-pending-ci` → `done` with squash-merge hash `01e0e13` (PR #14). All four CI jobs were green at merge. - Task 47 (parser): `done-pending-ci` → `done` with `01e0e13` (also PR #14). - Task 48 (HM unifier): `done-pending-ci` → `done` with `70756de` (PR #15). - Task 50 (color inference): `done-pending-ci` → `done` with `82e0f97` (PR #17). Round 2 / round 3 fixup hashes preserved in the activity log; the squash-merge hash is the canonical post-merge anchor. - Task 49 (monomorphization): notes corrected from the round-1 `_`/`__` mangling separator to the final `$`/`$$` format pinned by round-2 fixup `994f083`. Status hash `[981ec93]` left as-is (set by prior session's `531bcfe` flip commit, recorded the branch-commit hash convention rather than the squash-merge `858b4c2`). - Task 51 entry: `done-pending-ci` with commit pointer `[HEAD]` describing the new `examples/generic_map.sigil`, two e2e tests, and smoke + reproducibility script updates. - Task 52 entry: `done-pending-ci` with commit pointer `[HEAD]` noting that P17 follows the same `TypeExpr::Fn`-deferred grading as P09 / P10 until first-class function types ship. Stage 5 review checkpoint remains pending — the next step before Stage 6 begins is Brian's review of row-unification, let- generalization, color decisions, and monomorphization naming determinism on adversarial inputs. * [Task 51 fix-up] typecheck: cross-arm body consistency uses unify_ty, not Eq CI failure on PR #18 surfaced an unprecedented Stage 5 typechecker bug: `check_match` compared arm body types via structural equality (`first != t`) instead of attempting unification. With pre-Plan-B programs (every Plan A1/A2/A3 example) arm body types are concrete primitives or already-resolved user types, so the equality check held coincidentally. With generic-fn-internal matches whose arms return a generic-typed value (e.g. `fn map[A](xs: List[A]) -> List[A] { match xs { Nil => Nil, Cons(h, t) => Cons(h, map(t)) } }`), each ctor resolution allocates a fresh `List[?N]` user instance with a distinct fresh-var id — `Nil` resolves to `List[?6]` and `Cons(...)` to `List[?5]` — and the equality check reports `?6 != ?5` even though the two trivially unify. Fix: snapshot `self.errors.len()`, call `unify_ty(first, t, &arm.span)`, and on failure truncate any internal E0044 "type mismatch" errors `unify_ty` itself pushed and emit E0065 with the arm-specific phrasing so the user-facing diagnostic surface stays unchanged. On success the unifier binds the fresh vars together (directly or transitively through the function's declared return type), and the match expression's overall type is whichever representative falls out of `deref(first)`. Two regression tests: - `generic_match_returning_generic_unifies_arms` — pins the exact reproducer from `examples/generic_map.sigil`'s `map` fn. Pre-fix this fails with E0065; post-fix the program typechecks clean. - `match_arm_type_unification_still_rejects_real_mismatch` — regression guard that the new unify-based path still emits E0065 on a genuine Int-vs-String arm-body mismatch (i.e., the fix doesn't accidentally silence legitimate type errors). The long-standing `match_arm_types_must_unify_is_e0065` covers the same shape; this new test pins the contract specifically inside a generic-fn surface so the discriminating path is tested. Plan B classification: bug fix in Task 48 surface (HM unification), discovered by Task 51's example. Not a deviation — the plan specified HM unification end-to-end, the implementation just had a coincidentally-passing structural-Eq check. No PLAN_B_DEVIATIONS entry; PROGRESS notes for Task 51 will be updated with this fix-up in the PR body. * [Task 51-52 review fixups] address PR #18 reviewer items Two review comments on PR #18: an initial code-review pass, then a revised verdict ("request changes") that supersedes the first on the contentious subst-rollback question. This commit addresses every non-superseded item. == From the revised verdict (Comment 2) == 1. **PROGRESS Task 49 SHA normalized.** `[981ec93]` (branch-tip hash) → `[858b4c2]` (squash-merge SHA on `main`). Matches the convention every other prior task entry uses; closes the bookkeeping inconsistency the reviewer flagged. 2. **3-arm generic match regression test.** New `three_arm_generic_match_propagates_subst_across_all_arms` pins that cross-arm unify propagates substitutions across MORE than two arms. Each `W(x)`-style ctor allocates a fresh `Wrap[?N]` user instance, so arm 1 → `Wrap[?A]`, arm 2 → `Wrap[?B]`, arm 3 → `Wrap[?C]`. The cross-arm check unifies sequentially: arm1↔arm2 binds `?A := ?B`; arm1↔arm3 must then unify the deref'd representative against `Wrap[?C]`. A naive 2-arm-only check would miss a propagation bug on the third arm. 3. **Subst-pollution-pinning regression-guard.** New `subst_pollution_from_partial_unify_surfaces_at_call_site` is the discriminating test against a future "fix" that adds subst snapshot/restore on `unify_ty` failure. The program: `foo[A, B]` has a match where arm 1 returns `p: Pair[A, B]` and arm 2 returns `Pair("x", 3): Pair[String, Int]`. Cross-arm unify SUCCEEDS by binding `A := String`, `B := Int` — the body itself typechecks clean, but the bindings persist in the global subst. `foo`'s scheme is now over-constrained; `caller`'s `foo(Pair(1, 2), 0)` instantiates with `A := Int, B := Int` and the over-constraint surfaces as E0044 (concrete mismatch) + E0132 (ambiguous polymorphism: scheme generalization sees A and B already bound). With rollback: arm 2's bindings discarded, foo stays generic, caller accepts `Pair[Int, Int]` cleanly with NO errors. Test pins the cascade by asserting BOTH E0044 AND E0132 appear, AND that the body's match itself has no E0065 (cross-arm unify succeeded; a body-level error would mean a regression in the opposite direction). 4. **Perf-floor instability flagged for Task 60 in PLAN_B_DEVIATIONS.md.** `fib_perf_example_prints_6765_under_50ms` and `tree_example_prints_32767_under_500ms` exceed wall-clock floors by ~4x in debug profile (~200ms each on aarch64-apple-darwin). Pre-existing on `main`, not Task 51's fault. Surfaced as a new VERIFICATION DEBT entry so Task 60 doesn't have to rediscover. Three resolution options enumerated; reviewer prefers "platform-aware tightening or release-only mode." == From the initial review (Comment 1, where not superseded) == 5. **Issue 2: selective error truncation on cross-arm unify failure.** Pre-fix `self.errors.truncate(pre_unify_errors)` removed every error `unify_ty` pushed, including E0126 (occurs check) and E0127 (row occurs) — which name a real soundness problem the generic E0065 wouldn't capture. Now drains errors past baseline, keeps non-E0044 codes (occurs-check kinds), drops E0044 (replaced by arm-specific E0065). User-facing diagnostic surface unchanged on the common path; correctness improved on the edge. 6. **Issue 3: P16 e2e test pins the prompt-bank claim.** New `p16_generic_id_at_int_and_string_oracle` runs P16's source through the full pipeline. Asserts (a) stdout exactly `"42\nsigil\n"` (P16 oracle), (b) `--dump-color` produces exactly 3 monomorph lines `{id$$Int, id$$String, main}` — a regression that double-counts call sites would surface as a 4th line; an unmonomorphized polymorphic body would surface as a bare `id native`. Makes the PR description's "exactly two clones" claim substantive instead of aspirational. 7. **Issue 4: P17 surface-syntax-pending pin.** New `p17_compose_source_rejects_until_typeexpr_fn_ships` writes the exact P17 prompt source and asserts the front end rejects it (specific error code is implementation detail and could shift between parser- and typecheck-level rejections; the test just asserts non-success). Once `TypeExpr::Fn` ships, the test should be inverted to assert success against the prompt's stdout oracle. 8. **Issue 6: unused `h` binding.** `Cons(h, t) => 1 + length(t)` in `length`'s body → `Cons(_, t) => 1 + length(t)`. `h` was never used; `_` matches Sigil idiom and avoids relying on the unused-binding policy. `map`'s body still uses `h`, which is fine because it's used in the result. == Skipped == - **Issue 1 (subst rollback):** superseded by Comment 2's verdict that current no-rollback behavior is HM-correct. Item 7 above pins that contract via the regression-guard test. - **Issue 5 (helper extraction):** minor style, single call site, not worth extracting per repo convention against speculative abstractions. - **Issue 7 (reproducibility note):** informational only; no action requested. == Test counts == 348 → 351 compiler lib tests (+3 new typecheck tests). E2E gains 2 new tests (P16 oracle + dump-color, P17 surface-pending). Pod-verify green; full test suite deferred to CI.
Plan B Stage 5 Task 50. Replaces the Plan A1 always-Native stub in
compiler/src/color.rswith real per-monomorph color analysis +SCC-aware propagation +
--dump-colorCLI flag.Landed in this PR
compiler/src/color.rs. Eachtop-level fn is classified Native or Cps:
![]or![IO], noperformof anon-IO effect anywhere in the body or nested lambdas, AND (after
propagation) no transitive call into a Cps-color monomorph.
member is locally Cps the whole SCC is Cps; otherwise if any member
calls into a downstream SCC that's Cps, the SCC is Cps; otherwise
Native. Matches Plan B's "SCC-aware" guidance — avoids
over-pessimizing one fn because of an unrelated cycle.
--dump-colorand adversarial review:cps: open effect rowcps: row contains effect+ Ecps: performs+ E.opcps: transitively calls+ callee +which is cpscps: in SCC with cps member+ namenative: pure row/native: row is ![IO]--dump-colorflag incli.rs+main.rs+ newpipeline::dump_colorhelper. Runs the front end through colorinference and prints
<name> native|cps <reason>per fn (one lineper monomorph in program order) to stdout. No codegen.
-oissilently accepted but ignored under
--dump-color;--print-runtime-statsconflicts and emits a usage error.Test plan
color::testsunit tests cover: open-row Cps, non-IO-rowCps, perform-non-IO-with-IO-row Cps via body walk, caller-callee
both Native, mutual recursion Native, mutual recursion with one
Cps member tainting whole SCC, transitive Cps taint, unrelated
cycle non-pessimization, lambda body perform tainting parent,
stable dump_color program-order output.
cli::testspin--dump-colorparsing (default + humanformats),
-osilent ignore, and the conflict with--print-runtime-stats.sigil <input> --dump-colorend-to-end:dump_color_hello_is_native_row_io,dump_color_multi_fn_pure_program.runtime lib tests, no-interior-pointers, discipline greps).
cold-checkout).
Conservative-soundness notes
Identreferences (not just calls in calleeposition) count as outgoing edges. This is needed because Plan A2's
closure model treats top-level fn names as values (e.g.,
let f = some_fn), andlower_call's direct-Ident branch resolves them totheir target later. Counting both keeps the call graph sound when
future passes propagate fn-as-value forms.
this stage — closure conversion runs after color in the pipeline.
Their bodies are walked for performs and outgoing calls; the parent
picks up any Cps-tainting from inside.
![IO | e]classify as Cps. We can't staticallyprove the row var is never instantiated with anything beyond IO, and
Plan B Stage 5 keeps row vars in the IR. Stage 6's effect runtime is
the only mechanism that could discharge them, so treating them as
Cps is the safe v1 choice.
Out of scope (deferred to later tasks)
don't exist yet at color time. When CC lands per-monomorph closures,
Task 55 (CPS transform) will need to revisit.
are erased at codegen-entry, not specialized).
--debug-countersinstrumentation flag from Task 56 (Stage 6).Plan B Task 50 acceptance
Plan B Task 50 description:
All five spec points are met:
monomorphize::monomorphize.![]or![IO]): yes,local_colorrejects anyrow var or non-IO effect name.
find_non_io_perform_in_*walks the entire body and lambda bodies for any non-IO perform.
member colors; cross-SCC propagation honors reverse-topological
order.
--dump-colorflag: yes, plumbed through cli.rs, main.rs, andpipeline::dump_color; one stable line per monomorph in programorder.