Skip to content

Add [force e] strict-normalization combinator (eigentrust pitfall #11)#9

Draft
kumavis wants to merge 1 commit into
mainfrom
claude/fix-eigentrust-pitfall-11-strict
Draft

Add [force e] strict-normalization combinator (eigentrust pitfall #11)#9
kumavis wants to merge 1 commit into
mainfrom
claude/fix-eigentrust-pitfall-11-strict

Conversation

@kumavis
Copy link
Copy Markdown
Contributor

@kumavis kumavis commented Apr 25, 2026

Fixes pitfall #11 from docs/tracking/2026-04-23_eigentrust_pitfalls.md.

Summary

Adds a [force e] AST node whose whnf reduces its argument to NF (not just WHNF). User wraps recursive-tail-call arguments in force to opt into strict evaluation:

;; before — O(k²) shape, deep iteration on Posit32 hangs
[eigentrust-iterate c p alpha eps budget [eigentrust-step c p alpha tnew]]

;; after — O(k) shape
[eigentrust-iterate c p alpha eps budget [force [eigentrust-step c p alpha tnew]]]

Type-and-usage-transparent: infer(force(e)) = infer(e), inferQ likewise. QTT linearity is preserved — a forced m1 value still consumes once.

Approach (a) — explicit force, lowest blast radius

Picked option (a) from the design choices in the prompt:

  • (a) (chosen) Add a [force e] user-visible combinator
  • (b) deferred Eagerly normalize specific forms in whnf itself — risks QTT linearity violations and changes the evaluation model
  • (c) deferred Type-driven strictness for tail-call argument positions when types are concrete — bigger architectural change

(b) and (c) are documented as deferred future work in the commit message.

Pipeline exhaustiveness

Per .claude/rules/pipeline.md § "New AST Node", touched 13 of 15 pipeline files:

  • Core 8: syntax.rkt, substitution.rkt, zonk.rkt, reduction.rkt, pretty-print.rkt, pnet-serialize.rkt, typing-core.rkt, qtt.rkt
  • Surface 5: surface-syntax.rkt, parser.rkt, tree-parser.rkt, elaborator.rkt, macros.rkt
  • Defensive: driver.rkt, effect-ordering.rkt

Skipped: unify.rkt (force is type-identity; falls through after one whnf step), foreign.rkt (not interop), typing-propagators.rkt (no new structural typing).

Test plan

  • New tests/test-force-strict.rkt — 13 cases: basic eval, QTT linearity preservation under force, no-op on already-WHNF, deep-iteration speedup demo
  • Regression: reduction (27), parser (82), syntax (20), elaborator (40) — all green

Perf demo numbers

Synthetic deep iteration (natrec-per-level chain). Real Posit32 case has a much larger gap because step is genuinely expensive per level.

k lazy strict speedup
12 196ms 44ms 4.5x
20 437ms 74ms 5.9x
28 797ms 89ms 9.0x

Lazy is super-linear (10x runtime for 3.5x more depth). Strict is roughly linear. Confirms the O(k²) vs O(k) shape from the pitfall report.

Surprise worth flagging

tools/check-parens.sh is hardcoded to a macOS Racket path (/Applications/Racket v9.0/bin/racket), so it fails on every file in a Linux env. Worked around by inlining the same with-module-reading-parameterization check against /usr/local/bin/racket. Out of scope for this PR but worth a follow-up.

Commits

  1. bbebe74 — primary fix (new AST node + 13-file pipeline updates + test file)
  2. 80fdca8 — CI fix (skip stale bench file)

https://claude.ai/code/session_01MbncYJnrvjzhbVWw4xGi5x


Generated by Claude Code

@kumavis kumavis force-pushed the claude/fix-eigentrust-pitfall-11-strict branch from 80fdca8 to a09ddc0 Compare April 26, 2026 01:03
kumavis added a commit that referenced this pull request Apr 26, 2026
The 2026-04-23 eigentrust pitfalls memo (forthcoming branch) enumerated 16
items hit during the EigenTrust implementation. Items #1-7 and #11-15 are
language/elaboration defects with their own PRs. Items #8, #9, #10, and #16
are observations rather than Prologos defects; no compiler change is needed
for them but the memo deserves a parallel disposition note so a future reader
does not double-count them as open work.

#8 (exact-Rat slow on deep iter): intrinsic to exact rational arithmetic;
benchmark-scope guidance, not a fix.

#9 (Posit32 literals work): positive observation; `~` literal prefix is
unambiguous unlike `0/1`. No action.

#10 (PVec preserves where List does not): subsumed by pitfall #3 fix. After
#3 lands, both literal forms preserve element type uniformly. Close as
duplicate.

#16 (column-stochastic vs row-stochastic): algorithm/spec clarification, not
a Prologos defect. The eigentrust implementation branch already takes
column-stochastic M directly and validates via col-stochastic?.

https://claude.ai/code/session_01MbncYJnrvjzhbVWw4xGi5x

Co-authored-by: kumavis <1474978+kumavis@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@kumavis kumavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to achieve this without the user-driven force. so we'll set this aside for now.

Pitfall #11 (lazy argument reduction makes deep iteration scale as
O(k^2) for non-fixed-point iterates) had no general fix because
reduction is call-by-name everywhere and the user has no way to mark
an argument position as "evaluate before passing." When every
intermediate value differs bit-for-bit (Posit32, non-fixed-point Rat)
the reducer redoes O(n) levels on every iteration, totalling ~O(k^2)
terms expanded across k rounds.

This commit picks approach (a) from the suggested-fix triage: a new
expression form [force e] / (force e) that, at whnf time, fully
normalizes its argument and returns the NF (with the force wrapper
elided). Type-and-usage-transparent: infer(force(e)) = infer(e),
inferQ(force(e)) = inferQ(e). Lowest blast radius — does not change
existing reduction semantics for any expression that does not
mention `force`. Users who hit pitfall #11 can wrap the recursive
tail-call argument:

  ;; Before (slow):
  [eigentrust-iterate c p alpha eps budget [eigentrust-step c p alpha t]]

  ;; After (fast):
  [eigentrust-iterate c p alpha eps budget
    [force [eigentrust-step c p alpha t]]]

Future work, deferred:
  (b) Make whnf eagerly normalize specific forms (numeric literals,
      list literals) so even unwrapped lazy chains collapse at WHNF.
      More general but architectural — needs care around QTT
      linearity (an expression marked m1 must not be evaluated twice
      by an "eager" optimizer that didn't see it was inside an
      eliminator).
  (c) Type-driven strictness for tail-call argument positions when
      the type is concrete. Hardest; requires call-site analysis
      and a way to mark "strict" in the function signature.

Pipeline coverage (per .claude/rules/pipeline.md "New AST Node",
user-facing surface syntax variant — 13 of the 15 listed files):

  Core (8):
    syntax.rkt           expr-force struct + provide + expr? entry
    substitution.rkt     shift, subst (recursive)
    zonk.rkt             zonk, zonk-at-depth, default-metas
    reduction.rkt        whnf-impl/match -> nf(arg); nf-whnf clause
    pretty-print.rkt     pp-expr "[force ...]" + uses-bvar0?
    pnet-serialize.rkt   reg1!
    typing-core.rkt      infer (delegates to argument)
    qtt.rkt              inferQ (delegates to argument)

  User-facing surface syntax (5):
    surface-syntax.rkt   surf-force struct + provide
    parser.rkt           (force e) keyword case
    tree-parser.rkt      "force" surf-force in builtin-unary-ops
    elaborator.rkt       surf-force -> expr-force
    macros.rkt           surf-force recursive case in expand-expression

  Driver / generic recursion (2 — defensive coverage):
    driver.rkt           contains-unsupported-qtt?, rewrite-spec
    effect-ordering.rkt  fv-expr

  Not touched (deliberate):
    unify.rkt            classify-whnf-problem already handles unknown
                         forms via fall-through; force is type-identity
                         so unification of [force e] with t reduces to
                         unification of e with t after one whnf step.
    foreign.rkt          force is not a runtime interop concept.
    typing-propagators.rkt  no new structural typing behavior; force
                            inherits whatever the argument has.

Tests (tests/test-force-strict.rkt, 13 tests):
  1. Basic semantics (5):  whnf/nf on ground values, beta-redex,
     suc chains.
  2. Type identity (2):    infer(force(e)) = infer(e).
  3. QTT identity (2):     inferQ(force(e)) preserves type and the
                           full usage vector — ESPECIALLY under a
                           linear (m1) binder (force does NOT
                           consume an extra unit).
  4. Idempotence (2):      force around already-WHNF / nested force.
  5. Performance demo (2): synthetic deep chain with natrec at each
                           level. Confirms lazy and strict produce
                           the same NF, and asserts strict <= lazy
                           + small slack at k=18.

Empirical numbers from a synthetic natrec-per-level chain on the
agent machine (the real eigentrust workload sees a much larger gap
because Posit32 step is genuinely expensive per level):

  k=12: lazy=196ms strict=44ms   ~4.5x
  k=16: lazy=297ms strict=55ms   ~5.4x
  k=20: lazy=437ms strict=74ms   ~5.9x
  k=24: lazy=555ms strict=83ms   ~6.7x
  k=28: lazy=797ms strict=89ms   ~9.0x

The lazy curve is super-linear (~10x runtime for ~3.5x more depth);
the strict curve is roughly linear in k. This is the O(k^2) vs O(k)
shape predicted by pitfall #11.

Verification:
  raco make racket/prologos/tests/test-force-strict.rkt    ok
  raco test racket/prologos/tests/test-force-strict.rkt    13 / 13
  raco test racket/prologos/tests/test-reduction.rkt       27 / 27
  raco test racket/prologos/tests/test-parser.rkt          82 / 82
  raco test racket/prologos/tests/test-syntax.rkt          20 / 20
  raco test racket/prologos/tests/test-elaborator.rkt      40 / 40

Co-authored-by: kumavis <1474978+kumavis@users.noreply.github.com>
@kumavis kumavis force-pushed the claude/fix-eigentrust-pitfall-11-strict branch from a09ddc0 to 43baede Compare April 26, 2026 03:04
kumavis added a commit that referenced this pull request Apr 26, 2026
The 2026-04-23 eigentrust pitfalls memo (forthcoming branch) enumerated 16
items hit during the EigenTrust implementation. Items #1-7 and #11-15 are
language/elaboration defects with their own PRs. Items #8, #9, #10, and #16
are observations rather than Prologos defects; no compiler change is needed
for them but the memo deserves a parallel disposition note so a future reader
does not double-count them as open work.

#8 (exact-Rat slow on deep iter): intrinsic to exact rational arithmetic;
benchmark-scope guidance, not a fix.

#9 (Posit32 literals work): positive observation; `~` literal prefix is
unambiguous unlike `0/1`. No action.

#10 (PVec preserves where List does not): subsumed by pitfall #3 fix. After
#3 lands, both literal forms preserve element type uniformly. Close as
duplicate.

#16 (column-stochastic vs row-stochastic): algorithm/spec clarification, not
a Prologos defect. The eigentrust implementation branch already takes
column-stochastic M directly and validates via col-stochastic?.

https://claude.ai/code/session_01MbncYJnrvjzhbVWw4xGi5x

Co-authored-by: kumavis <1474978+kumavis@users.noreply.github.com>
kumavis pushed a commit that referenced this pull request Apr 27, 2026
Per user review of #0-#10: many entries were either out-of-scope
(env limitations, not Prologos issues) or wrong (claims I never
actually tested). Re-tested every claim against a real Racket and
revised the doc.

Numbers are reserved per the user's instruction — entries marked
DELETED keep their slot so cross-refs don't drift.

Detail:

  #0  DELETED — out-of-scope (Racket toolchain not in sandbox).
                Environment limitation, not a Prologos issue.

  #1  REFRAMED — was "capability subtype + promise resolution
                composition." Re-titled to honestly reflect what
                this actually is: an OCapN-side Phase 0
                deferred-implementation note (eventual cross-vat
                receive isn't wired up yet). NOT a Prologos bug.

  #2  DELETED — false claim. Tested with a real Racket: WS-mode
                wildcard match `match | _ -> body` on user data
                types elaborates AND evaluates correctly when the
                function carries a proper `spec`. The
                `prologos::data::datum` comment I cited applies to
                a narrower polymorphic-context case, not a blanket
                wildcard ban as I asserted.
                Cleanup of behavior.prologos (~250 -> ~70 LOC)
                follows.

  #3  DELETED — false claim. Tested: `data Step step : [Nat -> Nat]`
                (with bracketed function type per the lseq-cell
                convention) accepts a function value, including
                closures with captured state. Open-world actor
                behaviour storage IS supported. The closed-enum
                BehaviorTag in our implementation was a needless
                workaround driven by this incorrect pitfall.
                Cleanup tracked separately.

  #4  KEPT, REFRAMED — real, narrowed claim. grammar.ebnf §6
                lines 1153/1187/1199 promise `Mu` (sexp) and `rec`
                (WS) for recursive sessions. Both elaborate to
                `Unknown session type: rec` / `Mu`. So pitfall #4
                is now: "rec/Mu in grammar but not in elaborator."
                CapTP's stream-level well-typedness is therefore
                the documented ceiling; per-exchange sub-protocols
                remain the workaround.

  #5  KEPT — `none`/`some` need explicit type args in some inference
            contexts. Real ergonomics tension, accurately
            documented.

  #6  DELETED — out-of-scope. WS-mode `let p := body` and sexp-mode
                `(let (p v) body)` are TWO surface forms by design
                (grammar.ebnf §7 line 1236). User-error, not a
                Prologos bug.

  #7  DELETED — was a quantitative restatement of #2. With #2
                recanted, #7 evaporates: behavior modules can be
                wildcard-collapsed, dropping ~180 LOC.

  #8  DELETED — false claim. Tested: `data Box1 box1 : [Sigma [_ <Nat>] Bool]`
                and `data Table table : Nat -> [List [Sigma [_ <Nat>] Bool]]`
                both elaborate cleanly. The named-struct
                ActorEntry/PromiseEntry workaround in vat.prologos
                was unnecessary; can be simplified back.

  #9  DELETED — user error. `def` for value bindings vs `defn` for
                functions is documented (grammar.ebnf §3
                lines 189-190, prologos-syntax rules). Mis-using
                `defn` for a 0-ary constant isn't a Prologos bug.

  #10 DELETED — out-of-scope. Network sandbox blocking external
                docs is an environment limitation.

#11-#20 were not in scope of this review and remain as-is for the
user to review next.
@kumavis kumavis marked this pull request as draft April 28, 2026 21:08
kumavis pushed a commit that referenced this pull request May 4, 2026
Original PIR text claimed runtime/core/ contains the BSP scheduler.
Wrong — actual runtime/core/ has only data structures (cell store +
profile counters + format buffer). The BSP scheduler stayed in each
kernel file (hybrid kernel's worklist + fire_against_snapshot +
merge_pending_writes + swap_worklists are inlined in
prologos-runtime-hybrid.zig).

Stage 3 design called for core/bsp.zig (~150 LOC) + core/worklist.zig
(~60 LOC) as Phase 1 deliverables. Actual Phase 1 extracted only
cells.zig + profile.zig + format.zig. The factoring scope shrank
silently — neither the implementing commit nor any subsequent commit
acknowledged the gap. Surfaced when the user asked "what's in the
hybrid core zig side?" during PIR review.

Corrections:
- §1 (What Was Built): explicit "data structures only" + cross-ref
  to wrong-assumption #9
- §2 (Stated Objectives): added "reality check on the design quote"
  flagging the drift
- §3 delivered table: Phase 1 status changed to "✅ partial"
- §4 timeline: Phase 1+3+4 line clarifies scheduler not extracted
- §5 deferred: new row for BSP scheduler factoring
- §8 D1 + anti-decision: caveat added; "BSP scheduler abstraction"
  claim corrected to "cell-store comptime-parameterization
  abstraction"
- §9 #2: factoring narrative softened to "data structures only"
- §11 #3: original kernel template + factoring made explicit at
  data-structure layer only
- §12 #2 + #8: kernel-LOC framing corrected; "factored core LOC was
  bigger than expected" reframed to "smaller than expected — and
  the gap is the BSP scheduler"
- §13 architecture: "consuming runtime/core/" framing softened to
  "consumes for cell store + profile + format helpers; the BSP
  scheduler is inlined"
- §14 #3, #7, #8: "future kernels instantiate the same scheduler"
  claim corrected to "instantiate CellStore + reuse profile
  counters; each kernel still writes its own scheduler"
- §15 technical debt: new row for "BSP scheduler not factored"
- §17 wrong assumptions: new #9 "Phase 1 will factor the BSP
  scheduler into core" — Wrong; codifies the silent scope shrinkage
- §18 #4: "factoring at second-instance" pattern reinforced with
  "complete vs minimum-viable shared surface" caveat
- §21 lessons: new entry for "phase-close should compare delivered
  scope against design plan"
- §24 open Q #4: kernel-PU consumption clarified — needs scheduler
  reuse in-place or triggers the extraction debt

Errata block added at the top of the PIR documenting which sections
were corrected and why.

https://claude.ai/code/session_01Tycs6BWKG58Wo99YVPg6DF
kumavis pushed a commit that referenced this pull request May 4, 2026
…spatch fix

Both PReduce-lite and Hybrid Runtime PIRs now carry an "Addendum
(2026-05-04, swappable-backend refactor)" block at the top + targeted
updates to §3 (delivered files), §15 (technical debt closure), and
§23 (key files).

PReduce-lite PIR:
- Top-of-doc addendum referencing the refactor design doc, the new
  three-file backend layer (preduce-core + backend-racket + backend-
  hybrid), and the functional-threading model rationale (SH-endpoint fit).
- §3 file table: added preduce-core.rkt (153 LOC) and preduce-
  backend-racket.rkt (~100 LOC); annotated preduce.rkt as "1509 →
  ~1480 (post-backend-refactor)" with a note on the b-* primitive
  rewrite.
- §23 key-files: replaced the two-file PReduce-lite engine entry
  with the post-refactor five-file layout (preduce-core, backend-
  racket, backend-hybrid, preduce, preduce-hybrid).

Hybrid Runtime PIR:
- Top-of-doc addendum (separate from the existing BSP-scheduler-in-
  core errata) covering: (a) the parallel-impl debt closure (407 →
  66 LOC for preduce-hybrid.rkt; ~6× larger AST coverage on the
  kernel — Phase 1-10b vs Phase 8b only); (b) the native-dispatch
  regression discovered + fixed (initial backend-hybrid wrapped
  int-arith as callback; #:native-op hint restored kernel tags 0-7);
  (c) post-fix benchmark numbers (W4 hybrid 2× faster than lite for
  user-ctor workloads; W5 31% faster wall + 14× faster kernel-side
  for int-arith).
- §1 What Was Built: pre-refactor vs post-refactor narrative on the
  preduce-hybrid.rkt collapse + the new preduce-backend-hybrid.rkt.
- §3 file table: preduce-hybrid.rkt 407 → 66 (−341 LOC); added
  preduce-core.rkt + preduce-backend-hybrid.rkt rows.
- §15 technical debt: TWO entries struck through (rendered as ~~old
  text~~) and replaced with "CLOSED 2026-05-04" notes:
  - "Two parallel reducers" debt (closed by the refactor proper)
  - "Native int-arith dispatch lost" regression (closed by the
    #:native-op hint mechanism)

Both addenda cross-reference [`2026-05-04_PREDUCE_BACKEND_REFACTOR_DESIGN.md`]
which carries the full design plan + 9-phase rollout tracker.

The structural lessons reinforced:
- "Factor at second-instance" pattern (PReduce-lite §18 #4) — the
  refactor extracted the factoring at the right time, with a third
  Racket-reducer consumer (future preduce-distributed?) on the
  horizon.
- "The parallel-impl debt was real" — the hybrid PIR's §15 entry
  warned about it; the refactor cashed in the warning.
- "Phase-close should compare delivered scope against design plan"
  (Hybrid §17 #9) — the same lesson surfaced AGAIN: the initial
  backend-hybrid Phase 4 commit didn't compare its native-dispatch
  scope against the pre-refactor preduce-hybrid's; the regression
  hid for two commits before the user's "is there a bug" question
  exposed it. Codified for the third time; pattern is now confirmed
  load-bearing.

https://claude.ai/code/session_01Tycs6BWKG58Wo99YVPg6DF
kumavis pushed a commit that referenced this pull request May 4, 2026
Per user review of #0-#10: many entries were either out-of-scope
(env limitations, not Prologos issues) or wrong (claims I never
actually tested). Re-tested every claim against a real Racket and
revised the doc.

Numbers are reserved per the user's instruction — entries marked
DELETED keep their slot so cross-refs don't drift.

Detail:

  #0  DELETED — out-of-scope (Racket toolchain not in sandbox).
                Environment limitation, not a Prologos issue.

  #1  REFRAMED — was "capability subtype + promise resolution
                composition." Re-titled to honestly reflect what
                this actually is: an OCapN-side Phase 0
                deferred-implementation note (eventual cross-vat
                receive isn't wired up yet). NOT a Prologos bug.

  #2  DELETED — false claim. Tested with a real Racket: WS-mode
                wildcard match `match | _ -> body` on user data
                types elaborates AND evaluates correctly when the
                function carries a proper `spec`. The
                `prologos::data::datum` comment I cited applies to
                a narrower polymorphic-context case, not a blanket
                wildcard ban as I asserted.
                Cleanup of behavior.prologos (~250 -> ~70 LOC)
                follows.

  #3  DELETED — false claim. Tested: `data Step step : [Nat -> Nat]`
                (with bracketed function type per the lseq-cell
                convention) accepts a function value, including
                closures with captured state. Open-world actor
                behaviour storage IS supported. The closed-enum
                BehaviorTag in our implementation was a needless
                workaround driven by this incorrect pitfall.
                Cleanup tracked separately.

  #4  KEPT, REFRAMED — real, narrowed claim. grammar.ebnf §6
                lines 1153/1187/1199 promise `Mu` (sexp) and `rec`
                (WS) for recursive sessions. Both elaborate to
                `Unknown session type: rec` / `Mu`. So pitfall #4
                is now: "rec/Mu in grammar but not in elaborator."
                CapTP's stream-level well-typedness is therefore
                the documented ceiling; per-exchange sub-protocols
                remain the workaround.

  #5  KEPT — `none`/`some` need explicit type args in some inference
            contexts. Real ergonomics tension, accurately
            documented.

  #6  DELETED — out-of-scope. WS-mode `let p := body` and sexp-mode
                `(let (p v) body)` are TWO surface forms by design
                (grammar.ebnf §7 line 1236). User-error, not a
                Prologos bug.

  #7  DELETED — was a quantitative restatement of #2. With #2
                recanted, #7 evaporates: behavior modules can be
                wildcard-collapsed, dropping ~180 LOC.

  #8  DELETED — false claim. Tested: `data Box1 box1 : [Sigma [_ <Nat>] Bool]`
                and `data Table table : Nat -> [List [Sigma [_ <Nat>] Bool]]`
                both elaborate cleanly. The named-struct
                ActorEntry/PromiseEntry workaround in vat.prologos
                was unnecessary; can be simplified back.

  #9  DELETED — user error. `def` for value bindings vs `defn` for
                functions is documented (grammar.ebnf §3
                lines 189-190, prologos-syntax rules). Mis-using
                `defn` for a 0-ary constant isn't a Prologos bug.

  #10 DELETED — out-of-scope. Network sandbox blocking external
                docs is an environment limitation.

#11-#20 were not in scope of this review and remain as-is for the
user to review next.
kumavis pushed a commit that referenced this pull request May 4, 2026
Mini-PIR covering the 5-hour 12-commit refactor session that
delivered preduce-core.rkt + backend-{racket,hybrid}.rkt + the
preduce-hybrid.rkt thin-wrapper collapse + 10 OCapN-on-kernel
programs + the native-dispatch regression-and-fix loop.

469 lines, 24 sections, follows the 16-question PIR template. Highlights:

§1 What Was Built — functional threading throughout (Option A),
  #:native-op symbolic hint for backend-specific native dispatch,
  pitfalls.md opened.

§4 Timeline — 12 commits from `0d80dfa` (backend interface skeleton)
  through `236e441` (programs 9 + 10), with explicit user-correction
  points marked at threading-model flip, parallel-impl debt
  surfacing, and native-ns regression discovery.

§7 Bugs Found — 3 fixed (multi-value capture, native-dispatch
  regression, FormatBuffer truncation) + 2 averted (Option B
  threading model, parallel-impl-not-deleted).

§10 What Went Wrong — leads with "external adversarial review
  caught all 3 architectural drift points; my internal validation
  gates missed all three." Pattern repeated across this PIR + Hybrid
  PIR §17 #9 + PReduce-lite consolidated PIR.

§16 What Would We Do Differently — six tactical lessons (threading
  against SH-endpoint fit, "deletion phase" is mandatory before
  declaring victory, perf regression net needed, pitfalls.md opened
  proactively, mechanical rewrites benefit from per-section commits,
  benchmark-before-refactor as baseline).

§17 Wrong Assumptions — 7 assumptions surfaced as wrong, including
  "side-effecting backend interface is simpler" (wrong for in-and-
  out-of-native fit), "after Phase 4 the refactor is done" (parallel
  impl still in tree), "the differential gate is sufficient" (caught
  zero perf regressions), "OCapN programs will surface a flood of
  bugs" (only 4 distinct pitfalls across 10 programs).

§20 Longitudinal — flags "external adversarial review catches what
  internal VAG misses" as a 3-instance pattern across recent PIRs
  (Hybrid PIR §17 #9, PReduce-lite consolidated PIR refactor
  errata, this refactor's 3 user-driven corrections). Recommends
  codification in workflow.md.

§21 Lessons Distilled — 10 entries, mostly Pending codification
  in DESIGN_METHODOLOGY.org / workflow.md / testing.md /
  propagator-design.md adjacent docs. Two are already Done by
  example (the #:native-op mechanism, the 'hybrid sentinel
  pattern, the stress-test-as-bug-finder pattern via pitfalls.md).

§24 Open Questions — 7 questions surfaced, leading with "when does
  Phase 7 (native migration on OCapN-shape callbacks) happen?" and
  "should the parallel-impl debt at the Zig layer get its own
  refactor?" (same shape as this one, one layer down).

Cross-references the design plan + the PReduce-lite + Hybrid PIRs +
the pitfalls doc + the OCapN NOTES.md.

https://claude.ai/code/session_01Tycs6BWKG58Wo99YVPg6DF
hierophantos added a commit that referenced this pull request May 24, 2026
…empirical falsification

Verification-only sub-phase closing the 2A group. Empirically validates that
the BSP outer-loop's value-tier orchestration (2A.a's process-retraction +
2A.b's process-resolution + S1 NAF + classify-inhabit) is observationally
equivalent to the pre-2A sequential orchestrator (`run-stratified-resolution-
pure`) for retraction-heavy workloads. Mini-design + mini-audit persisted at
D.3 §8.7.c (10 sub-subsections).

§8.7.4's "S(-1) runs POST-S0" timing concern (drift risk #9-bis from §8.7.5)
empirically falsified by 3 orchestration-parity tests:

  Pre-2A sequential:   cleanup → S0 → L2 → ...        (S(-1) BEFORE S0)
  Post-2A BSP loop:    S0 → S(-1) → restart-outer-loop → S0 fires on
                       cleaned state → ...             (S(-1) AFTER S0)

The 2A model adds one "extra round" of S0 firing on pre-cleanup state per
outer-loop iteration. §8.7.4's correctness argument: worldview-filtering
at `net-cell-read` should hide stale entries from propagators firing under
a retracted assumption bitmask; S(-1)'s subsequent cleanup is COMPACTION,
not correctness. Empirical confirmation: all 3 parity tests PASS — the
correctness argument holds in practice.

Tests added (test-elaboration-parity.rkt):

1. `orchestration-union-no-retraction`:
   "(ns t) (def x : <Int | String> := 42) x" → expected '42
   Int branch succeeds first; no retraction. Baseline.

2. `orchestration-union-with-retraction` (THE FALSIFICATION CASE):
   "(ns t) (def x : <Int | String> := \"hello\") x" → expected "hello"
   Int branch fails → `record-assumption-retraction` writes cell-13 →
   process-retraction handler fires POST-S0 → BSP restart-from-outer-loop
   → S0 fires on cleaned state with retracted bits worldview-filtered →
   String branch succeeds. Tests that worldview-filtering preserves
   correctness across the timing change.

3. `orchestration-union-flipped-with-retraction`:
   "(ns t) (def x : <String | Int> := 42) x" → expected '42
   Symmetric variant with union order flipped; verifies no left/right
   asymmetry in the retraction path.

`(ns t)` bootstrap pattern works in `run-ns-last` harness — D1 drift risk
verified clean (no fallback needed). The `(ns t)` form triggers prelude
auto-import + namespace setup; subsequent `def x : <type>` invokes
`check ctx value (expr-union T1 T2)` at typing-core.rkt:2385 which uses
`with-speculative-rollback` (the canonical retraction trigger).

Adversarial 3-column VAG passed all 4 questions (cleanest sub-phase in
2A group — empirical confirmation of a design-time correctness argument
is high-confidence methodology; adversarial column surfaced "are 3 cases
sufficient" question even on a verification-only sub-phase).

Validation:
- Targeted tests (test-elaboration-parity): 25 tests PASS in 4.5s (+3 from
  orchestration-parity axis)
- Probe (examples/2026-04-22-1A-iii-probe.prologos): 0 errors; 28 outputs
  semantically preserved
- Acceptance (examples/2026-04-17-ppn-track4c.prologos): 0 errors
- Full suite: 8234 tests / 116.3s / 0 failures (vs 8231/121.1s pre-2A.c;
  +3 tests; wall -4.8s within 118-127s variance band)

Scaffolding retirement targets:
- None NEW for 2A.c (verification-only sub-phase; no production code changes)
- All 2A.a/b scaffolding retirement plans inherited and unchanged
- 2B is the next sub-phase: retire `run-stratified-resolution-pure`
  orchestrator + `run-stratified-resolution!` dead legacy + driver init
  for `current-retracted-assumptions` parameter

Methodology lessons codified this session (3 reinforcements):
1. Empirical falsification testing for design-time correctness arguments is
   HIGH-CONFIDENCE methodology. 2nd data point (1st: BSP-LE Track 2 tropical-
   quantale falsification).
2. Adversarial 3-column framing at every gate — even verification-only
   sub-phases. 3rd data point.
3. Verification-before-retirement is a clean separation pattern. 2A.a/b/c
   verify the new mechanism; 2B retires the old. 2nd data point (1st:
   S2.b/c/d/e arc).

**2A group COMPLETE**: .0 (cells) → .a (process-retraction) → .b (process-
resolution) → .c (orchestration-parity verification). Ready for 2B.

Per addendum D.3 §8.7.c mini-design + §3 tracker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants