bpowers · bpowers · May 16, 2026 · May 14, 2026 · May 14, 2026 · May 15, 2026
diff --git a/docs/README.md b/docs/README.md
@@ -27,11 +27,14 @@
   - [design-plans/2026-05-06-ltm-482-variable-level-loop-enumeration.md](design-plans/2026-05-06-ltm-482-variable-level-loop-enumeration.md) -- Tiered LTM loop enumeration: variable-level Johnson first, expand only the cross-element subgraph
   - [design-plans/2026-05-09-ltm-503-cross-element-agg.md](design-plans/2026-05-09-ltm-503-cross-element-agg.md) -- Cross-element LTM scoring: per-element arrayed-target partials, element-level cross-element loops, array reducers as aggregate nodes
   - [design-plans/2026-05-11-ltm-arrays-hardening.md](design-plans/2026-05-11-ltm-arrays-hardening.md) -- Arrayed/cross-element LTM hardening: unify the reference-site walkers behind one classification IR (#520), then layer eight fixes (#487, #511, #510, #514, #515, #483, #502, #492)
+  - [design-plans/2026-05-13-macros.md](design-plans/2026-05-13-macros.md) -- Vensim macro support: macros as a data-driven generalization of the stdlib module mechanism, persisted via a `MacroSpec` marker on `Model`; 7 implementation phases
 - [plans/](plans/README.md) -- Implementation plans (active and completed)
 - [test-plans/](test-plans/) -- Human verification plans for completed features
 - [implementation-plans/](implementation-plans/) -- Detailed phase-by-phase implementation plans
   - [implementation-plans/2026-04-05-server-rewrite/](implementation-plans/2026-04-05-server-rewrite/) -- 8-phase plan to build `simlin-serve` and refactor `simlin-mcp` into a shared core library
     - [test-requirements.md](implementation-plans/2026-04-05-server-rewrite/test-requirements.md) -- AC-to-test mapping for execution validation
+  - [implementation-plans/2026-05-13-macros/](implementation-plans/2026-05-13-macros/) -- 7-phase plan implementing Vensim `:MACRO:` support: datamodel/serialization foundation, MDL & XMILE import/export, compile-time expansion, multi-output materialization, and hero-corpus validation
+    - [test-requirements.md](implementation-plans/2026-05-13-macros/test-requirements.md) -- AC-to-test mapping for execution validation
 
 ## Security
 
@@ -40,6 +43,7 @@
 ## Domain Knowledge
 
 - [reference/xmile-v1.0.html](reference/xmile-v1.0.html) -- XMILE interchange format specification
+- [reference/vensim-macros.md](reference/vensim-macros.md) -- Vensim macros (`:MACRO:`): definition/call syntax, semantics (per-invocation stock state, locality, recursion), XMILE `<macro>` representation, xmutil's mapping, and implementation implications
 - [reference/ltm--loops-that-matter.md](reference/ltm--loops-that-matter.md) -- Loops That Matter technique: link scores, loop scores, algorithm reference
 - [array-design.md](array-design.md) -- Array/subscript design notes
 

diff --git a/docs/design-plans/2026-05-13-macros.md b/docs/design-plans/2026-05-13-macros.md
diff --git a/docs/implementation-plans/2026-05-13-macros/phase_01.md b/docs/implementation-plans/2026-05-13-macros/phase_01.md
diff --git a/docs/implementation-plans/2026-05-13-macros/phase_02.md b/docs/implementation-plans/2026-05-13-macros/phase_02.md
diff --git a/docs/implementation-plans/2026-05-13-macros/phase_03.md b/docs/implementation-plans/2026-05-13-macros/phase_03.md
diff --git a/docs/implementation-plans/2026-05-13-macros/phase_04.md b/docs/implementation-plans/2026-05-13-macros/phase_04.md
diff --git a/docs/implementation-plans/2026-05-13-macros/phase_05.md b/docs/implementation-plans/2026-05-13-macros/phase_05.md
diff --git a/docs/implementation-plans/2026-05-13-macros/phase_06.md b/docs/implementation-plans/2026-05-13-macros/phase_06.md
diff --git a/docs/implementation-plans/2026-05-13-macros/phase_07.md b/docs/implementation-plans/2026-05-13-macros/phase_07.md
diff --git a/docs/implementation-plans/2026-05-13-macros/test-requirements.md b/docs/implementation-plans/2026-05-13-macros/test-requirements.md
diff --git a/docs/reference/vensim-macros.md b/docs/reference/vensim-macros.md
diff --git a/docs/simlin-project.schema.json b/docs/simlin-project.schema.json
@@ -131,6 +131,16 @@
           "items": {
             "$ref": "#/$defs/ModelGroup"
           }
+        },
+        "macroSpec": {
+          "anyOf": [
+            {
+              "$ref": "#/$defs/MacroSpec"
+            },
+            {
+              "type": "null"
+            }
+          ]
         }
       },
       "required": [
@@ -1132,6 +1142,31 @@
         "name"
       ]
     },
+    "MacroSpec": {
+      "description": "Marks a model as a callable macro template and records its calling\nconvention. `Some` on `json::Model.macro_spec` (mirrored from\n[`datamodel::MacroSpec`]) means the model's variables are the macro body;\nthis names which body variables are the formal parameters and outputs.",
+      "type": "object",
+      "properties": {
+        "parameters": {
+          "type": "array",
+          "items": {
+            "type": "string"
+          }
+        },
+        "primaryOutput": {
+          "type": "string"
+        },
+        "additionalOutputs": {
+          "type": "array",
+          "items": {
+            "type": "string"
+          }
+        }
+      },
+      "required": [
+        "parameters",
+        "primaryOutput"
+      ]
+    },
     "Dimension": {
       "type": "object",
       "properties": {

diff --git a/docs/tech-debt.md b/docs/tech-debt.md
@@ -561,3 +561,39 @@ Known debt items consolidated from CLAUDE.md files and codebase analysis. Each e
 - **Suspected fix**: Snapshot the set with `[...children]` before iterating; remove the redundant post-loop `delete`.
 - **Owner**: unassigned
 - **Last reviewed**: 2026-05-08
+
+### 60. No CI feature matrix; a no-`file_io` build break propagated undetected
+
+- **Component**: build / CI (`scripts/pre-commit`, `.github/workflows/`, `src/simlin-engine/Cargo.toml`)
+- **Severity**: medium
+- **Description**: Neither the pre-commit hook nor CI exercises a Cargo **feature matrix** for `simlin-engine`, so the entire class of feature-gating bugs is invisible to the canonical gate. Concretely: during Phase 7c of the Vensim macro epic a subagent added `src/simlin-engine/tests/metasd_macros.rs` (which calls the `file_io`-gated `load_dat`/`load_csv`) WITHOUT the mirroring `[[test]] name = "metasd_macros" required-features = ["file_io"]` entry in `src/simlin-engine/Cargo.toml`. This broke `cargo build -p simlin-engine --all-targets` and `cargo test -p simlin-engine --no-run` whenever `file_io` is *not* enabled (`error[E0425]`, `load_dat`/`load_csv` configured out). The pre-commit "Running Rust checks" step passed throughout because the workspace build pulls `file_io` into scope via cross-crate feature unification (`simlin-cli` requests `simlin-engine/file_io`), so the break survived multiple commits and was caught only by a later manual minimal-feature build. The Cargo.toml fix landed in commit `7cba4ad2` (which itself documents the "fragile cross-crate feature-unification accident" root cause), but the GAP — no feature-matrix gate — remains. simlin-engine features: `file_io`, `schema`, `ai_info`, `debug-derive` (default = `schema, ai_info, debug-derive`), plus `xmutil`. Secondary (workflow, not a separate item): this propagated because multi-agent plan execution structurally trusts the inherited baseline — each subagent verified only its own delta and several built *with* `file_io` and concluded the base was clean; subagents should verify the exact configuration they will later claim green BEFORE starting and STOP if the inherited baseline is already broken. The deterministic fix is the CI matrix.
+- **Suspected fix**: Add a feature-matrix build/test step to the pre-commit hook and/or CI. At minimum `cargo build -p simlin-engine --all-targets --no-default-features` and `--all-features`. Optionally `cargo hack --feature-powerset` (bounded). Keep within the documented test-time budget ([docs/dev/rust.md#test-time-budgets](dev/rust.md)) — a couple of representative configurations (`--no-default-features`, `--all-features`), not the full powerset, is the pragmatic choice given the 3-minute workspace cap.
+- **Owner**: unassigned
+- **Last reviewed**: 2026-05-15
+
+### 61. CLAUDE.md files decaying from orientation docs into append-only changelogs
+
+- **Component**: docs / process (`src/simlin-engine/CLAUDE.md`; the `project-claude-librarian` / `maintaining-project-context` skill)
+- **Severity**: low
+- **Description**: `src/simlin-engine/CLAUDE.md`'s "Last updated" header has degraded into a single ~2,500-word run-on paragraph spanning two unrelated epics (LTM arrays Phases 1-8 *and* macro Phases 1-7); the `db_ltm.rs` and `ltm_agg.rs` module bullets are multi-hundred-word single paragraphs that read like changelog archaeology rather than orientation. The file's stated purpose is current-state orientation ("maps where functionality lives"; "Keep this file up to date when adding, removing, or reorganizing modules"). Root cause: the `maintaining-project-context` / `project-claude-librarian` skill optimizes *per-delta* accuracy, not whole-file readability, so the aggregate readability degrades a little every epic and nothing ever compacts it. Related but distinct from #407 (which is about pruning verbatim code snippets from `docs/design-plans/`, not CLAUDE.md orientation decay).
+- **Suspected fix**: Introduce an **epic-boundary CLAUDE.md compaction pass** (a distinct step from the librarian's per-delta updates) that relocates historical/changelog narrative into the now-committed design/implementation-plan docs (`docs/design-plans/`, `docs/implementation-plans/`) and trims each CLAUDE.md back to current-state orientation. Could be wired into `finishing-a-development-branch` or a dedicated skill.
+- **Owner**: unassigned
+- **Last reviewed**: 2026-05-15
+
+### 62. Implementation plans back-load all real-corpus validation into the final phase
+
+- **Component**: process / workflow (the `writing-implementation-plans` / `execute-implementation-plan` skills; `docs/implementation-plans/`)
+- **Severity**: medium
+- **Description**: Implementation plans tend to defer all real-corpus numeric/structural validation to the last phase, which hides latent pre-existing blockers until most dependent work is already layered on top. Concretely, in the Vensim macro epic two latent defects — GH #554 (false `init->init` macro recursion) and a #554-class `DELAY N`->`delayn` variant — *blocked acceptance criteria* but only surfaced in Phase 7, the first phase to exercise the renamed-builtin / stdlib-module-collision path against real models (C-LEARN, metasd), after five phases of dependent work. A cheap corpus smoke run against the Phase 1-2 datamodel would have surfaced #554 far earlier and reduced rework risk. A second symptom: discovering a known *bug-class* of latent blockers mid-plan triggered serial stop-and-ask round-trips even though both #554-class decisions were the same class with the same answer.
+- **Suspected fix**: Future implementation plans should **interleave an early, cheap real-corpus smoke validation** (run against the earliest phase that produces a runnable artifact) rather than back-loading all corpus validation into the last phase; and plans should **pre-authorize fixing a known bug-class of latent blockers** discovered during corpus validation, to avoid serial stop-and-ask round-trips for what is provably one decision. This is guidance for the plan-authoring skills, not a code change.
+- **Owner**: unassigned
+- **Last reviewed**: 2026-05-15
+
+### 63. Implementation-plan directory committed only at branch finish, untracked during execution
+
+- **Component**: process / workflow (the `writing-implementation-plans` / `execute-implementation-plan` / `finishing-a-development-branch` skills; `docs/implementation-plans/`)
+- **Severity**: low
+- **Description**: For the Vensim macro epic, the design plan was committed at the branch's merge-base (`86cc7fcb`), but the 8-file `docs/implementation-plans/2026-05-13-macros/` directory (phase_01..07 + test-requirements.md) that drove 40+ commits stayed **untracked in git** for the entire execution and was only committed at the very end (`d44d32fa`, "doc: add Vensim macro support implementation plan") during the finishing step. Consequences: reviewers of intermediate commits had no in-repo plan to check the work against, and a lost or compacted session could have lost the plan entirely (it existed only in the working tree). Distinct from #407 (which is about pruning *stale* plan content post-merge, not about *when* the plan enters version control).
+- **Suspected fix**: Commit the implementation plan at the **start** of execution — the `starting-an-implementation-plan` / `executing-an-implementation-plan` step should commit `docs/implementation-plans/<plan>/` before the first task subagent runs — rather than rescuing it at the `finishing-a-development-branch` step. Optionally add a guard in the execute step that refuses to start if the plan directory is untracked.
+- **Owner**: unassigned
+- **Last reviewed**: 2026-05-15
diff --git a/docs/test-plans/2026-05-13-macros.md b/docs/test-plans/2026-05-13-macros.md
@@ -0,0 +1,97 @@
+# Vensim Macro Support — Human Test Plan
+
+Companion to the implementation plan at `docs/implementation-plans/2026-05-13-macros/` and the
+design at `docs/design-plans/2026-05-13-macros.md`. Generated after the macro epic (all 7 phases
+plus the user-approved #554 and delayn prerequisite bug fixes) completed implementation, per-phase
+code review, final AC-coverage review, and automated test-coverage validation.
+
+## Coverage validation result: PASS
+
+All 37 acceptance criteria (`macros.AC1.1`–`macros.AC6.6`) map to a genuine, non-vacuous automated
+test that was confirmed to exist at its expected location and to run green in the non-gated tiers.
+The classifiers used by the C-LEARN / metasd corpus assertions are themselves non-vacuity-pinned in
+both directions against real diagnostics.
+
+The following are **design-deferred prerequisites or tracked out-of-scope items**, NOT coverage
+gaps (the automated tests exist and run; only the *strength* of an external reference is deferred):
+
+- metasd **simulation tier** eligible-set is empty pending Vensim DSS reference outputs — a
+  documented "Test prerequisites" setup task (GH #561). The **expansion tier** (the AC6.4
+  "all 14 pass" half) is implemented and passing for all 17 macro-using metasd files.
+- AC6.3 gates against formula-derived `output.tab` (arithmetic documented in each fixture README);
+  a Vensim DSS `.vdf` would upgrade reference strength (GH #561).
+- C-LEARN full end-to-end simulation is blocked by non-macro issues (GH #559); AC6.2/AC6.3 macro
+  expansion + focused-model coverage is present and asserted.
+- `RANDOM NORMAL`/stochastic builtins (GH #560), `DELAY N` non-constant order (GH #562),
+  cross-compilation-unit classifier duplication (GH #563), the Phase-3-documented non-time `$`
+  escape limitation, and the pysimlin ruff-gate gap (GH #551) are design-acknowledged
+  out-of-scope-and-tracked.
+
+Some macro corpus tests are deliberately `#[ignore]`d with a documented opt-in command (C-LEARN is
+~1.4 MB; heavy metasd models exceed the per-test time budget in `docs/dev/rust.md`). An
+`#[ignore]`d-but-present-and-runnable test still counts as automated coverage; AC1.7, AC3.3, and
+AC6.1 additionally have non-gated tests so the criterion is verified by the regular suite too.
+
+## Scope of manual testing
+
+Automated tests cover the engine/datamodel/serialization/round-trip logic exhaustively. This plan
+covers only what automated tests cannot: the **gated heavy/hero corpus opt-in runs**, the **actual
+diagram UI rendering** (Jest asserts pure `getAvailableModels` logic and a faked engine — only a
+human can confirm the real `<select>` DOM and the canvas render), and **in-tool cross-format export
+fidelity / cross-tool (Vensim DSS) numeric fidelity**.
+
+All paths are absolute; all commands run from `/home/bpowers/src/simlin`.
+
+### Prerequisites
+
+- `./scripts/dev-init.sh` has been run (idempotent).
+- Toolchain confirmed: `cargo --version`, `pnpm --version`, `uv --version` all succeed.
+- Baseline green (re-confirm before manual UI work):
+  - `cargo test -p simlin-engine --features file_io --test simulate --test mdl_roundtrip --test metasd_macros`
+    → all non-gated macro tests pass; C-LEARN/heavy models `#[ignore]`d.
+  - `cd src/diagram && npx jest module-details-utils editor-open-project` → green.
+
+## Phase A — Gated heavy-corpus opt-in runs (automated, human-initiated)
+
+Exercises the heavy/hero models excluded from CI by the per-test time budget. These ARE automated
+coverage; a human must opt in.
+
+| Step | Action | Expected |
+|------|--------|----------|
+| A1 | `cargo test -p simlin-engine --release --test simulate -- --ignored corpus_clearn_macros_import` | PASS. C-LEARN's 4 macros (`sample_until`/`sshape`/`ramp_from_to`/`init`) import with the exact parameter lists asserted; zero macro-attributable diagnostics. Runtime ~minutes. |
+| A2 | `cargo test -p simlin-engine --release --test simulate -- --ignored corpus_sstats_multi_output_materializes` | PASS. 2 SSTATS module instances, 22 binding auxes; only `*_data` GET-DIRECT diagnostics, no macro-specific ones. |
+| A3 | `cargo test -p simlin-engine --release --test metasd_macros -- --ignored metasd_expansion_tier_full` | PASS. All 17 macro-using metasd files. Watch stderr: every line prints `macro_attributable=0`. Confirm **no** line has `macro_attributable>0`. |
+| A4 | `cargo test -p simlin-engine --release --test simulate -- --ignored simulates_clearn` | An ignored/failed result is acceptable — this is GH #559 (non-macro C-LEARN blockers, out of scope). Confirm any failure is **not** macro-attributable (no `recursive macro:`, no `DuplicateMacroName`, no macro-model compile error). A macro-attributable error here is a regression — escalate. |
+
+## Phase B — Diagram UI: macro models absent from the model-reference dropdown (AC6.5/AC6.6 in the real UI)
+
+| Step | Action | Expected |
+|------|--------|----------|
+| B1 | Start the local viewer (`pnpm --filter @simlin/serve dev` or the documented local app path); open the served URL. | App loads, no console errors. |
+| B2 | Import `/home/bpowers/src/simlin/test/test-models/tests/macro_multi_macros/test_macro_multi_macros.mdl` (two `:MACRO:` blocks → two macro-marked models). | Project opens; the `main` diagram renders (no crash / no blank canvas). `macro output` and `second macro output` appear as ordinary variables. |
+| B3 | Add or select a Module element, open `ModuleDetails`, open the model-reference selector (`data-testid="model-ref-select"`). | Dropdown lists ordinary/stdlib models only. **`expression_macro` and `second_macro` MUST NOT appear** as selectable reference targets. |
+| B4 | Open `/home/bpowers/src/simlin/test/test-models/tests/macro_multi_output/test_macro_multi_output.mdl`; inspect `main`. | Renders without crash. `total`, `the min`, `the max`, `spread` appear as ordinary auxes; the materialized module is present but `add3` is NOT in the model-reference dropdown. |
+| B5 | Navigate the breadcrumb / model list (hamburger → model navigation). | Macro-marked models are NOT navigable standalone entries; only `main` (and any ordinary submodels) are reachable. |
+
+## Phase C — Cross-tool / cross-format fidelity (visual judgment)
+
+| Step | Action | Expected |
+|------|--------|----------|
+| C1 | Open `/home/bpowers/src/simlin/test/test-models/tests/macro_stock/test_macro_stock.mdl`, run the simulation, plot `macro output`. | Series matches stock semantics: 1.1, 6.1, 11.1, … 51.1 over the 11 steps (matches `macro_stock/output.tab`). |
+| C2 | Open `/home/bpowers/src/simlin/test/test-models/tests/macro_expression/test_macro_expression.xmile`; export to XMILE via the UI download; export a second time; diff the two downloads. | Two consecutive exports are byte-identical; the `<macro>` element and `<uses_macros recursive_macros="false" option_filters="false"/>` header option are present; round-trip preserves the macro. |
+| C3 | Export `macro_multi_output.mdl` to XMILE via the UI; inspect the XML. Then export `macro_expression.mdl` to XMILE. | Multi-output XML contains `<simlin:additional-outputs names="minval,maxval"/>` and a `simlin:macro-invocation` extension. Single-output XML contains **no** `simlin:` macro extension (AC4.5 in the live tool). |
+| C4 | (Requires a licensed Vensim DSS — closes prerequisite GH #561 for these fixtures) Run the 3 `macro_clearn_*` focused fixtures in Vensim DSS, export `.vdf`, compare Simlin's `macro output` series against Vensim for `sample_until`, `sshape`, `ramp_from_to`. | Series match within tolerance. Upgrades AC6.3 from formula-derived to DSS-validated. |
+
+## Human-verification traceability
+
+| Criterion | Why manual | Steps |
+|-----------|------------|-------|
+| AC6.5 (real UI render) | Jest asserts `openEngineProject` with a faked engine; only a human confirms the diagram visually renders without corruption | B2, B4 |
+| AC6.6 (rendered dropdown) | Jest asserts pure `getAvailableModels`; the actual `<select>` DOM + navigation UX need a human eye | B3, B4, B5 |
+| AC4.2/AC4.5 (in-tool export fidelity) | Automated tests assert `to_xmile` string content; a human confirms the UI download path matches and is byte-stable | C2, C3 |
+| AC6.2/6.3/6.4 heavy corpus | `#[ignore]`d for the per-test time budget; require a human-initiated opt-in run | A1–A4 |
+| AC6.3 DSS-grade reference (GH #561) | Requires a licensed Vensim DSS to generate `.vdf` references — a documented prerequisite | C4 |
+
+If any Phase A run reports a macro-attributable diagnostic, or any Phase B step shows a macro-marked
+model as a selectable/navigable reference target or a crash on open, treat it as a regression and
+escalate before merge/release.