ltm: per-element relative loop scores via subscripted IDs#472
Conversation
When `compile_ltm_equation_fragment` resolved a dependency that referred to another LTM synthetic variable (e.g. a loop_score's product equation referring to its constituent link_scores), the dep stub was hardcoded to `size: 1, ast: None` and forced the compiler to emit slot-0 reads for every output slot. As a result, A2A loop_score variables collapsed every slot to slot 0's value even though the underlying link_scores were correctly per-element distinct. Probed in `arrayed_population.stmx` (3 regions, distinct birth_rates): `link_score(births->population)` slot values were 0.0306 / 0.0202 / 0.0102, but `loop_score` reported 0.0306 in all three slots. The fix mirrors the existing dim-aware stub path used for explicit model A2A vars (`build_stub_variable`): look up the dep's dimensions in the salsa-cached `model_ltm_variables` result and synthesize an `Ast::ApplyToAll(canonical_dims, dummy_const)` with the right slot count. Scalar deps continue to fall through to the original 1-slot stub, so non-arrayed loops are unchanged. Two pre-existing tests had assertions that "every element must have non-zero values" and only passed because of the broadcast bug; relaxed to "at least one element has dynamics". In `arrayed_population.stmx`, LA's birth_rate equals the uniform death_rate (both 0.01), so LA's population is in exact equilibrium and its link_scores legitimately SAFEDIV to zero -- the broadcast bug had been hiding that all along. The new `test_a2a_loop_score_has_distinct_per_element_values` pins the contract going forward (heterogeneous birth_rate per region, asserts `loop_score` slots differ visibly). Tracked as tech-debt item #34; marked RESOLVED with the discovery context preserved.
The libsimlin scalar FFI for relative loop scores silently returned
slot 0 for arrayed (Apply-to-All) loops. This shipped per-element
introspection: callers can now pass `r1[Boston]` (named element),
`r1[2]` (1-based integer for indexed dims), or `r1[Boston, 2]` (2D
mixed) to get a specific element's series; bare `r1` on an arrayed
loop returns a signed argmax-abs aggregation across slots so a
single-line "this loop's importance" view shows the dominant
element's contribution at each step rather than always slot 0.
Engine surface:
- `ltm_post::LoopElementIndex` + `build_loop_element_index` map each
loop_id to its dim metadata (canonical names, sizes, named-vs-
indexed flag). `resolve(&[&str])` does row-major slot offset
arithmetic with internal canonicalize, accepts either named
elements or 1-based integers, and returns informative
`ResolveError` variants for FFI error formatting.
- Per-element streaming helpers (`compute_partition_denominator_for_element`,
`compute_rel_loop_score_for_element`) + signed argmax-abs aggregator
(`compute_rel_loop_score_argmax_abs`). Scalar loops broadcast
slot 0 to keep mixed-shape partitions consistent with
`compute_rel_loop_scores_per_element`.
Layout:
- `compute_metadata` now computes `FeedbackLoop::importance_series`
via per-element rel scores aggregated by signed argmax-abs. For
scalar loops this reduces to identity; for arrayed loops with
distinct per-element dynamics it differs from the prior slot-0
collapse (verified by `test_arrayed_loop_importance_matches_argmax_abs_aggregation`).
libsimlin:
- `SimState` snapshots `loop_element_index` at `simlin_sim_new`
time. Cache key changes from `Option<usize>` (partition) to
`(Option<usize>, usize)` (partition x element) so per-element
lookups amortize.
- `simlin_analyze_get_relative_loop_score` parses subscripted IDs
(new `parse_subscripted_loop_id` helper), resolves to a slot
via `LoopElementIndex.resolve`, and dispatches to either the
scalar helper, the per-element helper, or the argmax-abs
aggregator based on subscript presence and loop arity. Errors
map `ResolveError` variants to readable FFI messages
("loop 'r1' dimension 'region' has no element 'tokyo'", etc.).
- New `simlin_analyze_get_loop_element_count` accessor lets
callers detect arrayed loops without parsing trial subscripts.
pysimlin:
- `Sim.get_relative_loop_score` accepts an optional `element` kwarg
(string, int, or tuple); the helper formats `loop_id[e1, e2, ...]`
before passing to the FFI.
- New `Sim.get_loop_element_count(loop_id)` introspection.
Closes the FFI surface portion of issue #463. The visible per-element
behaviour required tech-debt #34 (A2A loop_score per-element fix) to
land first; with both in place, the layout argmax-abs path differs
from the slot-0 collapse for arrayed fixtures (the bite check in
`test_arrayed_loop_importance_matches_argmax_abs_aggregation` now
passes deterministically rather than by accident).
The Claude Code harness writes a per-session lock file at `.claude/scheduled_tasks.lock` to coordinate scheduled task firings. It's runtime state, not project content, and shouldn't appear in `git status` for anyone using the harness in this checkout.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf8f8bfbb3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let n_steps = series.len() / n; | ||
| (0..n_steps) |
There was a problem hiding this comment.
Index aggregated loop series with partition stride
compute_rel_loop_scores_per_element returns each loop’s data as step_count * max_slots_in_partition, but this code derives n_steps from the loop’s own slot count (n) and then indexes with that same stride. In mixed partitions (which are explicitly supported, e.g. scalar + arrayed loops), n < max_slots, so importance_series is read with the wrong layout and can become longer than the simulation horizon with misaligned values. This corrupts loop importance output for those models; the read should use the partition stride (or results.step_count) rather than per-loop n.
Useful? React with 👍 / 👎.
Code Review[P2] Layout argmax-abs aggregation strides by loop's own
|
Review of PR #472: ltm: per-element relative loop scores via subscripted IDs[P2]
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #472 +/- ##
==========================================
+ Coverage 81.90% 82.08% +0.18%
==========================================
Files 187 187
Lines 53170 53884 +714
==========================================
+ Hits 43550 44233 +683
- Misses 9620 9651 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Codex review on PR #472 flagged that `try_detect_ltm_loops_incremental` in `layout/mod.rs` was deriving `n_steps` from a loop's own `n_slots` and indexing the helper output with that same stride. But `compute_rel_loop_scores_per_element` writes each loop's series at the partition's max-slot stride, not the loop's own n_slots. For mixed partitions (loops with different slot counts sharing a partition) this silently produced importance_series of length `step_count * stride` with misaligned values instead of length `step_count`. While investigating, an adjacent OOB-read in the helper came to light: when iterating `k in 0..max_slots` the inner read pulls `row[off + k]` for arrayed members, OOB-reading past the loop's own slot range when partitions mix arrayed loops of different dimensionalities. Two principled fixes, both centered on the loop's own n_slots being the canonical bound: (1) `compute_rel_loop_scores_per_element` now gates per-element reads and writes by `k < n_slots[i]` for arrayed loops. Scalar loops continue broadcasting slot 0 (existing semantic). Positions `n_slots..max_slots` for arrayed loops with `n < max_slots` are zero-filled rather than OOB-read; the partition denom only includes contributions from loops that actually have an element at that index. A new RED test (`per_element_helper_handles_arrayed_with_smaller_n`) plants a sentinel `999.0` at the OOB target so the bug fails loudly rather than passing on whatever uninitialised data the allocator returns. (2) New `aggregate_per_element_argmax_abs` helper in `ltm_post.rs` takes the per-element rel-score map, `n_slots_by_loop`, and `step_count` and produces one signed argmax-abs series per loop. Stride is recovered from `series.len() / step_count` so the layout no longer needs to track partition stride. Argmax-abs iterates the loop's own n_slots only (always <= stride by construction). `try_detect_ltm_loops_incremental` is rewired to call this helper instead of the inlined buggy aggregation. Five RED tests cover scalar-in-mixed-partition stride recovery, arrayed-in-mixed-stride iterating own slots only, NaN/Inf filtering, empty-series handling, and a pure-arrayed sanity check. The integration regression test `test_compute_metadata_importance_series_length_matches_step_count` provides forward-looking layout-level coverage. It currently does NOT trigger the mixed-stride path due to a separately-tracked engine quirk (new tech-debt #35: A2A loops get `partition = None` because `Loop::stocks` holds variable-level names while `stock_partition` keys on element-level names). Once that quirk is fixed, the same fixture should immediately start exercising mixed-stride partitions and this test will keep the layout aggregation honest -- the unit tests cover the algorithm directly with hand-crafted inputs that deliberately simulate mixed-stride partitions until then. Filed tech-debt #35 to track the partition-mismatch quirk.
[P3] FFI streaming helpers diverge from
|
Codex review on the previous commit pointed out that
`compute_rel_loop_scores_per_element` was changed to *skip*
arrayed loops at partition indices `k >= n_slots`, but the
streaming helpers (`effective_slot`,
`compute_partition_denominator_for_element`,
`compute_rel_loop_score_for_element`,
`compute_rel_loop_score_argmax_abs`) still *clamped* to the
loop's last slot. The two paths therefore disagreed for any
mixed-stride partition: the FFI bare-arrayed
`simlin_analyze_get_relative_loop_score` denom included a
clamped `|row[off + n - 1]|` contribution from a loop with
fewer slots than the partition max, while the layout's
`importance_series` (which goes through the full-sweep helper)
did not. The two helpers' docstrings claimed they mirrored
each other; that claim was stale.
Fix: `effective_slot` now returns `Option<usize>`, returning
`None` for arrayed loops when `element_index >= n_slots`.
Callers thread the `None` through:
- `compute_partition_denominator_for_element` skips
`None`-returning members (no contribution to the partition
sum).
- `compute_rel_loop_score_for_element` returns an
all-zeros series (the loop has no own element at this
partition index, matching the full-sweep zero-fill).
- `compute_rel_loop_score_argmax_abs` skips `None`
iterations defensively (its k iteration is always within
`0..n_slots` by construction so the branch is unreachable
in practice, but keeping the call documents the invariant
and guards future callers).
Updated docstrings on `effective_slot` and the streaming
denominator helper to describe the skip semantic and remove
the stale "matches partition-wide max-slots stride convention"
language.
TDD: two new RED tests pinned the divergence (planted a
sentinel value at the clamp target so the bug failed loudly
with 1004.0 = |A[2]| + sentinel and 999.0 = sentinel/denom);
both now pass. A new parity test
`streaming_helpers_match_full_sweep_in_mixed_stride_partition`
extends the existing same-shape parity check to mixed-stride
so any future drift is caught structurally rather than waiting
for a code reviewer to notice.
This divergence is currently latent in production due to
tech-debt #35 (A2A loops never share a partition with anything
because of a separate variable-vs-element-level stock-name
mismatch in `partition_for_loop`), but lands the fix now so
that resolving #35 doesn't immediately introduce silent FFI/
layout disagreement.
Code ReviewI reviewed the engine fix (LTM dep-stub in I did not find any bugs introduced by this PR. Notable points I checked:
Overall correctness verdictCorrect. The patch is free of blocking issues and existing tests are extended (not weakened) at every layer. The two pre-existing tests that were relaxed ( |
Summary
Adds per-element relative loop score access to the libsimlin FFI and pysimlin: callers can now pass
r1[Boston],r1[2], orr1[Boston, 2]to address a specific slot of an arrayed (Apply-to-All) loop. Barer1on an arrayed loop now returns a signed argmax-abs aggregation across slots so single-line "this loop's importance" plots show the dominant element's contribution rather than silently reading slot 0.Built on a separately-tracked engine bug fix (tech-debt #34): A2A
loop_scorevariables were broadcasting slot 0 across every slot becausecompile_ltm_equation_fragmentstubbed LTM-var-to-LTM-var dependencies as scalar, defeating per-element evaluation regardless of how the FFI consumer wired up subscripts. Without this fix, the per-element FFI would have shipped numerically identical values for every element.Closes #463.
Commits
engine: fix A2A loop_score scalar broadcast— root-cause fix incompile_ltm_equation_fragmentso A2A loop_scores actually emit per-element values. Resolves tech-debt accidentally quadratic:module_deps#34 (filed during this work after the bite check on the layout aggregation test surfaced that all per-element rel scores were identical).ltm: per-element relative loop scores via subscripted IDs— engine helpers (LoopElementIndex+ resolver, per-element streaming helpers, argmax-abs aggregator), libsimlin SimState snapshot + cache + dispatch, layoutcompute_metadatarewire, pysimlinelement=kwarg.build: ignore .claude/scheduled_tasks.lock— small.gitignorehousekeeping.ltm: fix per-element rel-score stride and helper OOB— addresses the codex-cli review on this PR (P1: layout was using the loop's ownn_slotsto derive stride andn_stepsfrom the helper's output, but the helper writes at the partition's max-slot stride; for mixed partitions this produced misalignedimportance_seriesof lengthstep_count * stride). Plus a latent OOB-read incompute_rel_loop_scores_per_elementitself surfaced during investigation: whenn_slots[i] < max_slotsfor an arrayed loop, the inner read OOB-read into the next variable's data. Both fixes centred on per-loopn_slotsbeing the canonical bound; newaggregate_per_element_argmax_abshelper extracts the layout aggregation as a unit-testable pure function. Filed tech-debt unit checking #35 to track an adjacent engine quirk (A2A loops getpartition = Nonedue to a variable-level vs element-level stock-name mismatch inpartition_for_loop).Test plan
cargo test --workspace— all suites green (3,161 engine unit + 33 layout integration + 47 LTM integration + 16 libsimlin analysis + ~200 libsimlin total + 340 pysimlin tests).cargo clippy --workspace --all-targets— clean.cargo fmt --check— clean.cd src/pysimlin && uv run pytest tests/— 340 passed, 11 skipped.LoopElementIndexbuilder & resolver (10), per-element streaming helpers (5), argmax-abs aggregator (5), subscript parser (13), libsimlin FFI dispatch (5), layout argmax-abs contract + bite check, pysimlin element kwarg + element_count + end-to-end arrayed access (3), helper OOB regression with sentinel-value fixture (1), aggregation helper unit tests including scalar-in-mixed-partition stride recovery (5), and an integration regression for layout importance_series length (1).Notes for reviewers
compiler/dimensions.rsalready documents. The resolver's offset arithmetic islinear = i_0 * s_1 + i_1for 2D and the equivalent fold for ND.aggregate_per_element_argmax_abshelper that recovers stride from the helper's actual output length; the helper itself is also hardened against an OOB-read that the codex review caused us to notice.module_deps#34 (resolved) and unit checking #35 (filed) indocs/tech-debt.mddocument the engine-side discoveries from this work for future reference.