Context
PR #599's R4 added a dense fast-path to RuntimeView::flat_offset (skip the per-element sparse_lookup SmallVec when a view has no sparse mappings), taking it from ~18.9% to ~8.6% of the C-LEARN run. The residual ~8.6% is the actual per-element offset arithmetic (offset + Σ idx[i] * strides[i]).
The BeginIter non-contiguous precompute walks indices sequentially (increment_indices) and calls flat_offset per element — recomputing the full sum each step even though consecutive indices differ by a constant stride.
Idea
For sequential iteration, step the flat offset incrementally (carry-propagated stride add) instead of recomputing Σ idx*strides per element. Scope to the sequential-iteration paths (the BeginIter precompute; contiguous-but-transposed views); the vector-op call sites (VECTOR SELECT / ELM MAP / SORT ORDER) use arbitrary computed indices and are not stride-steppable.
Flagged as a follow-up by the R4 implementor. Bit-preserving.
Expected impact
A fraction of the residual ~8.6% (the sequential-iteration portion). Incremental.
Refs
src/simlin-engine/src/bytecode.rs — RuntimeView::flat_offset, RuntimeView::offset_for_iter_index.
src/simlin-engine/src/vm.rs — the BeginIter flat-offset precompute and increment_indices.
Context
PR #599's R4 added a dense fast-path to
RuntimeView::flat_offset(skip the per-elementsparse_lookupSmallVec when a view has no sparse mappings), taking it from ~18.9% to ~8.6% of the C-LEARN run. The residual ~8.6% is the actual per-element offset arithmetic (offset + Σ idx[i] * strides[i]).The
BeginIternon-contiguous precompute walks indices sequentially (increment_indices) and callsflat_offsetper element — recomputing the full sum each step even though consecutive indices differ by a constant stride.Idea
For sequential iteration, step the flat offset incrementally (carry-propagated stride add) instead of recomputing
Σ idx*stridesper element. Scope to the sequential-iteration paths (theBeginIterprecompute; contiguous-but-transposed views); the vector-op call sites (VECTOR SELECT/ELM MAP/SORT ORDER) use arbitrary computed indices and are not stride-steppable.Flagged as a follow-up by the R4 implementor. Bit-preserving.
Expected impact
A fraction of the residual ~8.6% (the sequential-iteration portion). Incremental.
Refs
src/simlin-engine/src/bytecode.rs—RuntimeView::flat_offset,RuntimeView::offset_for_iter_index.src/simlin-engine/src/vm.rs— theBeginIterflat-offset precompute andincrement_indices.