Summary
Investigate whether la-stack can recover more vector-kernel performance in v0.4.3 without weakening the parse-don't-validate invariant model introduced in v0.4.2.
Current State
The v0.4.2 finite-proof changes correctly parse public Matrix and Vector storage into private proof-bearing wrappers before computation. This improved correctness, but benchmarks show some small fixed-size kernels, especially dot product, squared norm, and norm-like operations, remain slower than nalgebra/faer.
The LU regression was mostly recovered by moving non-finite factor checks out of cubic update loops while still validating completed factor storage before factors escape. Vector kernels may have a similar correctness-preserving optimization opportunity: keep public methods checked, but avoid unnecessary duplicate passes or copies once proof-bearing values exist.
Investigation Targets
- Measure dot product, squared norm, infinity norm, solve_from_lu, and lu_solve separately against current nalgebra/faer baselines.
- Identify boundary-check cost versus arithmetic cost for small const-generic dimensions.
- Check whether Vector::dot and Vector::norm2_sq can combine parsing and computation more tightly without exposing invalid internal state.
- Add cold-path hints to vector-kernel overflow exits where that matches the existing Matrix/LU/LDLT error-path style.
- Compare public Vector::dot / Vector::norm2_sq against proof-bearing internal FiniteVector paths so the benchmark separates validation cost from arithmetic cost.
- Check whether matrix infinity-norm and symmetry paths have similar duplicate validation or avoidable pass costs before adding any helper abstraction.
- Preserve private proof-bearing wrappers and crate-private unchecked constructors; do not make non-finite storage observable or silently propagated.
- Consider additive internal-only fast paths over FiniteVector where proof already exists.
Stretch Targets
- Investigate whether det_sign_exact can avoid duplicate fast-filter work between det_direct() and det_errbound() for D <= 4.
- Only pursue determinant fast-filter sharing if Criterion shows it matters; split it into a separate issue if the formula consolidation becomes non-trivial.
Acceptance Criteria
Non-Goals
- Do not roll back parse-don't-validate correctness.
- Do not expose FiniteVector, FiniteMatrix, or other proof wrappers publicly without a separate API design decision.
- Do not trade typed error behavior for benchmark wins.
- Do not add generic_const_exprs-dependent APIs or dimension-specialized trait machinery for v0.4.3.
Summary
Investigate whether la-stack can recover more vector-kernel performance in v0.4.3 without weakening the parse-don't-validate invariant model introduced in v0.4.2.
Current State
The v0.4.2 finite-proof changes correctly parse public Matrix and Vector storage into private proof-bearing wrappers before computation. This improved correctness, but benchmarks show some small fixed-size kernels, especially dot product, squared norm, and norm-like operations, remain slower than nalgebra/faer.
The LU regression was mostly recovered by moving non-finite factor checks out of cubic update loops while still validating completed factor storage before factors escape. Vector kernels may have a similar correctness-preserving optimization opportunity: keep public methods checked, but avoid unnecessary duplicate passes or copies once proof-bearing values exist.
Investigation Targets
Stretch Targets
Acceptance Criteria
Non-Goals