(fix+refac): Cell-local NaN + Tq carrier through deriv-zero paths by mgyoo86 · Pull Request #147 · ProjectTorreyPines/FastInterpolations.jl

mgyoo86 · 2026-05-27T21:43:17Z

Summary

Threads the query carrier (ForwardDiff.Dual, Measurement, etc.) through every deriv-zero return path across Constant / Linear / Quadratic / Cubic / Hermite-family kernels and unifies the cell-local "OOB cell = extrap's data" contract. Closes two gaps left by #146:

Carrier + NaN through deriv-zero short-circuits — deriv paths used to discard the kernel via fill!(out, zero(Tv)) (Constant ND), broadcast 0 * first(y) (Series), or type-only zero(Tz) (Hetero NoInterp mixed). All three patterns drop Tq carrier and any NaN in the queried cell.
FillExtrap × deriv semantics — master sourced deriv from y_bnd (boundary y) regardless of extrap type, so FillExtrap(NaN) deriv silently returned 0 instead of propagating NaN. The new contract treats fill_value as the OOB cell's data: derivative = 0 * fill_value (NaN propagates, finite × 0 = 0).

Size: 34 files, +1248 / -518. Single contract across all five method families.

Unified pattern

Cell-local deriv via `kernel × 0`

Every deriv-zero path routes through the kernel and multiplies by 0 after the kernel has threaded the carrier:

Site	Pattern
1D kernels	`_constant_kernel(op, ...)` / `_linear_kernel(op, ...)` etc. — single dispatch handles value/deriv with `* one(dL)` or `* one(α)` carrier thread
ND Constant batch	`_constant_nd_kernel(...) * 0` — kernel threads `* one(dL_d)` per axis, outer `* 0` zeros value while preserving carrier (IEEE: `NaN × 0 = NaN`, `Dual × 0 = Dual(0, 0)`)
Hetero mixed	`_eval_hetero_nd_cell(...) * 0` — Real-axis kernel runs, NoInterp-axis deriv applies `* 0` at end (no more `zero(Tz)` early return)

Series boundary helpers (_constant_extrap_boundary_value, _fill_constant_extrap_simd!) now thread * one(aq.xq) for Tq carrier via signature change — aq.xq is the common field across all four anchored-query types.

`_extrap_oob_data` dispatch — extrap-aware OOB data

Single helper centralizes the per-extrap "what data sits in the OOB cell" decision:

@inline _extrap_oob_data(::ClampExtrap, y_bnd) = y_bnd        # boundary y extends into OOB
@inline _extrap_oob_data(e::FillExtrap, _)     = e.fill_value # fill_value IS the OOB data

_eval_extrapolation collapses both EvalValue and DerivOp paths onto a parallel data → promote shape:

EvalValue → _promote_extrap_val(_extrap_oob_data(ext, y_bnd), xq)   # data + carrier
DerivOp   → _promote_extrap_zero(_extrap_oob_data(ext, y_bnd), xq)  # 0 × data + carrier

The shared scaffolding makes the deriv path's intent obvious — the helper fetches the cell's data; the 0 * happens inside _promote_extrap_zero. Same dispatch threads through ND _fill_extrap_result and Series boundary helpers. ClampExtrap behavior unchanged; FillExtrap deriv now sources from fill_value, consistently across 1D / ND / Series / Hetero mixed.

Behavior changes

Deriv preserves Tq carrier on every path: Dual query → Dual result for any DerivOp(n) ≥ 1, including n beyond the polynomial degree.
Cell-local NaN propagates through deriv-zero: Constant ND, Constant Series (in-domain + OOB), Hetero NoInterp (all-NoInterp + mixed Real+NoInterp) — NaN in the queried cell now reaches the result; NaN elsewhere stays hidden.
FillExtrap × deriv adopts cell-local "fill_value-as-data" semantics: FillExtrap(c) at OOB now treats c as the OOB cell's data → deriv = 0 * c. Finite c returns 0 (no change); NaN (or other non-finite) c propagates through IEEE multiplication. Applies uniformly to 1D (all 7 methods), ND (Constant/Linear/etc.), Series, and Hetero mixed paths. Master inconsistently sourced from y_bnd regardless of extrap type, silently absorbing NaN fill_value sentinels.
NoExtrap anchored-query right-edge bug: _constant_anchor_dispatch(NoExtrap) at aq.xq == last(itp.x) used zero(T) * one(aq.xq), dropping Tg carrier and stripping NaN. Now delegates to the shared cell-local _constant_eval_at_anchor path.
Linear ND _linear_weight(::EvalDeriv1) Tq gap (pre-existing, master): _linear_weight(::EvalDeriv1, α, inv_h, ::Val{B}) = ±inv_h returned Tg only — for a Dual query on EvalDeriv1 axes that were the only Tq source (e.g. (D1, D1) mixed partial, or (EvalValue, D1) with axis-2 Dual), the multilinear sum dropped the carrier and the batch path failed with TypeError. Threading via * one(α) mirrors the EvalDeriv2+ pattern; LLVM const-folds the 1.0 factor on plain Float queries.
Hetero mixed OOB FillExtrap × NoInterp deriv (pre-existing, master): The _eval_nointerp OOB short-circuit ran before the NoInterp-deriv check, so any FillExtrap(c) returned c even when a NoInterp axis carried non-zero deriv (mathematically must be 0). Now oob * 0 applies when NoInterp deriv is present — finite c → 0, NaN c propagates per the new contract.

Performance

All numbers M1 Pro, Julia 1.12.6. Branch vs master.

Value path (no regression):

Scenario	Master	Branch	Δ
Constant Series in-domain batch (50q × 1k series)	62.4 μs	61.6 μs	-1.3%
Constant Series OOB Clamp batch	43.7 μs	44.5 μs	+1.8%
Constant Series scalar in-domain	124 ns	123 ns	≈
Linear/Cubic Series scalar OOB Clamp	102 ns	102 ns	≈
Constant ND 2D persist value (Nq=10k)	107.6 μs	107.6 μs	≈

Deriv path (now equal to value path on branch):

Scenario	Branch deriv	Branch value	Master deriv
Constant Series scalar in-domain	125 ns	123 ns	126 ns
Constant Series scalar OOB Clamp	103 ns	100 ns	88 ns
Constant ND 2D persist (Nq=1k)	11.04 μs	10.67 μs	81 ns
Constant ND 2D persist (Nq=10k)	111.0 μs	107.6 μs	680 ns

The two highlighted ND rows are the only material slowdown. Master's fill!(out, zero(Tv)) shortcut was effectively a memset that bypassed all computation — fast precisely because it dropped carrier + NaN. The branch makes deriv pay the same cost as value (cell-local kernel + * 0); the cost is bounded by the value path, which is the structural ceiling.

Series deriv paths slow ~14-17% (broadcast 0 * first(y) → per-k cell-local 0 * y[idx, k] * one(aq.xq)) — same rationale; SIMD inner loop still hoists one(aq.xq) outside the loop body.

_extrap_oob_data dispatch is @inline with singleton dispatch — LLVM specializes per concrete extrap type and dead-branch-eliminates. Zero overhead on OOB cold paths; in-domain paths never reach the helper.

Adds `test/test_carrier_type_stability.jl` with four `@testitem` blocks that combine `@inferred` and `isa T` into single assertions, locking both type-stability and carrier-shape in one place. - Cat A: `@inferred` across deriv-aware paths (mostly broken pin for 1D scalar Float-drop; Phase 2 will close). - Cat B: Dual carrier preservation through sub-zero deriv paths (broken pin — 1D scalar Float-drop, Phase 2). - Cat C: NaN propagation through Dual partials on ND deriv-zero short-circuits — currently RED for in-place + persist-allocating paths (Constant ND). - Cat D: scalar / allocating / in-place / persistent path cross-equivalence (currently RED — bleeds from Cat C). Created as the RED phase for the carrier-aware deriv-zero helper refactor; the next commit drives Cat C/D to GREEN. Cat A/B broken pins are tracked for Phase 2.

Three ND Constant deriv-zero short-circuits previously diverged from the carrier-propagation contract the scalar / one-shot allocating paths already honor: - `interpolant_protocol.jl:249` filled with `zero(eltype(output))`, dropping both the data's NaN/Inf and the query's carrier shape. - `constant_nd_oneshot.jl:237` filled with `0 * first(data)`, preserving NaN value but losing the carrier-shape lift (Dual partials collapsed to 0). - `interpolant_protocol.jl:186` was hand-coded with the right pattern but duplicated across call sites. Adds `_deriv_zero_value(::Type{T_out}, sample_data, sample_query)` in `src/core/utils.jl` with `Real` / `Tuple` dispatch: zero(T_out) + 0 * sample_data * <prod-folded-q> Three layers split the work: - `zero(T_out)` pins the kernel's promotion lattice. `T_out` comes from `_output_eltype(itp, Tq)` (or the method-trait route for one-shot, no-`itp` callers), so it carries `Tg`-aware widening that the multiplicative body alone cannot reach. For the Constant lattice (`Tv * Tq`) this term is identity and LLVM removes it. - `sample_data` carries `Tv` and any NaN/Inf pattern; the leading `0 *` annihilates value while IEEE keeps the NaN bits live. - `sample_query` (the same scalar/tuple the caller already holds from `_extract_query_point`) supplies `Tq` and threads NaN into every hidden carrier slot (Dual partials, Measurement uncertainty, …) via the product rule. All three deriv-zero short-circuits route through the new helper. Closes 5 NaN-partials failures + 3 cross-path equivalence failures in `test/test_carrier_type_stability.jl` (Cat C + Cat D, ND Constant). The arithmetic-lattice methods (Linear/Cubic/Quadratic/Hermite) will inherit the helper without further changes when Phase 2 audits each method's own one-shot in-place branch.

The `_zero_ref` name was misleading — the trait does not return a "zero value" but rather a representative `Tv` element used as a sample in the deriv-zero short-circuit (`first(itp.data)` for Constant/Linear, `first(itp.nodal_derivs.partials)` for Cubic/Quadratic, similar for Hetero). Callers always multiply this by `0`, so the actual value is irrelevant — only its type and any NaN/Inf pattern matter. `_value_sample` better captures the trait's role: "a sample of the value type Tv". Pure rename across 5 definitions + 11 use sites in 7 files; no behavior change.

Aligns the trait name with the helper's argument name (`sample_data` in `_deriv_zero_value(::Type{T_out}, sample_data, sample_query)`), making call sites read `_deriv_zero_value(T_out, _sample_data(itp), first_q)` — same token on both sides of the assignment. Pure rename; no behavior change.

…y` → `sampled_data` / `sampled_query` Disambiguates the helper's parameter names from the `_sample_data(itp)` trait introduced in b50f56e. Past-participle form ("already sampled") distinguishes the *result* (helper argument) from the *extraction function* (trait), reducing visual noise at call sites. Pure rename; no behavior change.

…e(α)` The EvalDeriv1/2/3 and generic DerivOp{N} branches of `_linear_kernel` computed slope/zero results that ignored the query parameter `α`, dropping Dual carrier on the 1D scalar path. Multiplying by `one(α)` threads the query carrier (Dual partials, Measurement uncertainty, …) through the kernel output; for plain `Real` `α` LLVM const-folds the `1.0` factor away, so perf is unchanged. Promotes 5 broken pins in `test_carrier_type_stability.jl`: - Cat A 1D oneshot/persist scalar @inferred linear deriv=2 - Cat B 1D oneshot scalar Linear DerivOp(1) Dual return - Cat B 1D persist scalar Linear DerivOp(1) Dual return - Cat B 1D oneshot scalar Linear DerivOp(2) Dual return Perf-neutral (within ±2% noise across 1D scalar / batch / 2D / persistent paths, zero allocations preserved): 1D scalar baseline 5-7 ns → post-fix 5-7 ns 1D batch alloc baseline 42-57 ns → post-fix 41-58 ns 2D scalar baseline 14 ns → post-fix 14 ns Cubic/Quadratic/PCHIP/Cardinal/Akima still pending (same kernel pattern, to be applied in follow-up commits).

…(dL)` EvalDeriv3 and the generic DerivOp{N≥4} branches of `_cubic_kernel` ignored the query offset `dL`/`dR` (both underscored), dropping Dual carrier on the 1D scalar path. Naming `dL` in the signature and multiplying the result by `one(dL)` threads the query carrier through; EvalValue/EvalDeriv1/EvalDeriv2 already use `dL`/`dR` in their formulas so they were carrier-aware by construction. Promotes 5 broken pins in `test_carrier_type_stability.jl`: - Cat A 1D oneshot/persist scalar @inferred cubic deriv=4 - Cat B 1D oneshot scalar Cubic DerivOp(3) Dual return - Cat B 1D persist scalar Cubic DerivOp(3) Dual return - Cat B 1D oneshot scalar Cubic DerivOp(4) Dual return Perf-neutral (1D scalar 74-83 ns oneshot, 4-7 ns persistent, 0 allocs). PCHIP/Cardinal/Akima/Hermite still pending — same pattern.

… one(dL)` EvalDeriv2/3 and the generic DerivOp{N≥3} branches of `_quadratic_kernel` underscored the `dL` argument and returned values that ignored the query, dropping Dual carrier on the 1D scalar path. Naming `dL` and multiplying by `one(dL)` threads the query carrier through. EvalValue/EvalDeriv1 already use `dL` in their formulas so they were carrier-aware. Promotes 1 broken pin in `test_carrier_type_stability.jl`: - Cat B 1D oneshot scalar Quadratic DerivOp(2) Dual return PCHIP/Cardinal/Akima/Hermite still pending.

…ne(dL)` `_hermite_kernel_1d` is shared by all Hermite-family methods (PCHIP / Cardinal / Akima / Hermite). EvalDeriv3 ignored `dL` (third derivative is constant) and DerivOp{N≥4} had it underscored, dropping Dual carrier on the 1D scalar path. EvalValue/EvalDeriv1/EvalDeriv2 already thread `dL` through `t = dL * inv_h` so they were carrier-aware. Promotes 5 broken pins in `test_carrier_type_stability.jl`: - Cat A 1D oneshot scalar @inferred PCHIP/Cardinal/Akima deriv=4 - Cat B 1D oneshot scalar PCHIP DerivOp(3) Dual return - Cat B 1D oneshot scalar PCHIP DerivOp(4) Dual return Single kernel fix → all 4 Hermite-family methods carrier-aware on the 1D scalar deriv path. Regression sweep (test_pchip / test_cardinal / test_akima / test_hermite) all green; perf neutral.

…circuits Replace the Phase 1 protocol-level deriv-zero short-circuits (which used `first(data)` as a global sample, conflating NaN policy across the whole array) with a cell-local kernel-native mechanism. The kernel's own `zero(α)` / per-corner weight × 0 flow already produces a carrier-aware zero from the cell-local data corners — protocol short-circuits were both unnecessary and overly conservative. Constant ND (kernel only handled EvalValue): • Add multi-dispatch wrapper `_constant_nd_evaluate`: ::NTuple{N, EvalValue} → forward to `_constant_nd_kernel` ::NTuple{N, AbstractEvalOp} → `_constant_nd_kernel(...) * 0` Cell-local kernel result carries NaN/Inf via IEEE `NaN * 0 = NaN`; Tq carrier rides on the kernel's per-axis `* one(dL_d)`. • Thread `ops::NTuple{N, AbstractEvalOp}` through the oneshot stack (`_constant_interp_nd_oneshot{,_batch,_batch!}`, `_constant_nd_batch_dispatch{!,}`). Linear ND (kernel already handled deriv ≥ 2 via `_linear_weight(::EvalDeriv2+, …) = zero(α)`): • Drop the `_eval_at_cell` short-circuit that returned `0 * first(itp.data)` and dropped non-axis-1 Tq carrier on heterogeneous queries. • Kernel's existing per-corner `data[corner] * weights_product` (with one factor being `zero(α)`) produces the cell-local carrier-aware zero naturally. Removed dead helpers / traits: • `_deriv_zero_value`, `_deriv_zero_fill` + Constant/Linear/Hetero specializations • `_is_any_deriv` (Constant ND oneshot) • Protocol-level early returns in `_eval_nd_at_point` and the ND in-place batch loop in `interpolant_protocol.jl` Tests (`test/test_carrier_type_stability.jl`): • Cat A (@inferred type stability) + Cat B (sub-zero deriv carrier) unchanged. • Cat C/D restructured around *cell-local* NaN: a NaN at a cell corner the query reads must propagate; a NaN outside the query's cell must NOT. • Scope note: cell-local applies to kernel-local methods (Constant, Linear, Hermite-family). Cubic/Quadratic perform a global tridiagonal solve at build, so a single NaN oxidizes every coefficient — by-design global, not covered by this contract.

…er case) Mirror the `AbstractArray` overload's NaN-propagating pattern on the `Number` overload — Array case already does `0 .* val .+ …`, but Number case did `zero(xq) * zero(val)`, stripping `val`'s NaN (cell-local boundary data) at every OOB ClampExtrap/FillExtrap deriv query. Path: `_eval_extrapolation(::DerivOp, y_bnd, ::ClampExtrap, xq)` / `(::FillExtrap, xq)` → `_promote_extrap_zero(y_bnd, xq)`. `y_bnd` is the cell-local boundary value (`first(y)` for OOB_LEFT, `last(y)` for OOB_RIGHT) — its NaN carrier must propagate to match the in-domain kernel's `0 * y_left * one(dL)` behavior and the Array overload's existing `0 .* val .+ …` form. Affects all 7 methods (Linear / Constant / Cubic / Quadratic / Pchip / Cardinal / Akima) uniformly via the shared helper. Tests (`test/test_carrier_type_stability.jl` — new Cat E): - Helper-level Number vs Array consistency - 1D OOB ClampExtrap × deriv × 5 cell-local methods × {OOB_LEFT, OOB_RIGHT} - 1D OOB FillExtrap × deriv × 5 cell-local methods × {OOB_LEFT, OOB_RIGHT} - Cell-local invariant: NaN at the OTHER boundary stays hidden - Carrier preservation: Dual xq + NaN val → Dual{NaN, 0} (matches `_promote_extrap_val`) 22 RED tests, then GREEN after the 1-line fix. Full regression sweep clean (10458 lines, 96% coverage, all tests passed).

Cat A pins `@inferred` on in-domain × deriv (NoExtrap default) paths; Cat E exercises a new path — OOB × ClampExtrap/FillExtrap × deriv through `_eval_extrapolation` → `_promote_extrap_zero` — which Cat A doesn't cover. Add `@inferred` at the helper level for both overloads (Number, Array, Dual carrier) so the recent fix's type stability is locked against future regressions.

…l paths Phase 3 + Bundle A established the cell-local NaN policy at the ND kernel and OOB extrap helper layers; this commit closes the remaining Constant-family gaps where deriv-zero branches used `0 * first(y)` (non-cell-local) while their EvalValue siblings already used cell-local indices like `last(y)` / `y[aq.idxR]` / `y[n_pts, k]` / `y_point[k, idx]`. Sites (4 files, 11 lines): - `constant_oneshot.jl` (2): InBounds + WrapExtrap right-edge seam. `0 * first(y) * one(xi)` → `0 * last(y) * one(xi)` — cell-local right endpoint, mirrors EvalValue branch. - `constant_anchor.jl` (3): NoExtrap, AbstractExtrap default, ClampOrFill in-domain at `aq.xq == x_last`. `0 * first(y) * one(aq.xq)` → `@inbounds 0 * y[aq.idxR] * one(aq.xq)` — symmetric with EvalValue. - `constant_series_interp.jl` (3): scalar series right-boundary (`_eval_constant_series_point!`) + anchored right-boundary (`_eval_constant_series_with_extrap`) + anchored in-domain deriv (`_eval_constant_series_anchored`). Per-series loops use `y_point[k, n_pts]` / `y[n_pts, k]` / `y[aq.idxL, k]` instead of `first(y)` — keeps per-series NaN cell-local. - `series_utils.jl` (4): `_constant_extrap_boundary_value` (layout `[time, series]`) + `_fill_constant_extrap_simd!` (layout `[series, time]`, named `y_point`). Both deriv overloads now consume the previously-unused `side` / `n_pts` / `k` args via `_boundary_point_index` for per-series cell-local zero. Tests (`test_carrier_type_stability.jl` Cat F, 30 assertions): - 1D oneshot scalar — right-edge × {NoExtrap, ClampExtrap, WrapExtrap} - 1D persistent scalar — right-edge × {NoExtrap, ClampExtrap, FillExtrap} - Series batch in-domain — per-series cell-local at NaN-bearing cell - Series scalar right-boundary — per-series cell-local at right edge - Series scalar/batch OOB ClampExtrap — per-series cell-local via fills - Cell-local invariant: NaN at the OTHER side stays hidden (13 asserts) 17 RED before fix → 30 GREEN after. Targeted regression (test_carrier_ type_stability + 7 Constant/Series/extrap test files) all pass.

NoInterp axes are structurally lookup-only (no math, like Constant). When a non-zero DerivOp targets a NoInterp axis, the existing short- circuits returned `zero(Tz)` of the promoted type — dropping any cell- local NaN in the queried data slice. Mirror the Constant Phase 3 pattern by multiplying the cell-local data by `zero(Tz)`: the sliced data carries NaN, `zero(Tz)` enforces type promotion; together they yield a type- correct, NaN-aware deriv-zero. Sites fixed (2 of 3): - `_eval_nointerp` line ~282 (all-NoInterp persistent, `N_r == 0`): hoist `data_expr` (covers both `_HeteroPartials` and raw `Array` storage), wrap deriv guard with `$data_expr * zero($Tz_expr)`. - `_interp_nointerp_oneshot` line ~475 (all-GridIdx oneshot, `grids_r === ()`): `data_r[] * zero(_output_eltype(...))`. Scope exception: `_interp_nointerp_oneshot` line ~489 (mixed Real + NoInterp axes with deriv on NoInterp) still returns plain `zero(Tz)` — strict cell-local would require running the Real-axis interp first (potential doubled work for global-solve methods); deferred as a separate follow-up. Tests (`test_carrier_type_stability.jl` Cat G, 4 assertions): - All-NoInterp persistent — cell-local NaN propagates + out-of-cell stays hidden - All-GridIdx oneshot — cell-local NaN propagates + out-of-cell stays hidden 2 RED before fix → 4 GREEN after. Targeted regression (test_carrier_ type_stability + test_nointerp) all pass.

…ttern for perf Bundle B+C's per-series cell-local NaN propagation in Constant Series deriv-zero paths is reverted to master's broadcast `0 * first(y)` pattern. The cell-local indexed loads add ~1 ns per call on the batch series eval inner loop (called K × Q times — 50k iterations on a 1000-series × 50- query batch), measuring **+49% per-call cost** (21.6 → 32.2 μs) on the benchmark scenario. Reverted sites (4): - `series_utils.jl` — `_constant_extrap_boundary_value` (2 overloads) - `constant_series_interp.jl` — `_eval_constant_series_point!` right-boundary deriv (scalar in-domain edge) - `constant_series_interp.jl` — `_eval_constant_series_with_extrap` `aq.xq == x_max` deriv branch (anchor-clipped OOB) ← actual perf culprit - `constant_series_interp.jl` — `_eval_constant_series_anchored` in-domain deriv branch - `series_utils.jl` — `_fill_constant_extrap_simd!` (already reverted for SIMD broadcast — kept consistent with renamed `y_point` arg) The Hetero / 1D right-edge / `_promote_extrap_zero` / NoInterp cell-local changes (Bundles A, B 1D, D) are KEPT — those sites have ≤ noise perf impact (verified via master worktree benchmark, all < 5 ns/call). Tests (`test_carrier_type_stability.jl` Cat F): - 4 series cell-local assertions marked `@test_broken` with explanatory comments — future fix path noted (NaN-presence detection at construction). - Out-of-cell invariant assertions kept as `@test` (still hold under broadcast since `first(y)` is one fixed cell). Perf verification: master 21.6 μs → branch (post-revert) 21.6 μs on the canonical bench (1000-series × 50-query Series batch OOB ClampExtrap deriv). Other paths unchanged at noise level.

Comments + test coverage cleanup before merging. Comments: - `linear_nd_eval.jl` deriv-≥2 weight: correct stale description (Phase 3 removed the protocol short-circuit; kernel handles it cell-locally). - `interpolant_protocol.jl`: replace dead `_select_output_eltype` reference with `_arithmetic_kernel_shape` / `_constant_kernel_shape`. - Series perf-trade-off comments (4 sites): trim to one-line "why broadcast". - Cat F header + `@test_broken` testset comments: drop dev-history phrasing. Test coverage: - Cat A: add `quadratic_interp` `@inferred` for oneshot scalar + persistent scalar (DerivOp(3) — beyond polynomial degree). - Cat E: extend OOB ClampExtrap/FillExtrap × deriv NaN loop to include `cubic_interp` and `quadratic_interp` (both route through `_promote_extrap_zero` at OOB, so the Bundle A fix applies). - Cat D: add Linear ND scalar ↔ in-place batch cross-path check (verifies kernel-native `_linear_weight(::EvalDeriv2+) = zero(α)`).

…anchored` "Derivatives of constant (step) function are zero" restates what the `0 * first(y)` expression already shows. The perf rationale comment is covered at `_constant_extrap_boundary_value` (the first occurrence).

Wrap plain `@test EXPR isa T` checks in duck-typing tests with `@inferred` so they pin both result type AND inference stability — same pattern as Cat A in `test_carrier_type_stability.jl`. - `test_duck_tv_dual_tq.jl`: 84 conversions (SVector/Dual Tv/Tq carriers). - `test_duck_typing_comprehensive.jl`: 66 conversions (DuckFloat-style Tv). Two test groups in `test_duck_tv_dual_tq.jl` kept as plain `isa`: - ND `ForwardDiff.gradient/hessian/derivative` (lines 306-317): AD wrappers have `Any` internal inference; `@inferred` is over-strict for these. - "Dual x grid carrier flows through Tv × Tq" (lines 266-275): some `(fn, y, xq)` combos infer `Union{Float64, Dual}` even though the runtime returns the right type. Latent type instability surfaces here.

Constant 1D right-edge short-circuit (`xi == last(x)`) returned `y[idxR] * one(xi)` — only the query carrier `Tq`. The kernel path picks up the grid carrier `Tg` via `aq.dL` / `dL = xq - xL`. For Dual grid + Float query, the two paths returned `Float` and `Dual` respectively → inference `Union{Tv, Dual}` even though runtime returns the right type. Add `* one(eltype(x))` (oneshot) / `* one(x_last)` (anchor) to thread Tg through the short-circuit. For Float grid this is a no-op (multiply by 1.0); for Dual grid it lifts to Dual. Now both branches return the same type, inference is clean. Sites (5): - `constant_anchor.jl` 3 short-circuits (NoExtrap, AbstractExtrap default, ClampOrFill in-domain). - `constant_oneshot.jl` InBounds + WrapExtrap. Test: re-promote `test_duck_tv_dual_tq.jl` "Dual x grid carrier flows through Tv × Tq" testset to `@test (@inferred …) isa T` (was reverted to plain `isa` in the previous commit; this latent inference bug was the root cause).

…l_value `_constant_interp_nd_oneshot{,_batch!}` passed a hardcoded `EvalValue()` to `_try_fill_oob`, so OOB queries under `FillExtrap` always returned the fill_value — even for deriv queries that should return zero. Phase 3 (commit 364c3e4) removed a protocol-level `_is_any_deriv` short-circuit at the public entry, which previously intercepted deriv queries before they reached `_try_fill_oob`. The hardcoded `EvalValue()` was correct under that short-circuit but became a regression once removed. Persistent path (`interpolant_protocol.jl:131,173`) correctly passes `ops` — only the oneshot path was missing the threading. Fix: pass `ops` through `_try_fill_oob` at both scalar and batch oneshot sites. Test (`test_carrier_type_stability.jl` Cat E): 4 assertions covering scalar (mixed-axis deriv) + batch oneshot + persistent sanity check.

`_linear_weight(::EvalDeriv1, α, inv_h, ::Val{B}) = ±inv_h` returned `Tg` only — for a `Dual` query whose carrier source was an `EvalDeriv1` axis (mixed partial `(D1, D1)`, or `(EvalValue, D1)` with axis-2 Dual), the multilinear corner sum dropped Tq and the batch path failed with `TypeError` (buffer eltype Dual vs kernel return Float). Fix mirrors the `EvalDeriv2+` pattern: `* one(α)` threads Tq even when `α` does not appear in the weight expression. LLVM const-folds the `1.0` factor on plain Float queries (zero cost). For Dual queries it costs one extra Dual mul per corner — necessary for AD correctness. Pre-existing on master; this PR's carrier-propagation contract surfaces the gap. Test coverage extended in Cat B with an ND counterpart of the existing 1D "non-zero deriv" testset, including mixed-partial `(D1, D1)` on Linear / Cubic ND.

…)` + per-k cell-local OOB helpers Series deriv branches used broadcast `0 * first(y)` (single-element load, SIMD-friendly) which dropped both cell-local NaN and Tq carrier — only NaN at `y[1]` could leak, regardless of which series cell was queried, and `Dual` queries returned the underlying `Tv` without carrier. This commit routes the deriv path through the value path's exact mechanism on every Series eval site: * `_eval_constant_series_anchored` and the scalar in-domain right-edge branch in `_eval_constant_series_point!` collapse to a single `_constant_kernel(op, y_left, y_right, h, dL, side)` call — value and deriv share one code path. The 1D `_constant_kernel(::EvalDeriv*)` overloads already follow the unified `0 * y * one(dL)` pattern. * Shared OOB helpers `_constant_extrap_boundary_value` and `_fill_constant_extrap_simd!` (in `core/series_utils.jl`, used by Linear/Cubic/Quadratic/Constant Series alike) take an `aq` argument and thread `* one(aq.xq)` for Tq carrier. `aq.xq` is the only common field across the four anchored-query types, so dispatch stays duck-typed. Clamp/Fill deriv overloads merged onto `::_ClampOrFill` and source their zero from boundary `y[idx, k]` (not `e.fill_value`) so a NaN-bearing `fill_value` does NOT leak through deriv while a NaN at the boundary cell still does — preserves the existing OOB FillExtrap × deriv contract. Performance: per-k cell-local indexing inside the SIMD loop replaces a single broadcast value, but `one(aq.xq)` hoists out of the inner loop so the body is one load + one mul per `k`. Series scalar deriv paths are ~14–17% slower (102 ns vs 88 ns); batch deriv is bounded by the value-path cost (now ≈ value). Cat F's four `@test_broken` assertions (Series in-domain / right-edge / OOB ClampExtrap × deriv) flip to `@test`. Direct-call tests in `test/test_series_utils.jl` updated for the new signature; bonus regression pin for FillExtrap(NaN) × deriv = 0.

… pattern When a `HeteroInterpolantND` query has `Real` axes + at least one `NoInterp` axis (mixed case) AND a non-zero deriv on a NoInterp axis, the `@generated _eval_nointerp` body short-circuited with `return zero($Tz_expr)` — a type-only zero that strips both cell-local NaN at the queried slice and any Tq carrier from the Real-axis search. Same pattern as Constant ND's `_constant_nd_kernel(...) * 0` dispatch: remove the early-return, run the Real-axis kernel (`_eval_hetero_nd_cell` / `_collapse_dims`) normally, and wrap the result with `result * 0` when any NoInterp axis carries a non-zero deriv. `result` is already carrier-threaded by the Real-axis kernel (IEEE: `0 * Dual = Dual(0, 0)`, `0 * NaN = NaN`), so the late `* 0` preserves Tq + Tg + Tv while zeroing the value. Applied to three branches of the `@generated` body (`_HeteroPartials` / OnTheFly local / OnTheFly global) and to the `_interp_nointerp_oneshot` mixed case. The runtime cost is one extra multiplication on the deriv path — Real-axis search would have been required anyway for a strict cell-local contract, so the saving from the prior early-return was conceptually unavailable. Also drops a stale "Constant Phase 3" label in the all-NoInterp comment (the pattern it references is now the regular `_constant_nd_evaluate` dispatch). Cat G extended with Mixed Linear+NoInterp persistent + oneshot cell-local NaN tests.

Sweeps the small fixes surfaced by the pre-merge code-review pass: * Constant `NoExtrap` anchored-query right-edge bug (`_constant_anchor_dispatch(NoExtrap)` at `aq.xq == last(itp.x)`): the wrapper used `zero(T) * one(aq.xq)`, dropping Tg carrier and stripping NaN at the boundary cell. Now delegates to the shared `_constant_eval_at_anchor(::NoExtrap)` path which mirrors the cell- local pattern used by the other three sibling overloads. Pinned by a new "1D anchored-query — right-edge cell-local" subtest in Cat F. * `constant_oneshot.jl`: scalar `_constant_eval_at_point` right-edge short-circuits switched from `one(xi) * one(eltype(x))` to `one(Tq) * one(Tg)` — the function signature already binds both type parameters, so use them directly. Same shape semantically, more explicit at the call site. * Stale `_sample_data` trait comments (Linear / Cubic / Quadratic ND eval): "Zero-ref for fill-value derivative computation" → "Per-method sample of `Tv` for fill-value paths (e.g. `_try_fill_oob`)" to match the wording landed for Constant ND in the trait rename series. * Stale comment in `constant_anchor.jl` describing the right-edge short-circuit pattern: now mentions both Tq and Tg threading via `one(aq.xq) * one(x_last)`. * `_promote_extrap_zero` docstring (`core/utils.jl`): broadened from "derivative of constant" to the actual current contract (carrier-aware zero under flat extrapolation, all method families). Explains why the `0 * val` prefix is structurally required for NaN propagation. * Phase label cleanup in `constant_series_interp.jl` ("Phase E.2:") and in `test_carrier_type_stability.jl` Cat F / Cat G headers (stray "Phase 3" / "Constant Phase 3" references). Removes plan-era labels that age poorly once the plan lands. * `constant_oneshot.jl` scalar entry docstring: lists `EvalValue()` + any `DerivOp(n)` (was capped at `DerivOp(2)`). Public API supports arbitrary N already. * `constant_nd_oneshot.jl` batch docstring: missing `ops` arg added to both `_constant_interp_nd_oneshot_batch!` / `..._batch` signatures. * `_constant_nd_evaluate` (`constant_nd_eval.jl`): rationale comment added explaining why this path intentionally runs the kernel + `* 0` rather than falling back to a `fill!` shortcut — guards future reverts that might be tempted by the perf delta vs master.

Replaces the hand-rolled Linear/Cubic-only Cat B ND subtests with a loop over all six ND methods (Linear D1 / Cubic D3 / Quadratic D2 / PCHIP D3 / Cardinal D3 / Akima D3) and three path families (oneshot scalar / oneshot batch / persistent scalar+batch), each covering both the per-axis `(EvalValue, Dk)` pattern and the mixed-partial `(DerivOp(1), DerivOp(1))` pattern. Helpers `_check_oneshot_scalar` / `_check_oneshot_batch` / `_check_persistent` take `fn::F` so `@inferred` can specialize per concrete method; the for-loop body would otherwise see `fn` as a Union and collapse the inference check. Coverage matrix after this commit (Cat B ND, all `@inferred + isa`): | method | oneshot scalar | oneshot batch | persistent scalar+batch | |------------|:--------------:|:-------------:|:-----------------------:| | Linear | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Cubic | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Quadratic | ✓ × 2 | ✓ × 2 | ✓ × 4 | | PCHIP | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Cardinal | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Akima | ✓ × 2 | ✓ × 2 | ✓ × 4 | Cat B grows from 14 → 57 assertions. The batch path is the most load-bearing: Linear's prior `_linear_weight(::EvalDeriv1)` Tq gap showed up there first as `TypeError` (buffer eltype `Dual` vs kernel return `Float`), which scalar-only coverage missed.

Copilot

Pull request overview

This PR threads the query carrier type (e.g. ForwardDiff.Dual, Measurement) through every derivative-zero return path across all interpolation method families, and unifies the cell-local NaN propagation contract. It closes carrier-and-NaN gaps that survived #146: previously, deriv paths in Constant ND, Series, and Hetero NoInterp would short-circuit via fill!, 0 * first(y), or zero(Tz), dropping both Tq carrier and any NaN at the queried cell.

Changes:

Unifies "deriv-zero" handling so every kernel runs and is multiplied by 0 after the carrier is threaded (preserving IEEE NaN × 0 = NaN, Dual partials, etc.) — touches Constant/Linear/Quadratic/Cubic/Hermite kernels and ND/Series/Anchor dispatch.
Replaces _zero_ref with _sample_data and removes the _deriv_zero_fill trait shortcut; series helpers (_constant_extrap_boundary_value, _fill_constant_extrap_simd!) now take aq and source the zero from y[idx, k] (not fill_value), so OOB FillExtrap × deriv returns 0 rather than leaking fill_value.
Adds extensive type-stability (@inferred) and cell-local NaN propagation tests (Cat A–G) across every method × dimensionality × path combination.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/core/interpolant_protocol.jl	Removes `_deriv_zero_fill` shortcut; renames `_zero_ref` → `_sample_data`.
src/core/series_utils.jl	Adds `aq` parameter; deriv path now sources zero from cell-local `y[idx, k]`.
src/core/utils.jl	Fixes `_promote_extrap_zero(::Number, ::Number)` to propagate boundary NaN.
src/vector_calculus.jl	Switches `_zero_ref` → `_sample_data` at fill-OOB sites.
src/constant/constant_oneshot.jl	Right-edge short-circuits thread `one(Tq) * one(Tg)`; updates docstring.
src/constant/constant_anchor.jl	Right-edge & NoExtrap short-circuits use cell-local `y[aq.idxR]` and thread Tg/Tq carriers.
src/constant/constant_series_interp.jl	Series right-boundary now routes through `_constant_kernel` for op-uniform carrier/NaN.
src/constant/nd/constant_nd_eval.jl	New `_constant_nd_evaluate` two-method dispatch (EvalValue vs deriv via `* 0`).
src/constant/nd/constant_nd_oneshot.jl	Threads `ops` through oneshot/batch dispatch; removes pre-call deriv short-circuits.
src/linear/linear_kernels.jl	Deriv kernels thread `* one(α)` carrier.
src/linear/linear_series_interp.jl	Adds `aq` param to series fill helpers.
src/linear/nd/linear_nd_eval.jl	Removes `_deriv_zero_fill` trait; `_linear_weight(::EvalDeriv1)` now `* one(α)` (fixes pre-existing TypeError on `(D1,D1)` Dual).
src/cubic/cubic_kernels.jl	EvalDeriv3/N kernels thread `* one(dL)`.
src/cubic/cubic_series_interp.jl	Adds `aq` param to series fill helpers.
src/cubic/nd/cubic_nd_eval.jl	Renames `_zero_ref` → `_sample_data`.
src/cubic/nd/cubic_nd_math.jl	EvalDeriv3 + DerivOp{N} hermite kernels thread `* one(dL)`.
src/quadratic/quadratic_kernels.jl	EvalDeriv2/3/N kernels thread `* one(dL)`.
src/quadratic/quadratic_series_interp.jl	Adds `aq` param to series fill helper.
src/quadratic/nd/quadratic_nd_eval.jl	Renames `_zero_ref` → `_sample_data`.
src/hetero/hetero_eval.jl	Renames `_zero_ref` → `_sample_data`; removes `_deriv_zero_fill` override.
src/hetero/hetero_nointerp.jl	All-NoInterp / mixed paths multiply cell-local `data * zero(Tz)` instead of returning bare `zero(Tz)`.
test/test_series_utils.jl	Updates calls for new `aq` arg; adds FillExtrap × NaN tests.
test/test_duck_typing_comprehensive.jl	Wraps existing `isa` checks with `@inferred`.
test/test_duck_tv_dual_tq.jl	Wraps existing `isa` checks with `@inferred`; notes AD wrappers exception.
test/test_carrier_type_stability.jl	New comprehensive test file covering Cat A–G (type stability, carrier propagation, cell-local NaN).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-05-27T22:02:03Z

Codecov Report

❌ Patch coverage is 98.27586% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.46%. Comparing base (616b106) to head (15de65e).

Files with missing lines	Patch %	Lines
src/core/nd_utils.jl	66.66%	1 Missing ⚠️
src/core/series_utils.jl	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   96.43%   96.46%   +0.03%     
==========================================
  Files         143      143              
  Lines       11978    11940      -38     
==========================================
- Hits        11551    11518      -33     
+ Misses        427      422       -5

Files with missing lines	Coverage Δ
src/constant/constant_anchor.jl	`98.55% <100.00%> (-0.05%)`	⬇️
src/constant/constant_oneshot.jl	`100.00% <100.00%> (ø)`
src/constant/constant_oneshot_series.jl	`100.00% <100.00%> (ø)`
src/constant/constant_series_interp.jl	`98.10% <100.00%> (+0.52%)`	⬆️
src/constant/nd/constant_nd_eval.jl	`100.00% <100.00%> (ø)`
src/constant/nd/constant_nd_oneshot.jl	`100.00% <100.00%> (+1.29%)`	⬆️
src/core/interpolant_protocol.jl	`100.00% <100.00%> (ø)`
src/core/utils.jl	`88.13% <100.00%> (+0.84%)`	⬆️
src/cubic/cubic_kernels.jl	`100.00% <100.00%> (ø)`
src/cubic/cubic_oneshot_series.jl	`100.00% <100.00%> (ø)`
... and 16 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-05-27T22:02:48Z

FastInterpolations.jl Benchmarks

All benchmarks (50 total, click to expand)

Benchmark	Current: `15de65e`	Previous	Imm. Ratio	Grad. Ratio
`10_nd_construct/bicubic_2d`	`36989.0` ns	`38859.0` ns	`0.952`	`0.96`
`10_nd_construct/bilinear_2d`	`598.1` ns	`580.7` ns	`1.03`	`0.946`
`10_nd_construct/tricubic_3d`	`353489.0` ns	`320133.0` ns	`1.104`	`1.0`
`10_nd_construct/trilinear_3d`	`1706.2` ns	`1607.8` ns	`1.061`	`0.955`
`11_nd_eval/bicubic_2d_batch`	`1429.6` ns	`1300.0` ns	`1.1`	`0.904`
`11_nd_eval/bicubic_2d_scalar`	`16.3` ns	`12.8` ns	`1.274`	`1.0`
`11_nd_eval/bilinear_2d_scalar`	`7.5` ns	`6.6` ns	`1.138`	`1.019`
`11_nd_eval/tricubic_3d_batch`	`3230.0` ns	`2489.8` ns	`1.297`	`0.989`
`11_nd_eval/tricubic_3d_scalar`	`33.5` ns	`24.8` ns	`1.348`	`0.979`
`11_nd_eval/trilinear_3d_scalar`	`13.2` ns	`10.0` ns	`1.321`	`0.975`
`12_cubic_eval_gridquery/range_random`	`4225.9` ns	`3638.9` ns	`1.161`	`0.979`
`12_cubic_eval_gridquery/range_sorted`	`4219.5` ns	`3613.4` ns	`1.168`	`0.98`
`12_cubic_eval_gridquery/vec_random`	`9311.3` ns	`7787.9` ns	`1.196`	`0.985`
`12_cubic_eval_gridquery/vec_sorted`	`3201.2` ns	`2620.3` ns	`1.222`	`0.997`
`13_nd_oneshot_gridquery/bicubic_2d_rand_rand`	`65569.2` ns	`58729.5` ns	`1.116`	`0.976`
`13_nd_oneshot_gridquery/bicubic_2d_sort_rand`	`62263.0` ns	`55491.7` ns	`1.122`	`0.982`
`13_nd_oneshot_gridquery/bicubic_2d_sort_sort`	`58212.5` ns	`53708.0` ns	`1.084`	`0.98`
`13_nd_oneshot_gridquery/bilinear_2d_rand_rand`	`20179.6` ns	`13537.0` ns	`1.491`	`1.155`
`13_nd_oneshot_gridquery/bilinear_2d_sort_rand`	`9365.2` ns	`7454.0` ns	`1.256`	`0.919`
`13_nd_oneshot_gridquery/bilinear_2d_sort_sort`	`5684.2` ns	`3851.0` ns	`1.476`	`0.998`
`14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand`	`18714.9` ns	`15082.7` ns	`1.241`	`1.008`
`14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand`	`17861.2` ns	`14233.5` ns	`1.255`	`0.932`
`1_cubic_oneshot/q00001`	`548.6` ns	`447.9` ns	`1.225`	`1.15`
`1_cubic_oneshot/q10000`	`43548.1` ns	`37312.3` ns	`1.167`	`0.702`
`2_cubic_construct/g0100`	`1388.2` ns	`1198.0` ns	`1.159`	`1.02`
`2_cubic_construct/g1000`	`12678.7` ns	`10953.5` ns	`1.158`	`0.977`
`3_cubic_eval/q00001`	`20.8` ns	`19.9` ns	`1.045`	`1.002`
`3_cubic_eval/q00100`	`439.6` ns	`377.6` ns	`1.164`	`0.976`
`3_cubic_eval/q10000`	`42632.4` ns	`36592.1` ns	`1.165`	`0.98`
`4_linear_oneshot/q00001`	`24.7` ns	`22.9` ns	`1.079`	`0.95`
`4_linear_oneshot/q10000`	`18693.8` ns	`15620.5` ns	`1.197`	`0.99`
`5_linear_construct/g0100`	`35.2` ns	`30.0` ns	`1.17`	`0.903`
`5_linear_construct/g1000`	`234.1` ns	`203.8` ns	`1.149`	`0.84`
`6_linear_eval/q00001`	`10.1` ns	`7.8` ns	`1.294`	`0.975`
`6_linear_eval/q00100`	`194.8` ns	`162.8` ns	`1.196`	`0.984`
`6_linear_eval/q10000`	`18480.4` ns	`15400.2` ns	`1.2`	`0.993`
`7_cubic_range/scalar_query`	`8.3` ns	`7.1` ns	`1.169`	`0.979`
`7_cubic_vec/scalar_query`	`11.3` ns	`8.1` ns	`1.395`	`1.054`
`8_cubic_multi/construct_s001_q100`	`639.2` ns	`523.6` ns	`1.221`	`1.1`
`8_cubic_multi/construct_s010_q100`	`4412.2` ns	`3704.2` ns	`1.191`	`0.986`
`8_cubic_multi/construct_s100_q100`	`39625.6` ns	`33467.5` ns	`1.184`	`0.972`
`8_cubic_multi/eval_s001_q100`	`807.5` ns	`640.8` ns	`1.26`	`1.08`
`8_cubic_multi/eval_s010_q100`	`1782.3` ns	`1443.8` ns	`1.234`	`1.023`
`8_cubic_multi/eval_s010_q100_scalar_loop`	`2296.5` ns	`1945.3` ns	`1.181`	`0.985`
`8_cubic_multi/eval_s100_q100`	`11541.5` ns	`9357.1` ns	`1.233`	`1.006`
`8_cubic_multi/eval_s100_q100_scalar_loop`	`3378.3` ns	`3160.8` ns	`1.069`	`0.974`
`9_nd_oneshot/bicubic_2d`	`45117.1` ns	`43695.9` ns	`1.033`	`1.095`
`9_nd_oneshot/bilinear_2d`	`540.0` ns	`542.4` ns	`0.996`	`0.96`
`9_nd_oneshot/tricubic_3d`	`420902.1` ns	`361762.1` ns	`1.163`	`1.092`
`9_nd_oneshot/trilinear_3d`	`1034.9` ns	`837.2` ns	`1.236`	`0.975`

⚠️ Performance Regression Confirmed ⚠️

After re-running 42 flagged benchmark(s) 10 time(s), 40 regression(s) confirmed.

Benchmark	Current	Previous	Imm. Ratio	Grad. Ratio	Tier
`10_nd_construct/tricubic_3d`	`353489.0` ns	`320133.0` ns	`1.104`	`1.0`	immediate
`11_nd_eval/bicubic_2d_scalar`	`16.3` ns	`12.8` ns	`1.274`	`1.0`	immediate
`11_nd_eval/bilinear_2d_scalar`	`7.5` ns	`6.6` ns	`1.138`	`1.019`	immediate
`11_nd_eval/tricubic_3d_batch`	`3230.0` ns	`2489.8` ns	`1.297`	`0.989`	immediate
`11_nd_eval/tricubic_3d_scalar`	`33.5` ns	`24.8` ns	`1.348`	`0.979`	immediate
`11_nd_eval/trilinear_3d_scalar`	`13.2` ns	`10.0` ns	`1.321`	`0.975`	immediate
`12_cubic_eval_gridquery/range_random`	`4225.9` ns	`3638.9` ns	`1.161`	`0.979`	immediate
`12_cubic_eval_gridquery/range_sorted`	`4219.5` ns	`3613.4` ns	`1.168`	`0.98`	immediate
`12_cubic_eval_gridquery/vec_random`	`9311.3` ns	`7787.9` ns	`1.196`	`0.985`	immediate
`12_cubic_eval_gridquery/vec_sorted`	`3201.2` ns	`2620.3` ns	`1.222`	`0.997`	immediate
`13_nd_oneshot_gridquery/bicubic_2d_rand_rand`	`65569.2` ns	`58729.5` ns	`1.116`	`0.976`	immediate
`13_nd_oneshot_gridquery/bicubic_2d_sort_rand`	`62263.0` ns	`55491.7` ns	`1.122`	`0.982`	immediate
`13_nd_oneshot_gridquery/bilinear_2d_rand_rand`	`20179.6` ns	`13537.0` ns	`1.491`	`1.155`	both
`13_nd_oneshot_gridquery/bilinear_2d_sort_rand`	`9365.2` ns	`7454.0` ns	`1.256`	`0.919`	immediate
`13_nd_oneshot_gridquery/bilinear_2d_sort_sort`	`5684.2` ns	`3851.0` ns	`1.476`	`0.998`	immediate
`14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand`	`18714.9` ns	`15082.7` ns	`1.241`	`1.008`	immediate
`14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand`	`17861.2` ns	`14233.5` ns	`1.255`	`0.932`	immediate
`1_cubic_oneshot/q00001`	`548.6` ns	`447.9` ns	`1.225`	`1.15`	both
`1_cubic_oneshot/q10000`	`43548.1` ns	`37312.3` ns	`1.167`	`0.702`	immediate
`2_cubic_construct/g0100`	`1388.2` ns	`1198.0` ns	`1.159`	`1.02`	immediate
`2_cubic_construct/g1000`	`12678.7` ns	`10953.5` ns	`1.158`	`0.977`	immediate
`3_cubic_eval/q00100`	`439.6` ns	`377.6` ns	`1.164`	`0.976`	immediate
`3_cubic_eval/q10000`	`42632.4` ns	`36592.1` ns	`1.165`	`0.98`	immediate
`4_linear_oneshot/q10000`	`18693.8` ns	`15620.5` ns	`1.197`	`0.99`	immediate
`5_linear_construct/g0100`	`35.2` ns	`30.0` ns	`1.17`	`0.903`	immediate
`5_linear_construct/g1000`	`234.1` ns	`203.8` ns	`1.149`	`0.84`	immediate
`6_linear_eval/q00001`	`10.1` ns	`7.8` ns	`1.294`	`0.975`	immediate
`6_linear_eval/q00100`	`194.8` ns	`162.8` ns	`1.196`	`0.984`	immediate
`6_linear_eval/q10000`	`18480.4` ns	`15400.2` ns	`1.2`	`0.993`	immediate
`7_cubic_range/scalar_query`	`8.3` ns	`7.1` ns	`1.169`	`0.979`	immediate
`7_cubic_vec/scalar_query`	`11.3` ns	`8.1` ns	`1.395`	`1.054`	immediate
`8_cubic_multi/construct_s001_q100`	`639.2` ns	`523.6` ns	`1.221`	`1.1`	immediate
`8_cubic_multi/construct_s010_q100`	`4412.2` ns	`3704.2` ns	`1.191`	`0.986`	immediate
`8_cubic_multi/construct_s100_q100`	`39625.6` ns	`33467.5` ns	`1.184`	`0.972`	immediate
`8_cubic_multi/eval_s001_q100`	`807.5` ns	`640.8` ns	`1.26`	`1.08`	immediate
`8_cubic_multi/eval_s010_q100`	`1782.3` ns	`1443.8` ns	`1.234`	`1.023`	immediate
`8_cubic_multi/eval_s010_q100_scalar_loop`	`2296.5` ns	`1945.3` ns	`1.181`	`0.985`	immediate
`8_cubic_multi/eval_s100_q100`	`11541.5` ns	`9357.1` ns	`1.233`	`1.006`	immediate
`9_nd_oneshot/tricubic_3d`	`420902.1` ns	`361762.1` ns	`1.163`	`1.092`	immediate
`9_nd_oneshot/trilinear_3d`	`1034.9` ns	`837.2` ns	`1.236`	`0.975`	immediate

Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)

This comment was automatically generated by Benchmark workflow.

@generated

…ill_value Mirrors the Constant ND OOB FillExtrap contract pinned in Cat E: "deriv at OOB returns 0, not fill_value". The Hetero mixed Real + GridIdx/NoInterp path violated this on both oneshot and persistent: * Oneshot (`_interp_nointerp_oneshot`) — regression introduced by the `result * 0` change: a Real-axis OOB query under `FillExtrap(NaN)` used to early-return `zero(Tz)` (master), but the new flow lets `interp(...)` compute `fill_value = NaN` and then multiplies by `0`, yielding `NaN` (`NaN * 0 = NaN`). * Persistent (`_eval_nointerp` @generated) — pre-existing master bug: the OOB `_try_fill_oob` short-circuit runs BEFORE the NoInterp deriv check, so any `FillExtrap` (incl. `NaN`) leaks regardless of the NoInterp axis derivative. Fix in both paths: when any NoInterp axis carries a non-zero deriv AND the Real-axis query is OOB, return a slab-local zero sourced from the data slice's first element (`data_r[1] * zero(Tz)`) — finite cells yield 0, a NaN at the slab survives, and the user-supplied `fill_value` cannot reach the result. The in-domain Real-axis path is unchanged (still runs the kernel + `result * 0` for cell-local NaN). Cat E gets a new testset "Hetero mixed OOB FillExtrap × NoInterp deriv returns zero, not fill_value" — runs both `99.0` and `NaN` fill values through oneshot and persistent. Pre-existing structural gap: Cat E covered OOB contracts only for 1D + Constant ND, and Cat G covered Hetero NoInterp only in-domain; the Hetero × OOB intersection was unpinned. New testset closes it.

…t leak fill_value" This reverts commit 5f20143.

`outputs = [Vector{Tv_out}(undef, NQ) for _ in 1:K]` infers as `Vector` (non-concrete) on Julia 1.10 LTS when `Tv_out` is bound from a function return. The comprehension is lowered to a closure, and 1.10's inference can't propagate the locally-bound `Type{T}` from the outer scope into the closure body — even though `_output_eltype(...)` is fully concrete in isolation. Verified by isolating: an inline type literal in the comprehension infers correctly, a local-variable type does not, a type-as-argument barrier does. Extracts the pattern into `_alloc_series_batch_outputs(::Type{Tv}, K, NQ)` in series_utils.jl. All four Series oneshot batch entry points (Linear / Cubic / Quadratic / Constant) route through it. Effect on LTS: `linear_interp(x, ::Series, ::Vector{<:Dual})` etc. now infer concretely as `Vector{Vector{Tv_out}}` instead of `Vector`, matching 1.11+/1.12+. Removes downstream type-instability that previously propagated to callers on LTS only. Unblocks the 4 batched-Dual @inferred tests in test_duck_tv_dual_tq.jl "Series path — duck-Tv × duck-Tq carrier propagates" that were failing on the LTS CI job.

Provides a version-conditional `@inferred` for tests that hit Julia inference gaps present only on older releases (e.g. 1.10 LTS doesn't propagate a locally-bound `Type{T}` from a function return into a closure body, even when the function is concrete-inferred in isolation). `@maybe_inferred expr` expands to: - `Test.@inferred expr` on ≥1.12 - `expr` unchanged on older releases (runtime contract via the surrounding `isa` / `==` still applies; inference assertion skipped) Opt-in via `setup=[InferredCompat]` on `@testitem`. Use sparingly — prefer fixing the source (e.g. extract a `::Type{T}` barrier, see `_alloc_series_batch_outputs`) when the inference gap is in this package's own code.

Added in 5408d2c as a defensive utility for hypothetical future LTS-only inference gaps. Never referenced by any testitem — the source-level `_alloc_series_batch_outputs` barrier (a9b8458) fully covers the only gap we hit on LTS, and no other test was waiting on this. Removed per YAGNI; if a real callsite shows up later, the macro is 6 lines.

@generated

…ontract Unifies the OOB deriv contract across all paths so it matches the strict cell-local interpretation: the OOB cell's data IS whatever the extrap rule puts there. For `FillExtrap(c)`, that's `c` — so deriv = `0 * c` (NaN propagates, finite × 0 = 0). Master's contract sourced deriv from `y_bnd` regardless of extrap type, which inconsistently ignored the fill_value as a NaN sentinel for AD / OOB-detection use cases. Single-dispatch helper `_extrap_deriv_source(extrap, y_bnd)`: - `ClampExtrap` → `y_bnd` (unchanged — boundary y NaN propagates) - `FillExtrap` → `e.fill_value` (NaN fill_value propagates as data) Threaded through every OOB deriv site: * 1D `_eval_extrapolation(::DerivOp, ...)` collapsed to one Union method using the helper. ClampExtrap behavior unchanged; FillExtrap now sources from fill_value. * ND `_fill_extrap_result(::AbstractEvalOp, fill_val, _, qe)` switched from `zero_ref` to `fill_val` (signature stable, `zero_ref` retained but unused). Tuple-ops overload likewise. * Series `_constant_extrap_boundary_value` / `_fill_constant_extrap_simd!` Clamp/Fill deriv collapsed to a single Union overload using the helper. * Hetero mixed `_eval_nointerp` @generated body: OOB short-circuit now runs `oob * 0` when a NoInterp axis has non-zero deriv — `oob` is already `_promote_extrap_zero(fill_value, ...)` post-fix, so finite fill → 0 and NaN fill propagates. Closes pre-existing master bug where the OOB shortcut returned `fill_value` without applying NoInterp deriv's `* 0`. * Hetero mixed `_interp_nointerp_oneshot`: no code change — `result * 0` already does the right thing once `_eval_extrapolation` is fixed. Perf: every change site is on the OOB cold path. Helper is `@inline` with singleton dispatch (LLVM dead-branch elimination). In-domain paths remain bit-identical; ClampExtrap-only callers see no behavior change. Tests: * `Cat E` rewritten — pins the new contract end-to-end across 1D (all 7 methods), ND Constant/Linear, Hetero mixed. 89 assertions. Old "queried-boundary NaN propagates" tests for `FillExtrap` repurposed to "y_bnd NaN is IGNORED" (the new contract distinction). * `test_constextrap_fill.jl` updates: "Cubic with fill value", "Linear with fill value", and "ND derivatives under FillExtrap" flipped from `iszero` to `isnan` for `FillExtrap(NaN)` × deriv; finite-fill assertions retained as-is. Reverts the previous patch (5f20143) which sourced the OOB zero from the data slice's first element under the prior contract — the option-B flow sources from `fill_value` instead, so that patch's `data_r[1]` short-circuit is unnecessary and was reverted in 81f221c.

The helper returns "what data sits in the OOB cell" — a property of the extrap, not specific to derivative evaluation. The old name (`_deriv_source`) implied deriv-only scope, but the underlying concept is the same `y_bnd` vs `fill_value` dispatch that already drives the `EvalValue` extrap path. Generalizing the name allows future callers (e.g. value-path consolidation) to reuse the helper without semantic mismatch. Behavior unchanged. 4 call sites renamed (`core/utils.jl` definition + `_eval_extrapolation` use; `core/series_utils.jl` × 2).

Three test files still pinned the prior contract ("FillExtrap(NaN) × deriv returns 0, fill_value-as-sentinel"). Flipped to the new "fill_value-as-data" semantics — NaN fill propagates through deriv via `0 * NaN = NaN`, finite fill × 0 = 0 unchanged. * `test_derivatives.jl` FillExtrap block: `D(4)` on cubic/quadratic at OOB with NaN fill now expects `all(isnan, res)` per series; in-domain D(4) for cubic stays `== z2`. * `test_series_utils.jl` direct-call tests for `_constant_extrap_boundary_value` and `_fill_constant_extrap_simd!`: FillExtrap deriv with NaN fill flips to `isnan`; added explicit finite-fill (99.0) → 0 assertions. * `test_mixed_precision_extrap.jl` FillExtrap derivatives: Float32 NaN fill + Float64 query now expects `isnan` with Float64 promotion; added finite-fill (0.0f0) sibling case to keep the Tq promotion test alive.

…_oob_data` The two `::EvalValue` methods (ClampExtrap, FillExtrap) used the same `y_bnd` vs `fill_value` dispatch already centralized in `_extrap_oob_data` — but expressed it inline as separate overloads. Collapsing them onto the Union method that mirrors the `::DerivOp` shape makes the parallel structure explicit: EvalValue → `data + carrier` (value-promotion of OOB cell data) DerivOp → `0 * data + carrier` (deriv-promotion: 0 × OOB cell data) The shared scaffolding clarifies the deriv-path read: `_extrap_oob_data` fetches the cell's data and `_promote_extrap_zero` applies the `0 *` — the helper is asking "what data?", not "what derivative?". Behavior unchanged. ClampExtrap / FillExtrap value-path output bit-equal to before (the dispatch chain still resolves to the same source); deriv path also unchanged. Verified across `test_carrier_type_stability`, `test_constextrap_fill`, `test_constant`, `test_derivatives`.

mgyoo86 · 2026-05-28T00:06:02Z

The flagged performance regressions can be ignored

The previous master measurement was just an outlier, not a real perf change introduced by this PR.

A few things support that read:

Local M1 reproduction is clean. Same ci_benchmark.jl, Julia 1.12.6, branch vs master worktree: 0 / 50 benchmarks regressed >5%; entire set within ±5%. Max local regression: +2.08%.
This kind of outlier happens periodically on ubuntu-latest. Scanning the full gh-pages/bench/data.js (209 master commits), ~2.8% of runs land on an anomalously fast runner. When they do, the next few PRs see phantom regressions until the baseline shifts.
616b106b5 looks like one of those. Vs the preceding 30 master commits, it sits at percentile rank ≤10% on 39 / 49 benchmarks — bottom decile across most of the suite.

Identical code, 21% delta

616b106b5 ("bump version 0.4.11 + README") has zero source changes vs its predecessor c47519bbd (Project.toml version + README.md only). Two consecutive master runs of bit-identical code:

Median Δ: −21.35% ; 41 / 50 benchmarks ≥10% "improvement"
Most extreme: 11_nd_eval/tricubic_3d_scalar 35.9 → 24.8 ns (−30.7%)

By construction that's 100% runner variance, and the inflated-by-~21% baseline closely matches the PR's flagged ratios (1.13–1.34×).

The "derivative view with fill value" testset pinned the prior contract (`FillExtrap(NaN) × deriv → 0`, no NaN in the recipe output). Under the new "fill_value-as-data" contract the OOB tails of `deriv1(itp)` carry `0 * NaN = NaN`, which Plots draws as gaps (intended visual outcome). Flipped the assertion to `any(isnan, yq)` for the NaN-fill case and added a finite-fill (`FillExtrap(0.0)`) sibling that still asserts no NaN.

mgyoo86 added 26 commits May 21, 2026 14:14

Runic formatting

55200a7

mgyoo86 requested a review from Copilot May 27, 2026 21:43

Copilot started reviewing on behalf of mgyoo86 May 27, 2026 21:43 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Runic formatting

c16d696

mgyoo86 added 9 commits May 27, 2026 15:10

Revert "(fix): Hetero mixed OOB + FillExtrap × NoInterp deriv must no…

81f221c

…t leak fill_value" This reverts commit 5f20143.

mgyoo86 added 2 commits May 27, 2026 17:06

Runic formatting

12d9701

mgyoo86 merged commit d7f8048 into master May 28, 2026
14 checks passed

mgyoo86 deleted the fix/nan_propagation branch May 28, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix+refac): Cell-local NaN + Tq carrier through deriv-zero paths#147

(fix+refac): Cell-local NaN + Tq carrier through deriv-zero paths#147
mgyoo86 merged 38 commits into
masterfrom
fix/nan_propagation

mgyoo86 commented May 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

codecov Bot commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

mgyoo86 commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mgyoo86 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Unified pattern

Cell-local deriv via kernel × 0

_extrap_oob_data dispatch — extrap-aware OOB data

Behavior changes

Performance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FastInterpolations.jl Benchmarks

⚠️ Performance Regression Confirmed ⚠️

Uh oh!

mgyoo86 commented May 28, 2026

The flagged performance regressions can be ignored

Identical code, 21% delta

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgyoo86 commented May 27, 2026 •

edited

Loading

Cell-local deriv via `kernel × 0`

`_extrap_oob_data` dispatch — extrap-aware OOB data

codecov Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading