(fix+refac): Cell-local NaN + Tq carrier through deriv-zero paths#147
Conversation
Adds `test/test_carrier_type_stability.jl` with four `@testitem` blocks
that combine `@inferred` and `isa T` into single assertions, locking
both type-stability and carrier-shape in one place.
- Cat A: `@inferred` across deriv-aware paths (mostly broken pin
for 1D scalar Float-drop; Phase 2 will close).
- Cat B: Dual carrier preservation through sub-zero deriv paths
(broken pin — 1D scalar Float-drop, Phase 2).
- Cat C: NaN propagation through Dual partials on ND deriv-zero
short-circuits — currently RED for in-place + persist-allocating
paths (Constant ND).
- Cat D: scalar / allocating / in-place / persistent path
cross-equivalence (currently RED — bleeds from Cat C).
Created as the RED phase for the carrier-aware deriv-zero helper
refactor; the next commit drives Cat C/D to GREEN. Cat A/B broken
pins are tracked for Phase 2.
Three ND Constant deriv-zero short-circuits previously diverged from the
carrier-propagation contract the scalar / one-shot allocating paths
already honor:
- `interpolant_protocol.jl:249` filled with `zero(eltype(output))`,
dropping both the data's NaN/Inf and the query's carrier shape.
- `constant_nd_oneshot.jl:237` filled with `0 * first(data)`,
preserving NaN value but losing the carrier-shape lift (Dual
partials collapsed to 0).
- `interpolant_protocol.jl:186` was hand-coded with the right pattern
but duplicated across call sites.
Adds `_deriv_zero_value(::Type{T_out}, sample_data, sample_query)` in
`src/core/utils.jl` with `Real` / `Tuple` dispatch:
zero(T_out) + 0 * sample_data * <prod-folded-q>
Three layers split the work:
- `zero(T_out)` pins the kernel's promotion lattice. `T_out` comes
from `_output_eltype(itp, Tq)` (or the method-trait route for
one-shot, no-`itp` callers), so it carries `Tg`-aware widening
that the multiplicative body alone cannot reach. For the Constant
lattice (`Tv * Tq`) this term is identity and LLVM removes it.
- `sample_data` carries `Tv` and any NaN/Inf pattern; the leading
`0 *` annihilates value while IEEE keeps the NaN bits live.
- `sample_query` (the same scalar/tuple the caller already holds
from `_extract_query_point`) supplies `Tq` and threads NaN into
every hidden carrier slot (Dual partials, Measurement uncertainty,
…) via the product rule.
All three deriv-zero short-circuits route through the new helper.
Closes 5 NaN-partials failures + 3 cross-path equivalence failures
in `test/test_carrier_type_stability.jl` (Cat C + Cat D, ND Constant).
The arithmetic-lattice methods (Linear/Cubic/Quadratic/Hermite) will
inherit the helper without further changes when Phase 2 audits each
method's own one-shot in-place branch.
The `_zero_ref` name was misleading — the trait does not return a "zero value" but rather a representative `Tv` element used as a sample in the deriv-zero short-circuit (`first(itp.data)` for Constant/Linear, `first(itp.nodal_derivs.partials)` for Cubic/Quadratic, similar for Hetero). Callers always multiply this by `0`, so the actual value is irrelevant — only its type and any NaN/Inf pattern matter. `_value_sample` better captures the trait's role: "a sample of the value type Tv". Pure rename across 5 definitions + 11 use sites in 7 files; no behavior change.
Aligns the trait name with the helper's argument name (`sample_data`
in `_deriv_zero_value(::Type{T_out}, sample_data, sample_query)`),
making call sites read `_deriv_zero_value(T_out, _sample_data(itp),
first_q)` — same token on both sides of the assignment. Pure rename;
no behavior change.
…y` → `sampled_data` / `sampled_query` Disambiguates the helper's parameter names from the `_sample_data(itp)` trait introduced in b50f56e. Past-participle form ("already sampled") distinguishes the *result* (helper argument) from the *extraction function* (trait), reducing visual noise at call sites. Pure rename; no behavior change.
…e(α)`
The EvalDeriv1/2/3 and generic DerivOp{N} branches of `_linear_kernel`
computed slope/zero results that ignored the query parameter `α`,
dropping Dual carrier on the 1D scalar path. Multiplying by `one(α)`
threads the query carrier (Dual partials, Measurement uncertainty, …)
through the kernel output; for plain `Real` `α` LLVM const-folds the
`1.0` factor away, so perf is unchanged.
Promotes 5 broken pins in `test_carrier_type_stability.jl`:
- Cat A 1D oneshot/persist scalar @inferred linear deriv=2
- Cat B 1D oneshot scalar Linear DerivOp(1) Dual return
- Cat B 1D persist scalar Linear DerivOp(1) Dual return
- Cat B 1D oneshot scalar Linear DerivOp(2) Dual return
Perf-neutral (within ±2% noise across 1D scalar / batch / 2D / persistent
paths, zero allocations preserved):
1D scalar baseline 5-7 ns → post-fix 5-7 ns
1D batch alloc baseline 42-57 ns → post-fix 41-58 ns
2D scalar baseline 14 ns → post-fix 14 ns
Cubic/Quadratic/PCHIP/Cardinal/Akima still pending (same kernel pattern,
to be applied in follow-up commits).
…(dL)`
EvalDeriv3 and the generic DerivOp{N≥4} branches of `_cubic_kernel`
ignored the query offset `dL`/`dR` (both underscored), dropping Dual
carrier on the 1D scalar path. Naming `dL` in the signature and
multiplying the result by `one(dL)` threads the query carrier through;
EvalValue/EvalDeriv1/EvalDeriv2 already use `dL`/`dR` in their formulas
so they were carrier-aware by construction.
Promotes 5 broken pins in `test_carrier_type_stability.jl`:
- Cat A 1D oneshot/persist scalar @inferred cubic deriv=4
- Cat B 1D oneshot scalar Cubic DerivOp(3) Dual return
- Cat B 1D persist scalar Cubic DerivOp(3) Dual return
- Cat B 1D oneshot scalar Cubic DerivOp(4) Dual return
Perf-neutral (1D scalar 74-83 ns oneshot, 4-7 ns persistent, 0 allocs).
PCHIP/Cardinal/Akima/Hermite still pending — same pattern.
… one(dL)`
EvalDeriv2/3 and the generic DerivOp{N≥3} branches of `_quadratic_kernel`
underscored the `dL` argument and returned values that ignored the query,
dropping Dual carrier on the 1D scalar path. Naming `dL` and multiplying
by `one(dL)` threads the query carrier through. EvalValue/EvalDeriv1
already use `dL` in their formulas so they were carrier-aware.
Promotes 1 broken pin in `test_carrier_type_stability.jl`:
- Cat B 1D oneshot scalar Quadratic DerivOp(2) Dual return
PCHIP/Cardinal/Akima/Hermite still pending.
…ne(dL)`
`_hermite_kernel_1d` is shared by all Hermite-family methods (PCHIP /
Cardinal / Akima / Hermite). EvalDeriv3 ignored `dL` (third derivative
is constant) and DerivOp{N≥4} had it underscored, dropping Dual carrier
on the 1D scalar path. EvalValue/EvalDeriv1/EvalDeriv2 already thread
`dL` through `t = dL * inv_h` so they were carrier-aware.
Promotes 5 broken pins in `test_carrier_type_stability.jl`:
- Cat A 1D oneshot scalar @inferred PCHIP/Cardinal/Akima deriv=4
- Cat B 1D oneshot scalar PCHIP DerivOp(3) Dual return
- Cat B 1D oneshot scalar PCHIP DerivOp(4) Dual return
Single kernel fix → all 4 Hermite-family methods carrier-aware on the
1D scalar deriv path. Regression sweep (test_pchip / test_cardinal /
test_akima / test_hermite) all green; perf neutral.
…circuits
Replace the Phase 1 protocol-level deriv-zero short-circuits (which used
`first(data)` as a global sample, conflating NaN policy across the whole
array) with a cell-local kernel-native mechanism. The kernel's own
`zero(α)` / per-corner weight × 0 flow already produces a carrier-aware
zero from the cell-local data corners — protocol short-circuits were both
unnecessary and overly conservative.
Constant ND (kernel only handled EvalValue):
• Add multi-dispatch wrapper `_constant_nd_evaluate`:
::NTuple{N, EvalValue} → forward to `_constant_nd_kernel`
::NTuple{N, AbstractEvalOp} → `_constant_nd_kernel(...) * 0`
Cell-local kernel result carries NaN/Inf via IEEE `NaN * 0 = NaN`;
Tq carrier rides on the kernel's per-axis `* one(dL_d)`.
• Thread `ops::NTuple{N, AbstractEvalOp}` through the oneshot stack
(`_constant_interp_nd_oneshot{,_batch,_batch!}`, `_constant_nd_batch_dispatch{!,}`).
Linear ND (kernel already handled deriv ≥ 2 via `_linear_weight(::EvalDeriv2+, …) = zero(α)`):
• Drop the `_eval_at_cell` short-circuit that returned `0 * first(itp.data)` and
dropped non-axis-1 Tq carrier on heterogeneous queries.
• Kernel's existing per-corner `data[corner] * weights_product` (with one factor
being `zero(α)`) produces the cell-local carrier-aware zero naturally.
Removed dead helpers / traits:
• `_deriv_zero_value`, `_deriv_zero_fill` + Constant/Linear/Hetero specializations
• `_is_any_deriv` (Constant ND oneshot)
• Protocol-level early returns in `_eval_nd_at_point` and the ND in-place
batch loop in `interpolant_protocol.jl`
Tests (`test/test_carrier_type_stability.jl`):
• Cat A (@inferred type stability) + Cat B (sub-zero deriv carrier) unchanged.
• Cat C/D restructured around *cell-local* NaN: a NaN at a cell corner the
query reads must propagate; a NaN outside the query's cell must NOT.
• Scope note: cell-local applies to kernel-local methods (Constant, Linear,
Hermite-family). Cubic/Quadratic perform a global tridiagonal solve at
build, so a single NaN oxidizes every coefficient — by-design global,
not covered by this contract.
…er case)
Mirror the `AbstractArray` overload's NaN-propagating pattern on the
`Number` overload — Array case already does `0 .* val .+ …`, but Number
case did `zero(xq) * zero(val)`, stripping `val`'s NaN (cell-local
boundary data) at every OOB ClampExtrap/FillExtrap deriv query.
Path: `_eval_extrapolation(::DerivOp, y_bnd, ::ClampExtrap, xq)` /
`(::FillExtrap, xq)` → `_promote_extrap_zero(y_bnd, xq)`. `y_bnd` is the
cell-local boundary value (`first(y)` for OOB_LEFT, `last(y)` for
OOB_RIGHT) — its NaN carrier must propagate to match the in-domain
kernel's `0 * y_left * one(dL)` behavior and the Array overload's
existing `0 .* val .+ …` form. Affects all 7 methods (Linear / Constant /
Cubic / Quadratic / Pchip / Cardinal / Akima) uniformly via the shared
helper.
Tests (`test/test_carrier_type_stability.jl` — new Cat E):
- Helper-level Number vs Array consistency
- 1D OOB ClampExtrap × deriv × 5 cell-local methods × {OOB_LEFT, OOB_RIGHT}
- 1D OOB FillExtrap × deriv × 5 cell-local methods × {OOB_LEFT, OOB_RIGHT}
- Cell-local invariant: NaN at the OTHER boundary stays hidden
- Carrier preservation: Dual xq + NaN val → Dual{NaN, 0} (matches `_promote_extrap_val`)
22 RED tests, then GREEN after the 1-line fix. Full regression sweep
clean (10458 lines, 96% coverage, all tests passed).
Cat A pins `@inferred` on in-domain × deriv (NoExtrap default) paths; Cat E exercises a new path — OOB × ClampExtrap/FillExtrap × deriv through `_eval_extrapolation` → `_promote_extrap_zero` — which Cat A doesn't cover. Add `@inferred` at the helper level for both overloads (Number, Array, Dual carrier) so the recent fix's type stability is locked against future regressions.
…l paths
Phase 3 + Bundle A established the cell-local NaN policy at the ND
kernel and OOB extrap helper layers; this commit closes the remaining
Constant-family gaps where deriv-zero branches used `0 * first(y)`
(non-cell-local) while their EvalValue siblings already used cell-local
indices like `last(y)` / `y[aq.idxR]` / `y[n_pts, k]` / `y_point[k, idx]`.
Sites (4 files, 11 lines):
- `constant_oneshot.jl` (2): InBounds + WrapExtrap right-edge seam.
`0 * first(y) * one(xi)` → `0 * last(y) * one(xi)` — cell-local right
endpoint, mirrors EvalValue branch.
- `constant_anchor.jl` (3): NoExtrap, AbstractExtrap default, ClampOrFill
in-domain at `aq.xq == x_last`. `0 * first(y) * one(aq.xq)` →
`@inbounds 0 * y[aq.idxR] * one(aq.xq)` — symmetric with EvalValue.
- `constant_series_interp.jl` (3): scalar series right-boundary
(`_eval_constant_series_point!`) + anchored right-boundary
(`_eval_constant_series_with_extrap`) + anchored in-domain deriv
(`_eval_constant_series_anchored`). Per-series loops use
`y_point[k, n_pts]` / `y[n_pts, k]` / `y[aq.idxL, k]` instead of
`first(y)` — keeps per-series NaN cell-local.
- `series_utils.jl` (4): `_constant_extrap_boundary_value` (layout
`[time, series]`) + `_fill_constant_extrap_simd!` (layout
`[series, time]`, named `y_point`). Both deriv overloads now consume
the previously-unused `side` / `n_pts` / `k` args via
`_boundary_point_index` for per-series cell-local zero.
Tests (`test_carrier_type_stability.jl` Cat F, 30 assertions):
- 1D oneshot scalar — right-edge × {NoExtrap, ClampExtrap, WrapExtrap}
- 1D persistent scalar — right-edge × {NoExtrap, ClampExtrap, FillExtrap}
- Series batch in-domain — per-series cell-local at NaN-bearing cell
- Series scalar right-boundary — per-series cell-local at right edge
- Series scalar/batch OOB ClampExtrap — per-series cell-local via fills
- Cell-local invariant: NaN at the OTHER side stays hidden (13 asserts)
17 RED before fix → 30 GREEN after. Targeted regression (test_carrier_
type_stability + 7 Constant/Series/extrap test files) all pass.
NoInterp axes are structurally lookup-only (no math, like Constant). When a non-zero DerivOp targets a NoInterp axis, the existing short- circuits returned `zero(Tz)` of the promoted type — dropping any cell- local NaN in the queried data slice. Mirror the Constant Phase 3 pattern by multiplying the cell-local data by `zero(Tz)`: the sliced data carries NaN, `zero(Tz)` enforces type promotion; together they yield a type- correct, NaN-aware deriv-zero. Sites fixed (2 of 3): - `_eval_nointerp` line ~282 (all-NoInterp persistent, `N_r == 0`): hoist `data_expr` (covers both `_HeteroPartials` and raw `Array` storage), wrap deriv guard with `$data_expr * zero($Tz_expr)`. - `_interp_nointerp_oneshot` line ~475 (all-GridIdx oneshot, `grids_r === ()`): `data_r[] * zero(_output_eltype(...))`. Scope exception: `_interp_nointerp_oneshot` line ~489 (mixed Real + NoInterp axes with deriv on NoInterp) still returns plain `zero(Tz)` — strict cell-local would require running the Real-axis interp first (potential doubled work for global-solve methods); deferred as a separate follow-up. Tests (`test_carrier_type_stability.jl` Cat G, 4 assertions): - All-NoInterp persistent — cell-local NaN propagates + out-of-cell stays hidden - All-GridIdx oneshot — cell-local NaN propagates + out-of-cell stays hidden 2 RED before fix → 4 GREEN after. Targeted regression (test_carrier_ type_stability + test_nointerp) all pass.
…ttern for perf Bundle B+C's per-series cell-local NaN propagation in Constant Series deriv-zero paths is reverted to master's broadcast `0 * first(y)` pattern. The cell-local indexed loads add ~1 ns per call on the batch series eval inner loop (called K × Q times — 50k iterations on a 1000-series × 50- query batch), measuring **+49% per-call cost** (21.6 → 32.2 μs) on the benchmark scenario. Reverted sites (4): - `series_utils.jl` — `_constant_extrap_boundary_value` (2 overloads) - `constant_series_interp.jl` — `_eval_constant_series_point!` right-boundary deriv (scalar in-domain edge) - `constant_series_interp.jl` — `_eval_constant_series_with_extrap` `aq.xq == x_max` deriv branch (anchor-clipped OOB) ← actual perf culprit - `constant_series_interp.jl` — `_eval_constant_series_anchored` in-domain deriv branch - `series_utils.jl` — `_fill_constant_extrap_simd!` (already reverted for SIMD broadcast — kept consistent with renamed `y_point` arg) The Hetero / 1D right-edge / `_promote_extrap_zero` / NoInterp cell-local changes (Bundles A, B 1D, D) are KEPT — those sites have ≤ noise perf impact (verified via master worktree benchmark, all < 5 ns/call). Tests (`test_carrier_type_stability.jl` Cat F): - 4 series cell-local assertions marked `@test_broken` with explanatory comments — future fix path noted (NaN-presence detection at construction). - Out-of-cell invariant assertions kept as `@test` (still hold under broadcast since `first(y)` is one fixed cell). Perf verification: master 21.6 μs → branch (post-revert) 21.6 μs on the canonical bench (1000-series × 50-query Series batch OOB ClampExtrap deriv). Other paths unchanged at noise level.
Comments + test coverage cleanup before merging. Comments: - `linear_nd_eval.jl` deriv-≥2 weight: correct stale description (Phase 3 removed the protocol short-circuit; kernel handles it cell-locally). - `interpolant_protocol.jl`: replace dead `_select_output_eltype` reference with `_arithmetic_kernel_shape` / `_constant_kernel_shape`. - Series perf-trade-off comments (4 sites): trim to one-line "why broadcast". - Cat F header + `@test_broken` testset comments: drop dev-history phrasing. Test coverage: - Cat A: add `quadratic_interp` `@inferred` for oneshot scalar + persistent scalar (DerivOp(3) — beyond polynomial degree). - Cat E: extend OOB ClampExtrap/FillExtrap × deriv NaN loop to include `cubic_interp` and `quadratic_interp` (both route through `_promote_extrap_zero` at OOB, so the Bundle A fix applies). - Cat D: add Linear ND scalar ↔ in-place batch cross-path check (verifies kernel-native `_linear_weight(::EvalDeriv2+) = zero(α)`).
…anchored` "Derivatives of constant (step) function are zero" restates what the `0 * first(y)` expression already shows. The perf rationale comment is covered at `_constant_extrap_boundary_value` (the first occurrence).
Wrap plain `@test EXPR isa T` checks in duck-typing tests with `@inferred`
so they pin both result type AND inference stability — same pattern as
Cat A in `test_carrier_type_stability.jl`.
- `test_duck_tv_dual_tq.jl`: 84 conversions (SVector/Dual Tv/Tq carriers).
- `test_duck_typing_comprehensive.jl`: 66 conversions (DuckFloat-style Tv).
Two test groups in `test_duck_tv_dual_tq.jl` kept as plain `isa`:
- ND `ForwardDiff.gradient/hessian/derivative` (lines 306-317): AD wrappers
have `Any` internal inference; `@inferred` is over-strict for these.
- "Dual x grid carrier flows through Tv × Tq" (lines 266-275): some
`(fn, y, xq)` combos infer `Union{Float64, Dual}` even though the
runtime returns the right type. Latent type instability surfaces here.
Constant 1D right-edge short-circuit (`xi == last(x)`) returned
`y[idxR] * one(xi)` — only the query carrier `Tq`. The kernel path picks
up the grid carrier `Tg` via `aq.dL` / `dL = xq - xL`. For Dual grid +
Float query, the two paths returned `Float` and `Dual` respectively →
inference `Union{Tv, Dual}` even though runtime returns the right type.
Add `* one(eltype(x))` (oneshot) / `* one(x_last)` (anchor) to thread Tg
through the short-circuit. For Float grid this is a no-op (multiply by
1.0); for Dual grid it lifts to Dual. Now both branches return the
same type, inference is clean.
Sites (5):
- `constant_anchor.jl` 3 short-circuits (NoExtrap, AbstractExtrap default,
ClampOrFill in-domain).
- `constant_oneshot.jl` InBounds + WrapExtrap.
Test: re-promote `test_duck_tv_dual_tq.jl` "Dual x grid carrier flows
through Tv × Tq" testset to `@test (@inferred …) isa T` (was reverted to
plain `isa` in the previous commit; this latent inference bug was the
root cause).
…l_value
`_constant_interp_nd_oneshot{,_batch!}` passed a hardcoded `EvalValue()`
to `_try_fill_oob`, so OOB queries under `FillExtrap` always returned
the fill_value — even for deriv queries that should return zero.
Phase 3 (commit 364c3e4) removed a protocol-level `_is_any_deriv`
short-circuit at the public entry, which previously intercepted deriv
queries before they reached `_try_fill_oob`. The hardcoded `EvalValue()`
was correct under that short-circuit but became a regression once removed.
Persistent path (`interpolant_protocol.jl:131,173`) correctly passes
`ops` — only the oneshot path was missing the threading. Fix: pass `ops`
through `_try_fill_oob` at both scalar and batch oneshot sites.
Test (`test_carrier_type_stability.jl` Cat E): 4 assertions covering
scalar (mixed-axis deriv) + batch oneshot + persistent sanity check.
`_linear_weight(::EvalDeriv1, α, inv_h, ::Val{B}) = ±inv_h` returned `Tg`
only — for a `Dual` query whose carrier source was an `EvalDeriv1` axis
(mixed partial `(D1, D1)`, or `(EvalValue, D1)` with axis-2 Dual), the
multilinear corner sum dropped Tq and the batch path failed with
`TypeError` (buffer eltype Dual vs kernel return Float).
Fix mirrors the `EvalDeriv2+` pattern: `* one(α)` threads Tq even when
`α` does not appear in the weight expression. LLVM const-folds the `1.0`
factor on plain Float queries (zero cost). For Dual queries it costs one
extra Dual mul per corner — necessary for AD correctness.
Pre-existing on master; this PR's carrier-propagation contract surfaces
the gap. Test coverage extended in Cat B with an ND counterpart of the
existing 1D "non-zero deriv" testset, including mixed-partial
`(D1, D1)` on Linear / Cubic ND.
…)` + per-k cell-local OOB helpers
Series deriv branches used broadcast `0 * first(y)` (single-element load,
SIMD-friendly) which dropped both cell-local NaN and Tq carrier — only
NaN at `y[1]` could leak, regardless of which series cell was queried,
and `Dual` queries returned the underlying `Tv` without carrier.
This commit routes the deriv path through the value path's exact
mechanism on every Series eval site:
* `_eval_constant_series_anchored` and the scalar in-domain right-edge
branch in `_eval_constant_series_point!` collapse to a single
`_constant_kernel(op, y_left, y_right, h, dL, side)` call — value
and deriv share one code path. The 1D `_constant_kernel(::EvalDeriv*)`
overloads already follow the unified `0 * y * one(dL)` pattern.
* Shared OOB helpers `_constant_extrap_boundary_value` and
`_fill_constant_extrap_simd!` (in `core/series_utils.jl`, used by
Linear/Cubic/Quadratic/Constant Series alike) take an `aq` argument
and thread `* one(aq.xq)` for Tq carrier. `aq.xq` is the only common
field across the four anchored-query types, so dispatch stays
duck-typed. Clamp/Fill deriv overloads merged onto `::_ClampOrFill`
and source their zero from boundary `y[idx, k]` (not `e.fill_value`)
so a NaN-bearing `fill_value` does NOT leak through deriv while a
NaN at the boundary cell still does — preserves the existing OOB
FillExtrap × deriv contract.
Performance: per-k cell-local indexing inside the SIMD loop replaces a
single broadcast value, but `one(aq.xq)` hoists out of the inner loop
so the body is one load + one mul per `k`. Series scalar deriv paths
are ~14–17% slower (102 ns vs 88 ns); batch deriv is bounded by the
value-path cost (now ≈ value).
Cat F's four `@test_broken` assertions (Series in-domain / right-edge /
OOB ClampExtrap × deriv) flip to `@test`. Direct-call tests in
`test/test_series_utils.jl` updated for the new signature; bonus
regression pin for FillExtrap(NaN) × deriv = 0.
… pattern When a `HeteroInterpolantND` query has `Real` axes + at least one `NoInterp` axis (mixed case) AND a non-zero deriv on a NoInterp axis, the `@generated _eval_nointerp` body short-circuited with `return zero($Tz_expr)` — a type-only zero that strips both cell-local NaN at the queried slice and any Tq carrier from the Real-axis search. Same pattern as Constant ND's `_constant_nd_kernel(...) * 0` dispatch: remove the early-return, run the Real-axis kernel (`_eval_hetero_nd_cell` / `_collapse_dims`) normally, and wrap the result with `result * 0` when any NoInterp axis carries a non-zero deriv. `result` is already carrier-threaded by the Real-axis kernel (IEEE: `0 * Dual = Dual(0, 0)`, `0 * NaN = NaN`), so the late `* 0` preserves Tq + Tg + Tv while zeroing the value. Applied to three branches of the `@generated` body (`_HeteroPartials` / OnTheFly local / OnTheFly global) and to the `_interp_nointerp_oneshot` mixed case. The runtime cost is one extra multiplication on the deriv path — Real-axis search would have been required anyway for a strict cell-local contract, so the saving from the prior early-return was conceptually unavailable. Also drops a stale "Constant Phase 3" label in the all-NoInterp comment (the pattern it references is now the regular `_constant_nd_evaluate` dispatch). Cat G extended with Mixed Linear+NoInterp persistent + oneshot cell-local NaN tests.
Sweeps the small fixes surfaced by the pre-merge code-review pass:
* Constant `NoExtrap` anchored-query right-edge bug
(`_constant_anchor_dispatch(NoExtrap)` at `aq.xq == last(itp.x)`):
the wrapper used `zero(T) * one(aq.xq)`, dropping Tg carrier and
stripping NaN at the boundary cell. Now delegates to the shared
`_constant_eval_at_anchor(::NoExtrap)` path which mirrors the cell-
local pattern used by the other three sibling overloads. Pinned by a
new "1D anchored-query — right-edge cell-local" subtest in Cat F.
* `constant_oneshot.jl`: scalar `_constant_eval_at_point` right-edge
short-circuits switched from `one(xi) * one(eltype(x))` to
`one(Tq) * one(Tg)` — the function signature already binds both type
parameters, so use them directly. Same shape semantically, more
explicit at the call site.
* Stale `_sample_data` trait comments (Linear / Cubic / Quadratic ND
eval): "Zero-ref for fill-value derivative computation" → "Per-method
sample of `Tv` for fill-value paths (e.g. `_try_fill_oob`)" to match
the wording landed for Constant ND in the trait rename series.
* Stale comment in `constant_anchor.jl` describing the right-edge
short-circuit pattern: now mentions both Tq and Tg threading via
`one(aq.xq) * one(x_last)`.
* `_promote_extrap_zero` docstring (`core/utils.jl`): broadened from
"derivative of constant" to the actual current contract
(carrier-aware zero under flat extrapolation, all method families).
Explains why the `0 * val` prefix is structurally required for NaN
propagation.
* Phase label cleanup in `constant_series_interp.jl` ("Phase E.2:") and
in `test_carrier_type_stability.jl` Cat F / Cat G headers (stray
"Phase 3" / "Constant Phase 3" references). Removes plan-era labels
that age poorly once the plan lands.
* `constant_oneshot.jl` scalar entry docstring: lists `EvalValue()` +
any `DerivOp(n)` (was capped at `DerivOp(2)`). Public API supports
arbitrary N already.
* `constant_nd_oneshot.jl` batch docstring: missing `ops` arg added to
both `_constant_interp_nd_oneshot_batch!` / `..._batch` signatures.
* `_constant_nd_evaluate` (`constant_nd_eval.jl`): rationale comment
added explaining why this path intentionally runs the kernel + `* 0`
rather than falling back to a `fill!` shortcut — guards future
reverts that might be tempted by the perf delta vs master.
Replaces the hand-rolled Linear/Cubic-only Cat B ND subtests with a loop over all six ND methods (Linear D1 / Cubic D3 / Quadratic D2 / PCHIP D3 / Cardinal D3 / Akima D3) and three path families (oneshot scalar / oneshot batch / persistent scalar+batch), each covering both the per-axis `(EvalValue, Dk)` pattern and the mixed-partial `(DerivOp(1), DerivOp(1))` pattern. Helpers `_check_oneshot_scalar` / `_check_oneshot_batch` / `_check_persistent` take `fn::F` so `@inferred` can specialize per concrete method; the for-loop body would otherwise see `fn` as a Union and collapse the inference check. Coverage matrix after this commit (Cat B ND, all `@inferred + isa`): | method | oneshot scalar | oneshot batch | persistent scalar+batch | |------------|:--------------:|:-------------:|:-----------------------:| | Linear | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Cubic | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Quadratic | ✓ × 2 | ✓ × 2 | ✓ × 4 | | PCHIP | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Cardinal | ✓ × 2 | ✓ × 2 | ✓ × 4 | | Akima | ✓ × 2 | ✓ × 2 | ✓ × 4 | Cat B grows from 14 → 57 assertions. The batch path is the most load-bearing: Linear's prior `_linear_weight(::EvalDeriv1)` Tq gap showed up there first as `TypeError` (buffer eltype `Dual` vs kernel return `Float`), which scalar-only coverage missed.
There was a problem hiding this comment.
Pull request overview
This PR threads the query carrier type (e.g. ForwardDiff.Dual, Measurement) through every derivative-zero return path across all interpolation method families, and unifies the cell-local NaN propagation contract. It closes carrier-and-NaN gaps that survived #146: previously, deriv paths in Constant ND, Series, and Hetero NoInterp would short-circuit via fill!, 0 * first(y), or zero(Tz), dropping both Tq carrier and any NaN at the queried cell.
Changes:
- Unifies "deriv-zero" handling so every kernel runs and is multiplied by
0after the carrier is threaded (preserving IEEENaN × 0 = NaN, Dual partials, etc.) — touches Constant/Linear/Quadratic/Cubic/Hermite kernels and ND/Series/Anchor dispatch. - Replaces
_zero_refwith_sample_dataand removes the_deriv_zero_filltrait shortcut; series helpers (_constant_extrap_boundary_value,_fill_constant_extrap_simd!) now takeaqand source the zero fromy[idx, k](notfill_value), so OOB FillExtrap × deriv returns0rather than leakingfill_value. - Adds extensive type-stability (
@inferred) and cell-local NaN propagation tests (Cat A–G) across every method × dimensionality × path combination.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/core/interpolant_protocol.jl | Removes _deriv_zero_fill shortcut; renames _zero_ref → _sample_data. |
| src/core/series_utils.jl | Adds aq parameter; deriv path now sources zero from cell-local y[idx, k]. |
| src/core/utils.jl | Fixes _promote_extrap_zero(::Number, ::Number) to propagate boundary NaN. |
| src/vector_calculus.jl | Switches _zero_ref → _sample_data at fill-OOB sites. |
| src/constant/constant_oneshot.jl | Right-edge short-circuits thread one(Tq) * one(Tg); updates docstring. |
| src/constant/constant_anchor.jl | Right-edge & NoExtrap short-circuits use cell-local y[aq.idxR] and thread Tg/Tq carriers. |
| src/constant/constant_series_interp.jl | Series right-boundary now routes through _constant_kernel for op-uniform carrier/NaN. |
| src/constant/nd/constant_nd_eval.jl | New _constant_nd_evaluate two-method dispatch (EvalValue vs deriv via * 0). |
| src/constant/nd/constant_nd_oneshot.jl | Threads ops through oneshot/batch dispatch; removes pre-call deriv short-circuits. |
| src/linear/linear_kernels.jl | Deriv kernels thread * one(α) carrier. |
| src/linear/linear_series_interp.jl | Adds aq param to series fill helpers. |
| src/linear/nd/linear_nd_eval.jl | Removes _deriv_zero_fill trait; _linear_weight(::EvalDeriv1) now * one(α) (fixes pre-existing TypeError on (D1,D1) Dual). |
| src/cubic/cubic_kernels.jl | EvalDeriv3/N kernels thread * one(dL). |
| src/cubic/cubic_series_interp.jl | Adds aq param to series fill helpers. |
| src/cubic/nd/cubic_nd_eval.jl | Renames _zero_ref → _sample_data. |
| src/cubic/nd/cubic_nd_math.jl | EvalDeriv3 + DerivOp{N} hermite kernels thread * one(dL). |
| src/quadratic/quadratic_kernels.jl | EvalDeriv2/3/N kernels thread * one(dL). |
| src/quadratic/quadratic_series_interp.jl | Adds aq param to series fill helper. |
| src/quadratic/nd/quadratic_nd_eval.jl | Renames _zero_ref → _sample_data. |
| src/hetero/hetero_eval.jl | Renames _zero_ref → _sample_data; removes _deriv_zero_fill override. |
| src/hetero/hetero_nointerp.jl | All-NoInterp / mixed paths multiply cell-local data * zero(Tz) instead of returning bare zero(Tz). |
| test/test_series_utils.jl | Updates calls for new aq arg; adds FillExtrap × NaN tests. |
| test/test_duck_typing_comprehensive.jl | Wraps existing isa checks with @inferred. |
| test/test_duck_tv_dual_tq.jl | Wraps existing isa checks with @inferred; notes AD wrappers exception. |
| test/test_carrier_type_stability.jl | New comprehensive test file covering Cat A–G (type stability, carrier propagation, cell-local NaN). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #147 +/- ##
==========================================
+ Coverage 96.43% 96.46% +0.03%
==========================================
Files 143 143
Lines 11978 11940 -38
==========================================
- Hits 11551 11518 -33
+ Misses 427 422 -5
🚀 New features to boost your workflow:
|
FastInterpolations.jl BenchmarksAll benchmarks (50 total, click to expand)
|
| Benchmark | Current | Previous | Imm. Ratio | Grad. Ratio | Tier |
|---|---|---|---|---|---|
10_nd_construct/tricubic_3d |
353489.0 ns |
320133.0 ns |
1.104 |
1.0 |
immediate |
11_nd_eval/bicubic_2d_scalar |
16.3 ns |
12.8 ns |
1.274 |
1.0 |
immediate |
11_nd_eval/bilinear_2d_scalar |
7.5 ns |
6.6 ns |
1.138 |
1.019 |
immediate |
11_nd_eval/tricubic_3d_batch |
3230.0 ns |
2489.8 ns |
1.297 |
0.989 |
immediate |
11_nd_eval/tricubic_3d_scalar |
33.5 ns |
24.8 ns |
1.348 |
0.979 |
immediate |
11_nd_eval/trilinear_3d_scalar |
13.2 ns |
10.0 ns |
1.321 |
0.975 |
immediate |
12_cubic_eval_gridquery/range_random |
4225.9 ns |
3638.9 ns |
1.161 |
0.979 |
immediate |
12_cubic_eval_gridquery/range_sorted |
4219.5 ns |
3613.4 ns |
1.168 |
0.98 |
immediate |
12_cubic_eval_gridquery/vec_random |
9311.3 ns |
7787.9 ns |
1.196 |
0.985 |
immediate |
12_cubic_eval_gridquery/vec_sorted |
3201.2 ns |
2620.3 ns |
1.222 |
0.997 |
immediate |
13_nd_oneshot_gridquery/bicubic_2d_rand_rand |
65569.2 ns |
58729.5 ns |
1.116 |
0.976 |
immediate |
13_nd_oneshot_gridquery/bicubic_2d_sort_rand |
62263.0 ns |
55491.7 ns |
1.122 |
0.982 |
immediate |
13_nd_oneshot_gridquery/bilinear_2d_rand_rand |
20179.6 ns |
13537.0 ns |
1.491 |
1.155 |
both |
13_nd_oneshot_gridquery/bilinear_2d_sort_rand |
9365.2 ns |
7454.0 ns |
1.256 |
0.919 |
immediate |
13_nd_oneshot_gridquery/bilinear_2d_sort_sort |
5684.2 ns |
3851.0 ns |
1.476 |
0.998 |
immediate |
14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand |
18714.9 ns |
15082.7 ns |
1.241 |
1.008 |
immediate |
14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand |
17861.2 ns |
14233.5 ns |
1.255 |
0.932 |
immediate |
1_cubic_oneshot/q00001 |
548.6 ns |
447.9 ns |
1.225 |
1.15 |
both |
1_cubic_oneshot/q10000 |
43548.1 ns |
37312.3 ns |
1.167 |
0.702 |
immediate |
2_cubic_construct/g0100 |
1388.2 ns |
1198.0 ns |
1.159 |
1.02 |
immediate |
2_cubic_construct/g1000 |
12678.7 ns |
10953.5 ns |
1.158 |
0.977 |
immediate |
3_cubic_eval/q00100 |
439.6 ns |
377.6 ns |
1.164 |
0.976 |
immediate |
3_cubic_eval/q10000 |
42632.4 ns |
36592.1 ns |
1.165 |
0.98 |
immediate |
4_linear_oneshot/q10000 |
18693.8 ns |
15620.5 ns |
1.197 |
0.99 |
immediate |
5_linear_construct/g0100 |
35.2 ns |
30.0 ns |
1.17 |
0.903 |
immediate |
5_linear_construct/g1000 |
234.1 ns |
203.8 ns |
1.149 |
0.84 |
immediate |
6_linear_eval/q00001 |
10.1 ns |
7.8 ns |
1.294 |
0.975 |
immediate |
6_linear_eval/q00100 |
194.8 ns |
162.8 ns |
1.196 |
0.984 |
immediate |
6_linear_eval/q10000 |
18480.4 ns |
15400.2 ns |
1.2 |
0.993 |
immediate |
7_cubic_range/scalar_query |
8.3 ns |
7.1 ns |
1.169 |
0.979 |
immediate |
7_cubic_vec/scalar_query |
11.3 ns |
8.1 ns |
1.395 |
1.054 |
immediate |
8_cubic_multi/construct_s001_q100 |
639.2 ns |
523.6 ns |
1.221 |
1.1 |
immediate |
8_cubic_multi/construct_s010_q100 |
4412.2 ns |
3704.2 ns |
1.191 |
0.986 |
immediate |
8_cubic_multi/construct_s100_q100 |
39625.6 ns |
33467.5 ns |
1.184 |
0.972 |
immediate |
8_cubic_multi/eval_s001_q100 |
807.5 ns |
640.8 ns |
1.26 |
1.08 |
immediate |
8_cubic_multi/eval_s010_q100 |
1782.3 ns |
1443.8 ns |
1.234 |
1.023 |
immediate |
8_cubic_multi/eval_s010_q100_scalar_loop |
2296.5 ns |
1945.3 ns |
1.181 |
0.985 |
immediate |
8_cubic_multi/eval_s100_q100 |
11541.5 ns |
9357.1 ns |
1.233 |
1.006 |
immediate |
9_nd_oneshot/tricubic_3d |
420902.1 ns |
361762.1 ns |
1.163 |
1.092 |
immediate |
9_nd_oneshot/trilinear_3d |
1034.9 ns |
837.2 ns |
1.236 |
0.975 |
immediate |
Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)
This comment was automatically generated by Benchmark workflow.
…ill_value
Mirrors the Constant ND OOB FillExtrap contract pinned in Cat E:
"deriv at OOB returns 0, not fill_value". The Hetero mixed Real +
GridIdx/NoInterp path violated this on both oneshot and persistent:
* Oneshot (`_interp_nointerp_oneshot`) — regression introduced by the
`result * 0` change: a Real-axis OOB query under `FillExtrap(NaN)`
used to early-return `zero(Tz)` (master), but the new flow lets
`interp(...)` compute `fill_value = NaN` and then multiplies by `0`,
yielding `NaN` (`NaN * 0 = NaN`).
* Persistent (`_eval_nointerp` @generated) — pre-existing master bug:
the OOB `_try_fill_oob` short-circuit runs BEFORE the NoInterp
deriv check, so any `FillExtrap` (incl. `NaN`) leaks regardless of
the NoInterp axis derivative.
Fix in both paths: when any NoInterp axis carries a non-zero deriv AND
the Real-axis query is OOB, return a slab-local zero sourced from the
data slice's first element (`data_r[1] * zero(Tz)`) — finite cells
yield 0, a NaN at the slab survives, and the user-supplied `fill_value`
cannot reach the result. The in-domain Real-axis path is unchanged
(still runs the kernel + `result * 0` for cell-local NaN).
Cat E gets a new testset "Hetero mixed OOB FillExtrap × NoInterp deriv
returns zero, not fill_value" — runs both `99.0` and `NaN` fill values
through oneshot and persistent. Pre-existing structural gap: Cat E
covered OOB contracts only for 1D + Constant ND, and Cat G covered
Hetero NoInterp only in-domain; the Hetero × OOB intersection was
unpinned. New testset closes it.
…t leak fill_value" This reverts commit 5f20143.
`outputs = [Vector{Tv_out}(undef, NQ) for _ in 1:K]` infers as `Vector`
(non-concrete) on Julia 1.10 LTS when `Tv_out` is bound from a function
return. The comprehension is lowered to a closure, and 1.10's inference
can't propagate the locally-bound `Type{T}` from the outer scope into
the closure body — even though `_output_eltype(...)` is fully concrete
in isolation. Verified by isolating: an inline type literal in the
comprehension infers correctly, a local-variable type does not, a
type-as-argument barrier does.
Extracts the pattern into `_alloc_series_batch_outputs(::Type{Tv}, K, NQ)`
in series_utils.jl. All four Series oneshot batch entry points (Linear /
Cubic / Quadratic / Constant) route through it.
Effect on LTS: `linear_interp(x, ::Series, ::Vector{<:Dual})` etc. now
infer concretely as `Vector{Vector{Tv_out}}` instead of `Vector`,
matching 1.11+/1.12+. Removes downstream type-instability that
previously propagated to callers on LTS only.
Unblocks the 4 batched-Dual @inferred tests in test_duck_tv_dual_tq.jl
"Series path — duck-Tv × duck-Tq carrier propagates" that were failing
on the LTS CI job.
Provides a version-conditional `@inferred` for tests that hit Julia
inference gaps present only on older releases (e.g. 1.10 LTS doesn't
propagate a locally-bound `Type{T}` from a function return into a
closure body, even when the function is concrete-inferred in isolation).
`@maybe_inferred expr` expands to:
- `Test.@inferred expr` on ≥1.12
- `expr` unchanged on older releases (runtime contract via the
surrounding `isa` / `==` still applies; inference assertion skipped)
Opt-in via `setup=[InferredCompat]` on `@testitem`. Use sparingly —
prefer fixing the source (e.g. extract a `::Type{T}` barrier, see
`_alloc_series_batch_outputs`) when the inference gap is in this
package's own code.
Added in 5408d2c as a defensive utility for hypothetical future LTS-only inference gaps. Never referenced by any testitem — the source-level `_alloc_series_batch_outputs` barrier (a9b8458) fully covers the only gap we hit on LTS, and no other test was waiting on this. Removed per YAGNI; if a real callsite shows up later, the macro is 6 lines.
…ontract
Unifies the OOB deriv contract across all paths so it matches the strict
cell-local interpretation: the OOB cell's data IS whatever the extrap
rule puts there. For `FillExtrap(c)`, that's `c` — so deriv = `0 * c`
(NaN propagates, finite × 0 = 0). Master's contract sourced deriv from
`y_bnd` regardless of extrap type, which inconsistently ignored the
fill_value as a NaN sentinel for AD / OOB-detection use cases.
Single-dispatch helper `_extrap_deriv_source(extrap, y_bnd)`:
- `ClampExtrap` → `y_bnd` (unchanged — boundary y NaN propagates)
- `FillExtrap` → `e.fill_value` (NaN fill_value propagates as data)
Threaded through every OOB deriv site:
* 1D `_eval_extrapolation(::DerivOp, ...)` collapsed to one Union method
using the helper. ClampExtrap behavior unchanged; FillExtrap now
sources from fill_value.
* ND `_fill_extrap_result(::AbstractEvalOp, fill_val, _, qe)` switched
from `zero_ref` to `fill_val` (signature stable, `zero_ref` retained
but unused). Tuple-ops overload likewise.
* Series `_constant_extrap_boundary_value` / `_fill_constant_extrap_simd!`
Clamp/Fill deriv collapsed to a single Union overload using the helper.
* Hetero mixed `_eval_nointerp` @generated body: OOB short-circuit now
runs `oob * 0` when a NoInterp axis has non-zero deriv — `oob` is
already `_promote_extrap_zero(fill_value, ...)` post-fix, so finite
fill → 0 and NaN fill propagates. Closes pre-existing master bug
where the OOB shortcut returned `fill_value` without applying
NoInterp deriv's `* 0`.
* Hetero mixed `_interp_nointerp_oneshot`: no code change — `result * 0`
already does the right thing once `_eval_extrapolation` is fixed.
Perf: every change site is on the OOB cold path. Helper is `@inline`
with singleton dispatch (LLVM dead-branch elimination). In-domain paths
remain bit-identical; ClampExtrap-only callers see no behavior change.
Tests:
* `Cat E` rewritten — pins the new contract end-to-end across 1D
(all 7 methods), ND Constant/Linear, Hetero mixed. 89 assertions.
Old "queried-boundary NaN propagates" tests for `FillExtrap` repurposed
to "y_bnd NaN is IGNORED" (the new contract distinction).
* `test_constextrap_fill.jl` updates: "Cubic with fill value",
"Linear with fill value", and "ND derivatives under FillExtrap"
flipped from `iszero` to `isnan` for `FillExtrap(NaN)` × deriv;
finite-fill assertions retained as-is.
Reverts the previous patch (5f20143) which sourced the OOB zero from
the data slice's first element under the prior contract — the option-B
flow sources from `fill_value` instead, so that patch's `data_r[1]`
short-circuit is unnecessary and was reverted in 81f221c.
The helper returns "what data sits in the OOB cell" — a property of the extrap, not specific to derivative evaluation. The old name (`_deriv_source`) implied deriv-only scope, but the underlying concept is the same `y_bnd` vs `fill_value` dispatch that already drives the `EvalValue` extrap path. Generalizing the name allows future callers (e.g. value-path consolidation) to reuse the helper without semantic mismatch. Behavior unchanged. 4 call sites renamed (`core/utils.jl` definition + `_eval_extrapolation` use; `core/series_utils.jl` × 2).
Three test files still pinned the prior contract ("FillExtrap(NaN) × deriv
returns 0, fill_value-as-sentinel"). Flipped to the new "fill_value-as-data"
semantics — NaN fill propagates through deriv via `0 * NaN = NaN`, finite
fill × 0 = 0 unchanged.
* `test_derivatives.jl` FillExtrap block: `D(4)` on cubic/quadratic at OOB
with NaN fill now expects `all(isnan, res)` per series; in-domain D(4)
for cubic stays `== z2`.
* `test_series_utils.jl` direct-call tests for `_constant_extrap_boundary_value`
and `_fill_constant_extrap_simd!`: FillExtrap deriv with NaN fill flips to
`isnan`; added explicit finite-fill (99.0) → 0 assertions.
* `test_mixed_precision_extrap.jl` FillExtrap derivatives: Float32 NaN fill
+ Float64 query now expects `isnan` with Float64 promotion; added
finite-fill (0.0f0) sibling case to keep the Tq promotion test alive.
…_oob_data` The two `::EvalValue` methods (ClampExtrap, FillExtrap) used the same `y_bnd` vs `fill_value` dispatch already centralized in `_extrap_oob_data` — but expressed it inline as separate overloads. Collapsing them onto the Union method that mirrors the `::DerivOp` shape makes the parallel structure explicit: EvalValue → `data + carrier` (value-promotion of OOB cell data) DerivOp → `0 * data + carrier` (deriv-promotion: 0 × OOB cell data) The shared scaffolding clarifies the deriv-path read: `_extrap_oob_data` fetches the cell's data and `_promote_extrap_zero` applies the `0 *` — the helper is asking "what data?", not "what derivative?". Behavior unchanged. ClampExtrap / FillExtrap value-path output bit-equal to before (the dispatch chain still resolves to the same source); deriv path also unchanged. Verified across `test_carrier_type_stability`, `test_constextrap_fill`, `test_constant`, `test_derivatives`.
The flagged performance regressions can be ignored
A few things support that read:
Identical code, 21% delta
By construction that's 100% runner variance, and the inflated-by-~21% baseline closely matches the PR's flagged ratios (1.13–1.34×). |
The "derivative view with fill value" testset pinned the prior contract (`FillExtrap(NaN) × deriv → 0`, no NaN in the recipe output). Under the new "fill_value-as-data" contract the OOB tails of `deriv1(itp)` carry `0 * NaN = NaN`, which Plots draws as gaps (intended visual outcome). Flipped the assertion to `any(isnan, yq)` for the NaN-fill case and added a finite-fill (`FillExtrap(0.0)`) sibling that still asserts no NaN.
Summary
Threads the query carrier (
ForwardDiff.Dual,Measurement, etc.) through every deriv-zero return path acrossConstant/Linear/Quadratic/Cubic/Hermite-familykernels and unifies the cell-local "OOB cell = extrap's data" contract. Closes two gaps left by #146:fill!(out, zero(Tv))(Constant ND), broadcast0 * first(y)(Series), or type-onlyzero(Tz)(Hetero NoInterp mixed). All three patterns dropTqcarrier and any NaN in the queried cell.FillExtrap × derivsemantics — master sourced deriv fromy_bnd(boundary y) regardless of extrap type, soFillExtrap(NaN)deriv silently returned0instead of propagating NaN. The new contract treatsfill_valueas the OOB cell's data: derivative= 0 * fill_value(NaN propagates, finite × 0 = 0).Size: 34 files, +1248 / -518. Single contract across all five method families.
Unified pattern
Cell-local deriv via
kernel × 0Every deriv-zero path routes through the kernel and multiplies by
0after the kernel has threaded the carrier:_constant_kernel(op, ...)/_linear_kernel(op, ...)etc. — single dispatch handles value/deriv with* one(dL)or* one(α)carrier thread_constant_nd_kernel(...) * 0— kernel threads* one(dL_d)per axis, outer* 0zeros value while preserving carrier (IEEE:NaN × 0 = NaN,Dual × 0 = Dual(0, 0))_eval_hetero_nd_cell(...) * 0— Real-axis kernel runs, NoInterp-axis deriv applies* 0at end (no morezero(Tz)early return)Series boundary helpers (
_constant_extrap_boundary_value,_fill_constant_extrap_simd!) now thread* one(aq.xq)forTqcarrier via signature change —aq.xqis the common field across all four anchored-query types._extrap_oob_datadispatch — extrap-aware OOB dataSingle helper centralizes the per-extrap "what data sits in the OOB cell" decision:
_eval_extrapolationcollapses bothEvalValueandDerivOppaths onto a paralleldata → promoteshape:The shared scaffolding makes the deriv path's intent obvious — the helper fetches the cell's data; the
0 *happens inside_promote_extrap_zero. Same dispatch threads through ND_fill_extrap_resultand Series boundary helpers. ClampExtrap behavior unchanged; FillExtrap deriv now sources fromfill_value, consistently across 1D / ND / Series / Hetero mixed.Behavior changes
Tqcarrier on every path:Dualquery →Dualresult for anyDerivOp(n)≥ 1, includingnbeyond the polynomial degree.Constant ND,Constant Series(in-domain + OOB),Hetero NoInterp(all-NoInterp + mixed Real+NoInterp) — NaN in the queried cell now reaches the result; NaN elsewhere stays hidden.FillExtrap × derivadopts cell-local "fill_value-as-data" semantics:FillExtrap(c)at OOB now treatscas the OOB cell's data → deriv =0 * c. Finitecreturns0(no change);NaN(or other non-finite)cpropagates through IEEE multiplication. Applies uniformly to 1D (all 7 methods), ND (Constant/Linear/etc.), Series, and Hetero mixed paths. Master inconsistently sourced fromy_bndregardless of extrap type, silently absorbing NaN fill_value sentinels.NoExtrapanchored-query right-edge bug:_constant_anchor_dispatch(NoExtrap)ataq.xq == last(itp.x)usedzero(T) * one(aq.xq), droppingTgcarrier and stripping NaN. Now delegates to the shared cell-local_constant_eval_at_anchorpath._linear_weight(::EvalDeriv1)Tqgap (pre-existing, master):_linear_weight(::EvalDeriv1, α, inv_h, ::Val{B}) = ±inv_hreturnedTgonly — for aDualquery onEvalDeriv1axes that were the only Tq source (e.g.(D1, D1)mixed partial, or(EvalValue, D1)with axis-2 Dual), the multilinear sum dropped the carrier and the batch path failed withTypeError. Threading via* one(α)mirrors theEvalDeriv2+pattern; LLVM const-folds the1.0factor on plain Float queries._eval_nointerpOOB short-circuit ran before the NoInterp-deriv check, so anyFillExtrap(c)returnedceven when a NoInterp axis carried non-zero deriv (mathematically must be 0). Nowoob * 0applies when NoInterp deriv is present — finitec→ 0, NaNcpropagates per the new contract.Performance
All numbers M1 Pro, Julia 1.12.6. Branch vs master.
Value path (no regression):
Deriv path (now equal to value path on branch):
The two highlighted ND rows are the only material slowdown. Master's
fill!(out, zero(Tv))shortcut was effectively a memset that bypassed all computation — fast precisely because it dropped carrier + NaN. The branch makes deriv pay the same cost as value (cell-local kernel +* 0); the cost is bounded by the value path, which is the structural ceiling.Seriesderiv paths slow ~14-17% (broadcast0 * first(y)→ per-k cell-local0 * y[idx, k] * one(aq.xq)) — same rationale; SIMD inner loop still hoistsone(aq.xq)outside the loop body._extrap_oob_datadispatch is@inlinewith singleton dispatch — LLVM specializes per concrete extrap type and dead-branch-eliminates. Zero overhead on OOB cold paths; in-domain paths never reach the helper.