Skip to content

(fix+refac): Cell-local NaN + Tq carrier through deriv-zero paths#147

Merged
mgyoo86 merged 38 commits into
masterfrom
fix/nan_propagation
May 28, 2026
Merged

(fix+refac): Cell-local NaN + Tq carrier through deriv-zero paths#147
mgyoo86 merged 38 commits into
masterfrom
fix/nan_propagation

Conversation

@mgyoo86
Copy link
Copy Markdown
Member

@mgyoo86 mgyoo86 commented May 27, 2026

Summary

Threads the query carrier (ForwardDiff.Dual, Measurement, etc.) through every deriv-zero return path across Constant / Linear / Quadratic / Cubic / Hermite-family kernels and unifies the cell-local "OOB cell = extrap's data" contract. Closes two gaps left by #146:

  1. Carrier + NaN through deriv-zero short-circuits — deriv paths used to discard the kernel via fill!(out, zero(Tv)) (Constant ND), broadcast 0 * first(y) (Series), or type-only zero(Tz) (Hetero NoInterp mixed). All three patterns drop Tq carrier and any NaN in the queried cell.
  2. FillExtrap × deriv semantics — master sourced deriv from y_bnd (boundary y) regardless of extrap type, so FillExtrap(NaN) deriv silently returned 0 instead of propagating NaN. The new contract treats fill_value as the OOB cell's data: derivative = 0 * fill_value (NaN propagates, finite × 0 = 0).

Size: 34 files, +1248 / -518. Single contract across all five method families.

Unified pattern

Cell-local deriv via kernel × 0

Every deriv-zero path routes through the kernel and multiplies by 0 after the kernel has threaded the carrier:

Site Pattern
1D kernels _constant_kernel(op, ...) / _linear_kernel(op, ...) etc. — single dispatch handles value/deriv with * one(dL) or * one(α) carrier thread
ND Constant batch _constant_nd_kernel(...) * 0 — kernel threads * one(dL_d) per axis, outer * 0 zeros value while preserving carrier (IEEE: NaN × 0 = NaN, Dual × 0 = Dual(0, 0))
Hetero mixed _eval_hetero_nd_cell(...) * 0 — Real-axis kernel runs, NoInterp-axis deriv applies * 0 at end (no more zero(Tz) early return)

Series boundary helpers (_constant_extrap_boundary_value, _fill_constant_extrap_simd!) now thread * one(aq.xq) for Tq carrier via signature change — aq.xq is the common field across all four anchored-query types.

_extrap_oob_data dispatch — extrap-aware OOB data

Single helper centralizes the per-extrap "what data sits in the OOB cell" decision:

@inline _extrap_oob_data(::ClampExtrap, y_bnd) = y_bnd        # boundary y extends into OOB
@inline _extrap_oob_data(e::FillExtrap, _)     = e.fill_value # fill_value IS the OOB data

_eval_extrapolation collapses both EvalValue and DerivOp paths onto a parallel data → promote shape:

EvalValue  _promote_extrap_val(_extrap_oob_data(ext, y_bnd), xq)   # data + carrier
DerivOp    _promote_extrap_zero(_extrap_oob_data(ext, y_bnd), xq)  # 0 × data + carrier

The shared scaffolding makes the deriv path's intent obvious — the helper fetches the cell's data; the 0 * happens inside _promote_extrap_zero. Same dispatch threads through ND _fill_extrap_result and Series boundary helpers. ClampExtrap behavior unchanged; FillExtrap deriv now sources from fill_value, consistently across 1D / ND / Series / Hetero mixed.

Behavior changes

  • Deriv preserves Tq carrier on every path: Dual query → Dual result for any DerivOp(n) ≥ 1, including n beyond the polynomial degree.
  • Cell-local NaN propagates through deriv-zero: Constant ND, Constant Series (in-domain + OOB), Hetero NoInterp (all-NoInterp + mixed Real+NoInterp) — NaN in the queried cell now reaches the result; NaN elsewhere stays hidden.
  • FillExtrap × deriv adopts cell-local "fill_value-as-data" semantics: FillExtrap(c) at OOB now treats c as the OOB cell's data → deriv = 0 * c. Finite c returns 0 (no change); NaN (or other non-finite) c propagates through IEEE multiplication. Applies uniformly to 1D (all 7 methods), ND (Constant/Linear/etc.), Series, and Hetero mixed paths. Master inconsistently sourced from y_bnd regardless of extrap type, silently absorbing NaN fill_value sentinels.
  • NoExtrap anchored-query right-edge bug: _constant_anchor_dispatch(NoExtrap) at aq.xq == last(itp.x) used zero(T) * one(aq.xq), dropping Tg carrier and stripping NaN. Now delegates to the shared cell-local _constant_eval_at_anchor path.
  • Linear ND _linear_weight(::EvalDeriv1) Tq gap (pre-existing, master): _linear_weight(::EvalDeriv1, α, inv_h, ::Val{B}) = ±inv_h returned Tg only — for a Dual query on EvalDeriv1 axes that were the only Tq source (e.g. (D1, D1) mixed partial, or (EvalValue, D1) with axis-2 Dual), the multilinear sum dropped the carrier and the batch path failed with TypeError. Threading via * one(α) mirrors the EvalDeriv2+ pattern; LLVM const-folds the 1.0 factor on plain Float queries.
  • Hetero mixed OOB FillExtrap × NoInterp deriv (pre-existing, master): The _eval_nointerp OOB short-circuit ran before the NoInterp-deriv check, so any FillExtrap(c) returned c even when a NoInterp axis carried non-zero deriv (mathematically must be 0). Now oob * 0 applies when NoInterp deriv is present — finite c → 0, NaN c propagates per the new contract.

Performance

All numbers M1 Pro, Julia 1.12.6. Branch vs master.

Value path (no regression):

Scenario Master Branch Δ
Constant Series in-domain batch (50q × 1k series) 62.4 μs 61.6 μs -1.3%
Constant Series OOB Clamp batch 43.7 μs 44.5 μs +1.8%
Constant Series scalar in-domain 124 ns 123 ns
Linear/Cubic Series scalar OOB Clamp 102 ns 102 ns
Constant ND 2D persist value (Nq=10k) 107.6 μs 107.6 μs

Deriv path (now equal to value path on branch):

Scenario Branch deriv Branch value Master deriv
Constant Series scalar in-domain 125 ns 123 ns 126 ns
Constant Series scalar OOB Clamp 103 ns 100 ns 88 ns
Constant ND 2D persist (Nq=1k) 11.04 μs 10.67 μs 81 ns
Constant ND 2D persist (Nq=10k) 111.0 μs 107.6 μs 680 ns

The two highlighted ND rows are the only material slowdown. Master's fill!(out, zero(Tv)) shortcut was effectively a memset that bypassed all computation — fast precisely because it dropped carrier + NaN. The branch makes deriv pay the same cost as value (cell-local kernel + * 0); the cost is bounded by the value path, which is the structural ceiling.

Series deriv paths slow ~14-17% (broadcast 0 * first(y) → per-k cell-local 0 * y[idx, k] * one(aq.xq)) — same rationale; SIMD inner loop still hoists one(aq.xq) outside the loop body.

_extrap_oob_data dispatch is @inline with singleton dispatch — LLVM specializes per concrete extrap type and dead-branch-eliminates. Zero overhead on OOB cold paths; in-domain paths never reach the helper.

mgyoo86 added 26 commits May 21, 2026 14:14
Adds `test/test_carrier_type_stability.jl` with four `@testitem` blocks
that combine `@inferred` and `isa T` into single assertions, locking
both type-stability and carrier-shape in one place.

  - Cat A: `@inferred` across deriv-aware paths (mostly broken pin
    for 1D scalar Float-drop; Phase 2 will close).
  - Cat B: Dual carrier preservation through sub-zero deriv paths
    (broken pin — 1D scalar Float-drop, Phase 2).
  - Cat C: NaN propagation through Dual partials on ND deriv-zero
    short-circuits — currently RED for in-place + persist-allocating
    paths (Constant ND).
  - Cat D: scalar / allocating / in-place / persistent path
    cross-equivalence (currently RED — bleeds from Cat C).

Created as the RED phase for the carrier-aware deriv-zero helper
refactor; the next commit drives Cat C/D to GREEN. Cat A/B broken
pins are tracked for Phase 2.
Three ND Constant deriv-zero short-circuits previously diverged from the
carrier-propagation contract the scalar / one-shot allocating paths
already honor:

  - `interpolant_protocol.jl:249` filled with `zero(eltype(output))`,
    dropping both the data's NaN/Inf and the query's carrier shape.
  - `constant_nd_oneshot.jl:237` filled with `0 * first(data)`,
    preserving NaN value but losing the carrier-shape lift (Dual
    partials collapsed to 0).
  - `interpolant_protocol.jl:186` was hand-coded with the right pattern
    but duplicated across call sites.

Adds `_deriv_zero_value(::Type{T_out}, sample_data, sample_query)` in
`src/core/utils.jl` with `Real` / `Tuple` dispatch:

    zero(T_out) + 0 * sample_data * <prod-folded-q>

Three layers split the work:

  - `zero(T_out)` pins the kernel's promotion lattice. `T_out` comes
    from `_output_eltype(itp, Tq)` (or the method-trait route for
    one-shot, no-`itp` callers), so it carries `Tg`-aware widening
    that the multiplicative body alone cannot reach. For the Constant
    lattice (`Tv * Tq`) this term is identity and LLVM removes it.
  - `sample_data` carries `Tv` and any NaN/Inf pattern; the leading
    `0 *` annihilates value while IEEE keeps the NaN bits live.
  - `sample_query` (the same scalar/tuple the caller already holds
    from `_extract_query_point`) supplies `Tq` and threads NaN into
    every hidden carrier slot (Dual partials, Measurement uncertainty,
    …) via the product rule.

All three deriv-zero short-circuits route through the new helper.
Closes 5 NaN-partials failures + 3 cross-path equivalence failures
in `test/test_carrier_type_stability.jl` (Cat C + Cat D, ND Constant).

The arithmetic-lattice methods (Linear/Cubic/Quadratic/Hermite) will
inherit the helper without further changes when Phase 2 audits each
method's own one-shot in-place branch.
The `_zero_ref` name was misleading — the trait does not return a "zero
value" but rather a representative `Tv` element used as a sample in the
deriv-zero short-circuit (`first(itp.data)` for Constant/Linear,
`first(itp.nodal_derivs.partials)` for Cubic/Quadratic, similar for
Hetero). Callers always multiply this by `0`, so the actual value is
irrelevant — only its type and any NaN/Inf pattern matter.

`_value_sample` better captures the trait's role: "a sample of the
value type Tv". Pure rename across 5 definitions + 11 use sites in
7 files; no behavior change.
Aligns the trait name with the helper's argument name (`sample_data`
in `_deriv_zero_value(::Type{T_out}, sample_data, sample_query)`),
making call sites read `_deriv_zero_value(T_out, _sample_data(itp),
first_q)` — same token on both sides of the assignment. Pure rename;
no behavior change.
…y` → `sampled_data` / `sampled_query`

Disambiguates the helper's parameter names from the `_sample_data(itp)`
trait introduced in b50f56e. Past-participle form ("already sampled")
distinguishes the *result* (helper argument) from the *extraction
function* (trait), reducing visual noise at call sites.

Pure rename; no behavior change.
…e(α)`

The EvalDeriv1/2/3 and generic DerivOp{N} branches of `_linear_kernel`
computed slope/zero results that ignored the query parameter `α`,
dropping Dual carrier on the 1D scalar path. Multiplying by `one(α)`
threads the query carrier (Dual partials, Measurement uncertainty, …)
through the kernel output; for plain `Real` `α` LLVM const-folds the
`1.0` factor away, so perf is unchanged.

Promotes 5 broken pins in `test_carrier_type_stability.jl`:
  - Cat A 1D oneshot/persist scalar @inferred linear deriv=2
  - Cat B 1D oneshot scalar Linear DerivOp(1) Dual return
  - Cat B 1D persist scalar Linear DerivOp(1) Dual return
  - Cat B 1D oneshot scalar Linear DerivOp(2) Dual return

Perf-neutral (within ±2% noise across 1D scalar / batch / 2D / persistent
paths, zero allocations preserved):
  1D scalar       baseline 5-7 ns   →  post-fix 5-7 ns
  1D batch alloc  baseline 42-57 ns →  post-fix 41-58 ns
  2D scalar       baseline 14 ns    →  post-fix 14 ns

Cubic/Quadratic/PCHIP/Cardinal/Akima still pending (same kernel pattern,
to be applied in follow-up commits).
…(dL)`

EvalDeriv3 and the generic DerivOp{N≥4} branches of `_cubic_kernel`
ignored the query offset `dL`/`dR` (both underscored), dropping Dual
carrier on the 1D scalar path. Naming `dL` in the signature and
multiplying the result by `one(dL)` threads the query carrier through;
EvalValue/EvalDeriv1/EvalDeriv2 already use `dL`/`dR` in their formulas
so they were carrier-aware by construction.

Promotes 5 broken pins in `test_carrier_type_stability.jl`:
  - Cat A 1D oneshot/persist scalar @inferred cubic deriv=4
  - Cat B 1D oneshot scalar Cubic DerivOp(3) Dual return
  - Cat B 1D persist scalar Cubic DerivOp(3) Dual return
  - Cat B 1D oneshot scalar Cubic DerivOp(4) Dual return

Perf-neutral (1D scalar 74-83 ns oneshot, 4-7 ns persistent, 0 allocs).
PCHIP/Cardinal/Akima/Hermite still pending — same pattern.
… one(dL)`

EvalDeriv2/3 and the generic DerivOp{N≥3} branches of `_quadratic_kernel`
underscored the `dL` argument and returned values that ignored the query,
dropping Dual carrier on the 1D scalar path. Naming `dL` and multiplying
by `one(dL)` threads the query carrier through. EvalValue/EvalDeriv1
already use `dL` in their formulas so they were carrier-aware.

Promotes 1 broken pin in `test_carrier_type_stability.jl`:
  - Cat B 1D oneshot scalar Quadratic DerivOp(2) Dual return

PCHIP/Cardinal/Akima/Hermite still pending.
…ne(dL)`

`_hermite_kernel_1d` is shared by all Hermite-family methods (PCHIP /
Cardinal / Akima / Hermite). EvalDeriv3 ignored `dL` (third derivative
is constant) and DerivOp{N≥4} had it underscored, dropping Dual carrier
on the 1D scalar path. EvalValue/EvalDeriv1/EvalDeriv2 already thread
`dL` through `t = dL * inv_h` so they were carrier-aware.

Promotes 5 broken pins in `test_carrier_type_stability.jl`:
  - Cat A 1D oneshot scalar @inferred PCHIP/Cardinal/Akima deriv=4
  - Cat B 1D oneshot scalar PCHIP DerivOp(3) Dual return
  - Cat B 1D oneshot scalar PCHIP DerivOp(4) Dual return

Single kernel fix → all 4 Hermite-family methods carrier-aware on the
1D scalar deriv path. Regression sweep (test_pchip / test_cardinal /
test_akima / test_hermite) all green; perf neutral.
…circuits

Replace the Phase 1 protocol-level deriv-zero short-circuits (which used
`first(data)` as a global sample, conflating NaN policy across the whole
array) with a cell-local kernel-native mechanism. The kernel's own
`zero(α)` / per-corner weight × 0 flow already produces a carrier-aware
zero from the cell-local data corners — protocol short-circuits were both
unnecessary and overly conservative.

Constant ND (kernel only handled EvalValue):
  • Add multi-dispatch wrapper `_constant_nd_evaluate`:
      ::NTuple{N, EvalValue}     → forward to `_constant_nd_kernel`
      ::NTuple{N, AbstractEvalOp} → `_constant_nd_kernel(...) * 0`
    Cell-local kernel result carries NaN/Inf via IEEE `NaN * 0 = NaN`;
    Tq carrier rides on the kernel's per-axis `* one(dL_d)`.
  • Thread `ops::NTuple{N, AbstractEvalOp}` through the oneshot stack
    (`_constant_interp_nd_oneshot{,_batch,_batch!}`, `_constant_nd_batch_dispatch{!,}`).

Linear ND (kernel already handled deriv ≥ 2 via `_linear_weight(::EvalDeriv2+, …) = zero(α)`):
  • Drop the `_eval_at_cell` short-circuit that returned `0 * first(itp.data)` and
    dropped non-axis-1 Tq carrier on heterogeneous queries.
  • Kernel's existing per-corner `data[corner] * weights_product` (with one factor
    being `zero(α)`) produces the cell-local carrier-aware zero naturally.

Removed dead helpers / traits:
  • `_deriv_zero_value`, `_deriv_zero_fill` + Constant/Linear/Hetero specializations
  • `_is_any_deriv` (Constant ND oneshot)
  • Protocol-level early returns in `_eval_nd_at_point` and the ND in-place
    batch loop in `interpolant_protocol.jl`

Tests (`test/test_carrier_type_stability.jl`):
  • Cat A (@inferred type stability) + Cat B (sub-zero deriv carrier) unchanged.
  • Cat C/D restructured around *cell-local* NaN: a NaN at a cell corner the
    query reads must propagate; a NaN outside the query's cell must NOT.
  • Scope note: cell-local applies to kernel-local methods (Constant, Linear,
    Hermite-family). Cubic/Quadratic perform a global tridiagonal solve at
    build, so a single NaN oxidizes every coefficient — by-design global,
    not covered by this contract.
…er case)

Mirror the `AbstractArray` overload's NaN-propagating pattern on the
`Number` overload — Array case already does `0 .* val .+ …`, but Number
case did `zero(xq) * zero(val)`, stripping `val`'s NaN (cell-local
boundary data) at every OOB ClampExtrap/FillExtrap deriv query.

Path: `_eval_extrapolation(::DerivOp, y_bnd, ::ClampExtrap, xq)` /
`(::FillExtrap, xq)` → `_promote_extrap_zero(y_bnd, xq)`. `y_bnd` is the
cell-local boundary value (`first(y)` for OOB_LEFT, `last(y)` for
OOB_RIGHT) — its NaN carrier must propagate to match the in-domain
kernel's `0 * y_left * one(dL)` behavior and the Array overload's
existing `0 .* val .+ …` form. Affects all 7 methods (Linear / Constant /
Cubic / Quadratic / Pchip / Cardinal / Akima) uniformly via the shared
helper.

Tests (`test/test_carrier_type_stability.jl` — new Cat E):
- Helper-level Number vs Array consistency
- 1D OOB ClampExtrap × deriv × 5 cell-local methods × {OOB_LEFT, OOB_RIGHT}
- 1D OOB FillExtrap × deriv × 5 cell-local methods × {OOB_LEFT, OOB_RIGHT}
- Cell-local invariant: NaN at the OTHER boundary stays hidden
- Carrier preservation: Dual xq + NaN val → Dual{NaN, 0} (matches `_promote_extrap_val`)

22 RED tests, then GREEN after the 1-line fix. Full regression sweep
clean (10458 lines, 96% coverage, all tests passed).
Cat A pins `@inferred` on in-domain × deriv (NoExtrap default) paths;
Cat E exercises a new path — OOB × ClampExtrap/FillExtrap × deriv
through `_eval_extrapolation` → `_promote_extrap_zero` — which Cat A
doesn't cover. Add `@inferred` at the helper level for both overloads
(Number, Array, Dual carrier) so the recent fix's type stability is
locked against future regressions.
…l paths

Phase 3 + Bundle A established the cell-local NaN policy at the ND
kernel and OOB extrap helper layers; this commit closes the remaining
Constant-family gaps where deriv-zero branches used `0 * first(y)`
(non-cell-local) while their EvalValue siblings already used cell-local
indices like `last(y)` / `y[aq.idxR]` / `y[n_pts, k]` / `y_point[k, idx]`.

Sites (4 files, 11 lines):
- `constant_oneshot.jl` (2): InBounds + WrapExtrap right-edge seam.
  `0 * first(y) * one(xi)` → `0 * last(y) * one(xi)` — cell-local right
  endpoint, mirrors EvalValue branch.
- `constant_anchor.jl` (3): NoExtrap, AbstractExtrap default, ClampOrFill
  in-domain at `aq.xq == x_last`. `0 * first(y) * one(aq.xq)` →
  `@inbounds 0 * y[aq.idxR] * one(aq.xq)` — symmetric with EvalValue.
- `constant_series_interp.jl` (3): scalar series right-boundary
  (`_eval_constant_series_point!`) + anchored right-boundary
  (`_eval_constant_series_with_extrap`) + anchored in-domain deriv
  (`_eval_constant_series_anchored`). Per-series loops use
  `y_point[k, n_pts]` / `y[n_pts, k]` / `y[aq.idxL, k]` instead of
  `first(y)` — keeps per-series NaN cell-local.
- `series_utils.jl` (4): `_constant_extrap_boundary_value` (layout
  `[time, series]`) + `_fill_constant_extrap_simd!` (layout
  `[series, time]`, named `y_point`). Both deriv overloads now consume
  the previously-unused `side` / `n_pts` / `k` args via
  `_boundary_point_index` for per-series cell-local zero.

Tests (`test_carrier_type_stability.jl` Cat F, 30 assertions):
- 1D oneshot scalar — right-edge × {NoExtrap, ClampExtrap, WrapExtrap}
- 1D persistent scalar — right-edge × {NoExtrap, ClampExtrap, FillExtrap}
- Series batch in-domain — per-series cell-local at NaN-bearing cell
- Series scalar right-boundary — per-series cell-local at right edge
- Series scalar/batch OOB ClampExtrap — per-series cell-local via fills
- Cell-local invariant: NaN at the OTHER side stays hidden (13 asserts)

17 RED before fix → 30 GREEN after. Targeted regression (test_carrier_
type_stability + 7 Constant/Series/extrap test files) all pass.
NoInterp axes are structurally lookup-only (no math, like Constant).
When a non-zero DerivOp targets a NoInterp axis, the existing short-
circuits returned `zero(Tz)` of the promoted type — dropping any cell-
local NaN in the queried data slice. Mirror the Constant Phase 3 pattern
by multiplying the cell-local data by `zero(Tz)`: the sliced data carries
NaN, `zero(Tz)` enforces type promotion; together they yield a type-
correct, NaN-aware deriv-zero.

Sites fixed (2 of 3):
- `_eval_nointerp` line ~282 (all-NoInterp persistent, `N_r == 0`):
  hoist `data_expr` (covers both `_HeteroPartials` and raw `Array`
  storage), wrap deriv guard with `$data_expr * zero($Tz_expr)`.
- `_interp_nointerp_oneshot` line ~475 (all-GridIdx oneshot,
  `grids_r === ()`): `data_r[] * zero(_output_eltype(...))`.

Scope exception: `_interp_nointerp_oneshot` line ~489 (mixed Real +
NoInterp axes with deriv on NoInterp) still returns plain `zero(Tz)` —
strict cell-local would require running the Real-axis interp first
(potential doubled work for global-solve methods); deferred as a
separate follow-up.

Tests (`test_carrier_type_stability.jl` Cat G, 4 assertions):
- All-NoInterp persistent — cell-local NaN propagates + out-of-cell stays hidden
- All-GridIdx oneshot — cell-local NaN propagates + out-of-cell stays hidden

2 RED before fix → 4 GREEN after. Targeted regression (test_carrier_
type_stability + test_nointerp) all pass.
…ttern for perf

Bundle B+C's per-series cell-local NaN propagation in Constant Series
deriv-zero paths is reverted to master's broadcast `0 * first(y)` pattern.
The cell-local indexed loads add ~1 ns per call on the batch series eval
inner loop (called K × Q times — 50k iterations on a 1000-series × 50-
query batch), measuring **+49% per-call cost** (21.6 → 32.2 μs) on the
benchmark scenario.

Reverted sites (4):
- `series_utils.jl`     — `_constant_extrap_boundary_value` (2 overloads)
- `constant_series_interp.jl` — `_eval_constant_series_point!`
  right-boundary deriv (scalar in-domain edge)
- `constant_series_interp.jl` — `_eval_constant_series_with_extrap`
  `aq.xq == x_max` deriv branch (anchor-clipped OOB) ← actual perf culprit
- `constant_series_interp.jl` — `_eval_constant_series_anchored`
  in-domain deriv branch
- `series_utils.jl`     — `_fill_constant_extrap_simd!` (already reverted
  for SIMD broadcast — kept consistent with renamed `y_point` arg)

The Hetero / 1D right-edge / `_promote_extrap_zero` / NoInterp cell-local
changes (Bundles A, B 1D, D) are KEPT — those sites have ≤ noise perf
impact (verified via master worktree benchmark, all < 5 ns/call).

Tests (`test_carrier_type_stability.jl` Cat F):
- 4 series cell-local assertions marked `@test_broken` with explanatory
  comments — future fix path noted (NaN-presence detection at construction).
- Out-of-cell invariant assertions kept as `@test` (still hold under
  broadcast since `first(y)` is one fixed cell).

Perf verification: master 21.6 μs → branch (post-revert) 21.6 μs on the
canonical bench (1000-series × 50-query Series batch OOB ClampExtrap
deriv). Other paths unchanged at noise level.
Comments + test coverage cleanup before merging.

Comments:
- `linear_nd_eval.jl` deriv-≥2 weight: correct stale description (Phase 3
  removed the protocol short-circuit; kernel handles it cell-locally).
- `interpolant_protocol.jl`: replace dead `_select_output_eltype` reference
  with `_arithmetic_kernel_shape` / `_constant_kernel_shape`.
- Series perf-trade-off comments (4 sites): trim to one-line "why broadcast".
- Cat F header + `@test_broken` testset comments: drop dev-history phrasing.

Test coverage:
- Cat A: add `quadratic_interp` `@inferred` for oneshot scalar + persistent
  scalar (DerivOp(3) — beyond polynomial degree).
- Cat E: extend OOB ClampExtrap/FillExtrap × deriv NaN loop to include
  `cubic_interp` and `quadratic_interp` (both route through
  `_promote_extrap_zero` at OOB, so the Bundle A fix applies).
- Cat D: add Linear ND scalar ↔ in-place batch cross-path check
  (verifies kernel-native `_linear_weight(::EvalDeriv2+) = zero(α)`).
…anchored`

"Derivatives of constant (step) function are zero" restates what the
`0 * first(y)` expression already shows. The perf rationale comment is
covered at `_constant_extrap_boundary_value` (the first occurrence).
Wrap plain `@test EXPR isa T` checks in duck-typing tests with `@inferred`
so they pin both result type AND inference stability — same pattern as
Cat A in `test_carrier_type_stability.jl`.

- `test_duck_tv_dual_tq.jl`: 84 conversions (SVector/Dual Tv/Tq carriers).
- `test_duck_typing_comprehensive.jl`: 66 conversions (DuckFloat-style Tv).

Two test groups in `test_duck_tv_dual_tq.jl` kept as plain `isa`:
- ND `ForwardDiff.gradient/hessian/derivative` (lines 306-317): AD wrappers
  have `Any` internal inference; `@inferred` is over-strict for these.
- "Dual x grid carrier flows through Tv × Tq" (lines 266-275): some
  `(fn, y, xq)` combos infer `Union{Float64, Dual}` even though the
  runtime returns the right type. Latent type instability surfaces here.
Constant 1D right-edge short-circuit (`xi == last(x)`) returned
`y[idxR] * one(xi)` — only the query carrier `Tq`. The kernel path picks
up the grid carrier `Tg` via `aq.dL` / `dL = xq - xL`. For Dual grid +
Float query, the two paths returned `Float` and `Dual` respectively →
inference `Union{Tv, Dual}` even though runtime returns the right type.

Add `* one(eltype(x))` (oneshot) / `* one(x_last)` (anchor) to thread Tg
through the short-circuit. For Float grid this is a no-op (multiply by
1.0); for Dual grid it lifts to Dual. Now both branches return the
same type, inference is clean.

Sites (5):
- `constant_anchor.jl` 3 short-circuits (NoExtrap, AbstractExtrap default,
  ClampOrFill in-domain).
- `constant_oneshot.jl` InBounds + WrapExtrap.

Test: re-promote `test_duck_tv_dual_tq.jl` "Dual x grid carrier flows
through Tv × Tq" testset to `@test (@inferred …) isa T` (was reverted to
plain `isa` in the previous commit; this latent inference bug was the
root cause).
…l_value

`_constant_interp_nd_oneshot{,_batch!}` passed a hardcoded `EvalValue()`
to `_try_fill_oob`, so OOB queries under `FillExtrap` always returned
the fill_value — even for deriv queries that should return zero.

Phase 3 (commit 364c3e4) removed a protocol-level `_is_any_deriv`
short-circuit at the public entry, which previously intercepted deriv
queries before they reached `_try_fill_oob`. The hardcoded `EvalValue()`
was correct under that short-circuit but became a regression once removed.

Persistent path (`interpolant_protocol.jl:131,173`) correctly passes
`ops` — only the oneshot path was missing the threading. Fix: pass `ops`
through `_try_fill_oob` at both scalar and batch oneshot sites.

Test (`test_carrier_type_stability.jl` Cat E): 4 assertions covering
scalar (mixed-axis deriv) + batch oneshot + persistent sanity check.
`_linear_weight(::EvalDeriv1, α, inv_h, ::Val{B}) = ±inv_h` returned `Tg`
only — for a `Dual` query whose carrier source was an `EvalDeriv1` axis
(mixed partial `(D1, D1)`, or `(EvalValue, D1)` with axis-2 Dual), the
multilinear corner sum dropped Tq and the batch path failed with
`TypeError` (buffer eltype Dual vs kernel return Float).

Fix mirrors the `EvalDeriv2+` pattern: `* one(α)` threads Tq even when
`α` does not appear in the weight expression. LLVM const-folds the `1.0`
factor on plain Float queries (zero cost). For Dual queries it costs one
extra Dual mul per corner — necessary for AD correctness.

Pre-existing on master; this PR's carrier-propagation contract surfaces
the gap. Test coverage extended in Cat B with an ND counterpart of the
existing 1D "non-zero deriv" testset, including mixed-partial
`(D1, D1)` on Linear / Cubic ND.
…)` + per-k cell-local OOB helpers

Series deriv branches used broadcast `0 * first(y)` (single-element load,
SIMD-friendly) which dropped both cell-local NaN and Tq carrier — only
NaN at `y[1]` could leak, regardless of which series cell was queried,
and `Dual` queries returned the underlying `Tv` without carrier.

This commit routes the deriv path through the value path's exact
mechanism on every Series eval site:

  * `_eval_constant_series_anchored` and the scalar in-domain right-edge
    branch in `_eval_constant_series_point!` collapse to a single
    `_constant_kernel(op, y_left, y_right, h, dL, side)` call — value
    and deriv share one code path. The 1D `_constant_kernel(::EvalDeriv*)`
    overloads already follow the unified `0 * y * one(dL)` pattern.

  * Shared OOB helpers `_constant_extrap_boundary_value` and
    `_fill_constant_extrap_simd!` (in `core/series_utils.jl`, used by
    Linear/Cubic/Quadratic/Constant Series alike) take an `aq` argument
    and thread `* one(aq.xq)` for Tq carrier. `aq.xq` is the only common
    field across the four anchored-query types, so dispatch stays
    duck-typed. Clamp/Fill deriv overloads merged onto `::_ClampOrFill`
    and source their zero from boundary `y[idx, k]` (not `e.fill_value`)
    so a NaN-bearing `fill_value` does NOT leak through deriv while a
    NaN at the boundary cell still does — preserves the existing OOB
    FillExtrap × deriv contract.

Performance: per-k cell-local indexing inside the SIMD loop replaces a
single broadcast value, but `one(aq.xq)` hoists out of the inner loop
so the body is one load + one mul per `k`. Series scalar deriv paths
are ~14–17% slower (102 ns vs 88 ns); batch deriv is bounded by the
value-path cost (now ≈ value).

Cat F's four `@test_broken` assertions (Series in-domain / right-edge /
OOB ClampExtrap × deriv) flip to `@test`. Direct-call tests in
`test/test_series_utils.jl` updated for the new signature; bonus
regression pin for FillExtrap(NaN) × deriv = 0.
… pattern

When a `HeteroInterpolantND` query has `Real` axes + at least one
`NoInterp` axis (mixed case) AND a non-zero deriv on a NoInterp axis,
the `@generated _eval_nointerp` body short-circuited with
`return zero($Tz_expr)` — a type-only zero that strips both cell-local
NaN at the queried slice and any Tq carrier from the Real-axis search.

Same pattern as Constant ND's `_constant_nd_kernel(...) * 0` dispatch:
remove the early-return, run the Real-axis kernel
(`_eval_hetero_nd_cell` / `_collapse_dims`) normally, and wrap the
result with `result * 0` when any NoInterp axis carries a non-zero
deriv. `result` is already carrier-threaded by the Real-axis kernel
(IEEE: `0 * Dual = Dual(0, 0)`, `0 * NaN = NaN`), so the late `* 0`
preserves Tq + Tg + Tv while zeroing the value.

Applied to three branches of the `@generated` body
(`_HeteroPartials` / OnTheFly local / OnTheFly global) and to the
`_interp_nointerp_oneshot` mixed case. The runtime cost is one extra
multiplication on the deriv path — Real-axis search would have been
required anyway for a strict cell-local contract, so the saving from
the prior early-return was conceptually unavailable.

Also drops a stale "Constant Phase 3" label in the all-NoInterp
comment (the pattern it references is now the regular
`_constant_nd_evaluate` dispatch).

Cat G extended with Mixed Linear+NoInterp persistent + oneshot
cell-local NaN tests.
Sweeps the small fixes surfaced by the pre-merge code-review pass:

* Constant `NoExtrap` anchored-query right-edge bug
  (`_constant_anchor_dispatch(NoExtrap)` at `aq.xq == last(itp.x)`):
  the wrapper used `zero(T) * one(aq.xq)`, dropping Tg carrier and
  stripping NaN at the boundary cell. Now delegates to the shared
  `_constant_eval_at_anchor(::NoExtrap)` path which mirrors the cell-
  local pattern used by the other three sibling overloads. Pinned by a
  new "1D anchored-query — right-edge cell-local" subtest in Cat F.

* `constant_oneshot.jl`: scalar `_constant_eval_at_point` right-edge
  short-circuits switched from `one(xi) * one(eltype(x))` to
  `one(Tq) * one(Tg)` — the function signature already binds both type
  parameters, so use them directly. Same shape semantically, more
  explicit at the call site.

* Stale `_sample_data` trait comments (Linear / Cubic / Quadratic ND
  eval): "Zero-ref for fill-value derivative computation" → "Per-method
  sample of `Tv` for fill-value paths (e.g. `_try_fill_oob`)" to match
  the wording landed for Constant ND in the trait rename series.

* Stale comment in `constant_anchor.jl` describing the right-edge
  short-circuit pattern: now mentions both Tq and Tg threading via
  `one(aq.xq) * one(x_last)`.

* `_promote_extrap_zero` docstring (`core/utils.jl`): broadened from
  "derivative of constant" to the actual current contract
  (carrier-aware zero under flat extrapolation, all method families).
  Explains why the `0 * val` prefix is structurally required for NaN
  propagation.

* Phase label cleanup in `constant_series_interp.jl` ("Phase E.2:") and
  in `test_carrier_type_stability.jl` Cat F / Cat G headers (stray
  "Phase 3" / "Constant Phase 3" references). Removes plan-era labels
  that age poorly once the plan lands.

* `constant_oneshot.jl` scalar entry docstring: lists `EvalValue()` +
  any `DerivOp(n)` (was capped at `DerivOp(2)`). Public API supports
  arbitrary N already.

* `constant_nd_oneshot.jl` batch docstring: missing `ops` arg added to
  both `_constant_interp_nd_oneshot_batch!` / `..._batch` signatures.

* `_constant_nd_evaluate` (`constant_nd_eval.jl`): rationale comment
  added explaining why this path intentionally runs the kernel + `* 0`
  rather than falling back to a `fill!` shortcut — guards future
  reverts that might be tempted by the perf delta vs master.
Replaces the hand-rolled Linear/Cubic-only Cat B ND subtests with a
loop over all six ND methods (Linear D1 / Cubic D3 / Quadratic D2 /
PCHIP D3 / Cardinal D3 / Akima D3) and three path families (oneshot
scalar / oneshot batch / persistent scalar+batch), each covering both
the per-axis `(EvalValue, Dk)` pattern and the mixed-partial
`(DerivOp(1), DerivOp(1))` pattern.

Helpers `_check_oneshot_scalar` / `_check_oneshot_batch` /
`_check_persistent` take `fn::F` so `@inferred` can specialize per
concrete method; the for-loop body would otherwise see `fn` as a
Union and collapse the inference check.

Coverage matrix after this commit (Cat B ND, all `@inferred + isa`):

  | method     | oneshot scalar | oneshot batch | persistent scalar+batch |
  |------------|:--------------:|:-------------:|:-----------------------:|
  | Linear     |    ✓ × 2       |    ✓ × 2      |        ✓ × 4            |
  | Cubic      |    ✓ × 2       |    ✓ × 2      |        ✓ × 4            |
  | Quadratic  |    ✓ × 2       |    ✓ × 2      |        ✓ × 4            |
  | PCHIP      |    ✓ × 2       |    ✓ × 2      |        ✓ × 4            |
  | Cardinal   |    ✓ × 2       |    ✓ × 2      |        ✓ × 4            |
  | Akima      |    ✓ × 2       |    ✓ × 2      |        ✓ × 4            |

Cat B grows from 14 → 57 assertions. The batch path is the most
load-bearing: Linear's prior `_linear_weight(::EvalDeriv1)` Tq gap
showed up there first as `TypeError` (buffer eltype `Dual` vs kernel
return `Float`), which scalar-only coverage missed.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR threads the query carrier type (e.g. ForwardDiff.Dual, Measurement) through every derivative-zero return path across all interpolation method families, and unifies the cell-local NaN propagation contract. It closes carrier-and-NaN gaps that survived #146: previously, deriv paths in Constant ND, Series, and Hetero NoInterp would short-circuit via fill!, 0 * first(y), or zero(Tz), dropping both Tq carrier and any NaN at the queried cell.

Changes:

  • Unifies "deriv-zero" handling so every kernel runs and is multiplied by 0 after the carrier is threaded (preserving IEEE NaN × 0 = NaN, Dual partials, etc.) — touches Constant/Linear/Quadratic/Cubic/Hermite kernels and ND/Series/Anchor dispatch.
  • Replaces _zero_ref with _sample_data and removes the _deriv_zero_fill trait shortcut; series helpers (_constant_extrap_boundary_value, _fill_constant_extrap_simd!) now take aq and source the zero from y[idx, k] (not fill_value), so OOB FillExtrap × deriv returns 0 rather than leaking fill_value.
  • Adds extensive type-stability (@inferred) and cell-local NaN propagation tests (Cat A–G) across every method × dimensionality × path combination.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/core/interpolant_protocol.jl Removes _deriv_zero_fill shortcut; renames _zero_ref_sample_data.
src/core/series_utils.jl Adds aq parameter; deriv path now sources zero from cell-local y[idx, k].
src/core/utils.jl Fixes _promote_extrap_zero(::Number, ::Number) to propagate boundary NaN.
src/vector_calculus.jl Switches _zero_ref_sample_data at fill-OOB sites.
src/constant/constant_oneshot.jl Right-edge short-circuits thread one(Tq) * one(Tg); updates docstring.
src/constant/constant_anchor.jl Right-edge & NoExtrap short-circuits use cell-local y[aq.idxR] and thread Tg/Tq carriers.
src/constant/constant_series_interp.jl Series right-boundary now routes through _constant_kernel for op-uniform carrier/NaN.
src/constant/nd/constant_nd_eval.jl New _constant_nd_evaluate two-method dispatch (EvalValue vs deriv via * 0).
src/constant/nd/constant_nd_oneshot.jl Threads ops through oneshot/batch dispatch; removes pre-call deriv short-circuits.
src/linear/linear_kernels.jl Deriv kernels thread * one(α) carrier.
src/linear/linear_series_interp.jl Adds aq param to series fill helpers.
src/linear/nd/linear_nd_eval.jl Removes _deriv_zero_fill trait; _linear_weight(::EvalDeriv1) now * one(α) (fixes pre-existing TypeError on (D1,D1) Dual).
src/cubic/cubic_kernels.jl EvalDeriv3/N kernels thread * one(dL).
src/cubic/cubic_series_interp.jl Adds aq param to series fill helpers.
src/cubic/nd/cubic_nd_eval.jl Renames _zero_ref_sample_data.
src/cubic/nd/cubic_nd_math.jl EvalDeriv3 + DerivOp{N} hermite kernels thread * one(dL).
src/quadratic/quadratic_kernels.jl EvalDeriv2/3/N kernels thread * one(dL).
src/quadratic/quadratic_series_interp.jl Adds aq param to series fill helper.
src/quadratic/nd/quadratic_nd_eval.jl Renames _zero_ref_sample_data.
src/hetero/hetero_eval.jl Renames _zero_ref_sample_data; removes _deriv_zero_fill override.
src/hetero/hetero_nointerp.jl All-NoInterp / mixed paths multiply cell-local data * zero(Tz) instead of returning bare zero(Tz).
test/test_series_utils.jl Updates calls for new aq arg; adds FillExtrap × NaN tests.
test/test_duck_typing_comprehensive.jl Wraps existing isa checks with @inferred.
test/test_duck_tv_dual_tq.jl Wraps existing isa checks with @inferred; notes AD wrappers exception.
test/test_carrier_type_stability.jl New comprehensive test file covering Cat A–G (type stability, carrier propagation, cell-local NaN).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 98.27586% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.46%. Comparing base (616b106) to head (15de65e).

Files with missing lines Patch % Lines
src/core/nd_utils.jl 66.66% 1 Missing ⚠️
src/core/series_utils.jl 90.90% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #147      +/-   ##
==========================================
+ Coverage   96.43%   96.46%   +0.03%     
==========================================
  Files         143      143              
  Lines       11978    11940      -38     
==========================================
- Hits        11551    11518      -33     
+ Misses        427      422       -5     
Files with missing lines Coverage Δ
src/constant/constant_anchor.jl 98.55% <100.00%> (-0.05%) ⬇️
src/constant/constant_oneshot.jl 100.00% <100.00%> (ø)
src/constant/constant_oneshot_series.jl 100.00% <100.00%> (ø)
src/constant/constant_series_interp.jl 98.10% <100.00%> (+0.52%) ⬆️
src/constant/nd/constant_nd_eval.jl 100.00% <100.00%> (ø)
src/constant/nd/constant_nd_oneshot.jl 100.00% <100.00%> (+1.29%) ⬆️
src/core/interpolant_protocol.jl 100.00% <100.00%> (ø)
src/core/utils.jl 88.13% <100.00%> (+0.84%) ⬆️
src/cubic/cubic_kernels.jl 100.00% <100.00%> (ø)
src/cubic/cubic_oneshot_series.jl 100.00% <100.00%> (ø)
... and 16 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

FastInterpolations.jl Benchmarks

All benchmarks (50 total, click to expand)
Benchmark Current: 15de65e Previous Imm. Ratio Grad. Ratio
10_nd_construct/bicubic_2d 36989.0 ns 38859.0 ns 0.952 0.96
10_nd_construct/bilinear_2d 598.1 ns 580.7 ns 1.03 0.946
10_nd_construct/tricubic_3d 353489.0 ns 320133.0 ns 1.104 1.0
10_nd_construct/trilinear_3d 1706.2 ns 1607.8 ns 1.061 0.955
11_nd_eval/bicubic_2d_batch 1429.6 ns 1300.0 ns 1.1 0.904
11_nd_eval/bicubic_2d_scalar 16.3 ns 12.8 ns 1.274 1.0
11_nd_eval/bilinear_2d_scalar 7.5 ns 6.6 ns 1.138 1.019
11_nd_eval/tricubic_3d_batch 3230.0 ns 2489.8 ns 1.297 0.989
11_nd_eval/tricubic_3d_scalar 33.5 ns 24.8 ns 1.348 0.979
11_nd_eval/trilinear_3d_scalar 13.2 ns 10.0 ns 1.321 0.975
12_cubic_eval_gridquery/range_random 4225.9 ns 3638.9 ns 1.161 0.979
12_cubic_eval_gridquery/range_sorted 4219.5 ns 3613.4 ns 1.168 0.98
12_cubic_eval_gridquery/vec_random 9311.3 ns 7787.9 ns 1.196 0.985
12_cubic_eval_gridquery/vec_sorted 3201.2 ns 2620.3 ns 1.222 0.997
13_nd_oneshot_gridquery/bicubic_2d_rand_rand 65569.2 ns 58729.5 ns 1.116 0.976
13_nd_oneshot_gridquery/bicubic_2d_sort_rand 62263.0 ns 55491.7 ns 1.122 0.982
13_nd_oneshot_gridquery/bicubic_2d_sort_sort 58212.5 ns 53708.0 ns 1.084 0.98
13_nd_oneshot_gridquery/bilinear_2d_rand_rand 20179.6 ns 13537.0 ns 1.491 1.155
13_nd_oneshot_gridquery/bilinear_2d_sort_rand 9365.2 ns 7454.0 ns 1.256 0.919
13_nd_oneshot_gridquery/bilinear_2d_sort_sort 5684.2 ns 3851.0 ns 1.476 0.998
14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand 18714.9 ns 15082.7 ns 1.241 1.008
14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand 17861.2 ns 14233.5 ns 1.255 0.932
1_cubic_oneshot/q00001 548.6 ns 447.9 ns 1.225 1.15
1_cubic_oneshot/q10000 43548.1 ns 37312.3 ns 1.167 0.702
2_cubic_construct/g0100 1388.2 ns 1198.0 ns 1.159 1.02
2_cubic_construct/g1000 12678.7 ns 10953.5 ns 1.158 0.977
3_cubic_eval/q00001 20.8 ns 19.9 ns 1.045 1.002
3_cubic_eval/q00100 439.6 ns 377.6 ns 1.164 0.976
3_cubic_eval/q10000 42632.4 ns 36592.1 ns 1.165 0.98
4_linear_oneshot/q00001 24.7 ns 22.9 ns 1.079 0.95
4_linear_oneshot/q10000 18693.8 ns 15620.5 ns 1.197 0.99
5_linear_construct/g0100 35.2 ns 30.0 ns 1.17 0.903
5_linear_construct/g1000 234.1 ns 203.8 ns 1.149 0.84
6_linear_eval/q00001 10.1 ns 7.8 ns 1.294 0.975
6_linear_eval/q00100 194.8 ns 162.8 ns 1.196 0.984
6_linear_eval/q10000 18480.4 ns 15400.2 ns 1.2 0.993
7_cubic_range/scalar_query 8.3 ns 7.1 ns 1.169 0.979
7_cubic_vec/scalar_query 11.3 ns 8.1 ns 1.395 1.054
8_cubic_multi/construct_s001_q100 639.2 ns 523.6 ns 1.221 1.1
8_cubic_multi/construct_s010_q100 4412.2 ns 3704.2 ns 1.191 0.986
8_cubic_multi/construct_s100_q100 39625.6 ns 33467.5 ns 1.184 0.972
8_cubic_multi/eval_s001_q100 807.5 ns 640.8 ns 1.26 1.08
8_cubic_multi/eval_s010_q100 1782.3 ns 1443.8 ns 1.234 1.023
8_cubic_multi/eval_s010_q100_scalar_loop 2296.5 ns 1945.3 ns 1.181 0.985
8_cubic_multi/eval_s100_q100 11541.5 ns 9357.1 ns 1.233 1.006
8_cubic_multi/eval_s100_q100_scalar_loop 3378.3 ns 3160.8 ns 1.069 0.974
9_nd_oneshot/bicubic_2d 45117.1 ns 43695.9 ns 1.033 1.095
9_nd_oneshot/bilinear_2d 540.0 ns 542.4 ns 0.996 0.96
9_nd_oneshot/tricubic_3d 420902.1 ns 361762.1 ns 1.163 1.092
9_nd_oneshot/trilinear_3d 1034.9 ns 837.2 ns 1.236 0.975

⚠️ Performance Regression Confirmed ⚠️

After re-running 42 flagged benchmark(s) 10 time(s), 40 regression(s) confirmed.

Benchmark Current Previous Imm. Ratio Grad. Ratio Tier
10_nd_construct/tricubic_3d 353489.0 ns 320133.0 ns 1.104 1.0 immediate
11_nd_eval/bicubic_2d_scalar 16.3 ns 12.8 ns 1.274 1.0 immediate
11_nd_eval/bilinear_2d_scalar 7.5 ns 6.6 ns 1.138 1.019 immediate
11_nd_eval/tricubic_3d_batch 3230.0 ns 2489.8 ns 1.297 0.989 immediate
11_nd_eval/tricubic_3d_scalar 33.5 ns 24.8 ns 1.348 0.979 immediate
11_nd_eval/trilinear_3d_scalar 13.2 ns 10.0 ns 1.321 0.975 immediate
12_cubic_eval_gridquery/range_random 4225.9 ns 3638.9 ns 1.161 0.979 immediate
12_cubic_eval_gridquery/range_sorted 4219.5 ns 3613.4 ns 1.168 0.98 immediate
12_cubic_eval_gridquery/vec_random 9311.3 ns 7787.9 ns 1.196 0.985 immediate
12_cubic_eval_gridquery/vec_sorted 3201.2 ns 2620.3 ns 1.222 0.997 immediate
13_nd_oneshot_gridquery/bicubic_2d_rand_rand 65569.2 ns 58729.5 ns 1.116 0.976 immediate
13_nd_oneshot_gridquery/bicubic_2d_sort_rand 62263.0 ns 55491.7 ns 1.122 0.982 immediate
13_nd_oneshot_gridquery/bilinear_2d_rand_rand 20179.6 ns 13537.0 ns 1.491 1.155 both
13_nd_oneshot_gridquery/bilinear_2d_sort_rand 9365.2 ns 7454.0 ns 1.256 0.919 immediate
13_nd_oneshot_gridquery/bilinear_2d_sort_sort 5684.2 ns 3851.0 ns 1.476 0.998 immediate
14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand 18714.9 ns 15082.7 ns 1.241 1.008 immediate
14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand 17861.2 ns 14233.5 ns 1.255 0.932 immediate
1_cubic_oneshot/q00001 548.6 ns 447.9 ns 1.225 1.15 both
1_cubic_oneshot/q10000 43548.1 ns 37312.3 ns 1.167 0.702 immediate
2_cubic_construct/g0100 1388.2 ns 1198.0 ns 1.159 1.02 immediate
2_cubic_construct/g1000 12678.7 ns 10953.5 ns 1.158 0.977 immediate
3_cubic_eval/q00100 439.6 ns 377.6 ns 1.164 0.976 immediate
3_cubic_eval/q10000 42632.4 ns 36592.1 ns 1.165 0.98 immediate
4_linear_oneshot/q10000 18693.8 ns 15620.5 ns 1.197 0.99 immediate
5_linear_construct/g0100 35.2 ns 30.0 ns 1.17 0.903 immediate
5_linear_construct/g1000 234.1 ns 203.8 ns 1.149 0.84 immediate
6_linear_eval/q00001 10.1 ns 7.8 ns 1.294 0.975 immediate
6_linear_eval/q00100 194.8 ns 162.8 ns 1.196 0.984 immediate
6_linear_eval/q10000 18480.4 ns 15400.2 ns 1.2 0.993 immediate
7_cubic_range/scalar_query 8.3 ns 7.1 ns 1.169 0.979 immediate
7_cubic_vec/scalar_query 11.3 ns 8.1 ns 1.395 1.054 immediate
8_cubic_multi/construct_s001_q100 639.2 ns 523.6 ns 1.221 1.1 immediate
8_cubic_multi/construct_s010_q100 4412.2 ns 3704.2 ns 1.191 0.986 immediate
8_cubic_multi/construct_s100_q100 39625.6 ns 33467.5 ns 1.184 0.972 immediate
8_cubic_multi/eval_s001_q100 807.5 ns 640.8 ns 1.26 1.08 immediate
8_cubic_multi/eval_s010_q100 1782.3 ns 1443.8 ns 1.234 1.023 immediate
8_cubic_multi/eval_s010_q100_scalar_loop 2296.5 ns 1945.3 ns 1.181 0.985 immediate
8_cubic_multi/eval_s100_q100 11541.5 ns 9357.1 ns 1.233 1.006 immediate
9_nd_oneshot/tricubic_3d 420902.1 ns 361762.1 ns 1.163 1.092 immediate
9_nd_oneshot/trilinear_3d 1034.9 ns 837.2 ns 1.236 0.975 immediate

Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)

This comment was automatically generated by Benchmark workflow.

mgyoo86 added 9 commits May 27, 2026 15:10
…ill_value

Mirrors the Constant ND OOB FillExtrap contract pinned in Cat E:
"deriv at OOB returns 0, not fill_value". The Hetero mixed Real +
GridIdx/NoInterp path violated this on both oneshot and persistent:

  * Oneshot (`_interp_nointerp_oneshot`) — regression introduced by the
    `result * 0` change: a Real-axis OOB query under `FillExtrap(NaN)`
    used to early-return `zero(Tz)` (master), but the new flow lets
    `interp(...)` compute `fill_value = NaN` and then multiplies by `0`,
    yielding `NaN` (`NaN * 0 = NaN`).

  * Persistent (`_eval_nointerp` @generated) — pre-existing master bug:
    the OOB `_try_fill_oob` short-circuit runs BEFORE the NoInterp
    deriv check, so any `FillExtrap` (incl. `NaN`) leaks regardless of
    the NoInterp axis derivative.

Fix in both paths: when any NoInterp axis carries a non-zero deriv AND
the Real-axis query is OOB, return a slab-local zero sourced from the
data slice's first element (`data_r[1] * zero(Tz)`) — finite cells
yield 0, a NaN at the slab survives, and the user-supplied `fill_value`
cannot reach the result. The in-domain Real-axis path is unchanged
(still runs the kernel + `result * 0` for cell-local NaN).

Cat E gets a new testset "Hetero mixed OOB FillExtrap × NoInterp deriv
returns zero, not fill_value" — runs both `99.0` and `NaN` fill values
through oneshot and persistent. Pre-existing structural gap: Cat E
covered OOB contracts only for 1D + Constant ND, and Cat G covered
Hetero NoInterp only in-domain; the Hetero × OOB intersection was
unpinned. New testset closes it.
`outputs = [Vector{Tv_out}(undef, NQ) for _ in 1:K]` infers as `Vector`
(non-concrete) on Julia 1.10 LTS when `Tv_out` is bound from a function
return. The comprehension is lowered to a closure, and 1.10's inference
can't propagate the locally-bound `Type{T}` from the outer scope into
the closure body — even though `_output_eltype(...)` is fully concrete
in isolation. Verified by isolating: an inline type literal in the
comprehension infers correctly, a local-variable type does not, a
type-as-argument barrier does.

Extracts the pattern into `_alloc_series_batch_outputs(::Type{Tv}, K, NQ)`
in series_utils.jl. All four Series oneshot batch entry points (Linear /
Cubic / Quadratic / Constant) route through it.

Effect on LTS: `linear_interp(x, ::Series, ::Vector{<:Dual})` etc. now
infer concretely as `Vector{Vector{Tv_out}}` instead of `Vector`,
matching 1.11+/1.12+. Removes downstream type-instability that
previously propagated to callers on LTS only.

Unblocks the 4 batched-Dual @inferred tests in test_duck_tv_dual_tq.jl
"Series path — duck-Tv × duck-Tq carrier propagates" that were failing
on the LTS CI job.
Provides a version-conditional `@inferred` for tests that hit Julia
inference gaps present only on older releases (e.g. 1.10 LTS doesn't
propagate a locally-bound `Type{T}` from a function return into a
closure body, even when the function is concrete-inferred in isolation).

`@maybe_inferred expr` expands to:
  - `Test.@inferred expr` on ≥1.12
  - `expr` unchanged on older releases (runtime contract via the
    surrounding `isa` / `==` still applies; inference assertion skipped)

Opt-in via `setup=[InferredCompat]` on `@testitem`. Use sparingly —
prefer fixing the source (e.g. extract a `::Type{T}` barrier, see
`_alloc_series_batch_outputs`) when the inference gap is in this
package's own code.
Added in 5408d2c as a defensive utility for hypothetical future LTS-only
inference gaps. Never referenced by any testitem — the source-level
`_alloc_series_batch_outputs` barrier (a9b8458) fully covers the only
gap we hit on LTS, and no other test was waiting on this. Removed per
YAGNI; if a real callsite shows up later, the macro is 6 lines.
…ontract

Unifies the OOB deriv contract across all paths so it matches the strict
cell-local interpretation: the OOB cell's data IS whatever the extrap
rule puts there. For `FillExtrap(c)`, that's `c` — so deriv = `0 * c`
(NaN propagates, finite × 0 = 0). Master's contract sourced deriv from
`y_bnd` regardless of extrap type, which inconsistently ignored the
fill_value as a NaN sentinel for AD / OOB-detection use cases.

Single-dispatch helper `_extrap_deriv_source(extrap, y_bnd)`:
  - `ClampExtrap` → `y_bnd`        (unchanged — boundary y NaN propagates)
  - `FillExtrap`  → `e.fill_value` (NaN fill_value propagates as data)

Threaded through every OOB deriv site:
  * 1D `_eval_extrapolation(::DerivOp, ...)` collapsed to one Union method
    using the helper. ClampExtrap behavior unchanged; FillExtrap now
    sources from fill_value.
  * ND `_fill_extrap_result(::AbstractEvalOp, fill_val, _, qe)` switched
    from `zero_ref` to `fill_val` (signature stable, `zero_ref` retained
    but unused). Tuple-ops overload likewise.
  * Series `_constant_extrap_boundary_value` / `_fill_constant_extrap_simd!`
    Clamp/Fill deriv collapsed to a single Union overload using the helper.
  * Hetero mixed `_eval_nointerp` @generated body: OOB short-circuit now
    runs `oob * 0` when a NoInterp axis has non-zero deriv — `oob` is
    already `_promote_extrap_zero(fill_value, ...)` post-fix, so finite
    fill → 0 and NaN fill propagates. Closes pre-existing master bug
    where the OOB shortcut returned `fill_value` without applying
    NoInterp deriv's `* 0`.
  * Hetero mixed `_interp_nointerp_oneshot`: no code change — `result * 0`
    already does the right thing once `_eval_extrapolation` is fixed.

Perf: every change site is on the OOB cold path. Helper is `@inline`
with singleton dispatch (LLVM dead-branch elimination). In-domain paths
remain bit-identical; ClampExtrap-only callers see no behavior change.

Tests:
  * `Cat E` rewritten — pins the new contract end-to-end across 1D
    (all 7 methods), ND Constant/Linear, Hetero mixed. 89 assertions.
    Old "queried-boundary NaN propagates" tests for `FillExtrap` repurposed
    to "y_bnd NaN is IGNORED" (the new contract distinction).
  * `test_constextrap_fill.jl` updates: "Cubic with fill value",
    "Linear with fill value", and "ND derivatives under FillExtrap"
    flipped from `iszero` to `isnan` for `FillExtrap(NaN)` × deriv;
    finite-fill assertions retained as-is.

Reverts the previous patch (5f20143) which sourced the OOB zero from
the data slice's first element under the prior contract — the option-B
flow sources from `fill_value` instead, so that patch's `data_r[1]`
short-circuit is unnecessary and was reverted in 81f221c.
The helper returns "what data sits in the OOB cell" — a property of the
extrap, not specific to derivative evaluation. The old name
(`_deriv_source`) implied deriv-only scope, but the underlying concept
is the same `y_bnd` vs `fill_value` dispatch that already drives the
`EvalValue` extrap path. Generalizing the name allows future callers
(e.g. value-path consolidation) to reuse the helper without semantic
mismatch.

Behavior unchanged. 4 call sites renamed (`core/utils.jl` definition +
`_eval_extrapolation` use; `core/series_utils.jl` × 2).
Three test files still pinned the prior contract ("FillExtrap(NaN) × deriv
returns 0, fill_value-as-sentinel"). Flipped to the new "fill_value-as-data"
semantics — NaN fill propagates through deriv via `0 * NaN = NaN`, finite
fill × 0 = 0 unchanged.

* `test_derivatives.jl` FillExtrap block: `D(4)` on cubic/quadratic at OOB
  with NaN fill now expects `all(isnan, res)` per series; in-domain D(4)
  for cubic stays `== z2`.
* `test_series_utils.jl` direct-call tests for `_constant_extrap_boundary_value`
  and `_fill_constant_extrap_simd!`: FillExtrap deriv with NaN fill flips to
  `isnan`; added explicit finite-fill (99.0) → 0 assertions.
* `test_mixed_precision_extrap.jl` FillExtrap derivatives: Float32 NaN fill
  + Float64 query now expects `isnan` with Float64 promotion; added
  finite-fill (0.0f0) sibling case to keep the Tq promotion test alive.
…_oob_data`

The two `::EvalValue` methods (ClampExtrap, FillExtrap) used the same
`y_bnd` vs `fill_value` dispatch already centralized in `_extrap_oob_data`
— but expressed it inline as separate overloads. Collapsing them onto
the Union method that mirrors the `::DerivOp` shape makes the parallel
structure explicit:

  EvalValue → `data + carrier`         (value-promotion of OOB cell data)
  DerivOp   → `0 * data + carrier`     (deriv-promotion: 0 × OOB cell data)

The shared scaffolding clarifies the deriv-path read: `_extrap_oob_data`
fetches the cell's data and `_promote_extrap_zero` applies the `0 *` —
the helper is asking "what data?", not "what derivative?".

Behavior unchanged. ClampExtrap / FillExtrap value-path output bit-equal
to before (the dispatch chain still resolves to the same source); deriv
path also unchanged. Verified across `test_carrier_type_stability`,
`test_constextrap_fill`, `test_constant`, `test_derivatives`.
@mgyoo86
Copy link
Copy Markdown
Member Author

mgyoo86 commented May 28, 2026

The flagged performance regressions can be ignored

  • The previous master measurement was just an outlier, not a real perf change introduced by this PR.

A few things support that read:

  • Local M1 reproduction is clean. Same ci_benchmark.jl, Julia 1.12.6, branch vs master worktree: 0 / 50 benchmarks regressed >5%; entire set within ±5%. Max local regression: +2.08%.
  • This kind of outlier happens periodically on ubuntu-latest. Scanning the full gh-pages/bench/data.js (209 master commits), ~2.8% of runs land on an anomalously fast runner. When they do, the next few PRs see phantom regressions until the baseline shifts.
  • 616b106b5 looks like one of those. Vs the preceding 30 master commits, it sits at percentile rank ≤10% on 39 / 49 benchmarks — bottom decile across most of the suite.

Identical code, 21% delta

616b106b5 ("bump version 0.4.11 + README") has zero source changes vs its predecessor c47519bbd (Project.toml version + README.md only). Two consecutive master runs of bit-identical code:

  • Median Δ: −21.35% ; 41 / 50 benchmarks ≥10% "improvement"
  • Most extreme: 11_nd_eval/tricubic_3d_scalar 35.9 → 24.8 ns (−30.7%)

By construction that's 100% runner variance, and the inflated-by-~21% baseline closely matches the PR's flagged ratios (1.13–1.34×).

mgyoo86 added 2 commits May 27, 2026 17:06
The "derivative view with fill value" testset pinned the prior contract
(`FillExtrap(NaN) × deriv → 0`, no NaN in the recipe output). Under the
new "fill_value-as-data" contract the OOB tails of `deriv1(itp)` carry
`0 * NaN = NaN`, which Plots draws as gaps (intended visual outcome).

Flipped the assertion to `any(isnan, yq)` for the NaN-fill case and added
a finite-fill (`FillExtrap(0.0)`) sibling that still asserts no NaN.
@mgyoo86 mgyoo86 merged commit d7f8048 into master May 28, 2026
14 checks passed
@mgyoo86 mgyoo86 deleted the fix/nan_propagation branch May 28, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants