(feat): introduce _CachedRangeunified Range normalizer (fixes #85)#87
(feat): introduce _CachedRangeunified Range normalizer (fixes #85)#87
_CachedRangeunified Range normalizer (fixes #85)#87Conversation
…id/periodic
- `_to_float`: OrdinalRange → plain StepRangeLen{FT,FT,FT} (no range(), no TwicePrecision)
- `_to_float_adding_endpoint`: new helper for exclusive periodic extension (n→n+1)
- StepRangeLen{FT,FT,FT}: direct field copy (zero arithmetic, no first()/step())
- OrdinalRange: plain StepRangeLen extended by one
- Other AbstractRange (TwicePrecision, etc.): range() fallback — preserves structure
- `_convert_grid`: delegates to _to_float for AbstractRange type conversion
- `_extend_exclusive` (periodic.jl): uses _to_float_adding_endpoint instead of range()
- `_check_domain` (AbstractRange): 1-ULP slack via prevfloat/nextfloat to handle
plain StepRangeLen endpoint precision (muladd last() may be 1 ULP below exact)
…alizer
- _to_float now converts ALL AbstractRange types to _CachedRange{FT},
not just TwicePrecision and OrdinalRange. After normalization, the grid
type space is exactly {_CachedRange{T}, Vector{T}}.
- Extract _CachedRange definition + _to_float conversions into dedicated
src/core/cached_range.jl (included between grid_spacing.jl and search.jl)
- Rename CachedRange → _CachedRange (internal implementation, not exported)
- Remove StepRangeLen/LinRange-specific dispatches from cubic_cache_pool.jl
(now unreachable — all callers pass normalized types)
- Add AbstractRange fallback dispatches in cache impl as safety net
- Replace all direct range()/StepRangeLen() construction in periodic
extension paths with _to_float_adding_endpoint (prevents Union return
types in ND periodic exclusive paths)
- Add early _to_float normalization in cubic series constructor
…oat entry points - Add x = _to_float(x, Tg) at all public oneshot API methods with Tg<:AbstractFloat constraint (linear, constant, quadratic, cubic). Previously these accepted raw StepRangeLen/LinRange without normalizing to _CachedRange, bypassing the _to_float normalizer. - Add x_targets = _to_float(x_targets, Tg) for vector query entry points. - Add _search_direct(_CachedRange, ScalarSpacing, xq) → 2-arg delegation. _CachedRange already has inv_h built in, so spacing arg is redundant. Fixes 1D cubic (and ND) falling through to generic AbstractRange dispatch.
…omain checks - Add domain_lo/domain_hi fields: equal to lo/hi in exact path (ARM), widened by prevfloat/nextfloat on x86_64 TwicePrecision fast path. - x86_64 @static dispatch: bypass TwicePrecision arithmetic via plain-T muladd for lo/hi (~22ns savings on Intel). lo/hi stay exact for index computation; domain_lo/domain_hi absorb ±1 ULP for _check_domain. - 5-arg convenience constructor: domain_lo=lo, domain_hi=hi (zero overhead for non-TwicePrecision paths). - Add _CachedRange-specific _check_domain dispatches in utils.jl using domain_lo/domain_hi instead of first(x)/last(x).
- Fix incorrect include-order comment in grid_spacing.jl (before → after) - Add Base.size(::_CachedRange) for complete AbstractArray interface - Document domain_hi reasoning in _to_float_adding_endpoint and type-mismatch _to_float
- Add @BoundsCheck checkbounds to getindex (zero-cost under @inbounds, safe for show/collect/broadcasting) - Docstring: reflect that _CachedRange benefits all architectures (cached inv_h, unified dispatch), not just Intel TwicePrecision avoidance
…ience methods - Add _to_float(x, FT) before _get_derivative_cache_impl / _get_periodic_cache_impl in ZeroCurvBC, ZeroSlopeBC, PeriodicBC, BCPair, and PointBC convenience methods. - Fix include-order comment in grid_spacing.jl (before → after). - Previously these passed raw Range to cache impl, relying on the AbstractRange fallback to normalize. Now consistent with 3-arg methods.
_CachedRangeunified Range normalizer (fixes #82)_CachedRangeunified Range normalizer (fixes #85)
FastInterpolations.jl BenchmarksAll benchmarks (42 total, click to expand)
|
| Benchmark | Current | Previous | Imm. Ratio | Grad. Ratio | Tier |
|---|---|---|---|---|---|
5_linear_construct/g0100 |
36.1 ns |
33.4 ns |
1.081 |
3.418 |
gradual |
5_linear_construct/g1000 |
235.0 ns |
259.6 ns |
0.905 |
23.012 |
gradual |
Thresholds: immediate > 1.1x (vs latest master), gradual > 1.1x (vs sliding window)
This comment was automatically generated by Benchmark workflow.
There was a problem hiding this comment.
Pull request overview
This PR introduces an internal _CachedRange grid type and normalizes AbstractRange inputs to it at API boundaries to avoid StepRangeLen/TwicePrecision overhead (notably on Intel x86_64), while keeping O(1) uniform-grid indexing and improving cache/search performance.
Changes:
- Add
src/core/cached_range.jldefining_CachedRange,_to_float(Range →_CachedRange), and_to_float_adding_endpointfor periodic-exclusive extension. - Normalize range grids via
_to_floatacross series constructors, one-shot APIs, periodic extension paths, and cubic cache pooling. - Update tests to assert
_CachedRangepreservation instead ofStepRangeLen.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_series_range_grid.jl | Updates series tests to expect _CachedRange for range grids. |
| test/test_linear.jl | Updates linear range-preservation test to expect _CachedRange{Float64}. |
| test/test_cubic_interpolant.jl | Updates cubic autocache test to expect _CachedRange in cache grid. |
| src/quadratic/quadratic_series_interp.jl | Normalizes copied grids via _to_float in series constructor. |
| src/quadratic/quadratic_oneshot.jl | Normalizes x / x_targets via _to_float in one-shot paths. |
| src/linear/linear_series_interp.jl | Normalizes copied grids via _to_float in series constructor. |
| src/linear/linear_oneshot.jl | Normalizes x / x_targets via _to_float in one-shot paths. |
| src/cubic/nd/cubic_nd_adjoint.jl | Uses _to_float_adding_endpoint for periodic-exclusive grid extension. |
| src/cubic/cubic_series_interp.jl | Normalizes x early via _to_float before downstream processing. |
| src/cubic/cubic_oneshot.jl | Normalizes x / x_query and uses _to_float_adding_endpoint for periodic extension. |
| src/cubic/cubic_cache_pool.jl | Normalizes grids for cache lookup/insertion and adds _CachedRange-keyed banks. |
| src/cubic/cubic_adjoint.jl | Uses _to_float_adding_endpoint for periodic-exclusive adjoint grid extension. |
| src/core/utils.jl | Moves range _to_float definitions out; adds _CachedRange domain-check specializations. |
| src/core/search.jl | Adds _search_direct specializations for _CachedRange using cached inv_h. |
| src/core/periodic.jl | Uses _to_float_adding_endpoint for Range periodic-exclusive extension (1D + ND). |
| src/core/nd_utils.jl | Normalizes ND range grids through _to_float (→ _CachedRange). |
| src/core/grid_spacing.jl | Notes _CachedRange integration point; defers specialization to cached_range.jl. |
| src/core/core.jl | Updates include order to load cached_range.jl before search/utils. |
| src/core/cached_range.jl | New file: defines _CachedRange, _to_float, _to_float_adding_endpoint, and _create_spacing specialization. |
| src/constant/constant_series_interp.jl | Normalizes copied grids via _to_float in series constructor. |
| src/constant/constant_oneshot.jl | Normalizes x / x_targets via _to_float in one-shot paths. |
Comments suppressed due to low confidence (1)
test/test_cubic_interpolant.jl:248
- The surrounding test comments still describe the old behavior (“Range is kept as Range” / “same objectid → hit”). With
_to_floatnormalization,itp.cache.xis now a_CachedRangevalue (not the originalStepRangeLen), and cache hits will be driven primarily byisequalrather than “same objectid”. Updating these comments will prevent future confusion when interpreting the test’s intent.
# Range-preserving cache: Range is kept as Range for O(1) index lookup!
# This provides significant performance benefit over Vector (binary search)
itp = cubic_interp(x_range, y; autocache = true)
@test itp.cache.x isa FastInterpolations._CachedRange # Range preserved for O(1) index calculation
# Verify correctness
@test itp(0.5) ≈ sin(2π * 0.5) atol = 0.01
# Verify cache hit works with same Range (Julia interns Ranges)
clear_cubic_cache!()
result1 = cubic_interp(x_range, y, 0.5; autocache = true) # First call: miss
result2 = cubic_interp(range(0.0, 1.0, 11), y, 0.5; autocache = true) # Same params → same objectid → hit!
@test result1 ≈ result2 # Same results from cache hit
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Add @BoundsCheck to getindex (was missing despite prior commit message) - Pin endpoints in getindex: i==1 → lo, i==len → hi (prevents ULP drift) - Delegate _to_float_adding_endpoint(AbstractRange) through _to_float to leverage x86_64 TwicePrecision bypass on StepRangeLen - Use FI._CachedRange alias in test_series_range_grid.jl for consistency - Update outdated comments in test_cubic_interpolant.jl
- Fix muladd formula: `x.offset - 1` → `1 - x.offset` for computing lo (offset=1 only works when ref==first, but Julia uses midpoint ref) - Update test_nd_comprehensive.jl: grid identity/equality tests now check first/last/length instead of === or == (grids normalized to _CachedRange)
…alization _CachedRange uses muladd (plain Float64) vs StepRangeLen TwicePrecision, causing 1-2 ULP differences in middle elements. Tests now use ≈ with tight tolerance (rtol=8eps or default) instead of == or === for grid value comparisons.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #87 +/- ##
==========================================
- Coverage 95.87% 95.85% -0.02%
==========================================
Files 94 95 +1
Lines 7973 8034 +61
==========================================
+ Hits 7644 7701 +57
- Misses 329 333 +4
🚀 New features to boost your workflow:
|
Tests Float64→Float32 downcast, Float32→Float64 upcast, and same-type pass-through. Verifies domain widening is dropped on type conversion and same-type returns identical object (===).
…allback Direct call with raw StepRangeLen verifies normalization to _CachedRange, cache hit on second call, and separate bank for different BC types.
After _CachedRange normalization (PR #87), several dispatch methods are unreachable or redundant: - _create_spacing(::LinRange): always normalized to _CachedRange before reaching _create_spacing; redundant with AbstractRange fallback - _create_spacing_pooled(::LinRange): same reason - _linear_interp_loop!(::AbstractRange, ::WrapExtrap): identical logic to the AbstractVector version; _CachedRange dispatches to AbstractVector
* (refac): remove dead LinRange dispatches and duplicate WrapExtrap loop After _CachedRange normalization (PR #87), several dispatch methods are unreachable or redundant: - _create_spacing(::LinRange): always normalized to _CachedRange before reaching _create_spacing; redundant with AbstractRange fallback - _create_spacing_pooled(::LinRange): same reason - _linear_interp_loop!(::AbstractRange, ::WrapExtrap): identical logic to the AbstractVector version; _CachedRange dispatches to AbstractVector * (perf): unify Range/Vector eval via 3-arg _get_h/_get_inv_h accessors Introduce 3-arg grid-based accessors that dispatch on grid type: - _get_h(x::_CachedRange, xR, xL) → x.h (cached, zero cost) - _get_h(x::AbstractVector, xR, xL) → xR - xL (computed) - _get_inv_h(x::_CachedRange, xR, xL) → x.inv_h (cached, avoids fdiv) - _get_inv_h(x::AbstractVector, xR, xL) → inv(xR - xL) This eliminates the need for separate _CachedRange specializations: - _constant_eval_at_point: 2 methods → 1 (uses _get_h) - _linear_eval_at_point: 2 methods → 1 (uses _get_inv_h) - _eval_linear_series_point_extrap!: 2 methods → 1 (uses _get_inv_h) * (refac): flatten linear eval layers — remove _linear_with_extrap Remove 2 pure delegation layers from the linear scalar eval path: - linear_interp(positional 6-arg) → pure delegation to _linear_with_extrap - _linear_with_extrap (4 methods) → pure delegation to _linear_eval_at_point Now _linear_eval_at_point dispatches directly on extrap type: - AbstractExtrap (NoExtrap/ExtendExtrap): direct search + kernel - _ClampOrFill: boundary check → extrap value or kernel - WrapExtrap: wrap query → search + kernel Call chain: 4 layers → 2 layers Before: linear_interp(kwargs) → linear_interp(pos) → _linear_with_extrap → _linear_eval_at_point → kernel After: linear_interp(kwargs) → _linear_eval_at_point(extrap dispatch) → kernel * (refac): flatten quadratic eval layers — remove _quadratic_eval_with_extrap Merge _quadratic_eval_at_point, _quadratic_eval_with_extrap, and _quadratic_eval_core into a single _quadratic_eval_at_point with 3 extrap dispatch methods (same pattern as linear). Call chain: 3 layers → 1 layer Before: _quadratic_eval_at_point → _quadratic_eval_with_extrap → _quadratic_eval_core → kernel After: _quadratic_eval_at_point(extrap dispatch) → kernel * (refac): rename + relocate _eval_extrapolation to core/utils.jl - Rename _constant_extrap_result → _eval_extrapolation (clearer intent, avoids confusion with ConstantInterpolant) - Move _eval_extrapolation + _promote_extrap_* from cubic_eval.jl to core/utils.jl (shared by all interpolation methods, not cubic-specific) - Simplify dispatch: DerivOp generic + EvalValue specific (was 5 methods with EvalDeriv1/2/3 each, now 4 methods via DerivOp parametric dispatch) - Replace @inbounds(y[1])/y[end] with first(y)/last(y) at OOB call sites * test: add inference test for cubic_interp with exclusive endpoint * (refac): flatten cubic eval layers + unify extrap helpers across all methods - Remove _eval_cubic_with_extrap indirection: inline kernel at each extrap dispatch site in _eval_cubic_at_point (matching linear/quadratic pattern) - Add @BoundsCheck _check_domain to cubic AbstractExtrap catch-all - Inline _linear_eval_constant_extrap at callsites, remove dead function - Fix stale comment in quadratic_oneshot.jl (cubic_eval.jl → core/utils.jl) - Update _eval_with_bc caller and test imports * (refac): narrow 3-arg _get_h/_get_inv_h signatures from ::Any to ::Real * (feat): introduce InBounds() extrap type for batch domain-check elision Vector _check_domain(::NoExtrap) now wraps validation in @BoundsCheck and returns InBounds(), so per-element scalar evals see a no-op check regardless of call depth. Fixes cubic batch path where @inbounds didn't propagate through _eval_with_bc → _eval_cubic_at_point (2 levels deep). - New InBounds <: AbstractExtrap type (exported) - Vector loops: `extrap = _check_domain(x, xq, extrap)` captures return - Remove redundant protocol-layer vector @BoundsCheck checks - Revert @propagate_inbounds on _eval_with_bc (no longer needed) - Remove debug artifacts (Main.@infiltrate, Main.@count_here) * refac: optimize index calculation in _search_direct and related functions using muladd * (docs): fix 5 stale comments referencing renamed/relocated functions - cubic_eval.jl: replace orphaned _constant_extrap_result comment block with cross-reference to core/utils.jl - nd_utils.jl: _promote_extrap_val/zero is in core/utils.jl, not cubic_eval.jl - test_mixed_precision_extrap.jl: _constant_extrap_result → _eval_extrapolation - test_derivatives.jl: _linear_eval_constant_extrap → _eval_extrapolation - type_promotion_rules.md: _constant_extrap_result → _eval_extrapolation * (refac): flatten constant eval layers — remove _constant_eval_extrap - 3-dispatch pattern (AbstractExtrap, _ClampOrFill, WrapExtrap) matching linear/quadratic/cubic — no intermediate _constant_eval_extrap indirection - Eliminates InBounds MethodError on OOB path (catch-all goes straight to search+kernel, no extrap-type-specific branch for OOB) - Remove union-splitting warning from AbstractExtrap docstring (all interpolants store concrete type params, union-splitting is irrelevant) * (docs+test): add InBounds() to docs index and extrapolation guide, add unit tests - docs/src/api/types.md: add InBounds to @docs block (fixes Documenter error) - docs/src/extrapolation.md: add InBounds to overview table, type hierarchy, summary - docs/src/architecture/type_promotion_rules.md: fix fill_value promotion wording - test/test_inbounds_extrap.jl: scalar/vector/interpolant tests for all methods, NoExtrap→InBounds batch conversion, type stability (@inferred) * runic formatting * (fix): add ExtendExtrap dispatch for constant eval — fix OOB + LeftSide/RightSide Constant kernel is side-dependent (discrete left/right selection), so OOB queries with dL > h return the wrong side value. Unlike polynomial methods where ExtendExtrap naturally extends via kernel, constant must delegate to ClampExtrap (slope=0 → extend = clamp). Regression from e811849.
Issue (#85)
Julia's
StepRangeLenusesTwicePrecisioninternally. On ARM (Apple Silicon) this is nearly free, but on Intel x86 the extended-precision arithmetic costs ~9ns perfirst(x)/last(x)call — making Range grids ~2 times slower than Vector despite O(1) index lookup.Solution
New internal type
_CachedRange{T} <: AbstractRange{T}that cachesfirst,last,step,inv(step)as plainTfields. All Range inputs are normalized to_CachedRangevia_to_floatat every public API boundary. After normalization, only two grid types exist:_CachedRange{T}andVector{T}— eliminating per-Range-type dispatch across the codebase.Intel x86_64: TwicePrecision bypass
On x86_64 only (
@static if Sys.ARCH === :x86_64), a specialized_to_floatdispatch extracts endpoints via plain-Tmuladdinstead of TwicePrecision arithmetic (~5ns vs ~27ns). This may introduce ±1 ULP rounding, so dedicateddomain_lo/domain_hifields (widened byprevfloat/nextfloat) ensure safe domain checks without affecting index computation accuracy.On ARM (Apple Silicon), TwicePrecision is natively fast — the exact generic path is used and
domain_lo == loexactly.Benchmark (Intel x86_64, N=1000)
_search_directKnown limitation
Oneshot scalar
linear_interpwith Range is on par with Vector (~10.6 vs ~10.2ns) due to redundantinv(h)in the eval path. Tracked for follow-up optimization.