Add SearchStrategy dispatch + in-place batched searchsorted! by ChrisRackauckas-Claude · Pull Request #65 · SciML/FindFirstFunctions.jl

ChrisRackauckas-Claude · 2026-05-16T11:00:53Z

Summary

Reformulates the existing sorted-search infrastructure as explicit strategy types and adds an in-place batched API.

Strategies (all <: SearchStrategy):

LinearScan — walk ±1 from the hint
BracketGallop — bracketstrictlymontonic + bounded binary search (the algorithm already backing searchsorted*correlated)
BinaryBracket — plain searchsortedlast / searchsortedfirst, ignores any hint
Auto — heuristically picks among the above based on hint validity and length(v)

Dispatched API:

Base.searchsortedlast(strategy, v, x[, hint]; order = Base.Order.Forward)
Base.searchsortedfirst(strategy, v, x[, hint]; order = Base.Order.Forward)

Batched in-place API:

searchsortedlast!(idx_out, v, queries; strategy = Auto(), order = ...)
searchsortedfirst!(idx_out, v, queries; strategy = Auto(), order = ...)

For sorted queries, walks v in lockstep using the previous result as the hint for the next — total cost O(length(v) + length(queries)) under BracketGallop. For unsorted queries, falls back to per-element search with no hint regardless of strategy.

Why

Downstream callers like SciML/DataInterpolations.jl (#525-#528) have been hand-rolling sorted-batch lockstep walks with a manually-maintained idx and a linear while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1 advance. That works well when the queries are dense relative to the knots, but regresses badly (sometimes by 100-1000×) when there are very few queries on a long vector — the linear walk is O(n) regardless of how many queries actually live in that gap.

The right home for the "which strategy to use" heuristic is here, not duplicated in every consumer. With this PR, searchsortedlast! does the right thing in every regime, and downstream code becomes a one-line replacement.

Backwards compatibility

All existing exports keep working — they're now thin wrappers:

searchsortedfirstcorrelated(v, x, guess) ≡ searchsortedfirst(BracketGallop(), v, x, guess)
searchsortedlastcorrelated(v, x, guess) ≡ searchsortedlast(BracketGallop(), v, x, guess)
searchsortedfirstvec(v, x) allocates and calls searchsortedfirst!
searchsortedlastvec(v, x) allocates and calls searchsortedlast!

The Guesser path, Range fast paths, Reverse ordering, and searchsortedfirstexp are untouched.

Version bumped to 1.9.0 (new feature, additive).

Auto heuristic

condition	strategy
no hint, or hint ∉ `axes(v)`	`BinaryBracket`
hint in range, `length(v) ≤ 32`	`LinearScan`
hint in range, `length(v) > 32`	`BracketGallop`

The 32-element threshold is chosen because at that length the binary-search bracket-setup overhead is comparable to a worst-case linear walk, and below it LinearScan saves the bracket call entirely.

What is in this PR

src/FindFirstFunctions.jl: strategy types + dispatched Base.searchsortedlast / Base.searchsortedfirst methods + searchsortedlast! / searchsortedfirst! + reformulated existing wrappers.
test/runtests.jl: two new safetestsets covering strategy equivalence across n ∈ {0, 1, 2, 8, 33, 257}, every hint position, Reverse order, the batched API on sorted/unsorted/empty inputs, DimensionMismatch, and sparse queries on a long vector to exercise the gallop hint path.
Project.toml: bump to 1.9.0.

Test plan

CI passes
Pkg.test() locally: 26814 / 26814 passing on Julia 1.10 (was 25481 before — +1333 new assertions across the two new testsets)
Runic-formatted

Please ignore until reviewed by @ChrisRackauckas. The follow-up plan is small PRs to DataInterpolations.jl swapping the hand-rolled walks in #525-#528's merged Akima and the in-flight #527/#528 for searchsortedlast! calls.

🤖 Generated with Claude Code

Reformulates the existing sorted-search infrastructure as explicit strategy types and adds an in-place batched API: - `SearchStrategy` abstract type with concrete subtypes `LinearScan`, `BracketGallop`, `BinaryBracket`, and `Auto`. - `Base.searchsortedlast(strategy, v, x[, hint])` and the matching `searchsortedfirst` dispatch on the strategy: * `LinearScan` — walk ±1 from the hint. * `BracketGallop` — `bracketstrictlymontonic` + bounded binary search. * `BinaryBracket` — plain `searchsortedlast` / `searchsortedfirst`. * `Auto` — picks based on hint validity and `length(v)`: no/invalid hint or `length(v) > 32` → `BracketGallop`, hint in range and `length(v) ≤ 32` → `LinearScan`, no hint → `BinaryBracket`. - `searchsortedlast!(idx_out, v, queries; strategy = Auto())` and `searchsortedfirst!(...)` — in-place batched variants. For sorted `queries` the previous result is used as a hint for the next, so the total cost is O(length(v) + length(queries)) under `BracketGallop`. Unsorted `queries` fall back to per-element search with no hint. `searchsortedfirstcorrelated` / `searchsortedlastcorrelated` and `searchsortedfirstvec` / `searchsortedlastvec` are kept as backwards- compatible wrappers over the new dispatched API, so no existing caller breaks. Version bumped to 1.9.0 since this is a backward-compatible feature addition. The intent is that downstream callers like SciML/DataInterpolations.jl that hand-roll a sorted-batch lockstep walk (`while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1`) can instead call `searchsortedlast!` and get correct O(log gap) behavior at every n/m ratio, not just dense queries. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

…chmarks Expands the strategy hierarchy from the previous commit and tunes the `Auto` heuristic against a real benchmark sweep (`bench/strategies.jl`, results checked in at `bench/results.md`). New concrete strategies: - `ExpFromLeft`: galloping forward from a left-bound hint (`hint` must be ≤ the answer; otherwise falls back to a full search). Backs `searchsortedfirstexp`-style access, suitable for batched-sorted loops where the previous result is a natural lower bound. - `InterpolationSearch`: extrapolates the answer index from a linear interpolation between `v[1]` and `v[end]`, then refines with a bounded binary search. O(1) per query on uniformly-spaced data, O(log n) worst case. Numeric eltypes only; falls back to `BinaryBracket` for non-numeric `v`. Refined `Auto`: - Per-query path is unchanged in spirit but with a smaller threshold (16 elements) before falling back to `LinearScan`. - Batched path now picks based on the *actual* expected gap between consecutive results, not just `length(v) / length(queries)`. For numeric data this uses the span ratio `(queries[end] - queries[1]) / (v[end] - v[1])`, which correctly detects sorted-burst queries (all clustered inside one segment of `v`) and routes them to `LinearScan`. For non-numeric data, the length ratio is used unchanged. - `m == 1` and `m == 0` are handled by fast paths that skip the span/issorted heuristics entirely (saves ~20 ns of bookkeeping that used to dominate the wall time on single-element batches). - `BracketGallop` is dropped from `Auto`'s batched choices because `ExpFromLeft` strictly beats it on sorted forward queries (its bidirectional bracketing wastes effort). - `InterpolationSearch` is intentionally not part of the `Auto` choice: the benchmark shows it wins big on uniformly-spaced data but loses catastrophically on `two_scale` (clustered) data, and detecting which regime applies cheaply enough to amortize over small `m` isn't feasible without an O(n) `looks_linear` check. Users who know their data is roughly linear opt in explicitly. Benchmark sweep (`n × m × spacing × query-pattern × strategy`, 240 cells): | | within 20% of best | > 20% slower | |--------------------------------|--------------------|--------------| | Auto **before this commit** | 89 / 240 (37%) | 151 / 240 | | Auto **after this commit** | 214 / 240 (89%) | 10 / 240 | The remaining 10 cells are all sparse queries on uniform/jittered data where `InterpolationSearch` would dominate but isn't safe to pick automatically — well-documented trade-off. Tests: 26929 / 26929 passing on Julia 1.10 (was 26814 before — +115 new assertions covering ExpFromLeft, InterpolationSearch, and the batched-Auto heuristic on dense / sparse / dense-burst / m=1 / m=0 / non-numeric inputs). Also caught and fixed an `ExpFromLeft` correctness bug: when the hint was past the actual answer (`x < v[hint]`), the function returned an index ~`hint - 1` instead of the correct value (could be 0 if x was below all of v). Added an explicit `x < v[hint] → fall back to full search` check; covered by the new tests. `bench/strategies.jl` is reproducible via `julia --project=bench bench/strategies.jl` (30 samples per cell, ~3 minutes wall time); `bench/results.md` is the checked-in artifact from the latest run. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-05-17T11:36:14Z

Pushed a second commit (163465e) that:

Adds two more strategies to round out coverage:
- ExpFromLeft — galloping forward from a left-bound hint, equivalent to searchsortedfirstexp but as a dispatchable strategy. Suited to batched-sorted loops where hint = previous_result.
- InterpolationSearch — linear-extrapolation guess then bounded binary search. O(1) per query on uniformly-spaced numeric data, O(log n) worst case, falls back to BinaryBracket for non-numeric v.
Reworks Auto based on a benchmark sweep (bench/strategies.jl, results checked in at bench/results.md). The matrix covers n × m × spacing × query-pattern × strategy — 5 spacings (uniform, log-spaced, jittered-near-uniform, random, two-scale-clustered) × 4 query patterns (sorted-uniform, sorted-dense-burst, sorted-near-start, unsorted) × 3 n × 4 m = 240 cells.

Before/after Auto verdicts on the same 240-cell matrix:

Auto state within 20% of best > 20% slower

original length(v) ≤ 32 heuristic 89 / 240 (37%) 151 / 240

bench-tuned (this commit) 214 / 240 (89%) 10 / 240

Key changes that drove this:
- Batched-Auto now picks based on the actual expected gap, using the span ratio (queries[end] - queries[1]) / (v[end] - v[1]) for numeric data. This correctly routes dense-burst queries (clustered in one segment of v) to LinearScan even when length(v) / length(queries) would suggest otherwise.
- BracketGallop is no longer in Auto's batched choice — ExpFromLeft strictly beats it for forward-moving sorted queries (its bidirectional bracketing wastes effort).
- m == 1 fast path skips the span/sortedness heuristic (saves ~20 ns/call).
- Strategy dispatch in Auto's sorted loop is now via concrete if/else branches so each leg is type-stable and gets fully specialized.
Surfaced and fixed a real correctness bug in ExpFromLeft: when the hint was past the answer (x < v[hint]), the function returned ~hint - 1 instead of the correct (often-zero) index. Now falls back to a full searchsortedlast/searchsortedfirst in that case. Covered by the new tests.
Why InterpolationSearch is intentionally not in Auto's choice: on uniform/jittered data it wins big (~2× on n=65536, m=10), but on two_scale (clustered) data it's catastrophically slow (4-13× worse than ExpFromLeft at n=65536, m=4096). Detecting which regime applies cheaply enough to amortize over small m would require an O(n) looks_linear check that's net-negative on the cases where it'd help most. Users who know their data is roughly linear can opt in with strategy = InterpolationSearch(). The remaining 10 Auto-worse-than-20% cells in the bench are exactly these cases.

Test count: 26814 → 26929 (+115). Full bench output is in bench/results.md.

Expands the strategy lineup, the bench coverage, and tightens the Auto heuristic again based on the broader data: - **`GuesserHint(::Guesser)` strategy** — wraps the existing `Guesser`-based correlated search as a dispatchable strategy so it can be passed to the new batched/per-query API alongside the others. - **Sampled-linearity probe** — `_sampled_looks_linear` reads 5 fixed elements of `v` (first, n/4, n/2, 3n/4, last) and tests whether the interior points sit within 5% of the straight line between the endpoints. ~12 ns regardless of `length(v)`. Used by Auto to unlock `InterpolationSearch` on uniform/jittered data, gated by: * `gap >= 64` (sparse enough that InterpolationSearch's per-query advantage matters) * `length(v) >= 1024` (the linearity probe + InterpSearch overhead is amortized) * `length(queries) >= 2` (m == 1 has its own fast path) - **`m == 1` fast path** — Auto's batched path now short-circuits for a single-element batch directly into `Base.searchsortedlast`/`first`, saving the issorted + span heuristic cost (~20 ns). - **Threshold fix** — gap check is now `>=` `_AUTO_INTERP_MIN_GAP` (was `>`), so the boundary case where `gap` equals the threshold qualifies for `InterpolationSearch`. - **`ExpFromLeft` correctness fix** — previously assumed the hint was a valid lower bound, which silently returned wrong indices when `x < v[hint]`. Now detects that case and falls back to a full `Base.searchsortedlast` / `searchsortedfirst`. Covered by new tests. - **`bench/strategies.jl` broadened** — five sub-sweep modes (`MODE=fast` (default), `full`, `spacing`, `ratio`, `pattern`, `extreme`), 10 knot spacings (added `power2`, `sqrt`, `plateau`, `bimodal`, `near_linear` on top of the original 5), 8 query patterns (added `sorted_arithmetic`, `sorted_geometric`, `sorted_bimodal`, `sorted_repeated` on top of the original 4). Bench verdict across all sub-sweeps (772 cells total): | sub-sweep | within 20% of best | |----------------------|--------------------| | spacing variety | 10 / 10 (100%) | | pattern variety | 8 / 8 (100%) | | super-dense (m ≫ n) | 270 / 270 (100%) | | fast sweep | 213 / 240 (89%) | | super-sparse (m ≪ n) | 156 / 180 (87%) | | n/m crossover | 40 / 64 (62%) | | **total** | **697 / 772 (90%)** | The remaining 6% are mostly `m ≤ 4` sparse queries on uniform data where the absolute per-call cost is ~100 ns and `InterpolationSearch` would still win by 20-40 ns — a documented opt-in trade-off, since lowering the linearity-probe threshold further would risk picking InterpolationSearch on borderline-non-linear data where it loses by 4-13×. Tests: 26929 / 26929 still passing. New artifacts in `bench/`: - `bench/strategies.jl` — the bench script - `bench/results.md` — fast sweep verdict (the default mode) - `bench/results_focused.md` — spacing/pattern/ratio/extreme breakdowns Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-05-17T11:52:25Z

Pushed e26ac82 answering your three questions in turn:

1. Audit — does this maintain everything that was there before?

Yes. All these existing names keep working unchanged: bracketstrictlymontonic, searchsortedfirstexp, searchsortedfirstcorrelated(v, x, ::Integer), searchsortedlastcorrelated(v, x, ::Integer), the Guesser overloads of both, Range fast paths, searchsortedfirstvec/searchsortedlastvec, looks_linear, Guesser, findfirstequal, findfirstsortedequal. The new strategies wrap the existing algorithms rather than replacing them; legacy callers don't change.

2. More strategies to consider — added one

GuesserHint(::Guesser) — wraps the existing Guesser-based correlated search as a dispatchable strategy so it slots into the new batched/per-query API alongside LinearScan / BracketGallop / ExpFromLeft / InterpolationSearch / BinaryBracket / Auto. The Guesser already decides between linear-extrapolation lookup and using its cached previous index; GuesserHint simply re-exposes that under the strategy dispatch.

I also evaluated and rejected a couple of candidates:

InterpolationGallop would essentially duplicate InterpolationSearch (which already uses gallop refinement internally).
KAryBinarySearch / ParallelBinarySearch would be substantial implementations whose payoff vs. binary search is dominated by branch-prediction characteristics that vary across CPUs — out of scope.

3. Bench depth — vastly broadened

bench/strategies.jl now has five sub-sweep modes (MODE=fast default, full, spacing, ratio, pattern, extreme), with:

10 knot spacings (added power2, sqrt, plateau, bimodal, near_linear on top of uniform, log, jittered, random, two_scale)
8 query patterns (added sorted_arithmetic, sorted_geometric, sorted_bimodal, sorted_repeated on top of the original 4)
n up to 262 144, m up to 16 384, and a dedicated super-sparse / super-dense sub-sweep that takes the ratio to its extremes

Final Auto verdict across 772 cells: 697 within 20% of best (90%), 50 worse (6%), 25 effectively tied.

Per-sub-sweep:

sub-sweep	within 20%
spacing variety (10 cells, all 10 spacings)	10/10 (100%)
query patterns (8 cells, all 8 patterns)	8/8 (100%)
super-dense (270 cells, m ≫ n)	270/270 (100%)
fast sweep (240 cells)	213/240 (89%)
super-sparse (180 cells, m ≪ n)	156/180 (87%)
n/m crossover (64 cells, all n × m combos on uniform)	40/64 (62%)

What the heuristic now does

The Auto choice for searchsortedlast!(out, v, queries; strategy = Auto()):

m == 0 → return immediately
m == 1 → bypass everything, dispatch straight to Base.searchsortedlast (saves ~20 ns of bookkeeping)
!issorted(queries) → per-element Base.searchsortedlast with no hint
Compute gap = floor(n × (queries-span / v-span) / m). The span ratio (for numeric data) detects dense_burst queries clustered inside a single segment of v, which would have n/m ≫ 0 but actual_gap ≈ 0.
gap ≤ 4 → LinearScan
gap ≥ 64, length(v) ≥ 1024, m ≥ 2, and the sampled linearity probe (5 reads of v at first, n/4, n/2, 3n/4, last — total ~12 ns — checks the interior points are within 5% of the straight line) accepts → InterpolationSearch. The probe is what prevents Auto from picking InterpolationSearch on non-linear data, where it loses badly (~4-13× on plateau and two_scale spacings at large m).
Otherwise → ExpFromLeft

Caught and fixed a real bug

ExpFromLeft previously assumed hint ≤ answer. When x < v[hint] (i.e., the hint was past the answer), it returned ~hint - 1 — wrong, sometimes wildly so. Now detects that case and falls back to a full Base.searchsortedlast / searchsortedfirst. Covered by the new tests.

Artifacts in the PR

bench/strategies.jl — the bench harness, runnable via julia --project=bench bench/strategies.jl (or with one of the MODE=… env vars for the focused sweeps).
bench/results.md — captured fast-sweep results (240 cells).
bench/results_focused.md — captured spacing / pattern / ratio / extreme sub-sweep results (~500 cells).

Both are reproducible; they're the actual output from this commit run on Julia 1.10.

Tests: 26929 / 26929 passing locally. CI still waiting for the same first-time-contributor approval.

Two algorithmic refinements driven by the broader benchmark sweep: - **Skew detection in `_estimate_avg_gap`** — the span-based gap estimator now returns `(gap, skewed)` where `skewed` is true when the median query sits more than 20 % of the query span away from the linear midpoint. Gated on `m ≥ 10` so small-batch statistical noise doesn't trip it. For sorted_near_start / sorted_geometric / sorted_bimodal patterns (most queries clustered in part of `v`'s range), the unmodified span heuristic massively underestimated the effective gap and routed Auto to `LinearScan` when `ExpFromLeft` was best (1.8-3.1× slowdown on those cells before). - **`InterpolationSearch` gated on `!skewed`** — even on perfectly uniform `v`, when queries are clustered, `ExpFromLeft` from `prev_idx` beats `InterpolationSearch` because each consecutive query's true index is close to the previous one's. The skew flag now prevents Auto from gambling on InterpSearch in that regime. - **Tightened sampled-linearity probe** from 5 probes / 5 % tolerance to 9 probes (k·n/10 for k = 1..9) / 0.1 % tolerance. This reliably rejects sorted-random data — which still looks linear-ish to a loose probe at large n — without rejecting any genuinely uniform or jittered grid. Probe cost rises from ~12 ns to ~25 ns and now catches the case where Auto used to pick InterpSearch on random data and lose 50-150 %. - **`AbstractRange` short-circuit** — `_sampled_looks_linear(::AbstractRange)` returns true without sampling. Ranges are definitionally uniform. Bench expanded so the heuristic can be vetted in detail: - Five **tuning sub-sweeps** (`MODE=tune`) that locate the actual strategy crossover points empirically — LinearScan↔ExpFromLeft crosses at gap ≈ 5; ExpFromLeft↔InterpSearch crosses around gap 8 *on uniform data*; minimum useful n for InterpSearch is ~2 048; minimum useful m is ~2; InterpSearch's downside on non-linear data bottoms out at 1.5× (random) and tops out at 14× (plateau). - A **3 920-cell validation sweep** (`MODE=validate`) — every combination of 10 spacings × 8 query patterns × 7 `n` × 7 `m`. - Two more knot spacings (`near_linear`, `power2`) and pattern variety (`sorted_arithmetic`, `sorted_geometric`, `sorted_bimodal`, `sorted_repeated`) on top of the v9 set. Bench verdict over **4 692 cells across all sub-sweeps**: | sub-sweep | within 20 % | regression worst case | |--------------------------|-------------|-----------------------| | spacing (10) | 10 / 10 | none | | pattern (8) | 7 / 8 | 1.51× at m=512 | | super-dense (270) | 263 / 270 | 1.21× borderline | | fast (240) | 221 / 240 | 1.59× at m=10 | | super-sparse (180) | 147 / 180 | 2.00× at m=4 | | n/m crossover (64) | 51 / 64 | 1.62× at m=4096 | | **validate (3 920)** | **3 289 / 3 920 (84 %)** | **2.64× (1 cell)** | | **aggregate (4 692)** | **3 988 (85 %)** | **only 1 cell exceeds 2 ×** | Tests: 26 929 / 26 929 still passing. Artifacts: - `bench/strategies.jl` — bench harness with 6 modes - `bench/results.md` — fast sweep - `bench/results_focused.md` — spacing / pattern / ratio / extreme - `bench/results_validate.md` — the 3 920-cell sweep - `bench/results_tune.md` — crossover measurements Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-05-17T12:15:41Z

Pushed 65ea6c0. Much more thorough vetting + two algorithmic refinements:

Algorithmic changes

1. Skew detection. Tuning sweeps surfaced a previously-hidden regression: on sorted_near_start / sorted_geometric / sorted_bimodal patterns (most queries clustered in part of v's range), the span-based gap estimator underestimated the effective gap, and Auto routed to LinearScan when ExpFromLeft was best (1.8 – 3.1× slowdown).

_estimate_avg_gap now returns (gap, skewed), where skewed = true when the median query sits more than 20 % of the query span away from the linear midpoint. Gated on m ≥ 10 so small-batch statistical noise doesn't trip it.

2. InterpolationSearch gated on !skewed. Even on perfectly uniform v, when queries are clustered, ExpFromLeft from prev_idx beats InterpolationSearch because each consecutive query's true index is close to the previous one's. The skew flag now prevents Auto from gambling on InterpSearch in that regime.

3. Tightened linearity probe. From 5 probes / 5 % tolerance to 9 probes (k·n/10 for k = 1..9) / 0.1 % tolerance. Reliably rejects sorted-random data — which still looked linear-ish to the looser probe at large n — without rejecting any genuinely uniform or jittered grid. Probe cost is ~25 ns (was ~12 ns). This was the fix for Auto being 50 – 150 % slower than ExpFromLeft on random sorted data after InterpolationSearch was added.

4. AbstractRange short-circuit. _sampled_looks_linear(::AbstractRange) returns true without sampling; ranges are definitionally uniform.

Bench changes

Five tuning sub-sweeps (MODE=tune) that empirically locate the strategy crossovers — LinearScan ↔ ExpFromLeft crosses at gap ≈ 5; ExpFromLeft ↔ InterpSearch crosses near gap 8 on uniform data; minimum useful n for InterpSearch is ~2048; minimum useful m is 2; InterpSearch's downside on non-linear data ranges from 1.5× (random) to 14× (plateau).
A 3 920-cell validation sweep (MODE=validate) — every combination of 10 spacings × 8 query patterns × 7 n × 7 m.
Two more knot spacings (near_linear, power2, sqrt) and pattern variety (sorted_arithmetic, sorted_geometric, sorted_bimodal, sorted_repeated).

Final verdicts

Across 4 692 cells in all sub-sweeps:

sub-sweep	within 20 %	worst case
spacing variety (10 cells)	10/10 (100 %)	none
pattern variety (8 cells)	7/8 (88 %)	1.51× at m=512
super-dense, m ≫ n (270 cells)	263/270 (97 %)	1.21× borderline
fast sweep (240 cells)	221/240 (92 %)	1.59× at m=10
super-sparse (180 cells)	147/180 (82 %)	2.00× at m=4
n/m crossover (64 cells)	51/64 (80 %)	1.62× at m=4096
validation (3 920 cells)	3 289/3 920 (84 %)	2.64× (1 cell)
aggregate (4 692 cells)	3 988 (85 %)	only 1 cell > 2×

The single cell exceeding 2× has a 180 ns absolute difference. The residual ~15 % outside-20 % cells are mostly m = 4 regimes where Auto's bookkeeping cost (issorted check, gap calc, linearity probe — totals ~30 – 50 ns) is non-negligible relative to the 100 – 300 ns total per-batch budget, and where direct InterpolationSearch would have saved another ~30 – 50 ns. Closing that gap would require type-specializing the dispatch further (e.g. via Val parameters), which I don't think is worth the added complexity.

I tried bumping _AUTO_INTERP_MIN_M from 2 to 10 to see if that helped (theory: skip InterpSearch's overhead on small batches). It was strictly worse (3 224 vs 3 289 cells within 20 %) — Auto's m = 2..9 wins on uniform sparse data are real and worth keeping.

Tests: 26 929 / 26 929 passing. CI awaiting maintainer approval as usual.

Three small CI fixes from the FFF SciML#65 run: - `bench/strategies.jl`: rename loop variable `strat` to `strategy` so the typo check stops flagging it as `start` / `strata`. - `Project.toml`: add `StableRNGs = "1"` to `[compat]` — Aqua's `test_deps_compat` requires every test-extras dep to have a compat bound. - `docs/src/index.md`: add `@docs` entries for the new public symbols (`searchsortedlast!`, `searchsortedfirst!`, plus the strategy types `SearchStrategy` / `LinearScan` / `BracketGallop` / `ExpFromLeft` / `InterpolationSearch` / `BinaryBracket` / `GuesserHint` / `Auto`) so Documenter's cross-reference resolver finds them and the `missing_docs` check passes. Tests: 26929 / 26929 still passing locally. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

The 1.9 release shipped the new strategy-dispatched API alongside the existing single-purpose names as thin wrappers. Cleaner long-term shape is a single canonical API, so 2.0 removes the legacy surface: - **Removed**: `searchsortedfirstcorrelated`, `searchsortedlastcorrelated` (both `Integer` and `Guesser` overloads). - **Removed**: `searchsortedfirstvec`, `searchsortedlastvec` (allocating batched variants — callers own their output buffer and use the in-place `searchsortedfirst!` / `searchsortedlast!`). - **Made internal**: `searchsortedfirstexp` and `bracketstrictlymontonic`. They're still in the module as implementation helpers for `ExpFromLeft` and `BracketGallop` respectively, but no longer documented or exported. - **Kept (documented public)**: `searchsortedfirst!` / `searchsortedlast!`, `SearchStrategy` and all concrete strategies (`LinearScan`, `BracketGallop`, `ExpFromLeft`, `InterpolationSearch`, `BinaryBracket`, `GuesserHint`, `Auto`), `Guesser`, `looks_linear`, `findfirstequal`, `findfirstsortedequal`. The hint provider stays because it's orthogonal to the dispatch; the equality-search functions stay because they're a different problem. Migration for downstream callers: | old | 2.0 | |--------------------------------------------------------------------|-----------------------------------------------------------------| | `searchsortedfirstcorrelated(v, x, guess::Integer)` | `searchsortedfirst(BracketGallop(), v, x, guess)` | | `searchsortedlastcorrelated(v, x, guess::Integer)` | `searchsortedlast(BracketGallop(), v, x, guess)` | | `searchsortedfirstcorrelated(v, x, ::Guesser)` | `searchsortedfirst(GuesserHint(g), v, x)` | | `searchsortedlastcorrelated(v, x, ::Guesser)` | `searchsortedlast(GuesserHint(g), v, x)` | | `searchsortedfirstvec(v, x)` | `searchsortedfirst!(buf, v, x)` (caller-owned `buf`) | | `searchsortedlastvec(v, x)` | `searchsortedlast!(buf, v, x)` | | `searchsortedfirstexp(v, x, lo, hi)` | `searchsortedfirst(ExpFromLeft(), v, x, lo)` | Tests rewritten to exercise the strategy API directly. Docs index updated to present the strategy API as canonical, with `Guesser` / `looks_linear` / equality-search as supporting helpers. `Project.toml` bumped to `2.0.0`. Downstream callers in SciML that import the removed names will need companion PRs — see SciML/DataInterpolations.jl#529 for the first one. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Two changes bundled because they touch the same area of the codebase: 1. **`SIMDLinearScan` search strategy.** New `SearchStrategy` subtype that lowers the forward `LinearScan` walk to 8-wide SIMD chunks via custom LLVM IR. Specialised for `DenseVector{Int64}` and `DenseVector{Float64}` through two ordered-compare predicates each (`sgt`/`sge` for Int64, `ogt`/`oge` for Float64), generated from a shared template `_simd_scan_ir(t, pred)` modelled on the existing `FFE_IR` equality scan. Any other element type falls back to scalar `LinearScan` (dispatch is static, no runtime branch). Non-`Forward` orderings and the no-hint path also fall back. Backward walks stay scalar — the SIMD primitive is forward-only. `SIMDLinearScan` is opt-in: `Auto` does not pick it, because the regime where it beats `LinearScan` (long forward walks) overlaps with where `Auto` already prefers `BracketGallop` or `ExpFromLeft`. Pin it when you have a workload that wants a long forward scan and you know the eltype. Caveats documented in `strategies.md`: - `Int64` / `Float64` only; everything else uses scalar `LinearScan`. - NaN in a `Float64` vector compares false under `fcmp o*`, so NaNs are silently skipped by the SIMD scan. Sorted vectors with NaN aren't well-defined under any total order anyway. - Forward ordering only; reverse falls back to scalar. Tests (`test/runtests.jl`): - 10000 random fuzz comparisons against `Base.searchsortedlast` / `Base.searchsortedfirst` per eltype (Int64, Float64) under `StableRNG`. - Edge cases: empty / single-element / duplicates / out-of-range hints / x outside the vector range. - Fallback paths: Int32, Float32, String, no-hint, reverse order. Precompile workload extended to cover the new strategy on both specialised eltypes. 2. **Docs restructured into four topical pages** plus a slimmed landing page: - `interface.md`: public API surface, strategy contract, how to add a new strategy (with correctness-check pattern), note that `Auto` is not externally extensible. - `strategies.md`: catalog of every shipped strategy with a chooser table (best case / worst case / hint usage), per-strategy notes. `SIMDLinearScan` lives here with its caveats. The equality-search functions (`findfirstequal`, `findfirstsortedequal`) are linked from the bottom of the page, separated because their return semantics (`nothing` on miss) don't match the positional-search API. - `guessers.md`: `Guesser` + `GuesserHint` API, usage patterns (interpolation-segment lookup), and explicit list of when *not* to use a `Guesser`. - `auto.md`: full `Auto` decision tree for both per-query and batched callers, every crossover constant, justification per branch, and a self-contained benchmark script that reproduces the regime-grid comparison. `docs/make.jl` updated to register the new pages. `index.md` cut down to overview + per-section links. All 47088 tests pass (20000 new fuzz cases for `SIMDLinearScan` plus the existing 27013). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

FFF 2.0 (SciML/FindFirstFunctions.jl#65) drops the legacy single-purpose names (`searchsortedfirstcorrelated`, `searchsortedlastcorrelated`, and the allocating `*vec` batched forms) in favour of a single strategy- dispatched API. This commit migrates DataInterpolations off the legacy surface. - `src/DataInterpolations.jl`: import `FindFirstFunctions` plus `Guesser` only; drop the legacy correlated-lookup name imports. - `src/interpolation_utils.jl`: split `get_idx` into two methods. The `iguess::Integer` overload routes through `searchsortedfirst(BracketGallop(), tvec, t, iguess)` / `searchsortedlast(...)`. The `iguess::Guesser` overload routes through `searchsortedfirst(GuesserHint(iguess), tvec, t)`. Both forms preserve the existing `lb`/`ub_shift`/`idx_shift`/`side` keyword contract. - `test/derivative_tests.jl`, `test/interpolation_tests.jl`: cached-index tests now use `searchsortedfirst(GuesserHint(A.iguesser), A.t, t)` and `searchsortedfirst(BracketGallop(), func.t, _t, func.iguesser(_t))`. - `src/interpolation_methods.jl`: trim explanatory/update prose comments that grew during the batch-evaluator work. Section headers and structural `else # Extension` markers retained; per-task narrative comments removed. - `Project.toml`: bump `FindFirstFunctions` compat to `2`. All 57879 DI tests pass locally on Julia 1.11: Core | 2539 pass, 5 broken (pre-existing) Methods | 42151 pass Extensions | 13178 pass Misc | 11 pass Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Callers issuing many short batches against the same sorted vector currently pay Auto's per-call probes — `_sampled_looks_linear(v)` reads 11 elements each time, ~25 ns. Caching the probe once is the obvious save for interpolation-segment lookups and similar repeated-vector workloads. - `SearchProperties` is an isbits struct holding `(has_props, is_linear, has_nan)`. Default constructor returns the "no information" sentinel (`has_props = false`); `SearchProperties(v)` runs the probes and populates the fields. - `Auto` gains a `props::SearchProperties` field. `Auto()` keeps current behaviour (sentinel props → probe per call); `Auto(SearchProperties(v))` skips the probe in the batched path. Single concrete struct, not parametric — keeps call sites type-stable without specialization explosion. - The batched dispatch reads `s.props.is_linear` instead of probing when `has_props` is set. Other fields populated for forward compatibility but not consumed yet — `has_nan` is the obvious next consumer when `SIMDLinearScan` gets wired into Auto's decision tree. Trust-the-caller contract: stale or lying `SearchProperties` is correctness-preserving — `InterpolationSearch` from a bad guess falls through to `BracketGallop` (slow but O(log n)). The caller is responsible for re-running `SearchProperties(v)` if `v` mutates. Tests cover sentinel construction, populated construction on linear/ non-linear/NaN-containing/integer vectors, output equivalence between `Auto()` and `Auto(props)`, the `isbits` guarantee, and a deliberately- lying cache that still produces correct results. 47100 tests pass (was 47088; 12 new cases). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Single-file changelog covering the 1.x → 2.0 transition. Sections: - Removed names (six legacy functions) with a one-to-one migration table to the new strategy dispatch. - Made-internal helpers (`searchsortedfirstexp`, `bracketstrictlymontonic`) and how to reach the same algorithm via the strategy API. - The new strategy-dispatched search API (seven shipped strategies + `Auto`). - The new batched in-place API (`searchsortedfirst!` / `searchsortedlast!`). - The `Auto` heuristic, the crossover constants, and how they were empirically derived from the regime-grid sweep. - `SearchProperties` cache integration with `Auto`. - `SIMDLinearScan` with the eltype / NaN / order caveats. - Documentation restructure into five topical pages. - Internal: shared `_simd_scan_ir` template now backs both the equality scan and the positional-search compares. - Unchanged: equality search (`findfirstequal`, `findfirstsortedequal`) — deliberately separate API. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Equality search now flows through the strategy framework via a new wrapper `findequal(strategy, v, x[, hint])` that returns an `Int` with the sentinel `firstindex(v) - 1` for "not found". Type-stable, composes with every existing strategy. The new `BisectThenSIMD <: SearchStrategy` short-circuits to `findfirstsortedequal`'s algorithm on `DenseVector{Int64}`. For positional dispatch (`searchsortedfirst`/`searchsortedlast`), `BisectThenSIMD` delegates to `BinaryBracket` — bisect-then-equality-scan can't answer "where would x insert?". The new tests exposed four pre-existing duplicate-handling bugs, all fixed in this commit: - `searchsortedfirst(BracketGallop(), v, x, hint)` returned the rightmost duplicate when the hint landed in a run of equals. Cause: `bracketstrictlymontonic` used `x < v[lo]` to choose direction, which is correct for `searchsortedlast` but wrong for `searchsortedfirst`. Fix: add `bracketstrictlymontonic_first` with the inverted polarity `v[lo] < x`, used only by the `searchsortedfirst` overload. - `searchsortedfirst(InterpolationSearch(), v, x)` propagated the same bug via `BracketGallop`. Fixed by the bracket-first fix. - `searchsortedfirst(ExpFromLeft(), v, x, hint)` exponential-searched forward when `v[hint] == x`, missing earlier duplicates. Fixed by falling back to a full search whenever `v[hint] >= x` (the exponential-forward path is only safe when `v[hint] < x`). - `findfirstsortedequal(var, vars)` bisected with `vars[mid] <= var` and advanced offset on equal hits, walking past the first duplicate. Fixed by tightening to strict `<` and adjusting the window-shrink rule so the "keep" branch retains the midpoint in the next window (`len = half + 1` instead of `len -= half`). The fast-path LLVM IR branch select is replaced by plain `ifelse`; Julia compiles to the same branchless `select` modulo the `!unpredictable` metadata. Tests: 84120 pass (was 47100). The new `findequal + BisectThenSIMD` testset runs 2500 random-trial parity checks against `Base` across all shipped strategies on Int64 and Float64 vectors (including frequent duplicates), the OffsetArray-style sentinel, reverse ordering, empty and single-element edge cases, and a check that `BisectThenSIMD` in positional dispatch correctly delegates to `BinaryBracket`. Docs: - `strategies.md` gains a "Equality search through the strategy framework" section with the `findequal` and `BisectThenSIMD` @docs entries. - The Equality-search appendix now points callers to `findequal` as the preferred sorted-equality API and explains why `findfirstequal` (unsorted) stays outside the strategy framework. - NEWS.md adds the new-functionality section and the four-way bug-fix section with concrete before/after behaviour. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

…ty page Six small cleanups grouped under one commit: 1. **Typo**: rename `bracketstrictlymontonic` to `bracketstrictlymonotonic` (and the `_first` companion). Internal helpers — no downstream impact. 2. **FFE_IR unification**: replace the ~60-line literal `FFE_IR` string with `_simd_scan_ir("i64", "eq")`, using the same SIMD-scan IR template that backs the four `>`/`>=` predicates for `SIMDLinearScan`. All five SIMD primitives now share a single IR source. The new `_simd_scan_ir` is now defined at the top of the module so `FFE_IR` can refer to it; the duplicate copy further down has been removed. 3. **Docstring refresh**: - `findfirstequal` now documents the unsorted-input contract, the Int64 SIMD specialization vs. the generic `findfirst(==(x), v)` fallback, and points at `findequal` / `findfirstsortedequal` for the sorted-equality alternatives. - `findfirstsortedequal` now points at `findequal(BisectThenSIMD(), …)` as the preferred sentinel-returning strategy-framework variant and explains its role as the dedicated `Union{Int64, Nothing}`-returning name. 4. **OffsetArray sentinel note**: `findequal` docstring spells out that the "not found" sentinel is `firstindex(v) - 1` (= `0` for 1-based vectors, adjusts for OffsetArrays), and instructs callers to test against `firstindex(v)` rather than the literal `0`. 5. **`SearchProperties(v; linear_tolerance = 1e-3)`** exposes the sampled-linearity probe's tolerance as a kwarg, symmetric to `Guesser`'s `looks_linear_threshold`. The internal `_sampled_looks_linear` is now `(v, tol::Float64)` with `tol` defaulting to `_AUTO_LINEAR_REL_TOLERANCE` (= 1e-3) for callers that don't pass one. Loosening to `1e-2` widens the regime where `Auto` picks `InterpolationSearch`; tightening to `1e-4` is more conservative. 6. **Exports**: 2.0 now exports the public API surface (`SearchStrategy`, every concrete strategy, `SearchProperties`, `Guesser`, `looks_linear`, `searchsortedfirst!`, `searchsortedlast!`, `findequal`, `findfirstequal`, `findfirstsortedequal`). Previously the package exported nothing, so callers had to fully qualify every reference (`FindFirstFunctions.LinearScan()`). `using FindFirstFunctions` is now sufficient to access the full API; previously qualified call sites continue to work. **New `equality.md` doc page**: the `findfirstequal` / `findfirstsortedequal` docs lived as an appendix on `strategies.md`, which was wrong because these functions don't match the strategy-dispatch contract (different return type, different question). Moved to their own page with a chooser table, the sentinel-vs.-Nothing discussion, and the shared SIMD-primitive notes. `strategies.md` keeps a one-paragraph stub pointing at the new page; `index.md` and `make.jl` updated to register and surface the new page in the nav. 84120 tests pass (unchanged — these are all internal renames or doc changes). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Comprehensive rebenchmark of `Auto`'s batched decision tree against the shipped strategies (`LinearScan`, `SIMDLinearScan`, `BracketGallop`, `ExpFromLeft`, `InterpolationSearch`) on a 1080-cell sweep covering 5 `v` patterns × 4 query patterns × 5 `n` sizes × 6 `m` sizes × 2 element types (`Int64`, `Float64`). Baseline (`Auto` before this commit): median 1.18x, p95 2.09x, max 2.78x This commit: median 1.04x, p95 1.38x, max 2.18x With `Auto(SearchProperties(v))`: median 1.03x, p95 1.38x, max 2.09x New decision-tree branches: - **`SIMDLinearScan` in the medium-gap regime.** When `v` is `DenseVector{Int64}` or `DenseVector{Float64}` (NaN-free for Float64), `Auto` now dispatches to `SIMDLinearScan` for `gap ∈ (4, _auto_simd_gap_max(v)]`. `_auto_simd_gap_max` is 64 for both eltypes — the eltype-specific accessor leaves room to lower the Int64 ceiling later without changing the Float64 path. For Float64 the NaN check consults `SearchProperties.has_nan` when a populated cache is attached; otherwise no-NaN is assumed (same trust contract Base uses for sortedness). The bench shows `SIMDLinearScan` winning 25% of cells overall, with median 1.94× speedup over `LinearScan` in the cells it wins. - **`BracketGallop` preferred over `ExpFromLeft` at large gaps.** New constant `_AUTO_GALLOP_GAP_MIN = 16`. At gap ≥ 16, `ExpFromLeft`'s 5 up-front linear probes are guaranteed to miss (the answer is much more than 5 elements past the previous-result hint), and `BracketGallop`'s doubling-from-`hint` walk is strictly faster. - **Tiered linearity probe for `InterpolationSearch`.** The strict `_AUTO_LINEAR_REL_TOLERANCE = 1e-3` gates the `_AUTO_INTERP_MIN_GAP ≤ gap < _AUTO_INTERP_LOOSE_GAP` range (8 to 256). For `gap ≥ _AUTO_INTERP_LOOSE_GAP` (256), the loose `_AUTO_LINEAR_LOOSE_TOLERANCE = 5e-2` applies — `InterpolationSearch`'s cache benefit compensates for a worse guess at large gaps, and the bounded binary-search refinement is still O(log n). This unlocks `InterpolationSearch` on approximately linear data like sorted random or jittered vectors at large `n`, while still rejecting genuinely nonlinear data (log-spaced, two-scale) where `InterpolationSearch` loses 2–3×. Bug fix in `_estimate_avg_gap`: the previous code fell back to `n ÷ m` whenever the skew flag was set, which caused `SIMDLinearScan` to be picked for tightly-clustered queries (`span_q ≈ 0`) where `LinearScan`'s scalar walk is 5× faster. The skew flag now serves its intended purpose as a binary `InterpolationSearch`-unsuitability signal, while the gap value is always the span-based estimate. Reproducibility: - `bench/auto_sweep.jl`: the full sweep. `julia --project=bench bench/auto_sweep.jl` writes `bench/results.csv` and prints a summary plus the worst 30 Auto-slack cells. - `bench/analyze.jl`: reads `results.csv` and prints `SIMDLinearScan` win-by-regime tables, the n/m gap-bucket distribution, and `Auto`'s misprediction histogram. Useful for tuning future constants. - `bench/.gitignore`: keeps `results.csv` and `Manifest.toml` out of version control. Docs: - `auto.md` rewrites the decision-tree section, replaces the crossover-constants table (now includes `_auto_simd_gap_max`, `_AUTO_GALLOP_GAP_MIN`, `_AUTO_INTERP_LOOSE_GAP`, `_AUTO_LINEAR_LOOSE_TOLERANCE`), updates the headline-results table with the 1080-cell sweep numbers, and links to the actual bench script. The inline script is kept as a minimum-viable standalone reproducer. - `NEWS.md` adds the retuning section with before/after numbers, the new branches, and the bug-fix description. All 84120 tests pass (no behavioural test changes — the retuning is internal to Auto's branch selection). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Investigated a `BitInterpolationSearch` strategy that would reinterpret positive `DenseVector{Float64}` as `DenseVector{UInt64}` and run linear extrapolation on the IEEE bit pattern. The hypothesis was that this would beat `InterpolationSearch` on log-spaced data, since IEEE bit patterns are approximately linear in array index for geometric sequences. Implemented, correctness-verified, and benchmarked across `m ∈ {4..4096}` on `n = 65536` log-spaced Float64 with sparse queries. Result: the strategy loses to every existing alternative at every `m` (65–88 ns/q vs 19–41 ns/q for `ExpFromLeft`, 16–199 ns/q for `SIMDLinearScan`). The per-query division and Float64-from-UInt64 conversion cost more than the saved bracket refinement; on non-log-spaced data the bit-pattern guess is also strictly worse than the float-domain guess because the denormal-to-normalized transition introduces a discontinuity. `Auto` already serves log-spaced data well via `SIMDLinearScan` at medium gaps and `BracketGallop` / `ExpFromLeft` at large gaps. No new dispatch path was justified. NEWS.md adds an "Investigated but rejected" section documenting the exploration so future contributors don't re-derive the same negative result. 84120 tests pass; no behavioural change. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Reinstating BitInterpolationSearch after a more comprehensive bench than my earlier sweep — that earlier conclusion ("never wins") was based on a too-narrow regime grid (single v pattern, single q pattern). The new sweep at `bench/bitinterp_sweep.jl` covers 1404 cells across 9 v patterns (uniform, logspaced, logspaced_wide spanning 10⁻³ to 10¹⁵, geometric_dense, geometric_sparse, power2, sqrt, two_decade, jittered_log) × 4 q patterns (linear_uniform, log_uniform, dense_grid, log_grid) × 6 n sizes up to 2²⁰ × 7 m sizes from 4 to 16384. Results: - BitInterp wins outright in 59/1404 cells (4.2%) - Within 10% of per-cell best in 75 cells (5.3%) - Within 20% of per-cell best in 105 cells (7.5%) Wins concentrate in the regime BitInterp is designed for: log-spaced/geometric Float64 data at small m (= 4) and large n (≥ 16384). Margins range from 1.0× tie to 1.43× on `logspaced_wide log_grid n=2²⁰ m=4`. Other wide-spread wins: logspaced_wide log_grid n=2¹² m=4: 35.0 vs ExpFromLeft 47.5 (1.36×) logspaced_wide log_grid n=2¹⁸ m=4: 47.5 vs ExpFromLeft 62.5 (1.32×) logspaced_wide dense_grid n=2²⁰ m=4: 50.0 vs ExpFromLeft 65.0 (1.30×) `Auto` does NOT pick BitInterp. The wins are narrow (4% of cells in a bench specifically designed to probe BitInterp's regime), and adding the eligibility check + sampled-log-linearity probe to Auto's hot path would penalize the much larger set of non-log-spaced workloads. Users with known log-spaced workloads can pin `searchsortedlast!(out, v, queries; strategy = BitInterpolationSearch())` once and get the win at zero heuristic cost. Implementation: - `_bit_interp_guess_f64(v, x, lo, hi)` does per-element bitcast rather than reinterpret(UInt64, v) on the whole vector, avoiding the ReinterpretArray wrapper overhead. - Strict guard on positive + finite v[lo] and x; non-positive or non-finite Float64 bit patterns are not monotonic with float value under raw reinterpret, so the strategy falls back to BinaryBracket. - Non-Float64 dense eltypes fall back to plain InterpolationSearch (where the bit pattern equals the value). Tests pass (84120). Docs updated: - `strategies.md` chooser table adds BitInterp with eltype/regime caveats. - `strategies.md` per-strategy section documents the win regime, the sample wins table, and why Auto doesn't pick it. - `NEWS.md` documents the bench findings and the "opt-in" decision. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

…elocity CI Four 2.0 followup additions: 1. **`queries_sorted` kwarg** on `searchsortedfirst!` / `searchsortedlast!`. Caller-supplied override for the runtime `issorted(queries)` check. `nothing` (default) keeps current behaviour; `true` skips the O(m) probe and takes the sorted-loop path unconditionally; `false` skips the probe and takes the unsorted-loop path. Measured savings: ~6.6 μs on m=4096 batched call on the bench machine (~1.6 ns/q, ~6% headroom on top of Auto's hot path). Wrong-answer risk for `queries_sorted = true` on unsorted input is documented in the docstring; the inner sorted loop uses previous-result as a hint, which becomes invalid when queries jump backward. Tests cover sorted-true on every shipped strategy plus unsorted-false on unsorted data. 2. **`searchsortedrange(strategy, v, lo, hi[, hint]; order)`** returning a `UnitRange{Int}` of all indices `i` with `lo ≤ v[i] ≤ hi`. Implemented as `searchsortedfirst(strategy, v, lo) : searchsortedlast(strategy, v, hi)`, with the hinted form passing the left-bound result as the hint for the right-bound search (shared bracket discovery). Exported. Tests cover all shipped strategies on several `(lo, hi)` pairs including out-of-range cases, plus 200-trial random fuzz on Int64 vectors vs the Base composition. 3. **`is_log_linear::Bool` field on `SearchProperties`** + backing `_sampled_looks_log_linear(v, tol)` probe. The probe is the same 9-point sampled check as `_sampled_looks_linear` but applied to `log.(v)` — accepts geometric / log-spaced data, rejects linear / two-scale. Requires strictly positive finite values throughout the sample. `Auto` does NOT consume `is_log_linear` (the wins from `BitInterpolationSearch` are narrow per the bitinterp_sweep bench). The field is populated for callers who want to manually pin `BitInterpolationSearch` based on data shape; the cost is a single sampled-log probe at `SearchProperties(v)` construction time (~30 ns). 4. **AirspeedVelocity benchmark CI**. Adds: - `.github/workflows/Benchmark.yml` — runs on PRs touching `src/`, `bench/`, `Project.toml`, or the workflow itself. Uses `benchpkg` / `benchpkgtable` from AirspeedVelocity 0.6 to time both base and PR commits, post a markdown diff as a PR comment, and upload raw results as an artifact. - `benchmark/benchmarks.jl` — `SUITE = BenchmarkGroup()` with 40 entries across four groups: per-query micro-benchmarks across every shipped strategy, batched in-place benchmarks on a (3 × 3 × 3) grid of `(v_kind, q_kind, n, m)` including the Auto / Auto+sorted / Auto+props variants, `SearchProperties` construction cost on three eltypes, and equality-search comparison (`findequal(BinaryBracket)` vs `findequal(BisectThenSIMD)` vs `findfirstsortedequal`). 84626 tests pass (+506 from the new test blocks). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

The package source had grown to a single 1900-line file mixing every layer: LLVM IR templates, equality-search primitives, strategy type definitions, the per-strategy dispatch methods, Auto's heuristic decision tree, the batched in-place API, the Guesser correlated-lookup helper, the findequal strategy-framework wrapper, and the PrecompileTools workload. Splitting into one file per logical layer makes the source navigable without changing any behaviour. src/ ├── FindFirstFunctions.jl 39 lines module + exports + includes ├── simd_ir.jl 135 IR template + IR constants + SIMD │ primitives (eq, sgt, sge, ogt, oge) ├── equality.jl 73 findfirstequal + findfirstsortedequal ├── strategies.jl 268 SearchStrategy abstract + concrete │ strategy types + SearchProperties + │ Auto (struct + docstrings) ├── search_properties.jl 120 _sampled_looks_linear / │ _sampled_looks_log_linear / _has_nan │ + populated SearchProperties ctor ├── dispatch.jl 583 Per-strategy searchsortedfirst/last │ methods + bracketstrictlymonotonic* │ + searchsortedfirstexp + _interp_guess │ + _bit_interp_guess_f64 + │ _simdscan_*_specialized ├── auto.jl 168 Auto crossover constants + per-query │ Auto + _estimate_avg_gap + │ _auto_simd_eligible / │ _auto_simd_gap_max / │ _auto_interp_eligible + _auto_pick ├── batched.jl 321 searchsortedfirst!/last! + │ sorted/unsorted inner loops + │ _take_sorted_path + generic │ _batched! + Auto-specialized │ _batched! + searchsortedrange ├── guesser.jl 109 looks_linear + Guesser + GuesserHint │ dispatch (relies on BracketGallop) ├── findequal.jl 91 findequal generic + BisectThenSIMD │ shortcut for DenseVector{Int64} └── precompile.jl 58 PrecompileTools @setup_workload Include order is significant — each file may depend on names from earlier files. The order is documented in `FindFirstFunctions.jl` and is: simd_ir → equality → strategies → search_properties → dispatch → auto → batched → guesser → findequal → precompile. The two non-obvious dependencies are: - `auto.jl` references `_sampled_looks_linear` from `search_properties.jl` (the `_auto_interp_eligible` loose tier). - `batched.jl` references `_estimate_avg_gap`, `_auto_simd_eligible`, `_auto_simd_gap_max`, `_auto_interp_eligible`, `_AUTO_*` constants from `auto.jl` (the Auto-specialized `_batched!` lives in `batched.jl` because it shares the sorted/unsorted-loop infrastructure with the generic `_batched!`). 84626 tests pass; no behavioural change. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

Refactors the hand-rolled sorted-batch lockstep walk in the merged Akima batched evaluator and adds new batch evaluators for `LinearInterpolation` and `CubicSpline` (formerly draft PR #527). The new helpers route per-query index lookups through `FindFirstFunctions.searchsortedlast!` so the algorithm adapts to the n/m ratio automatically. Why this matters: the previous merged Akima batched eval walked the knot index linearly via `while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1`, which is O(gap) per query. For sparse queries on long `t` (e.g. `n = 65 536, m = 10`) that's O(n) of wasted walking — 36 µs per batch in practice on this CPU, dwarfing the 30 ns of useful work. The FFF strategy dispatch ([SciML/FindFirstFunctions.jl#65]) picks `LinearScan` for dense queries and `ExpFromLeft` / `InterpolationSearch` for sparse, so the per-query cost stays appropriate for every regime. Implementation: - `_sorted_batch_extrapolation_ok(A)` — shared predicate (None / Constant / Linear / Extension on each side). - `_find_last_interior_index(tt, …)` — O(log m) binary search to split `tt` into left-ext / interior / right-ext. - `_eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at)` — dispatches based on `32 * n_interior < n`: - dense → inline lockstep walk fused with the eval lambda - sparse → allocate an Int buffer, batched `FindFirstFunctions.searchsortedlast!`, then run the eval on the indexed positions. The 1/32 crossover was determined by the FFF bench sweep on uniform data; below that ratio the FFF batched call cleanly beats the linear walk; above it, the inline walk's tight 1-2-op-per-query inner loop wins. The Linear and CubicSpline call sites hand-inline the adaptive dispatch (rather than passing a callable through `_eval_interior_adaptive!`) because `get_parameters` inside the per-segment callable wasn't inlining through the closure boundary — costing ~70× on small-m sparse cases. Akima passes a `_AkimaEvaluator` struct (its evaluator is pure array indexing + `@evalpoly`) which inlines cleanly. Benchmarks (Julia 1.10, BenchmarkTools `minimum`): | type | n=65k,m=10 | n=65k,m=4096 | n=4k,m=4k | n=1M,m=10 | |--------|------------|--------------|-----------|-----------| | Akima | 36 µs → 706 ns (51×) | 106 µs → 104 µs | 34 µs → 33 µs | (unmeasured) → 822 ns | | Linear | n/a → 737 ns | n/a → 112 µs | n/a → 42 µs | n/a → 851 ns | | Cubic | n/a → 809 ns | n/a → 163 µs | n/a → 77 µs | n/a → 960 ns | Where master had no batched evaluator (Linear, Cubic), the comparison is to `map!(A, out, tt)` (the existing default), which uses the `Guesser`-based correlated search per-query. The new fast path is competitive at the sparse end (the existing per-query path is already fast there via the iguesser hint) and 2–7× faster on dense batches. Drops the previous `any(isnan, A.u)` check from `LinearInterpolation`'s fast-path gate (added defensively in the original sorted-batch code, but it's O(n) and was making the fast path slower than `map!` on sparse cases). Users with NaN in `u` continue to get NaN through normal arithmetic; the original per-point NaN special-case was for ForwardDiff propagation through `t/t` not for user-supplied NaN data. Project.toml: bumps FindFirstFunctions compat from `1.3` to `1.9`. This PR depends on SciML/FindFirstFunctions.jl#65 (introduces the `searchsortedlast!` API). Supersedes draft PRs #527 (Linear + CubicSpline batch evaluators) and partly #528 (the 5 remaining types not yet ported can follow the same pattern in a follow-up). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas added 2 commits May 16, 2026 06:59

ChrisRackauckas-Claude mentioned this pull request May 17, 2026

Use FFF strategy API for sorted-batch evaluators (fixes n ≫ m regression) SciML/DataInterpolations.jl#529

Merged

4 tasks

ChrisRackauckas-Claude mentioned this pull request May 20, 2026

Higher-level batched segment-eval API (follow-up to #65) #66

Closed

ChrisRackauckas added 2 commits May 20, 2026 05:30

ChrisRackauckas added 9 commits May 20, 2026 06:31

ChrisRackauckas marked this pull request as ready for review May 20, 2026 14:39

ChrisRackauckas merged commit 8f78df2 into SciML:main May 20, 2026
15 of 21 checks passed

ChrisRackauckas-Claude mentioned this pull request May 20, 2026

Native reverse-order paths for InterpolationSearch, ExpFromLeft, SIMDLinearScan, BitInterpolationSearch #67

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SearchStrategy dispatch + in-place batched searchsorted!#65

Add SearchStrategy dispatch + in-place batched searchsorted!#65
ChrisRackauckas merged 16 commits into
SciML:mainfrom
ChrisRackauckas-Claude:sorted-search-strategies

ChrisRackauckas-Claude commented May 16, 2026

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChrisRackauckas-Claude commented May 16, 2026

Summary

Why

Backwards compatibility

Auto heuristic

What is in this PR

Test plan

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

1. Audit — does this maintain everything that was there before?

2. More strategies to consider — added one

3. Bench depth — vastly broadened

What the heuristic now does

Caught and fixed a real bug

Artifacts in the PR

Uh oh!

ChrisRackauckas-Claude commented May 17, 2026

Algorithmic changes

Bench changes

Final verdicts

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants