Skip to content

Add SearchStrategy dispatch + in-place batched searchsorted!#65

Merged
ChrisRackauckas merged 16 commits into
SciML:mainfrom
ChrisRackauckas-Claude:sorted-search-strategies
May 20, 2026
Merged

Add SearchStrategy dispatch + in-place batched searchsorted!#65
ChrisRackauckas merged 16 commits into
SciML:mainfrom
ChrisRackauckas-Claude:sorted-search-strategies

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Summary

Reformulates the existing sorted-search infrastructure as explicit strategy types and adds an in-place batched API.

Strategies (all <: SearchStrategy):

  • LinearScan — walk ±1 from the hint
  • BracketGallopbracketstrictlymontonic + bounded binary search (the algorithm already backing searchsorted*correlated)
  • BinaryBracket — plain searchsortedlast / searchsortedfirst, ignores any hint
  • Auto — heuristically picks among the above based on hint validity and length(v)

Dispatched API:

Base.searchsortedlast(strategy, v, x[, hint]; order = Base.Order.Forward)
Base.searchsortedfirst(strategy, v, x[, hint]; order = Base.Order.Forward)

Batched in-place API:

searchsortedlast!(idx_out, v, queries; strategy = Auto(), order = ...)
searchsortedfirst!(idx_out, v, queries; strategy = Auto(), order = ...)

For sorted queries, walks v in lockstep using the previous result as the hint for the next — total cost O(length(v) + length(queries)) under BracketGallop. For unsorted queries, falls back to per-element search with no hint regardless of strategy.

Why

Downstream callers like SciML/DataInterpolations.jl (#525-#528) have been hand-rolling sorted-batch lockstep walks with a manually-maintained idx and a linear while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1 advance. That works well when the queries are dense relative to the knots, but regresses badly (sometimes by 100-1000×) when there are very few queries on a long vector — the linear walk is O(n) regardless of how many queries actually live in that gap.

The right home for the "which strategy to use" heuristic is here, not duplicated in every consumer. With this PR, searchsortedlast! does the right thing in every regime, and downstream code becomes a one-line replacement.

Backwards compatibility

All existing exports keep working — they're now thin wrappers:

  • searchsortedfirstcorrelated(v, x, guess)searchsortedfirst(BracketGallop(), v, x, guess)
  • searchsortedlastcorrelated(v, x, guess)searchsortedlast(BracketGallop(), v, x, guess)
  • searchsortedfirstvec(v, x) allocates and calls searchsortedfirst!
  • searchsortedlastvec(v, x) allocates and calls searchsortedlast!

The Guesser path, Range fast paths, Reverse ordering, and searchsortedfirstexp are untouched.

Version bumped to 1.9.0 (new feature, additive).

Auto heuristic

condition strategy
no hint, or hint ∉ `axes(v)` `BinaryBracket`
hint in range, `length(v) ≤ 32` `LinearScan`
hint in range, `length(v) > 32` `BracketGallop`

The 32-element threshold is chosen because at that length the binary-search bracket-setup overhead is comparable to a worst-case linear walk, and below it LinearScan saves the bracket call entirely.

What is in this PR

  • src/FindFirstFunctions.jl: strategy types + dispatched Base.searchsortedlast / Base.searchsortedfirst methods + searchsortedlast! / searchsortedfirst! + reformulated existing wrappers.
  • test/runtests.jl: two new safetestsets covering strategy equivalence across n ∈ {0, 1, 2, 8, 33, 257}, every hint position, Reverse order, the batched API on sorted/unsorted/empty inputs, DimensionMismatch, and sparse queries on a long vector to exercise the gallop hint path.
  • Project.toml: bump to 1.9.0.

Test plan

  • CI passes
  • Pkg.test() locally: 26814 / 26814 passing on Julia 1.10 (was 25481 before — +1333 new assertions across the two new testsets)
  • Runic-formatted

Please ignore until reviewed by @ChrisRackauckas. The follow-up plan is small PRs to DataInterpolations.jl swapping the hand-rolled walks in #525-#528's merged Akima and the in-flight #527/#528 for searchsortedlast! calls.

🤖 Generated with Claude Code

Reformulates the existing sorted-search infrastructure as explicit strategy
types and adds an in-place batched API:

  - `SearchStrategy` abstract type with concrete subtypes
    `LinearScan`, `BracketGallop`, `BinaryBracket`, and `Auto`.
  - `Base.searchsortedlast(strategy, v, x[, hint])` and the matching
    `searchsortedfirst` dispatch on the strategy:
      * `LinearScan`     — walk ±1 from the hint.
      * `BracketGallop`  — `bracketstrictlymontonic` + bounded binary search.
      * `BinaryBracket`  — plain `searchsortedlast` / `searchsortedfirst`.
      * `Auto`           — picks based on hint validity and `length(v)`:
                           no/invalid hint or `length(v) > 32` → `BracketGallop`,
                           hint in range and `length(v) ≤ 32` → `LinearScan`,
                           no hint → `BinaryBracket`.
  - `searchsortedlast!(idx_out, v, queries; strategy = Auto())` and
    `searchsortedfirst!(...)` — in-place batched variants. For sorted
    `queries` the previous result is used as a hint for the next, so the
    total cost is O(length(v) + length(queries)) under `BracketGallop`.
    Unsorted `queries` fall back to per-element search with no hint.

`searchsortedfirstcorrelated` / `searchsortedlastcorrelated` and
`searchsortedfirstvec` / `searchsortedlastvec` are kept as backwards-
compatible wrappers over the new dispatched API, so no existing caller
breaks. Version bumped to 1.9.0 since this is a backward-compatible
feature addition.

The intent is that downstream callers like SciML/DataInterpolations.jl
that hand-roll a sorted-batch lockstep walk (`while idx < n - 1 && tt[i]
> t[idx + 1]; idx += 1`) can instead call `searchsortedlast!` and get
correct O(log gap) behavior at every n/m ratio, not just dense queries.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…chmarks

Expands the strategy hierarchy from the previous commit and tunes the
`Auto` heuristic against a real benchmark sweep (`bench/strategies.jl`,
results checked in at `bench/results.md`).

New concrete strategies:

  - `ExpFromLeft`: galloping forward from a left-bound hint
    (`hint` must be ≤ the answer; otherwise falls back to a full search).
    Backs `searchsortedfirstexp`-style access, suitable for batched-sorted
    loops where the previous result is a natural lower bound.
  - `InterpolationSearch`: extrapolates the answer index from a linear
    interpolation between `v[1]` and `v[end]`, then refines with a bounded
    binary search. O(1) per query on uniformly-spaced data, O(log n)
    worst case. Numeric eltypes only; falls back to `BinaryBracket` for
    non-numeric `v`.

Refined `Auto`:

  - Per-query path is unchanged in spirit but with a smaller threshold
    (16 elements) before falling back to `LinearScan`.
  - Batched path now picks based on the *actual* expected gap between
    consecutive results, not just `length(v) / length(queries)`. For
    numeric data this uses the span ratio
    `(queries[end] - queries[1]) / (v[end] - v[1])`, which correctly
    detects sorted-burst queries (all clustered inside one segment of
    `v`) and routes them to `LinearScan`. For non-numeric data, the
    length ratio is used unchanged.
  - `m == 1` and `m == 0` are handled by fast paths that skip the
    span/issorted heuristics entirely (saves ~20 ns of bookkeeping that
    used to dominate the wall time on single-element batches).
  - `BracketGallop` is dropped from `Auto`'s batched choices because
    `ExpFromLeft` strictly beats it on sorted forward queries (its
    bidirectional bracketing wastes effort).
  - `InterpolationSearch` is intentionally not part of the `Auto` choice:
    the benchmark shows it wins big on uniformly-spaced data but loses
    catastrophically on `two_scale` (clustered) data, and detecting which
    regime applies cheaply enough to amortize over small `m` isn't
    feasible without an O(n) `looks_linear` check. Users who know their
    data is roughly linear opt in explicitly.

Benchmark sweep (`n × m × spacing × query-pattern × strategy`, 240 cells):

|                                | within 20% of best | > 20% slower |
|--------------------------------|--------------------|--------------|
| Auto **before this commit**    |  89 / 240 (37%)    | 151 / 240    |
| Auto **after this commit**     | 214 / 240 (89%)    |  10 / 240    |

The remaining 10 cells are all sparse queries on uniform/jittered data
where `InterpolationSearch` would dominate but isn't safe to pick
automatically — well-documented trade-off.

Tests: 26929 / 26929 passing on Julia 1.10 (was 26814 before — +115 new
assertions covering ExpFromLeft, InterpolationSearch, and the
batched-Auto heuristic on dense / sparse / dense-burst / m=1 / m=0 /
non-numeric inputs).

Also caught and fixed an `ExpFromLeft` correctness bug: when the hint
was past the actual answer (`x < v[hint]`), the function returned an
index ~`hint - 1` instead of the correct value (could be 0 if x was
below all of v). Added an explicit `x < v[hint] → fall back to full
search` check; covered by the new tests.

`bench/strategies.jl` is reproducible via `julia --project=bench
bench/strategies.jl` (30 samples per cell, ~3 minutes wall time);
`bench/results.md` is the checked-in artifact from the latest run.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Pushed a second commit (163465e) that:

  1. Adds two more strategies to round out coverage:

    • ExpFromLeft — galloping forward from a left-bound hint, equivalent to searchsortedfirstexp but as a dispatchable strategy. Suited to batched-sorted loops where hint = previous_result.
    • InterpolationSearch — linear-extrapolation guess then bounded binary search. O(1) per query on uniformly-spaced numeric data, O(log n) worst case, falls back to BinaryBracket for non-numeric v.
  2. Reworks Auto based on a benchmark sweep (bench/strategies.jl, results checked in at bench/results.md). The matrix covers n × m × spacing × query-pattern × strategy — 5 spacings (uniform, log-spaced, jittered-near-uniform, random, two-scale-clustered) × 4 query patterns (sorted-uniform, sorted-dense-burst, sorted-near-start, unsorted) × 3 n × 4 m = 240 cells.

    Before/after Auto verdicts on the same 240-cell matrix:

    Auto state within 20% of best > 20% slower
    original length(v) ≤ 32 heuristic 89 / 240 (37%) 151 / 240
    bench-tuned (this commit) 214 / 240 (89%) 10 / 240

    Key changes that drove this:

    • Batched-Auto now picks based on the actual expected gap, using the span ratio (queries[end] - queries[1]) / (v[end] - v[1]) for numeric data. This correctly routes dense-burst queries (clustered in one segment of v) to LinearScan even when length(v) / length(queries) would suggest otherwise.
    • BracketGallop is no longer in Auto's batched choice — ExpFromLeft strictly beats it for forward-moving sorted queries (its bidirectional bracketing wastes effort).
    • m == 1 fast path skips the span/sortedness heuristic (saves ~20 ns/call).
    • Strategy dispatch in Auto's sorted loop is now via concrete if/else branches so each leg is type-stable and gets fully specialized.
  3. Surfaced and fixed a real correctness bug in ExpFromLeft: when the hint was past the answer (x < v[hint]), the function returned ~hint - 1 instead of the correct (often-zero) index. Now falls back to a full searchsortedlast/searchsortedfirst in that case. Covered by the new tests.

  4. Why InterpolationSearch is intentionally not in Auto's choice: on uniform/jittered data it wins big (~2× on n=65536, m=10), but on two_scale (clustered) data it's catastrophically slow (4-13× worse than ExpFromLeft at n=65536, m=4096). Detecting which regime applies cheaply enough to amortize over small m would require an O(n) looks_linear check that's net-negative on the cases where it'd help most. Users who know their data is roughly linear can opt in with strategy = InterpolationSearch(). The remaining 10 Auto-worse-than-20% cells in the bench are exactly these cases.

Test count: 26814 → 26929 (+115). Full bench output is in bench/results.md.

Expands the strategy lineup, the bench coverage, and tightens the Auto
heuristic again based on the broader data:

  - **`GuesserHint(::Guesser)` strategy** — wraps the existing
    `Guesser`-based correlated search as a dispatchable strategy so it can
    be passed to the new batched/per-query API alongside the others.

  - **Sampled-linearity probe** — `_sampled_looks_linear` reads 5 fixed
    elements of `v` (first, n/4, n/2, 3n/4, last) and tests whether the
    interior points sit within 5% of the straight line between the
    endpoints. ~12 ns regardless of `length(v)`. Used by Auto to unlock
    `InterpolationSearch` on uniform/jittered data, gated by:
      * `gap >= 64` (sparse enough that InterpolationSearch's
        per-query advantage matters)
      * `length(v) >= 1024` (the linearity probe + InterpSearch overhead
        is amortized)
      * `length(queries) >= 2` (m == 1 has its own fast path)

  - **`m == 1` fast path** — Auto's batched path now short-circuits for
    a single-element batch directly into `Base.searchsortedlast`/`first`,
    saving the issorted + span heuristic cost (~20 ns).

  - **Threshold fix** — gap check is now `>=` `_AUTO_INTERP_MIN_GAP`
    (was `>`), so the boundary case where `gap` equals the threshold
    qualifies for `InterpolationSearch`.

  - **`ExpFromLeft` correctness fix** — previously assumed the hint was
    a valid lower bound, which silently returned wrong indices when
    `x < v[hint]`. Now detects that case and falls back to a full
    `Base.searchsortedlast` / `searchsortedfirst`. Covered by new tests.

  - **`bench/strategies.jl` broadened** — five sub-sweep modes
    (`MODE=fast` (default), `full`, `spacing`, `ratio`, `pattern`,
    `extreme`), 10 knot spacings (added `power2`, `sqrt`, `plateau`,
    `bimodal`, `near_linear` on top of the original 5), 8 query
    patterns (added `sorted_arithmetic`, `sorted_geometric`,
    `sorted_bimodal`, `sorted_repeated` on top of the original 4).

Bench verdict across all sub-sweeps (772 cells total):

  | sub-sweep            | within 20% of best |
  |----------------------|--------------------|
  | spacing variety      | 10 / 10 (100%)     |
  | pattern variety      |  8 /  8 (100%)     |
  | super-dense (m ≫ n)  | 270 / 270 (100%)   |
  | fast sweep           | 213 / 240 (89%)    |
  | super-sparse (m ≪ n) | 156 / 180 (87%)    |
  | n/m crossover        |  40 /  64 (62%)    |
  | **total**            | **697 / 772 (90%)** |

The remaining 6% are mostly `m ≤ 4` sparse queries on uniform data where
the absolute per-call cost is ~100 ns and `InterpolationSearch` would
still win by 20-40 ns — a documented opt-in trade-off, since lowering
the linearity-probe threshold further would risk picking InterpolationSearch
on borderline-non-linear data where it loses by 4-13×.

Tests: 26929 / 26929 still passing. New artifacts in `bench/`:

  - `bench/strategies.jl` — the bench script
  - `bench/results.md` — fast sweep verdict (the default mode)
  - `bench/results_focused.md` — spacing/pattern/ratio/extreme breakdowns

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Pushed e26ac82 answering your three questions in turn:

1. Audit — does this maintain everything that was there before?

Yes. All these existing names keep working unchanged: bracketstrictlymontonic, searchsortedfirstexp, searchsortedfirstcorrelated(v, x, ::Integer), searchsortedlastcorrelated(v, x, ::Integer), the Guesser overloads of both, Range fast paths, searchsortedfirstvec/searchsortedlastvec, looks_linear, Guesser, findfirstequal, findfirstsortedequal. The new strategies wrap the existing algorithms rather than replacing them; legacy callers don't change.

2. More strategies to consider — added one

GuesserHint(::Guesser) — wraps the existing Guesser-based correlated search as a dispatchable strategy so it slots into the new batched/per-query API alongside LinearScan / BracketGallop / ExpFromLeft / InterpolationSearch / BinaryBracket / Auto. The Guesser already decides between linear-extrapolation lookup and using its cached previous index; GuesserHint simply re-exposes that under the strategy dispatch.

I also evaluated and rejected a couple of candidates:

  • InterpolationGallop would essentially duplicate InterpolationSearch (which already uses gallop refinement internally).
  • KAryBinarySearch / ParallelBinarySearch would be substantial implementations whose payoff vs. binary search is dominated by branch-prediction characteristics that vary across CPUs — out of scope.

3. Bench depth — vastly broadened

bench/strategies.jl now has five sub-sweep modes (MODE=fast default, full, spacing, ratio, pattern, extreme), with:

  • 10 knot spacings (added power2, sqrt, plateau, bimodal, near_linear on top of uniform, log, jittered, random, two_scale)
  • 8 query patterns (added sorted_arithmetic, sorted_geometric, sorted_bimodal, sorted_repeated on top of the original 4)
  • n up to 262 144, m up to 16 384, and a dedicated super-sparse / super-dense sub-sweep that takes the ratio to its extremes

Final Auto verdict across 772 cells: 697 within 20% of best (90%), 50 worse (6%), 25 effectively tied.

Per-sub-sweep:

sub-sweep within 20%
spacing variety (10 cells, all 10 spacings) 10/10 (100%)
query patterns (8 cells, all 8 patterns) 8/8 (100%)
super-dense (270 cells, m ≫ n) 270/270 (100%)
fast sweep (240 cells) 213/240 (89%)
super-sparse (180 cells, m ≪ n) 156/180 (87%)
n/m crossover (64 cells, all n × m combos on uniform) 40/64 (62%)

What the heuristic now does

The Auto choice for searchsortedlast!(out, v, queries; strategy = Auto()):

  1. m == 0 → return immediately
  2. m == 1 → bypass everything, dispatch straight to Base.searchsortedlast (saves ~20 ns of bookkeeping)
  3. !issorted(queries) → per-element Base.searchsortedlast with no hint
  4. Compute gap = floor(n × (queries-span / v-span) / m). The span ratio (for numeric data) detects dense_burst queries clustered inside a single segment of v, which would have n/m ≫ 0 but actual_gap ≈ 0.
  5. gap ≤ 4LinearScan
  6. gap ≥ 64, length(v) ≥ 1024, m ≥ 2, and the sampled linearity probe (5 reads of v at first, n/4, n/2, 3n/4, last — total ~12 ns — checks the interior points are within 5% of the straight line) accepts → InterpolationSearch. The probe is what prevents Auto from picking InterpolationSearch on non-linear data, where it loses badly (~4-13× on plateau and two_scale spacings at large m).
  7. Otherwise → ExpFromLeft

Caught and fixed a real bug

ExpFromLeft previously assumed hint ≤ answer. When x < v[hint] (i.e., the hint was past the answer), it returned ~hint - 1 — wrong, sometimes wildly so. Now detects that case and falls back to a full Base.searchsortedlast / searchsortedfirst. Covered by the new tests.

Artifacts in the PR

  • bench/strategies.jl — the bench harness, runnable via julia --project=bench bench/strategies.jl (or with one of the MODE=… env vars for the focused sweeps).
  • bench/results.md — captured fast-sweep results (240 cells).
  • bench/results_focused.md — captured spacing / pattern / ratio / extreme sub-sweep results (~500 cells).

Both are reproducible; they're the actual output from this commit run on Julia 1.10.

Tests: 26929 / 26929 passing locally. CI still waiting for the same first-time-contributor approval.

Two algorithmic refinements driven by the broader benchmark sweep:

  - **Skew detection in `_estimate_avg_gap`** — the span-based gap
    estimator now returns `(gap, skewed)` where `skewed` is true when
    the median query sits more than 20 % of the query span away from
    the linear midpoint. Gated on `m ≥ 10` so small-batch statistical
    noise doesn't trip it. For sorted_near_start / sorted_geometric /
    sorted_bimodal patterns (most queries clustered in part of `v`'s
    range), the unmodified span heuristic massively underestimated the
    effective gap and routed Auto to `LinearScan` when `ExpFromLeft`
    was best (1.8-3.1× slowdown on those cells before).

  - **`InterpolationSearch` gated on `!skewed`** — even on perfectly
    uniform `v`, when queries are clustered, `ExpFromLeft` from
    `prev_idx` beats `InterpolationSearch` because each consecutive
    query's true index is close to the previous one's. The skew flag
    now prevents Auto from gambling on InterpSearch in that regime.

  - **Tightened sampled-linearity probe** from 5 probes / 5 % tolerance
    to 9 probes (k·n/10 for k = 1..9) / 0.1 % tolerance. This reliably
    rejects sorted-random data — which still looks linear-ish to a
    loose probe at large n — without rejecting any genuinely uniform
    or jittered grid. Probe cost rises from ~12 ns to ~25 ns and now
    catches the case where Auto used to pick InterpSearch on random
    data and lose 50-150 %.

  - **`AbstractRange` short-circuit** — `_sampled_looks_linear(::AbstractRange)`
    returns true without sampling. Ranges are definitionally uniform.

Bench expanded so the heuristic can be vetted in detail:

  - Five **tuning sub-sweeps** (`MODE=tune`) that locate the actual
    strategy crossover points empirically — LinearScan↔ExpFromLeft
    crosses at gap ≈ 5; ExpFromLeft↔InterpSearch crosses around gap 8
    *on uniform data*; minimum useful n for InterpSearch is ~2 048;
    minimum useful m is ~2; InterpSearch's downside on non-linear data
    bottoms out at 1.5× (random) and tops out at 14× (plateau).
  - A **3 920-cell validation sweep** (`MODE=validate`) — every
    combination of 10 spacings × 8 query patterns × 7 `n` × 7 `m`.
  - Two more knot spacings (`near_linear`, `power2`) and pattern
    variety (`sorted_arithmetic`, `sorted_geometric`, `sorted_bimodal`,
    `sorted_repeated`) on top of the v9 set.

Bench verdict over **4 692 cells across all sub-sweeps**:

  | sub-sweep                | within 20 % | regression worst case |
  |--------------------------|-------------|-----------------------|
  | spacing (10)             | 10 / 10     | none                  |
  | pattern (8)              |  7 /  8     | 1.51× at m=512        |
  | super-dense (270)        | 263 / 270   | 1.21× borderline      |
  | fast (240)               | 221 / 240   | 1.59× at m=10         |
  | super-sparse (180)       | 147 / 180   | 2.00× at m=4          |
  | n/m crossover (64)       | 51 / 64     | 1.62× at m=4096       |
  | **validate (3 920)**     | **3 289 / 3 920 (84 %)** | **2.64× (1 cell)** |
  | **aggregate (4 692)**    | **3 988 (85 %)** | **only 1 cell exceeds 2 ×** |

Tests: 26 929 / 26 929 still passing. Artifacts:

  - `bench/strategies.jl` — bench harness with 6 modes
  - `bench/results.md` — fast sweep
  - `bench/results_focused.md` — spacing / pattern / ratio / extreme
  - `bench/results_validate.md` — the 3 920-cell sweep
  - `bench/results_tune.md` — crossover measurements

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Pushed 65ea6c0. Much more thorough vetting + two algorithmic refinements:

Algorithmic changes

1. Skew detection. Tuning sweeps surfaced a previously-hidden regression: on sorted_near_start / sorted_geometric / sorted_bimodal patterns (most queries clustered in part of v's range), the span-based gap estimator underestimated the effective gap, and Auto routed to LinearScan when ExpFromLeft was best (1.8 – 3.1× slowdown).

_estimate_avg_gap now returns (gap, skewed), where skewed = true when the median query sits more than 20 % of the query span away from the linear midpoint. Gated on m ≥ 10 so small-batch statistical noise doesn't trip it.

2. InterpolationSearch gated on !skewed. Even on perfectly uniform v, when queries are clustered, ExpFromLeft from prev_idx beats InterpolationSearch because each consecutive query's true index is close to the previous one's. The skew flag now prevents Auto from gambling on InterpSearch in that regime.

3. Tightened linearity probe. From 5 probes / 5 % tolerance to 9 probes (k·n/10 for k = 1..9) / 0.1 % tolerance. Reliably rejects sorted-random data — which still looked linear-ish to the looser probe at large n — without rejecting any genuinely uniform or jittered grid. Probe cost is ~25 ns (was ~12 ns). This was the fix for Auto being 50 – 150 % slower than ExpFromLeft on random sorted data after InterpolationSearch was added.

4. AbstractRange short-circuit. _sampled_looks_linear(::AbstractRange) returns true without sampling; ranges are definitionally uniform.

Bench changes

  • Five tuning sub-sweeps (MODE=tune) that empirically locate the strategy crossovers — LinearScan ↔ ExpFromLeft crosses at gap ≈ 5; ExpFromLeft ↔ InterpSearch crosses near gap 8 on uniform data; minimum useful n for InterpSearch is ~2048; minimum useful m is 2; InterpSearch's downside on non-linear data ranges from 1.5× (random) to 14× (plateau).
  • A 3 920-cell validation sweep (MODE=validate) — every combination of 10 spacings × 8 query patterns × 7 n × 7 m.
  • Two more knot spacings (near_linear, power2, sqrt) and pattern variety (sorted_arithmetic, sorted_geometric, sorted_bimodal, sorted_repeated).

Final verdicts

Across 4 692 cells in all sub-sweeps:

sub-sweep within 20 % worst case
spacing variety (10 cells) 10/10 (100 %) none
pattern variety (8 cells) 7/8 (88 %) 1.51× at m=512
super-dense, m ≫ n (270 cells) 263/270 (97 %) 1.21× borderline
fast sweep (240 cells) 221/240 (92 %) 1.59× at m=10
super-sparse (180 cells) 147/180 (82 %) 2.00× at m=4
n/m crossover (64 cells) 51/64 (80 %) 1.62× at m=4096
validation (3 920 cells) 3 289/3 920 (84 %) 2.64× (1 cell)
aggregate (4 692 cells) 3 988 (85 %) only 1 cell > 2×

The single cell exceeding 2× has a 180 ns absolute difference. The residual ~15 % outside-20 % cells are mostly m = 4 regimes where Auto's bookkeeping cost (issorted check, gap calc, linearity probe — totals ~30 – 50 ns) is non-negligible relative to the 100 – 300 ns total per-batch budget, and where direct InterpolationSearch would have saved another ~30 – 50 ns. Closing that gap would require type-specializing the dispatch further (e.g. via Val parameters), which I don't think is worth the added complexity.

I tried bumping _AUTO_INTERP_MIN_M from 2 to 10 to see if that helped (theory: skip InterpSearch's overhead on small batches). It was strictly worse (3 224 vs 3 289 cells within 20 %) — Auto's m = 2..9 wins on uniform sparse data are real and worth keeping.

Tests: 26 929 / 26 929 passing. CI awaiting maintainer approval as usual.

Three small CI fixes from the FFF SciML#65 run:

  - `bench/strategies.jl`: rename loop variable `strat` to `strategy` so
    the typo check stops flagging it as `start` / `strata`.
  - `Project.toml`: add `StableRNGs = "1"` to `[compat]` — Aqua's
    `test_deps_compat` requires every test-extras dep to have a compat
    bound.
  - `docs/src/index.md`: add `@docs` entries for the new public symbols
    (`searchsortedlast!`, `searchsortedfirst!`, plus the strategy types
    `SearchStrategy` / `LinearScan` / `BracketGallop` / `ExpFromLeft` /
    `InterpolationSearch` / `BinaryBracket` / `GuesserHint` / `Auto`) so
    Documenter's cross-reference resolver finds them and the
    `missing_docs` check passes.

Tests: 26929 / 26929 still passing locally.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The 1.9 release shipped the new strategy-dispatched API alongside the
existing single-purpose names as thin wrappers. Cleaner long-term shape
is a single canonical API, so 2.0 removes the legacy surface:

  - **Removed**: `searchsortedfirstcorrelated`, `searchsortedlastcorrelated`
    (both `Integer` and `Guesser` overloads).
  - **Removed**: `searchsortedfirstvec`, `searchsortedlastvec` (allocating
    batched variants — callers own their output buffer and use the in-place
    `searchsortedfirst!` / `searchsortedlast!`).
  - **Made internal**: `searchsortedfirstexp` and `bracketstrictlymontonic`.
    They're still in the module as implementation helpers for `ExpFromLeft`
    and `BracketGallop` respectively, but no longer documented or exported.

  - **Kept (documented public)**: `searchsortedfirst!` / `searchsortedlast!`,
    `SearchStrategy` and all concrete strategies (`LinearScan`,
    `BracketGallop`, `ExpFromLeft`, `InterpolationSearch`, `BinaryBracket`,
    `GuesserHint`, `Auto`), `Guesser`, `looks_linear`, `findfirstequal`,
    `findfirstsortedequal`. The hint provider stays because it's
    orthogonal to the dispatch; the equality-search functions stay because
    they're a different problem.

Migration for downstream callers:

  | old                                                                | 2.0                                                            |
  |--------------------------------------------------------------------|-----------------------------------------------------------------|
  | `searchsortedfirstcorrelated(v, x, guess::Integer)`                | `searchsortedfirst(BracketGallop(), v, x, guess)`               |
  | `searchsortedlastcorrelated(v, x, guess::Integer)`                 | `searchsortedlast(BracketGallop(), v, x, guess)`                |
  | `searchsortedfirstcorrelated(v, x, ::Guesser)`                     | `searchsortedfirst(GuesserHint(g), v, x)`                       |
  | `searchsortedlastcorrelated(v, x, ::Guesser)`                      | `searchsortedlast(GuesserHint(g), v, x)`                        |
  | `searchsortedfirstvec(v, x)`                                       | `searchsortedfirst!(buf, v, x)` (caller-owned `buf`)            |
  | `searchsortedlastvec(v, x)`                                        | `searchsortedlast!(buf, v, x)`                                  |
  | `searchsortedfirstexp(v, x, lo, hi)`                               | `searchsortedfirst(ExpFromLeft(), v, x, lo)`                    |

Tests rewritten to exercise the strategy API directly. Docs index updated to
present the strategy API as canonical, with `Guesser` / `looks_linear` /
equality-search as supporting helpers.

`Project.toml` bumped to `2.0.0`.

Downstream callers in SciML that import the removed names will need
companion PRs — see SciML/DataInterpolations.jl#529 for the first one.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Two changes bundled because they touch the same area of the codebase:

1. **`SIMDLinearScan` search strategy.** New `SearchStrategy` subtype that
   lowers the forward `LinearScan` walk to 8-wide SIMD chunks via custom
   LLVM IR. Specialised for `DenseVector{Int64}` and `DenseVector{Float64}`
   through two ordered-compare predicates each (`sgt`/`sge` for Int64,
   `ogt`/`oge` for Float64), generated from a shared template
   `_simd_scan_ir(t, pred)` modelled on the existing `FFE_IR` equality
   scan. Any other element type falls back to scalar `LinearScan` (dispatch
   is static, no runtime branch). Non-`Forward` orderings and the no-hint
   path also fall back. Backward walks stay scalar — the SIMD primitive is
   forward-only.

   `SIMDLinearScan` is opt-in: `Auto` does not pick it, because the regime
   where it beats `LinearScan` (long forward walks) overlaps with where
   `Auto` already prefers `BracketGallop` or `ExpFromLeft`. Pin it when you
   have a workload that wants a long forward scan and you know the eltype.

   Caveats documented in `strategies.md`:
   - `Int64` / `Float64` only; everything else uses scalar `LinearScan`.
   - NaN in a `Float64` vector compares false under `fcmp o*`, so NaNs are
     silently skipped by the SIMD scan. Sorted vectors with NaN aren't
     well-defined under any total order anyway.
   - Forward ordering only; reverse falls back to scalar.

   Tests (`test/runtests.jl`):
   - 10000 random fuzz comparisons against `Base.searchsortedlast` /
     `Base.searchsortedfirst` per eltype (Int64, Float64) under `StableRNG`.
   - Edge cases: empty / single-element / duplicates / out-of-range hints
     / x outside the vector range.
   - Fallback paths: Int32, Float32, String, no-hint, reverse order.

   Precompile workload extended to cover the new strategy on both
   specialised eltypes.

2. **Docs restructured into four topical pages** plus a slimmed landing
   page:
   - `interface.md`: public API surface, strategy contract, how to add a
     new strategy (with correctness-check pattern), note that `Auto` is
     not externally extensible.
   - `strategies.md`: catalog of every shipped strategy with a chooser
     table (best case / worst case / hint usage), per-strategy notes.
     `SIMDLinearScan` lives here with its caveats. The equality-search
     functions (`findfirstequal`, `findfirstsortedequal`) are linked
     from the bottom of the page, separated because their return
     semantics (`nothing` on miss) don't match the positional-search API.
   - `guessers.md`: `Guesser` + `GuesserHint` API, usage patterns
     (interpolation-segment lookup), and explicit list of when *not* to
     use a `Guesser`.
   - `auto.md`: full `Auto` decision tree for both per-query and batched
     callers, every crossover constant, justification per branch, and a
     self-contained benchmark script that reproduces the regime-grid
     comparison.

   `docs/make.jl` updated to register the new pages. `index.md` cut down
   to overview + per-section links.

All 47088 tests pass (20000 new fuzz cases for `SIMDLinearScan` plus the
existing 27013).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
ChrisRackauckas-Claude pushed a commit to ChrisRackauckas-Claude/DataInterpolations.jl that referenced this pull request May 20, 2026
FFF 2.0 (SciML/FindFirstFunctions.jl#65) drops the legacy single-purpose
names (`searchsortedfirstcorrelated`, `searchsortedlastcorrelated`, and
the allocating `*vec` batched forms) in favour of a single strategy-
dispatched API. This commit migrates DataInterpolations off the legacy
surface.

  - `src/DataInterpolations.jl`: import `FindFirstFunctions` plus `Guesser`
    only; drop the legacy correlated-lookup name imports.
  - `src/interpolation_utils.jl`: split `get_idx` into two methods. The
    `iguess::Integer` overload routes through
    `searchsortedfirst(BracketGallop(), tvec, t, iguess)` /
    `searchsortedlast(...)`. The `iguess::Guesser` overload routes through
    `searchsortedfirst(GuesserHint(iguess), tvec, t)`. Both forms preserve
    the existing `lb`/`ub_shift`/`idx_shift`/`side` keyword contract.
  - `test/derivative_tests.jl`, `test/interpolation_tests.jl`: cached-index
    tests now use `searchsortedfirst(GuesserHint(A.iguesser), A.t, t)` and
    `searchsortedfirst(BracketGallop(), func.t, _t, func.iguesser(_t))`.
  - `src/interpolation_methods.jl`: trim explanatory/update prose comments
    that grew during the batch-evaluator work. Section headers and
    structural `else  # Extension` markers retained; per-task narrative
    comments removed.
  - `Project.toml`: bump `FindFirstFunctions` compat to `2`.

All 57879 DI tests pass locally on Julia 1.11:

  Core       | 2539 pass, 5 broken (pre-existing)
  Methods    | 42151 pass
  Extensions | 13178 pass
  Misc       |    11 pass

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Callers issuing many short batches against the same sorted vector currently
pay Auto's per-call probes — `_sampled_looks_linear(v)` reads 11 elements
each time, ~25 ns. Caching the probe once is the obvious save for
interpolation-segment lookups and similar repeated-vector workloads.

  - `SearchProperties` is an isbits struct holding `(has_props, is_linear,
    has_nan)`. Default constructor returns the "no information" sentinel
    (`has_props = false`); `SearchProperties(v)` runs the probes and
    populates the fields.
  - `Auto` gains a `props::SearchProperties` field. `Auto()` keeps current
    behaviour (sentinel props → probe per call); `Auto(SearchProperties(v))`
    skips the probe in the batched path. Single concrete struct, not
    parametric — keeps call sites type-stable without specialization
    explosion.
  - The batched dispatch reads `s.props.is_linear` instead of probing when
    `has_props` is set. Other fields populated for forward compatibility
    but not consumed yet — `has_nan` is the obvious next consumer when
    `SIMDLinearScan` gets wired into Auto's decision tree.

Trust-the-caller contract: stale or lying `SearchProperties` is
correctness-preserving — `InterpolationSearch` from a bad guess falls
through to `BracketGallop` (slow but O(log n)). The caller is responsible
for re-running `SearchProperties(v)` if `v` mutates.

Tests cover sentinel construction, populated construction on linear/
non-linear/NaN-containing/integer vectors, output equivalence between
`Auto()` and `Auto(props)`, the `isbits` guarantee, and a deliberately-
lying cache that still produces correct results.

47100 tests pass (was 47088; 12 new cases).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Single-file changelog covering the 1.x → 2.0 transition. Sections:

  - Removed names (six legacy functions) with a one-to-one migration
    table to the new strategy dispatch.
  - Made-internal helpers (`searchsortedfirstexp`, `bracketstrictlymontonic`)
    and how to reach the same algorithm via the strategy API.
  - The new strategy-dispatched search API (seven shipped strategies +
    `Auto`).
  - The new batched in-place API (`searchsortedfirst!` / `searchsortedlast!`).
  - The `Auto` heuristic, the crossover constants, and how they were
    empirically derived from the regime-grid sweep.
  - `SearchProperties` cache integration with `Auto`.
  - `SIMDLinearScan` with the eltype / NaN / order caveats.
  - Documentation restructure into five topical pages.
  - Internal: shared `_simd_scan_ir` template now backs both the
    equality scan and the positional-search compares.
  - Unchanged: equality search (`findfirstequal`,
    `findfirstsortedequal`) — deliberately separate API.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Equality search now flows through the strategy framework via a new wrapper
`findequal(strategy, v, x[, hint])` that returns an `Int` with the
sentinel `firstindex(v) - 1` for "not found". Type-stable, composes with
every existing strategy. The new `BisectThenSIMD <: SearchStrategy`
short-circuits to `findfirstsortedequal`'s algorithm on `DenseVector{Int64}`.
For positional dispatch (`searchsortedfirst`/`searchsortedlast`),
`BisectThenSIMD` delegates to `BinaryBracket` — bisect-then-equality-scan
can't answer "where would x insert?".

The new tests exposed four pre-existing duplicate-handling bugs, all
fixed in this commit:

  - `searchsortedfirst(BracketGallop(), v, x, hint)` returned the
    rightmost duplicate when the hint landed in a run of equals. Cause:
    `bracketstrictlymontonic` used `x < v[lo]` to choose direction,
    which is correct for `searchsortedlast` but wrong for
    `searchsortedfirst`. Fix: add `bracketstrictlymontonic_first` with
    the inverted polarity `v[lo] < x`, used only by the
    `searchsortedfirst` overload.
  - `searchsortedfirst(InterpolationSearch(), v, x)` propagated the
    same bug via `BracketGallop`. Fixed by the bracket-first fix.
  - `searchsortedfirst(ExpFromLeft(), v, x, hint)` exponential-searched
    forward when `v[hint] == x`, missing earlier duplicates. Fixed by
    falling back to a full search whenever `v[hint] >= x` (the
    exponential-forward path is only safe when `v[hint] < x`).
  - `findfirstsortedequal(var, vars)` bisected with `vars[mid] <= var`
    and advanced offset on equal hits, walking past the first
    duplicate. Fixed by tightening to strict `<` and adjusting the
    window-shrink rule so the "keep" branch retains the midpoint in
    the next window (`len = half + 1` instead of `len -= half`). The
    fast-path LLVM IR branch select is replaced by plain `ifelse`;
    Julia compiles to the same branchless `select` modulo the
    `!unpredictable` metadata.

Tests: 84120 pass (was 47100). The new `findequal + BisectThenSIMD`
testset runs 2500 random-trial parity checks against `Base` across all
shipped strategies on Int64 and Float64 vectors (including frequent
duplicates), the OffsetArray-style sentinel, reverse ordering, empty
and single-element edge cases, and a check that `BisectThenSIMD` in
positional dispatch correctly delegates to `BinaryBracket`.

Docs:
  - `strategies.md` gains a "Equality search through the strategy
    framework" section with the `findequal` and `BisectThenSIMD`
    @docs entries.
  - The Equality-search appendix now points callers to `findequal` as
    the preferred sorted-equality API and explains why
    `findfirstequal` (unsorted) stays outside the strategy framework.
  - NEWS.md adds the new-functionality section and the four-way
    bug-fix section with concrete before/after behaviour.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…ty page

Six small cleanups grouped under one commit:

1. **Typo**: rename `bracketstrictlymontonic` to `bracketstrictlymonotonic`
   (and the `_first` companion). Internal helpers — no downstream impact.

2. **FFE_IR unification**: replace the ~60-line literal `FFE_IR` string
   with `_simd_scan_ir("i64", "eq")`, using the same SIMD-scan IR
   template that backs the four `>`/`>=` predicates for `SIMDLinearScan`.
   All five SIMD primitives now share a single IR source. The new
   `_simd_scan_ir` is now defined at the top of the module so `FFE_IR`
   can refer to it; the duplicate copy further down has been removed.

3. **Docstring refresh**:
   - `findfirstequal` now documents the unsorted-input contract, the
     Int64 SIMD specialization vs. the generic `findfirst(==(x), v)`
     fallback, and points at `findequal` / `findfirstsortedequal` for
     the sorted-equality alternatives.
   - `findfirstsortedequal` now points at `findequal(BisectThenSIMD(), …)`
     as the preferred sentinel-returning strategy-framework variant and
     explains its role as the dedicated `Union{Int64, Nothing}`-returning
     name.

4. **OffsetArray sentinel note**: `findequal` docstring spells out that
   the "not found" sentinel is `firstindex(v) - 1` (= `0` for 1-based
   vectors, adjusts for OffsetArrays), and instructs callers to test
   against `firstindex(v)` rather than the literal `0`.

5. **`SearchProperties(v; linear_tolerance = 1e-3)`** exposes the
   sampled-linearity probe's tolerance as a kwarg, symmetric to
   `Guesser`'s `looks_linear_threshold`. The internal
   `_sampled_looks_linear` is now `(v, tol::Float64)` with `tol`
   defaulting to `_AUTO_LINEAR_REL_TOLERANCE` (= 1e-3) for callers that
   don't pass one. Loosening to `1e-2` widens the regime where `Auto`
   picks `InterpolationSearch`; tightening to `1e-4` is more
   conservative.

6. **Exports**: 2.0 now exports the public API surface
   (`SearchStrategy`, every concrete strategy, `SearchProperties`,
   `Guesser`, `looks_linear`, `searchsortedfirst!`, `searchsortedlast!`,
   `findequal`, `findfirstequal`, `findfirstsortedequal`). Previously
   the package exported nothing, so callers had to fully qualify every
   reference (`FindFirstFunctions.LinearScan()`). `using FindFirstFunctions`
   is now sufficient to access the full API; previously qualified call
   sites continue to work.

**New `equality.md` doc page**: the `findfirstequal` /
`findfirstsortedequal` docs lived as an appendix on `strategies.md`,
which was wrong because these functions don't match the strategy-dispatch
contract (different return type, different question). Moved to their own
page with a chooser table, the sentinel-vs.-Nothing discussion, and the
shared SIMD-primitive notes. `strategies.md` keeps a one-paragraph stub
pointing at the new page; `index.md` and `make.jl` updated to register
and surface the new page in the nav.

84120 tests pass (unchanged — these are all internal renames or doc
changes).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Comprehensive rebenchmark of `Auto`'s batched decision tree against the
shipped strategies (`LinearScan`, `SIMDLinearScan`, `BracketGallop`,
`ExpFromLeft`, `InterpolationSearch`) on a 1080-cell sweep covering
5 `v` patterns × 4 query patterns × 5 `n` sizes × 6 `m` sizes × 2 element
types (`Int64`, `Float64`).

Baseline (`Auto` before this commit):  median 1.18x, p95 2.09x, max 2.78x
This commit:                            median 1.04x, p95 1.38x, max 2.18x
With `Auto(SearchProperties(v))`:       median 1.03x, p95 1.38x, max 2.09x

New decision-tree branches:

  - **`SIMDLinearScan` in the medium-gap regime.** When `v` is
    `DenseVector{Int64}` or `DenseVector{Float64}` (NaN-free for Float64),
    `Auto` now dispatches to `SIMDLinearScan` for `gap ∈ (4, _auto_simd_gap_max(v)]`.
    `_auto_simd_gap_max` is 64 for both eltypes — the eltype-specific
    accessor leaves room to lower the Int64 ceiling later without changing
    the Float64 path. For Float64 the NaN check consults
    `SearchProperties.has_nan` when a populated cache is attached;
    otherwise no-NaN is assumed (same trust contract Base uses for
    sortedness). The bench shows `SIMDLinearScan` winning 25% of cells
    overall, with median 1.94× speedup over `LinearScan` in the cells it
    wins.

  - **`BracketGallop` preferred over `ExpFromLeft` at large gaps.** New
    constant `_AUTO_GALLOP_GAP_MIN = 16`. At gap ≥ 16, `ExpFromLeft`'s 5
    up-front linear probes are guaranteed to miss (the answer is much
    more than 5 elements past the previous-result hint), and
    `BracketGallop`'s doubling-from-`hint` walk is strictly faster.

  - **Tiered linearity probe for `InterpolationSearch`.** The strict
    `_AUTO_LINEAR_REL_TOLERANCE = 1e-3` gates the
    `_AUTO_INTERP_MIN_GAP ≤ gap < _AUTO_INTERP_LOOSE_GAP` range (8 to 256).
    For `gap ≥ _AUTO_INTERP_LOOSE_GAP` (256), the loose
    `_AUTO_LINEAR_LOOSE_TOLERANCE = 5e-2` applies — `InterpolationSearch`'s
    cache benefit compensates for a worse guess at large gaps, and the
    bounded binary-search refinement is still O(log n). This unlocks
    `InterpolationSearch` on approximately linear data like sorted random
    or jittered vectors at large `n`, while still rejecting genuinely
    nonlinear data (log-spaced, two-scale) where `InterpolationSearch`
    loses 2–3×.

Bug fix in `_estimate_avg_gap`: the previous code fell back to `n ÷ m`
whenever the skew flag was set, which caused `SIMDLinearScan` to be
picked for tightly-clustered queries (`span_q ≈ 0`) where `LinearScan`'s
scalar walk is 5× faster. The skew flag now serves its intended purpose
as a binary `InterpolationSearch`-unsuitability signal, while the gap
value is always the span-based estimate.

Reproducibility:

  - `bench/auto_sweep.jl`: the full sweep. `julia --project=bench
    bench/auto_sweep.jl` writes `bench/results.csv` and prints a summary
    plus the worst 30 Auto-slack cells.
  - `bench/analyze.jl`: reads `results.csv` and prints `SIMDLinearScan`
    win-by-regime tables, the n/m gap-bucket distribution, and `Auto`'s
    misprediction histogram. Useful for tuning future constants.
  - `bench/.gitignore`: keeps `results.csv` and `Manifest.toml` out of
    version control.

Docs:

  - `auto.md` rewrites the decision-tree section, replaces the
    crossover-constants table (now includes `_auto_simd_gap_max`,
    `_AUTO_GALLOP_GAP_MIN`, `_AUTO_INTERP_LOOSE_GAP`,
    `_AUTO_LINEAR_LOOSE_TOLERANCE`), updates the headline-results table
    with the 1080-cell sweep numbers, and links to the actual bench
    script. The inline script is kept as a minimum-viable standalone
    reproducer.
  - `NEWS.md` adds the retuning section with before/after numbers, the
    new branches, and the bug-fix description.

All 84120 tests pass (no behavioural test changes — the retuning is
internal to Auto's branch selection).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Investigated a `BitInterpolationSearch` strategy that would reinterpret
positive `DenseVector{Float64}` as `DenseVector{UInt64}` and run linear
extrapolation on the IEEE bit pattern. The hypothesis was that this would
beat `InterpolationSearch` on log-spaced data, since IEEE bit patterns are
approximately linear in array index for geometric sequences.

Implemented, correctness-verified, and benchmarked across `m ∈ {4..4096}`
on `n = 65536` log-spaced Float64 with sparse queries. Result: the
strategy loses to every existing alternative at every `m` (65–88 ns/q vs
19–41 ns/q for `ExpFromLeft`, 16–199 ns/q for `SIMDLinearScan`). The
per-query division and Float64-from-UInt64 conversion cost more than the
saved bracket refinement; on non-log-spaced data the bit-pattern guess is
also strictly worse than the float-domain guess because the
denormal-to-normalized transition introduces a discontinuity.

`Auto` already serves log-spaced data well via `SIMDLinearScan` at medium
gaps and `BracketGallop` / `ExpFromLeft` at large gaps. No new dispatch
path was justified.

NEWS.md adds an "Investigated but rejected" section documenting the
exploration so future contributors don't re-derive the same negative
result.

84120 tests pass; no behavioural change.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Reinstating BitInterpolationSearch after a more comprehensive bench than
my earlier sweep — that earlier conclusion ("never wins") was based on a
too-narrow regime grid (single v pattern, single q pattern). The new
sweep at `bench/bitinterp_sweep.jl` covers 1404 cells across 9 v patterns
(uniform, logspaced, logspaced_wide spanning 10⁻³ to 10¹⁵,
geometric_dense, geometric_sparse, power2, sqrt, two_decade, jittered_log)
× 4 q patterns (linear_uniform, log_uniform, dense_grid, log_grid) × 6 n
sizes up to 2²⁰ × 7 m sizes from 4 to 16384.

Results:
  - BitInterp wins outright in 59/1404 cells (4.2%)
  - Within 10% of per-cell best in 75 cells (5.3%)
  - Within 20% of per-cell best in 105 cells (7.5%)

Wins concentrate in the regime BitInterp is designed for:
log-spaced/geometric Float64 data at small m (= 4) and large n (≥ 16384).
Margins range from 1.0× tie to 1.43× on
`logspaced_wide log_grid n=2²⁰ m=4`. Other wide-spread wins:

  logspaced_wide log_grid n=2¹²  m=4:  35.0 vs ExpFromLeft 47.5 (1.36×)
  logspaced_wide log_grid n=2¹⁸  m=4:  47.5 vs ExpFromLeft 62.5 (1.32×)
  logspaced_wide dense_grid n=2²⁰ m=4: 50.0 vs ExpFromLeft 65.0 (1.30×)

`Auto` does NOT pick BitInterp. The wins are narrow (4% of cells in a
bench specifically designed to probe BitInterp's regime), and adding the
eligibility check + sampled-log-linearity probe to Auto's hot path would
penalize the much larger set of non-log-spaced workloads. Users with
known log-spaced workloads can pin
`searchsortedlast!(out, v, queries; strategy = BitInterpolationSearch())`
once and get the win at zero heuristic cost.

Implementation:
  - `_bit_interp_guess_f64(v, x, lo, hi)` does per-element bitcast rather
    than reinterpret(UInt64, v) on the whole vector, avoiding the
    ReinterpretArray wrapper overhead.
  - Strict guard on positive + finite v[lo] and x; non-positive or
    non-finite Float64 bit patterns are not monotonic with float value
    under raw reinterpret, so the strategy falls back to BinaryBracket.
  - Non-Float64 dense eltypes fall back to plain InterpolationSearch
    (where the bit pattern equals the value).

Tests pass (84120). Docs updated:
  - `strategies.md` chooser table adds BitInterp with eltype/regime caveats.
  - `strategies.md` per-strategy section documents the win regime, the
    sample wins table, and why Auto doesn't pick it.
  - `NEWS.md` documents the bench findings and the "opt-in" decision.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…elocity CI

Four 2.0 followup additions:

1. **`queries_sorted` kwarg** on `searchsortedfirst!` / `searchsortedlast!`.
   Caller-supplied override for the runtime `issorted(queries)` check.
   `nothing` (default) keeps current behaviour; `true` skips the O(m)
   probe and takes the sorted-loop path unconditionally; `false` skips
   the probe and takes the unsorted-loop path. Measured savings:
   ~6.6 μs on m=4096 batched call on the bench machine (~1.6 ns/q,
   ~6% headroom on top of Auto's hot path).

   Wrong-answer risk for `queries_sorted = true` on unsorted input is
   documented in the docstring; the inner sorted loop uses
   previous-result as a hint, which becomes invalid when queries jump
   backward. Tests cover sorted-true on every shipped strategy plus
   unsorted-false on unsorted data.

2. **`searchsortedrange(strategy, v, lo, hi[, hint]; order)`** returning
   a `UnitRange{Int}` of all indices `i` with `lo ≤ v[i] ≤ hi`.
   Implemented as `searchsortedfirst(strategy, v, lo) : searchsortedlast(strategy, v, hi)`,
   with the hinted form passing the left-bound result as the hint for
   the right-bound search (shared bracket discovery). Exported.

   Tests cover all shipped strategies on several `(lo, hi)` pairs
   including out-of-range cases, plus 200-trial random fuzz on Int64
   vectors vs the Base composition.

3. **`is_log_linear::Bool` field on `SearchProperties`** + backing
   `_sampled_looks_log_linear(v, tol)` probe. The probe is the same
   9-point sampled check as `_sampled_looks_linear` but applied to
   `log.(v)` — accepts geometric / log-spaced data, rejects linear /
   two-scale. Requires strictly positive finite values throughout the
   sample.

   `Auto` does NOT consume `is_log_linear` (the wins from
   `BitInterpolationSearch` are narrow per the bitinterp_sweep bench).
   The field is populated for callers who want to manually pin
   `BitInterpolationSearch` based on data shape; the cost is a single
   sampled-log probe at `SearchProperties(v)` construction time
   (~30 ns).

4. **AirspeedVelocity benchmark CI**. Adds:
   - `.github/workflows/Benchmark.yml` — runs on PRs touching `src/`,
     `bench/`, `Project.toml`, or the workflow itself. Uses
     `benchpkg` / `benchpkgtable` from AirspeedVelocity 0.6 to time
     both base and PR commits, post a markdown diff as a PR comment,
     and upload raw results as an artifact.
   - `benchmark/benchmarks.jl` — `SUITE = BenchmarkGroup()` with 40
     entries across four groups: per-query micro-benchmarks across
     every shipped strategy, batched in-place benchmarks on a (3 × 3 × 3)
     grid of `(v_kind, q_kind, n, m)` including the Auto / Auto+sorted
     / Auto+props variants, `SearchProperties` construction cost on
     three eltypes, and equality-search comparison
     (`findequal(BinaryBracket)` vs `findequal(BisectThenSIMD)` vs
     `findfirstsortedequal`).

84626 tests pass (+506 from the new test blocks).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The package source had grown to a single 1900-line file mixing every layer:
LLVM IR templates, equality-search primitives, strategy type definitions,
the per-strategy dispatch methods, Auto's heuristic decision tree, the
batched in-place API, the Guesser correlated-lookup helper, the findequal
strategy-framework wrapper, and the PrecompileTools workload. Splitting
into one file per logical layer makes the source navigable without changing
any behaviour.

  src/
  ├── FindFirstFunctions.jl   39 lines   module + exports + includes
  ├── simd_ir.jl             135         IR template + IR constants + SIMD
  │                                       primitives (eq, sgt, sge, ogt, oge)
  ├── equality.jl             73         findfirstequal + findfirstsortedequal
  ├── strategies.jl          268         SearchStrategy abstract + concrete
  │                                       strategy types + SearchProperties +
  │                                       Auto (struct + docstrings)
  ├── search_properties.jl   120         _sampled_looks_linear /
  │                                       _sampled_looks_log_linear / _has_nan
  │                                       + populated SearchProperties ctor
  ├── dispatch.jl            583         Per-strategy searchsortedfirst/last
  │                                       methods + bracketstrictlymonotonic*
  │                                       + searchsortedfirstexp + _interp_guess
  │                                       + _bit_interp_guess_f64 +
  │                                       _simdscan_*_specialized
  ├── auto.jl                168         Auto crossover constants + per-query
  │                                       Auto + _estimate_avg_gap +
  │                                       _auto_simd_eligible /
  │                                       _auto_simd_gap_max /
  │                                       _auto_interp_eligible + _auto_pick
  ├── batched.jl             321         searchsortedfirst!/last! +
  │                                       sorted/unsorted inner loops +
  │                                       _take_sorted_path + generic
  │                                       _batched! + Auto-specialized
  │                                       _batched! + searchsortedrange
  ├── guesser.jl             109         looks_linear + Guesser + GuesserHint
  │                                       dispatch (relies on BracketGallop)
  ├── findequal.jl            91         findequal generic + BisectThenSIMD
  │                                       shortcut for DenseVector{Int64}
  └── precompile.jl           58         PrecompileTools @setup_workload

Include order is significant — each file may depend on names from earlier
files. The order is documented in `FindFirstFunctions.jl` and is:
simd_ir → equality → strategies → search_properties → dispatch → auto →
batched → guesser → findequal → precompile. The two non-obvious
dependencies are:

  - `auto.jl` references `_sampled_looks_linear` from `search_properties.jl`
    (the `_auto_interp_eligible` loose tier).
  - `batched.jl` references `_estimate_avg_gap`, `_auto_simd_eligible`,
    `_auto_simd_gap_max`, `_auto_interp_eligible`, `_AUTO_*` constants from
    `auto.jl` (the Auto-specialized `_batched!` lives in `batched.jl`
    because it shares the sorted/unsorted-loop infrastructure with the
    generic `_batched!`).

84626 tests pass; no behavioural change.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas ChrisRackauckas marked this pull request as ready for review May 20, 2026 14:39
@ChrisRackauckas ChrisRackauckas merged commit 8f78df2 into SciML:main May 20, 2026
15 of 21 checks passed
ChrisRackauckas added a commit to SciML/DataInterpolations.jl that referenced this pull request May 21, 2026
Refactors the hand-rolled sorted-batch lockstep walk in the merged Akima
batched evaluator and adds new batch evaluators for `LinearInterpolation`
and `CubicSpline` (formerly draft PR #527). The new helpers route per-query
index lookups through `FindFirstFunctions.searchsortedlast!` so the
algorithm adapts to the n/m ratio automatically.

Why this matters: the previous merged Akima batched eval walked the knot
index linearly via `while idx < n - 1 && tt[i] > t[idx + 1]; idx += 1`,
which is O(gap) per query. For sparse queries on long `t` (e.g. `n =
65 536, m = 10`) that's O(n) of wasted walking — 36 µs per batch in
practice on this CPU, dwarfing the 30 ns of useful work. The FFF
strategy dispatch ([SciML/FindFirstFunctions.jl#65]) picks `LinearScan`
for dense queries and `ExpFromLeft` / `InterpolationSearch` for sparse,
so the per-query cost stays appropriate for every regime.

Implementation:

  - `_sorted_batch_extrapolation_ok(A)` — shared predicate (None /
    Constant / Linear / Extension on each side).
  - `_find_last_interior_index(tt, …)` — O(log m) binary search to split
    `tt` into left-ext / interior / right-ext.
  - `_eval_interior_adaptive!(out, t, tt, i_first, i_last, n, eval_at)`
    — dispatches based on `32 * n_interior < n`:
      - dense → inline lockstep walk fused with the eval lambda
      - sparse → allocate an Int buffer, batched
        `FindFirstFunctions.searchsortedlast!`, then run the eval on the
        indexed positions.
    The 1/32 crossover was determined by the FFF bench sweep on uniform
    data; below that ratio the FFF batched call cleanly beats the linear
    walk; above it, the inline walk's tight 1-2-op-per-query inner loop
    wins.

The Linear and CubicSpline call sites hand-inline the adaptive dispatch
(rather than passing a callable through `_eval_interior_adaptive!`)
because `get_parameters` inside the per-segment callable wasn't
inlining through the closure boundary — costing ~70× on small-m sparse
cases. Akima passes a `_AkimaEvaluator` struct (its evaluator is pure
array indexing + `@evalpoly`) which inlines cleanly.

Benchmarks (Julia 1.10, BenchmarkTools `minimum`):

  | type   | n=65k,m=10 | n=65k,m=4096 | n=4k,m=4k | n=1M,m=10 |
  |--------|------------|--------------|-----------|-----------|
  | Akima  | 36 µs → 706 ns (51×) | 106 µs → 104 µs | 34 µs → 33 µs | (unmeasured) → 822 ns |
  | Linear | n/a → 737 ns         | n/a → 112 µs     | n/a → 42 µs   | n/a → 851 ns |
  | Cubic  | n/a → 809 ns         | n/a → 163 µs     | n/a → 77 µs   | n/a → 960 ns |

Where master had no batched evaluator (Linear, Cubic), the comparison is
to `map!(A, out, tt)` (the existing default), which uses the
`Guesser`-based correlated search per-query. The new fast path is
competitive at the sparse end (the existing per-query path is already
fast there via the iguesser hint) and 2–7× faster on dense batches.

Drops the previous `any(isnan, A.u)` check from `LinearInterpolation`'s
fast-path gate (added defensively in the original sorted-batch code,
but it's O(n) and was making the fast path slower than `map!` on
sparse cases). Users with NaN in `u` continue to get NaN through
normal arithmetic; the original per-point NaN special-case was for
ForwardDiff propagation through `t/t` not for user-supplied NaN data.

Project.toml: bumps FindFirstFunctions compat from `1.3` to `1.9`.

This PR depends on SciML/FindFirstFunctions.jl#65 (introduces the
`searchsortedlast!` API). Supersedes draft PRs #527 (Linear +
CubicSpline batch evaluators) and partly #528 (the 5 remaining types
not yet ported can follow the same pattern in a follow-up).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants