Skip to content

Chunk navigation capability#778

Merged
frankmcsherry merged 3 commits into
master-nextfrom
chunk-navigation-capability
Jul 2, 2026
Merged

Chunk navigation capability#778
frankmcsherry merged 3 commits into
master-nextfrom
chunk-navigation-capability

Conversation

@frankmcsherry

Copy link
Copy Markdown
Member

Move the Cursor requirement from Chunk to a new trait that can be independently implemented, removing the requirement from chunk implementors.

frankmcsherry and others added 2 commits July 1, 2026 22:26
…ility

Chunk is now storage plus transducers plus Time/len/TARGET, with no
key/val opinions. Navigation moves to NavigableChunk: Chunk + Navigable,
which carries bounds(); VecChunk and ColChunk implement it.

ChunkBatch drops its cached boundary columns, and the straddle cursor
builds that resident index at construction instead, so Navigable for
ChunkBatch<C> gates on C: NavigableChunk. ChunkSpine<C> now forms,
merges, and settles for a cursor-less C; cursor-driven consumption is
what demands the capability. chunk_bench's scan and driver bounds
tighten to NavigableChunk accordingly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… index

Building the boundary index at cursor construction cost O(chunks) before
the first seek — more than the walk itself for sparse lookups, and pure
waste for sequential passes, which visit every chunk anyway. Every
navigation question is answerable from the chunks' resident bounds()
directly: seeks binary-search them (log(chunks) reads) and boundary
spill checks read the two neighbouring chunks. Cursor construction and
ChunkBatch::new are now free of per-chunk work entirely.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@frankmcsherry frankmcsherry marked this pull request as ready for review July 2, 2026 15:52
`seek_key` binary-searched the chunks' resident bounds over the whole
`[0, chunks)` range on every call. Seeks within a cursor pass are
monotone forward, so start from the current key's first chunk
(`key_chunk`) and gallop, falling back to a full search only on a
backward seek. A monotone sweep now costs `O(log Δ)` bounds reads per
seek rather than `O(log chunks)`.

Isolated, the chunk-selection cost drops several-fold on large batches
(up to ~19x at ~1000 chunks) and regresses only by a negligible
constant on very sparse probe sets. No new state: the batch carries no
index, and the hint is the `key_chunk` the cursor already tracks.

Adds a test asserting `seek_key` lands at the first key `>= target` for
every (start, target) pair, forward and backward, so the hint never
changes the result.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@frankmcsherry frankmcsherry merged commit 79911ec into master-next Jul 2, 2026
6 checks passed
@frankmcsherry frankmcsherry deleted the chunk-navigation-capability branch July 2, 2026 21:00
frankmcsherry added a commit to frankmcsherry/differential-dataflow that referenced this pull request Jul 3, 2026
TimelyDataflow#778 Chunk split)

A columnar differential-dataflow backend for DDIR whose native representation
is corgi Value columns and whose scalar logic is corgi `eval_graph`, parallel
to `backend::vec`.

Arrangement:
  - CorgiChunk : Chunk (NOT NavigableChunk) — corgi columns + Vec time/diff,
    ordered (key,val) by corgi structural order then time. merge/extract/advance/
    settle ported from the VecChunk reference; drives corgi's discrimination sort
    (sort_perm) + batched compare_idx, never per-pair. Rides TimelyDataflow#778's cursor-less
    Chunk path, so it gets the fueled/graded ChunkBatchMerger for free.
  - Tactics (Route B): cursor-less reduce (incremental key selection over the
    input delta); join via corgi `find` (find_ranges → equal-range merge-join,
    multi-record). as_collection reads columns directly.

differential-dataflow proper: widen the JoinTactic/ReduceTactic/Fresh/
join_with_tactic/reduce_with_tactic seam from pub(crate) to pub, so an
out-of-crate tactic can be implemented at all. (Candidate for upstreaming as
the public extension point the TimelyDataflow#773 tactics were designed to be.)

State: all 6 canonical programs match `vec`; 33 lib tests pass. reach ~1.9×
slower than vec, compute-bound linear ~3× FASTER (columnar eval avoids the
row backend's ~22% Value::cmp pointer-chasing). The remaining reach gap is
entirely the reduce, still row-wise — the only operator not yet columnar.

Notes / follow-ups: corgi_arrange.rs (old CorgiBatch impl) is superseded by
corgi_chunk.rs and retained only as reference; scratch examples alongside the
corgi_perf/corgi_progs/corgi_prof harnesses. Depends on frankmcsherry/wip
corgi branch dd-arrange-api.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant