Chunk navigation capability#778
Merged
Merged
Conversation
…ility Chunk is now storage plus transducers plus Time/len/TARGET, with no key/val opinions. Navigation moves to NavigableChunk: Chunk + Navigable, which carries bounds(); VecChunk and ColChunk implement it. ChunkBatch drops its cached boundary columns, and the straddle cursor builds that resident index at construction instead, so Navigable for ChunkBatch<C> gates on C: NavigableChunk. ChunkSpine<C> now forms, merges, and settles for a cursor-less C; cursor-driven consumption is what demands the capability. chunk_bench's scan and driver bounds tighten to NavigableChunk accordingly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… index Building the boundary index at cursor construction cost O(chunks) before the first seek — more than the walk itself for sparse lookups, and pure waste for sequential passes, which visit every chunk anyway. Every navigation question is answerable from the chunks' resident bounds() directly: seeks binary-search them (log(chunks) reads) and boundary spill checks read the two neighbouring chunks. Cursor construction and ChunkBatch::new are now free of per-chunk work entirely. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
`seek_key` binary-searched the chunks' resident bounds over the whole `[0, chunks)` range on every call. Seeks within a cursor pass are monotone forward, so start from the current key's first chunk (`key_chunk`) and gallop, falling back to a full search only on a backward seek. A monotone sweep now costs `O(log Δ)` bounds reads per seek rather than `O(log chunks)`. Isolated, the chunk-selection cost drops several-fold on large batches (up to ~19x at ~1000 chunks) and regresses only by a negligible constant on very sparse probe sets. No new state: the batch carries no index, and the hint is the `key_chunk` the cursor already tracks. Adds a test asserting `seek_key` lands at the first key `>= target` for every (start, target) pair, forward and backward, so the hint never changes the result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
frankmcsherry
added a commit
to frankmcsherry/differential-dataflow
that referenced
this pull request
Jul 3, 2026
TimelyDataflow#778 Chunk split) A columnar differential-dataflow backend for DDIR whose native representation is corgi Value columns and whose scalar logic is corgi `eval_graph`, parallel to `backend::vec`. Arrangement: - CorgiChunk : Chunk (NOT NavigableChunk) — corgi columns + Vec time/diff, ordered (key,val) by corgi structural order then time. merge/extract/advance/ settle ported from the VecChunk reference; drives corgi's discrimination sort (sort_perm) + batched compare_idx, never per-pair. Rides TimelyDataflow#778's cursor-less Chunk path, so it gets the fueled/graded ChunkBatchMerger for free. - Tactics (Route B): cursor-less reduce (incremental key selection over the input delta); join via corgi `find` (find_ranges → equal-range merge-join, multi-record). as_collection reads columns directly. differential-dataflow proper: widen the JoinTactic/ReduceTactic/Fresh/ join_with_tactic/reduce_with_tactic seam from pub(crate) to pub, so an out-of-crate tactic can be implemented at all. (Candidate for upstreaming as the public extension point the TimelyDataflow#773 tactics were designed to be.) State: all 6 canonical programs match `vec`; 33 lib tests pass. reach ~1.9× slower than vec, compute-bound linear ~3× FASTER (columnar eval avoids the row backend's ~22% Value::cmp pointer-chasing). The remaining reach gap is entirely the reduce, still row-wise — the only operator not yet columnar. Notes / follow-ups: corgi_arrange.rs (old CorgiBatch impl) is superseded by corgi_chunk.rs and retained only as reference; scratch examples alongside the corgi_perf/corgi_progs/corgi_prof harnesses. Depends on frankmcsherry/wip corgi branch dd-arrange-api. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Move the
Cursorrequirement fromChunkto a new trait that can be independently implemented, removing the requirement from chunk implementors.