Skip to content

Spine benchmark columnar performance improvements#732

Merged
frankmcsherry merged 5 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:spines-bakeoff
Apr 29, 2026
Merged

Spine benchmark columnar performance improvements#732
frankmcsherry merged 5 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:spines-bakeoff

Conversation

@frankmcsherry
Copy link
Copy Markdown
Member

This PR re-adds examples/spines.rs (removed in spring cleaning) to compare the in-tree columnar representations with existing val/key idioms. Several scaling glitches were observed, many of them improved, although surely several more remain.

frankmcsherry and others added 5 commits April 26, 2026 14:50
Brings back the spines arrangement bake-off (deleted in TimelyDataflow#724 Spring
cleaning, then RHH-dependent) with three modes: `key` (OrdKeySpine),
`val` (OrdValSpine with Val=()), and `col` (columnar ValSpine via the
columnar module added in TimelyDataflow#730). All three feed the same Vec-shaped
input collections through one driver loop; `col` repacks via a small
in-dataflow `unary` (`ToRecorded`) that builds `RecordedUpdates`
containers before `arrange_core`.

Bisecting against the example exposed a regression introduced in TimelyDataflow#725:
EditList::load now delegates to populate_key, which seek_keys + checks
+ rewinds vals on every call. In the merge-join inner loop (join.rs
Ordering::Equal arm), the cursor is already positioned by the upstream
`match trace_key.cmp(&batch_key)` work, so the seek is redundant.
Repeated 1M times in the spines query phase, this added ~3s (+40%
queries time vs pre-TimelyDataflow#725 baseline).

Restoring EditList::load to its pre-TimelyDataflow#725 division of labor — assume
the cursor is positioned, walk vals inline — recovers performance.
populate_key and replay_key keep the seek for callers that legitimately
need it (reduce, ValueHistory). The Option-based meet API from TimelyDataflow#725
stays.

Measurements (1M keys, 1000 size, key mode):
- v0.23.0 baseline: 6.56s queries
- pre-TimelyDataflow#725 (f4e7550): 7.16s queries
- master HEAD before this commit: 10.12s queries
- this commit: 7.00s queries

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@frankmcsherry frankmcsherry changed the title Spines bakeoff Spine benchmark columnar performance improvements Apr 29, 2026
@frankmcsherry frankmcsherry merged commit 7bc8e81 into TimelyDataflow:master-next Apr 29, 2026
6 checks passed
@frankmcsherry frankmcsherry deleted the spines-bakeoff branch April 29, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant