feat(parquet): batch RLE runs in level encoder via scan-ahead#9830
feat(parquet): batch RLE runs in level encoder via scan-ahead#9830alamb merged 2 commits intoapache:mainfrom
Conversation
Add `is_accumulating_rle()` and `extend_run()` methods to `RleEncoder` that allow callers to detect when the encoder is in RLE accumulation mode and bulk-extend runs without per-element overhead. Upgrade `put_with_observer()` in `LevelEncoder` to exploit this: after each `put()`, it checks whether the encoder entered accumulation mode. If so, it scans ahead for the rest of the run, calls `extend_run()` to batch it in O(1), and fires the observer once with the full run length. This turns the previous O(n) per-value encoding + observation into O(1) amortized per RLE run, which is a significant improvement for sparse columns where long runs of identical levels are common. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
|
@alamb this one is short and sweet, and should yield some nice perf wins |
|
run benchmarks arrow_reader arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing batch_rle_run (415b496) to 4fa8d2f (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing batch_rle_run (415b496) to 4fa8d2f (merge-base) diff File an issue against this benchmark runner |
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- thank you @HippoBaro
FYI @etseidl and @jhorstmann
| bit_packed_max_size.max(rle_max_size) | ||
| } | ||
|
|
||
| /// Returns `true` if the encoder is currently in RLE accumulation mode |
There was a problem hiding this comment.
I am always a bit confused about what part of the parquet module is (actually) part of its public API. For anyone else that is also confused:
RleEncoder is only reachable through the encodings module, and parquet/src/lib.rs:144 makes that module public only when the experimental feature is enabled. In the default build it is private, and even when exposed it is explicitly marked as having “no stability guarantees.” The docs.rs crate page for parquet 58.0.0 also omits enc odings from the normal public API surface, which is consistent with that setup: https://docs.rs/parquet/58.0.0/parquet/ and the feature list notes experimental as unstable: https://docs.rs/crate/parquet/latest
There was a problem hiding this comment.
The difference between public and public-experimental always trips me up!
|
I'll wait for benchmark confirmation before merging this one |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
etseidl
left a comment
There was a problem hiding this comment.
Looks like a win to me. 😄
|
@alamb you ran the benchmarks for the reader path, but this code relates to the write path 😅 |
|
run benchmark arrow_writer |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing batch_rle_run (415b496) to 4fa8d2f (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
That's more like it! :noice: |
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
|
I did do a double check on some of the string regressions and they disappeared on my workstation. |
|
As I am still waiting on a few PRs before I cut the 58.2.0 release I figured I'll just merge this one in to make the release that much better |
Which issue does this PR close?
Rationale for this change
See #9731
What changes are included in this PR?
Add
is_accumulating_rle()andextend_run()methods toRleEncoderthat allow callers to detect when the encoder is in RLE accumulation mode and bulk-extend runs without per-element overhead.Upgrade
put_with_observer()inLevelEncoderto exploit this: after eachput(), it checks whether the encoder entered accumulation mode. If so, it scans ahead for the rest of the run, callsextend_run()to batch it in O(1), and fires the observer once with the full run length.This turns the previous O(n) per-value encoding + observation into O(1) amortized per RLE run, which is a significant improvement for sparse columns where long runs of identical levels are common.
Are these changes tested?
All tests passing + added coverage around RLE accumulation mode trigger.
Are there any user-facing changes?
None.