Speed up DeltaBitPackDecoder
#1281
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
performance
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The current
DeltaBitPackDecoder
implementation decodes each value separately with a call toBitReader::get_unaligned
. This prevents it from benefiting from the vectorizedunpack32
that is used by RLEDecoder, and makes for a complex inner body of the loop. It also seems to do a fair amount of memory allocation on the hot path, which is also unfortunate.Describe the solution you'd like
The
DeltaBitPackDecoder
should be faster thanRLEDecoder
and theoretically should be even faster thanPlainDecoder
- the algorithm is specifically designed to vectorize well. Most of the logic to achieve this already exists, it is just a matter of hooking it up.One small detail is
unpack32
can only handle a bit width less than 32. This should be a rare degenerate case, as the format stores deltas not values, but the logic will need to fallback in such a caseThe text was updated successfully, but these errors were encountered: