Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding delta compression to Bitpacking compression #5491

Merged
merged 37 commits into from
Dec 8, 2022

Conversation

samansmink
Copy link
Contributor

@samansmink samansmink commented Nov 25, 2022

This PR reworks our bitpacking compression implementation, main improvements:

  • Add support for FOR-Delta compression (compressing deltas instead of values with FOR)
  • Explicit modes for constant and constant delta compression.

The current implementation chooses for each VECTOR_SIZE values the most suitable compression mode.

In the process I also improved the NumericLimits class a bit by making it the Minimum and Maximum methods constexpr.

Compression ratio

TPC-DS SF1 gets roughly 10% smaller from 317992960 to 288108544 bytes.
TPC-H SF1 gets roughly 3% smaller from 264777728 to 256651264 bytes.

Compression ratio validation is in test/sql/storage/compression/bitpacking/bitpacking_compression_ratio.test_coverage. Max compression ratio is like ~800x-900x for constant compression or constant delta compression, delta_for and for can compress up to ~58x-60x.

Performance

Ran some benchmarks on my M1 macbook to validate the implementation.
Firstly added some benchmarks to benchmark/micro/compression to target the different modes:

benchmark uncompressed current this_pr current vs uncompressed this_pr vs uncompressed
benchmark/micro/compression/bitpacking/bitpacking_read_constant.benchmark 0.09 0.10 0.11 7% 12%
benchmark/micro/compression/bitpacking/bitpacking_read_constant_delta.benchmark 0.10 0.10 0.11 6% 9%
benchmark/micro/compression/bitpacking/bitpacking_read_dfor.benchmark 0.10 0.10 0.11 2% 14%
benchmark/micro/compression/bitpacking/bitpacking_read_for.benchmark 0.10 0.10 0.10 -0% 3%
benchmark/micro/compression/bitpacking/bitpacking_store_constant.benchmark 2.83 4.21 2.91 49% 3%
benchmark/micro/compression/bitpacking/bitpacking_store_constant_delta.benchmark 3.21 5.46 3.45 70% 7%
benchmark/micro/compression/bitpacking/bitpacking_store_dfor.benchmark 2.85 4.10 3.68 44% 29%
benchmark/micro/compression/bitpacking/bitpacking_store_for.benchmark 3.12 5.44 3.62 75% 16%

then tpch write:

benchmark current this_pr diff
benchmark/micro/compression/store_tpch_sf1.benchmark 12.16 12.00 -1%

Read performance did not change much, bitpacking is already very fast. Adding delta on top doesn't really change much. Write performance has improved quite a bit due to our previous inefficient use of TrySubtractOperator that is now both changed to constexpr and called less often.

Future improvements

Some ideas that are left for future improvements, they're not super important i think:

  • emit constant vectors where possible
  • support delta compression with nulls
  • support delta encoding on unsigned types with values beyond the domain of their corresponding signed type

@samansmink samansmink changed the title Bitpacking refactor Adding delta compression to Bitpacking compression Nov 25, 2022
@Mytherin Mytherin changed the base branch from master to feature November 28, 2022 12:12
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Great work and great results! Some comments below:

src/main/settings/settings.cpp Outdated Show resolved Hide resolved
src/include/duckdb/common/windows.hpp Outdated Show resolved Hide resolved
src/include/duckdb/common/limits.hpp Outdated Show resolved Hide resolved
src/storage/compression/bitpacking.cpp Outdated Show resolved Hide resolved
src/storage/compression/bitpacking.cpp Outdated Show resolved Hide resolved
src/storage/compression/bitpacking.cpp Show resolved Hide resolved
@Mytherin Mytherin merged commit 453ba4e into duckdb:feature Dec 8, 2022
@Mytherin
Copy link
Collaborator

Mytherin commented Dec 8, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants