[Improvement] add zlib_crc32_fixed#60735
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR introduces an optimized zlib_crc32_fixed function to improve CRC32 hash computation performance for fixed-size types. The implementation uses the Slicing-by-4 algorithm, which processes 4 bytes at a time using precomputed lookup tables, achieving a significant performance improvement from 269ms to 62ms for hash computation.
Changes:
- Adds
zlib_crc32_fixedtemplate function using Slicing-by-4 algorithm for 1, 2, 4, and 8-byte types, with fallback to zlib for larger types - Implements constexpr
CRC32SliceBy4Tablefor compile-time lookup table generation - Refactors column hashing in
ColumnVectorandColumnDecimalto use the optimized function - Provides comprehensive test coverage for the new function across various data types and edge cases
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| be/src/util/hash_util.hpp | Adds CRC32SliceBy4Table struct and zlib_crc32_fixed template function with Slicing-by-4 optimization; updates zlib_crc_hash_null to use the new function |
| be/src/vec/columns/column_vector.h | Declares _zlib_crc32_hash helper method |
| be/src/vec/columns/column_vector.cpp | Refactors update_crcs_with_value to use new _zlib_crc32_hash helper method; implements _zlib_crc32_hash with special handling for date/datetime types |
| be/src/vec/columns/column_decimal.cpp | Replaces direct zlib_crc_hash calls with zlib_crc32_fixed for non-DECIMALV2 types |
| be/test/util/crc32c_test.cpp | Adds comprehensive tests for zlib_crc32_fixed covering all data types, incremental hashing, and edge cases; includes commented-out tests for future crc32c_fixed improvements |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
run buildall |
|
run buildallrun buildall |
What problem does this PR solve?
zlib_crc32: SplitBlockHashComputeTime: 269.15ms
crc32c_fixed: SplitBlockHashComputeTime: 33.258ms
zlib_crc32_fixed: SplitBlockHashComputeTime: 61.941ms
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)