-
Notifications
You must be signed in to change notification settings - Fork 3.7k
add zlib_crc32_fixed #60650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add zlib_crc32_fixed #60650
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a fixed-size, zlib-compatible CRC32 hashing fast path and wires it into vectorized column CRC updates to reduce hashing overhead for common primitive widths.
Changes:
- Add
HashUtil::zlib_crc32_fixed()implementing a slicing-by-4 CRC32 for 1/2/4/8-byte values (fallback to zlib for other sizes). - Refactor
ColumnVector::update_crcs_with_value()to route through a new_zlib_crc32_hash()helper and usezlib_crc32_fixed()for non-date/datetime types. - Adjust
memcpy_fixed()to usememcpyandstd::assume_alignedfor aligned copies.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| be/src/vec/common/memcpy_small.h | Switch fixed-size copy helper to memcpy (with optional alignment hints). |
| be/src/vec/columns/column_vector.h | Declare new _zlib_crc32_hash() helper used by CRC batch updates. |
| be/src/vec/columns/column_vector.cpp | Refactor CRC batch update logic to use _zlib_crc32_hash() and zlib_crc32_fixed(). |
| be/src/util/hash_util.hpp | Add zlib_crc32_fixed() slicing-by-4 implementation and use it for null hashing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
be/src/util/hash_util.hpp
Outdated
| template <typename T> | ||
| static uint32_t zlib_crc32_fixed(const T& value, uint32_t hash) { | ||
| // Slicing-by-4 table: t[0] is the standard byte-at-a-time table, | ||
| // t[1..3] are extended tables for parallel 4-byte processing. | ||
| struct CRC32SliceBy4Table { | ||
| uint32_t t[4][256] {}; |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zlib_crc32_fixed is a function template and defines static constexpr CRC32SliceBy4Table tbl inside the template. This will instantiate (and potentially emit) a separate 4x256 CRC table for every distinct T used across the codebase, increasing compile time and binary size. Consider moving the slice-by-4 table to a non-templated shared singleton (e.g., an inline/constinit static in HashUtil, or a file-scope table) and have the template reuse it.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
fix fix
92e9548 to
5fbf3d4
Compare
|
run buildall |
|
run buildall |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)