Skip to content

Conversation

@BiteTheDDDDt
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings February 10, 2026 10:19
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a fixed-size, zlib-compatible CRC32 hashing fast path and wires it into vectorized column CRC updates to reduce hashing overhead for common primitive widths.

Changes:

  • Add HashUtil::zlib_crc32_fixed() implementing a slicing-by-4 CRC32 for 1/2/4/8-byte values (fallback to zlib for other sizes).
  • Refactor ColumnVector::update_crcs_with_value() to route through a new _zlib_crc32_hash() helper and use zlib_crc32_fixed() for non-date/datetime types.
  • Adjust memcpy_fixed() to use memcpy and std::assume_aligned for aligned copies.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
be/src/vec/common/memcpy_small.h Switch fixed-size copy helper to memcpy (with optional alignment hints).
be/src/vec/columns/column_vector.h Declare new _zlib_crc32_hash() helper used by CRC batch updates.
be/src/vec/columns/column_vector.cpp Refactor CRC batch update logic to use _zlib_crc32_hash() and zlib_crc32_fixed().
be/src/util/hash_util.hpp Add zlib_crc32_fixed() slicing-by-4 implementation and use it for null hashing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 55 to 60
template <typename T>
static uint32_t zlib_crc32_fixed(const T& value, uint32_t hash) {
// Slicing-by-4 table: t[0] is the standard byte-at-a-time table,
// t[1..3] are extended tables for parallel 4-byte processing.
struct CRC32SliceBy4Table {
uint32_t t[4][256] {};
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zlib_crc32_fixed is a function template and defines static constexpr CRC32SliceBy4Table tbl inside the template. This will instantiate (and potentially emit) a separate 4x256 CRC table for every distinct T used across the codebase, increasing compile time and binary size. Consider moving the slice-by-4 table to a non-templated shared singleton (e.g., an inline/constinit static in HashUtil, or a file-scope table) and have the template reuse it.

Copilot uses AI. Check for mistakes.
@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (64/64) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.71% (19451/36904)
Line Coverage 36.19% (181107/500370)
Region Coverage 32.60% (140719/431703)
Branch Coverage 33.62% (60950/181272)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (64/64) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.70% (25926/36159)
Line Coverage 54.30% (271041/499131)
Region Coverage 51.66% (225278/436083)
Branch Coverage 53.16% (96734/181976)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

fix
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants