Skip to content

Conversation

@BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Dec 26, 2025

What problem does this PR solve?

图片

This pull request refactors and optimizes the handling of null maps and key packing in hash join and hash table code, with a focus on improving SIMD (Single Instruction, Multiple Data) usage and simplifying null bitmap logic. The changes replace older byte-searching utilities with new, more efficient SIMD-based functions, update how null bitmaps are packed and processed, and streamline column null data replacement. Additionally, the logic for determining hash key types and handling fixed key serialization is improved for better correctness and performance.

Key improvements and changes:

SIMD utilities and null map handling

  • Introduced new SIMD-based functions contain_one and contain_zero in simd/bits.h, replacing the older contain_byte and related logic for checking the presence of ones or zeros in null maps, resulting in more efficient null detection.
  • Updated all usages of null map checks throughout the codebase to use the new contain_one and contain_zero functions, simplifying and unifying the logic for detecting nulls in columns and filters. [1] [2] [3] [4] [5] [6]

Hash key and null bitmap packing

  • Refactored the logic for packing null maps into hash keys in MethodKeysFixed, introducing new templates and helper functions for interleaved null map packing, and replacing the old bitmap size calculation with a simplified approach. This improves both performance and maintainability. [1] [2]
  • Updated the logic for initializing and inserting keys, ensuring correct handling of nulls and simplifying offset calculations for key data. [1] [2] [3]

Column null data replacement

  • Simplified the replace_column_null_data methods for vector and decimal columns by removing unnecessary null count checks and optimizing the replacement logic. [1] [2]

Hash key type logic

  • Improved the logic for determining the hash key type in hash_key_type.h to handle cases where the number of data types exceeds the bit size, defaulting to serialized keys as needed. [1] [2]

Code cleanup and dependency updates

  • Removed unused functions and updated includes to ensure all SIMD utilities are properly imported where needed. [1] [2] [3]

These changes collectively improve performance, maintainability, and correctness in hash join operations, especially in handling nullable columns and SIMD optimizations.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 71.43% (15/21) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.40% (18952/35489)
Line Coverage 39.26% (175764/447675)
Region Coverage 33.84% (136023/402003)
Branch Coverage 34.76% (58725/168936)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (21/21) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.17% (25040/34694)
Line Coverage 58.89% (262946/446483)
Region Coverage 53.87% (218783/406168)
Branch Coverage 55.32% (93770/169500)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.52% (1771/2227)
Line Coverage 64.80% (31299/48299)
Region Coverage 65.39% (15582/23831)
Branch Coverage 55.97% (8281/14796)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 80.39% (41/51) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.38% (18951/35501)
Line Coverage 39.26% (175795/447732)
Region Coverage 33.82% (136029/402160)
Branch Coverage 34.76% (58731/168985)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (51/51) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.15% (25039/34706)
Line Coverage 58.88% (262929/446540)
Region Coverage 53.82% (218704/406325)
Branch Coverage 55.32% (93788/169549)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 44.95% (49/109) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.37% (18952/35512)
Line Coverage 39.25% (175903/448149)
Region Coverage 33.83% (136166/402473)
Branch Coverage 34.76% (58802/169166)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (109/109) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.16% (25052/34717)
Line Coverage 58.90% (263276/446957)
Region Coverage 53.67% (218240/406636)
Branch Coverage 55.32% (93887/169728)

HappenLee
HappenLee previously approved these changes Jan 5, 2026
Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

PR approved by anyone and no changes requested.

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 6, 2026
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 46.79% (51/109) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.22% (18968/35640)
Line Coverage 39.23% (176078/448865)
Region Coverage 33.72% (136127/403713)
Branch Coverage 34.70% (58761/169340)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 56.72% (76/134) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.23% (18971/35643)
Line Coverage 39.19% (175788/448564)
Region Coverage 33.75% (136250/403717)
Branch Coverage 34.71% (58771/169340)

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

format

update

format

fix

update fix

fix
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 97.10% (134/138) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.21% (25163/34847)
Line Coverage 58.94% (263602/447235)
Region Coverage 53.62% (218505/407520)
Branch Coverage 55.29% (93870/169783)

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 7, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

PR approved by at least one committer and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants