[Exec](cache) condition cache digest error in runtime filter in and add debug log#58857
Merged
zhangstar333 merged 1 commit intoapache:masterfrom Dec 10, 2025
Merged
[Exec](cache) condition cache digest error in runtime filter in and add debug log#58857zhangstar333 merged 1 commit intoapache:masterfrom
zhangstar333 merged 1 commit intoapache:masterfrom
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
3aaccd2 to
936cf17
Compare
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 36286 ms |
TPC-DS: Total hot run time: 180605 ms |
ClickBench: Total hot run time: 27.17 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BiteTheDDDDt
approved these changes
Dec 10, 2025
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
zhangstar333
approved these changes
Dec 10, 2025
nagisa-kunhah
pushed a commit
to nagisa-kunhah/doris
that referenced
this pull request
Dec 14, 2025
…dd debug log (apache#58857) ### What problem does this PR solve? ## Overview This pull request (authored by HappenLee) focuses on **performance optimization** (via sorting algorithm replacement) and **observability enhancement** (via logging expansion) for Apache Doris, along with a critical fix to ensure accurate digest calculation in predicate expressions. The changes span core data structure handling, segment iteration, and vectorized expression logic. ## Key Changes ### 1. Performance: Replace `std::sort` with `pdqsort` for Faster Set Sorting - **File**: `be/src/exprs/hybrid_set.h` - Modifications : - Added `#include <pdqsort.h>` to enable the pdqsort algorithm (a fast, adaptive quicksort variant optimized for real-world data). - Replaced ``` std::sort(elems.begin(), elems.end()) ``` with ``` pdqsort(elems.begin(), elems.end()) ``` in three set classes: - `HybridSet`: For generic element type sets. - `StringSet`: For string reference (`StringRef`) sets. - `StringValueSet`: For string value-based sets. - **Purpose**: pdqsort outperforms `std::sort` in most practical scenarios (e.g., partially sorted data, duplicate values), reducing the time to sort elements during digest calculation for set-based operations. ### 2. Observability: Add Debug Logs for Condition Cache Operations - **File**: `be/src/olap/rowset/segment_v2/segment_iterator.cpp` - Modifications : - Cache Hit Logging (Line 132-138): Added ``` VLOG_DEBUG ``` output when a condition cache hit occurs, including: - Query ID (from `_opts.runtime_state->query_id()`). - Segment ID (`_segment->id()`). - Cache digest (`_opts.condition_cache_digest`). - Rowset ID (`_opts.rowset_id.to_string()`). - **Cache Insert Logging** (Line 2379-2383): Added `VLOG_DEBUG` output when inserting data into the condition cache, including the same fields as the hit log. - **Purpose**: Improve debuggability for cache-related issues (e.g., false misses, incorrect cache entries) by linking cache events to specific queries and data segments. ### 3. Correctness: Fix Digest Calculation for `VDirectInPredicate` - **File**: `be/src/vec/exprs/vdirect_in_predicate.h` - Modifications : - Updated the ``` get_digest(uint64_t seed) ``` method to: 1. First incorporate the digest of the predicate’s child expression (`_children[0]->get_digest(seed)`). 2. Only propagate the filter’s digest (`_filter->get_digest(seed)`) if the child digest is non-zero; otherwise, return the original seed. - Replaced the previous implementation (which directly returned `_filter->get_digest(seed)` without including the child expression). - **Purpose**: Ensure the digest uniquely identifies the full predicate logic (including both the child expression and the filter), preventing hash collisions that could lead to incorrect cache lookups or data processing. ## Impact - **Performance**: Faster sorting for set-based digest calculations may reduce latency in query operations involving `IN` predicates or set comparisons. - **Debuggability**: Detailed cache logs enable quicker diagnosis of cache performance issues. - **Correctness**: Fixes a potential source of incorrect digest values, improving the reliability of cache-dependent features (e.g., condition cache, query result caching).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Overview
This pull request (authored by HappenLee) focuses on performance optimization (via sorting algorithm replacement) and observability enhancement (via logging expansion) for Apache Doris, along with a critical fix to ensure accurate digest calculation in predicate expressions. The changes span core data structure handling, segment iteration, and vectorized expression logic.
Key Changes
1. Performance: Replace
std::sortwithpdqsortfor Faster Set SortingFile:
be/src/exprs/hybrid_set.hModifications
:
Added
#include <pdqsort.h>to enable the pdqsort algorithm (a fast, adaptive quicksort variant optimized for real-world data).Replaced
with
in three set classes:
HybridSet: For generic element type sets.StringSet: For string reference (StringRef) sets.StringValueSet: For string value-based sets.Purpose: pdqsort outperforms
std::sortin most practical scenarios (e.g., partially sorted data, duplicate values), reducing the time to sort elements during digest calculation for set-based operations.2. Observability: Add Debug Logs for Condition Cache Operations
File:
be/src/olap/rowset/segment_v2/segment_iterator.cppModifications
:
Cache Hit Logging
(Line 132-138): Added
output when a condition cache hit occurs, including:
_opts.runtime_state->query_id())._segment->id())._opts.condition_cache_digest)._opts.rowset_id.to_string()).Cache Insert Logging (Line 2379-2383): Added
VLOG_DEBUGoutput when inserting data into the condition cache, including the same fields as the hit log.Purpose: Improve debuggability for cache-related issues (e.g., false misses, incorrect cache entries) by linking cache events to specific queries and data segments.
3. Correctness: Fix Digest Calculation for
VDirectInPredicateFile:
be/src/vec/exprs/vdirect_in_predicate.hModifications
:
Updated the
method to:
_children[0]->get_digest(seed))._filter->get_digest(seed)) if the child digest is non-zero; otherwise, return the original seed.Replaced the previous implementation (which directly returned
_filter->get_digest(seed)without including the child expression).Purpose: Ensure the digest uniquely identifies the full predicate logic (including both the child expression and the filter), preventing hash collisions that could lead to incorrect cache lookups or data processing.
Impact
INpredicates or set comparisons.None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)