[fix](be) Restrict varbinary predicate block to file scan#64019
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
Summary: Requesting changes for a TopN runtime predicate regression in file scans. The PR goal is clear and the hook-based split between shared scan logic and file-scan limitations is focused, but the current target initialization path disables VARBINARY TopN filters entirely for file scans instead of only preventing reader/column-predicate pushdown.
Critical checkpoint conclusions:
- Goal/test: The change partially achieves the goal, but the file-scan VARBINARY TopN case is not covered and regresses runtime filter application. Only manual testing is reported; no targeted regression/BE test proves the affected path.
- Scope: The code is small and focused.
- Concurrency/lifecycle: No new threads, locks, global/static lifecycle, or ownership cycles were introduced. Existing RuntimePredicate locking is unchanged.
- Config/compatibility/protocol: No new config, storage format, or FE-BE protocol changes.
- Parallel paths: Column predicate normalization and TopN predicate initialization are parallel decisions; the new capability hook is applied to both, but
prepare()now conflatescannot push down as column predicatewithdo not initialize target, which is the issue below. - Tests/results: No new test or expected result is added. A focused external file-scan VARBINARY TopN/runtime-filter case would catch this.
- Observability/performance: No new observability required. The issue is performance/behavioral because a safe residual VTopN predicate is not installed.
- Transaction/data correctness: Not applicable; no transaction, persistence, or data write path changes.
User focus: No additional user-provided review focus was supplied.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The scan operator unconditionally skipped VARBINARY column predicate and TopN runtime predicate pushdown. The commit that introduced the guard was for external Parquet/file scan reader predicate limitations, so applying it in the shared scan path also blocked non-file scans. This change adds a scan-operator hook for column predicate pushdown capability, keeps the default permissive, and makes FileScanOperatorX reject VARBINARY column predicates.
### Release note
None
### Check List (For Author)
- Test: Manual test
- ./build.sh --be
- build-support/check-format.sh
- build-support/run-clang-tidy.sh (fails due existing jni-util.h static_assert analysis errors and pre-existing scan_operator.cpp complexity/style diagnostics)
- Behavior changed: Yes. VARBINARY predicate pushdown is only blocked by file scan now.
- Does this need documentation: No
11f26e0 to
830a82d
Compare
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29098 ms |
TPC-DS: Total hot run time: 169188 ms |
|
/review |
|
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Review summary: I reviewed the actual GitHub PR diff for the four changed scan operator files and did not find additional blocking issues beyond the already-known inline thread. The current patch no longer uses continue for non-pushdown TopN targets and initializes the runtime predicate target with cid = -1, so the residual TopN predicate path is preserved for file-scan VARBINARY targets.
Critical checkpoint conclusions:
- Goal/test: The goal is to restrict the VARBINARY column predicate pushdown block to file scans while allowing other scan types. The code implements this via a virtual
can_push_down_column_predicate()hook and aFileScanOperatorXoverride. The PR reports BE build/format/manual checks; I did not run additional tests in this review. - Scope/clarity: The change is small and focused on scan predicate pushdown capability routing.
- Concurrency/lifecycle: No new shared mutable state, thread lifecycle, lock ordering, or dependency objects are introduced.
- Configuration/compatibility: No new config, storage format, function symbol, or FE/BE protocol compatibility change is introduced.
- Parallel paths: The generic scan predicate path and file-scan TopN path are both routed through the new capability hook; OLAP keeps its existing key-column TopN restriction.
- Conditional checks: The file-scan-specific VARBINARY check is localized in
FileScanOperatorX::can_push_down_column_predicate()and matches the stated external-reader limitation. - Tests: No new automated regression/unit test is added. The PR states manual BE build/style/tidy checks; this is acceptable for the small routing fix but leaves residual risk around external file-scan VARBINARY TopN coverage.
- Observability/transactions/persistence/data writes: Not applicable; this change affects scan predicate routing only.
- Performance: No meaningful extra hot-path overhead beyond one virtual capability check during predicate normalization/TopN setup.
User focus: No additional user-provided review focus was specified.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: The scan operator unconditionally skipped VARBINARY column predicate and TopN runtime predicate pushdown. The commit that introduced the guard was for external Parquet/file scan reader predicate limitations, so applying it in the shared scan path also blocked non-file scans. This change adds a scan-operator hook for column predicate pushdown capability, keeps the default permissive, and makes FileScanOperatorX reject VARBINARY column predicates.
Release note
None
Check List (For Author)