Skip to content

Conversation

@BiteTheDDDDt
Copy link
Contributor

…ilter (apache#59786)

This pull request refactors and simplifies the handling of late-arriving
runtime filters and conjunct context cloning in the scan pipeline. The
main changes include streamlining the interfaces for applying runtime
filters, improving thread safety with new locking mechanisms, and
removing redundant or duplicated code. The changes also clarify
ownership and lifecycle of conjunct expressions to prevent concurrent
access issues.

**Runtime filter and conjunct management improvements:**

* Refactored `try_append_late_arrival_runtime_filter` in
`RuntimeFilterConsumerHelper` to use references instead of pointers for
`arrived_rf_num` and conjuncts, and updated all call sites accordingly
for clearer and safer usage.
[[1]](diffhunk://#diff-ed4cbc22dcb1b5d27652827cea0f12373591a9b06e3c857f0dd0e32f58422026L99-R110)
[[2]](diffhunk://#diff-61d96ad649abf519ee21143b6a4d9ed0c1b820ba273281ca8650a1e047d10f15L47-R50)
[[3]](diffhunk://#diff-04a5f666c19cc90dfb8f222f1fadc43f08fa2d59f5ae61e2889e615e3cffc35dL101-R101)
[[4]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L232-R245)
* Removed the redundant `clone_conjunct_ctxs` method from
`RuntimeFilterConsumerHelper` and moved conjunct cloning logic into
`ScanLocalStateBase`, introducing new methods
`update_late_arrival_runtime_filter` and `clone_conjunct_ctxs` for
better encapsulation.
[[1]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9R76-R91)
[[2]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R91-R94)
[[3]](diffhunk://#diff-ed4cbc22dcb1b5d27652827cea0f12373591a9b06e3c857f0dd0e32f58422026L123-R128)
[[4]](diffhunk://#diff-61d96ad649abf519ee21143b6a4d9ed0c1b820ba273281ca8650a1e047d10f15L47-R50)
[[5]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L232-R245)

**Thread safety enhancements:**

* Added a `std::mutex _conjuncts_lock` to `ScanLocalStateBase` and used
it to guard access to `_conjuncts` for thread-safe updates and cloning.
[[1]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R21)
[[2]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R128)
[[3]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9R76-R91)
[[4]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R91-R94)

**Scanner and file scanner lifecycle fixes:**

* Changed handling of conjuncts in `FileScanner` and `Scanner` to avoid
destroying conjuncts that may still be in use, by explicitly moving them
to a `_stale_expr_ctxs` vector before clearing. Removed the
`_discard_conjuncts` method and replaced it with direct management.
[[1]](diffhunk://#diff-6afbc092caa913a8775be9772777060161de5608b9aae939202e963e8aaf4e8bL360-L366)
[[2]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L232-R245)
[[3]](diffhunk://#diff-17432beb88ab0d918d9bbe41b3ba500e3f5102645896e89587a4dc43bcab2125L192-L198)

**Minor interface improvements:**

* Made `Scanner::is_open()` a `const` method for correctness.
Copilot AI review requested due to automatic review settings January 20, 2026 04:50
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

This pull request refactors the codebase by removing the legacy `udf`
directory and its related files, consolidating user-defined function
(UDF) functionality into the new `vec/exprs/function_context`
implementation. It also updates all relevant includes and references
throughout the codebase to use the new location. Additionally, it
removes the unused `_stale_expr_ctxs` logic from several classes and
modules, simplifying resource management for expression contexts.

* Removed the `udf` subdirectory from the build system and deleted its
CMake configuration and source files, fully deprecating the legacy UDF
implementation. (`be/CMakeLists.txt`, `be/src/udf/CMakeLists.txt`)
[[1]](diffhunk://#diff-3507aac2aff9b5fe5f66d28967f3aa848491d4ced2466f6bf201ab3a97531837L832)
[[2]](diffhunk://#diff-dea6b1a2f72c10f4fc0cc4c48152c415c60541077fd035c6627bd0b5adec0f9cL1-L26)
* Renamed and migrated `udf/udf.h` and `udf/udf.cpp` to
`vec/exprs/function_context.h` and `vec/exprs/function_context.cpp`,
updating all code comments and references to reflect the new structure.
[[1]](diffhunk://#diff-5a085fa9fc24ba20e16e56569ed72532be93709b43e36e7149121ea7ae351fa8L17-R23)
[[2]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL17-L19)

* Replaced all `#include "udf/udf.h"` directives with `#include
"vec/exprs/function_context.h"` in affected modules, ensuring
consistency and future maintainability.
[[1]](diffhunk://#diff-adf4fada3277f58a65b283770d1309969d0d0167bd4a5aae58b31807d4e7c4e5L21-R22)
[[2]](diffhunk://#diff-b14814962710a0ff9710b6141b3fc2c5a492fb3284254966cdfd1b0a28de0553L32-R33)
[[3]](diffhunk://#diff-ba47d784a52d375a969014eac3244b1d56c7302d09bd184bcac4a15117fe50bbL21-R23)
[[4]](diffhunk://#diff-42bfdbb7f380191848ab3078380ab28d1b93177c189f2a0fffdee0d69da13494L45)
[[5]](diffhunk://#diff-66e6f23e815b6c5f952dabc06cde8ec7a5d20144b500c523890127085fdd667cL42)
[[6]](diffhunk://#diff-25ded31ac19678798c2a499d2635fe2c68ed86e41d2d5f6e64cb7fc1811f5d66L26)
[[7]](diffhunk://#diff-3f379e2eb89e851f445dc67d09353a48756faa5020a04e59cf4d3508f2c06aa5L63)
[[8]](diffhunk://#diff-917cff4d9270b2fd219b5d32a0198cb5759ad4196897c4415663f371f2d79845L32)
[[9]](diffhunk://#diff-d571ee8d57e4a0536cfd97ab954c8607bcd3d155058c33fed3da8c4e4f5d1debL27-R27)
[[10]](diffhunk://#diff-49cc84a21912031949401756043d8fa3ba7511928f93751623333b686a057fd9L25-R25)

* Removed the `_stale_expr_ctxs` member and all related logic from
`ScanLocalState`, `ScanOperatorX`, and `Scanner` classes, as well as
their usage in `scan_operator.cpp` and `scanner.cpp`, streamlining
expression context lifecycle management.
[[1]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9L186-L189)
[[2]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9L298)
[[3]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2L311-L313)
[[4]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2L421-L423)
[[5]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L241-L243)
[[6]](diffhunk://#diff-17432beb88ab0d918d9bbe41b3ba500e3f5102645896e89587a4dc43bcab2125L74)
[[7]](diffhunk://#diff-17432beb88ab0d918d9bbe41b3ba500e3f5102645896e89587a4dc43bcab2125L227-L230)

* Updated comments and documentation in the new `function_context.h` to
clarify UDF usage and semantics, and made minor improvements to code
style and naming.
[[1]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL46-R61)
[[2]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL123-R123)
[[3]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL152-R154)
[[4]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL203-R200)
[[5]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccR216)
* Added missing includes for `vec/common/arena.h` where necessary to
support string buffer and column operations.
[[1]](diffhunk://#diff-59ad32ff05a22e3c0750589ed971a2cab922a7d91d3ad526acace6f6dcad20b8R36)
[[2]](diffhunk://#diff-fc4e983dc58b2ef7fbced53547a021834db711969e048ee533cf7f1f827c46beR24)
[[3]](diffhunk://#diff-3f379e2eb89e851f445dc67d09353a48756faa5020a04e59cf4d3508f2c06aa5R86)

These changes collectively modernize and simplify the handling of
user-defined functions in the codebase, reducing technical debt and
improving maintainability.
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a pick of two previous pull requests that refactor runtime filter handling and remove stale UDF-related code. The main changes include refactoring the try_append_late_arrival_runtime_filter interface to use references instead of pointers, removing redundant methods, adding thread safety with mutexes, and simplifying conjunct management logic in scanners.

Changes:

  • Refactored RuntimeFilterConsumerHelper::try_append_late_arrival_runtime_filter to use reference parameters instead of pointers and changed parameter order
  • Added mutex protection for _conjuncts access in ScanLocalStateBase with new methods update_late_arrival_runtime_filter and clone_conjunct_ctxs
  • Removed the _discard_conjuncts helper method and simplified conjunct lifecycle management in scanners
  • Made Scanner::is_open() a const method
  • Updated all call sites to use the new API signatures

Reviewed changes

Copilot reviewed 66 out of 66 changed files in this pull request and generated no comments.

Show a summary per file
File Description
be/test/runtime_filter/runtime_filter_consumer_helper_test.cpp Updated test to use new API signature with reference parameters and reordered arguments
be/src/vec/exec/scan/scanner.h Made is_open() const and removed _discard_conjuncts() helper method
be/src/vec/exec/scan/scanner.cpp Refactored runtime filter logic to use new ScanLocalStateBase methods for thread safety
be/src/vec/exec/scan/file_scanner.cpp Simplified conjunct handling by directly assigning vectors instead of cloning
be/src/runtime_filter/runtime_filter_consumer_helper.h Changed interface to use references and removed clone_conjunct_ctxs method
be/src/runtime_filter/runtime_filter_consumer_helper.cpp Updated implementation to match new interface and removed cloning method
be/src/pipeline/exec/scan_operator.h Added mutex and new methods for thread-safe runtime filter updates
be/src/pipeline/exec/scan_operator.cpp Implemented new thread-safe methods with proper locking
be/src/pipeline/exec/multi_cast_data_stream_source.cpp Updated to use new API signature with appropriate comment about single-threaded execution
Comments suppressed due to low confidence (1)

be/src/vec/exec/scan/scanner.cpp:210

  • The critical assignment _applied_rf_num = arrived_rf_num; is missing after cloning the conjuncts. This was present in the original code but was accidentally removed during refactoring. Without this update, the scanner will keep trying to append late-arrival runtime filters on every call even though they have already been applied, causing unnecessary work and incorrect behavior. The _applied_rf_num field needs to be updated to arrived_rf_num after successfully cloning the conjuncts to track which runtime filters have been applied.

Status Scanner::close(RuntimeState* state) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants