-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Chore](pick) pick #59906 #59786 #60050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-4.0
Are you sure you want to change the base?
Conversation
…ilter (apache#59786) This pull request refactors and simplifies the handling of late-arriving runtime filters and conjunct context cloning in the scan pipeline. The main changes include streamlining the interfaces for applying runtime filters, improving thread safety with new locking mechanisms, and removing redundant or duplicated code. The changes also clarify ownership and lifecycle of conjunct expressions to prevent concurrent access issues. **Runtime filter and conjunct management improvements:** * Refactored `try_append_late_arrival_runtime_filter` in `RuntimeFilterConsumerHelper` to use references instead of pointers for `arrived_rf_num` and conjuncts, and updated all call sites accordingly for clearer and safer usage. [[1]](diffhunk://#diff-ed4cbc22dcb1b5d27652827cea0f12373591a9b06e3c857f0dd0e32f58422026L99-R110) [[2]](diffhunk://#diff-61d96ad649abf519ee21143b6a4d9ed0c1b820ba273281ca8650a1e047d10f15L47-R50) [[3]](diffhunk://#diff-04a5f666c19cc90dfb8f222f1fadc43f08fa2d59f5ae61e2889e615e3cffc35dL101-R101) [[4]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L232-R245) * Removed the redundant `clone_conjunct_ctxs` method from `RuntimeFilterConsumerHelper` and moved conjunct cloning logic into `ScanLocalStateBase`, introducing new methods `update_late_arrival_runtime_filter` and `clone_conjunct_ctxs` for better encapsulation. [[1]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9R76-R91) [[2]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R91-R94) [[3]](diffhunk://#diff-ed4cbc22dcb1b5d27652827cea0f12373591a9b06e3c857f0dd0e32f58422026L123-R128) [[4]](diffhunk://#diff-61d96ad649abf519ee21143b6a4d9ed0c1b820ba273281ca8650a1e047d10f15L47-R50) [[5]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L232-R245) **Thread safety enhancements:** * Added a `std::mutex _conjuncts_lock` to `ScanLocalStateBase` and used it to guard access to `_conjuncts` for thread-safe updates and cloning. [[1]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R21) [[2]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R128) [[3]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9R76-R91) [[4]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2R91-R94) **Scanner and file scanner lifecycle fixes:** * Changed handling of conjuncts in `FileScanner` and `Scanner` to avoid destroying conjuncts that may still be in use, by explicitly moving them to a `_stale_expr_ctxs` vector before clearing. Removed the `_discard_conjuncts` method and replaced it with direct management. [[1]](diffhunk://#diff-6afbc092caa913a8775be9772777060161de5608b9aae939202e963e8aaf4e8bL360-L366) [[2]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L232-R245) [[3]](diffhunk://#diff-17432beb88ab0d918d9bbe41b3ba500e3f5102645896e89587a4dc43bcab2125L192-L198) **Minor interface improvements:** * Made `Scanner::is_open()` a `const` method for correctness.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
This pull request refactors the codebase by removing the legacy `udf` directory and its related files, consolidating user-defined function (UDF) functionality into the new `vec/exprs/function_context` implementation. It also updates all relevant includes and references throughout the codebase to use the new location. Additionally, it removes the unused `_stale_expr_ctxs` logic from several classes and modules, simplifying resource management for expression contexts. * Removed the `udf` subdirectory from the build system and deleted its CMake configuration and source files, fully deprecating the legacy UDF implementation. (`be/CMakeLists.txt`, `be/src/udf/CMakeLists.txt`) [[1]](diffhunk://#diff-3507aac2aff9b5fe5f66d28967f3aa848491d4ced2466f6bf201ab3a97531837L832) [[2]](diffhunk://#diff-dea6b1a2f72c10f4fc0cc4c48152c415c60541077fd035c6627bd0b5adec0f9cL1-L26) * Renamed and migrated `udf/udf.h` and `udf/udf.cpp` to `vec/exprs/function_context.h` and `vec/exprs/function_context.cpp`, updating all code comments and references to reflect the new structure. [[1]](diffhunk://#diff-5a085fa9fc24ba20e16e56569ed72532be93709b43e36e7149121ea7ae351fa8L17-R23) [[2]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL17-L19) * Replaced all `#include "udf/udf.h"` directives with `#include "vec/exprs/function_context.h"` in affected modules, ensuring consistency and future maintainability. [[1]](diffhunk://#diff-adf4fada3277f58a65b283770d1309969d0d0167bd4a5aae58b31807d4e7c4e5L21-R22) [[2]](diffhunk://#diff-b14814962710a0ff9710b6141b3fc2c5a492fb3284254966cdfd1b0a28de0553L32-R33) [[3]](diffhunk://#diff-ba47d784a52d375a969014eac3244b1d56c7302d09bd184bcac4a15117fe50bbL21-R23) [[4]](diffhunk://#diff-42bfdbb7f380191848ab3078380ab28d1b93177c189f2a0fffdee0d69da13494L45) [[5]](diffhunk://#diff-66e6f23e815b6c5f952dabc06cde8ec7a5d20144b500c523890127085fdd667cL42) [[6]](diffhunk://#diff-25ded31ac19678798c2a499d2635fe2c68ed86e41d2d5f6e64cb7fc1811f5d66L26) [[7]](diffhunk://#diff-3f379e2eb89e851f445dc67d09353a48756faa5020a04e59cf4d3508f2c06aa5L63) [[8]](diffhunk://#diff-917cff4d9270b2fd219b5d32a0198cb5759ad4196897c4415663f371f2d79845L32) [[9]](diffhunk://#diff-d571ee8d57e4a0536cfd97ab954c8607bcd3d155058c33fed3da8c4e4f5d1debL27-R27) [[10]](diffhunk://#diff-49cc84a21912031949401756043d8fa3ba7511928f93751623333b686a057fd9L25-R25) * Removed the `_stale_expr_ctxs` member and all related logic from `ScanLocalState`, `ScanOperatorX`, and `Scanner` classes, as well as their usage in `scan_operator.cpp` and `scanner.cpp`, streamlining expression context lifecycle management. [[1]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9L186-L189) [[2]](diffhunk://#diff-ac38473f151ccd99db4ed80dfbfa176f4f957ae2f5335ce8fbb4f5dbb0e932e9L298) [[3]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2L311-L313) [[4]](diffhunk://#diff-f5a0db9faf0b649227b0126cca493d8dce22c9d497116fd480d7d018ce1696d2L421-L423) [[5]](diffhunk://#diff-2c575051a29b06200611882d17cb4c0fea6df46659e8ea953c705aaca2c2b5a4L241-L243) [[6]](diffhunk://#diff-17432beb88ab0d918d9bbe41b3ba500e3f5102645896e89587a4dc43bcab2125L74) [[7]](diffhunk://#diff-17432beb88ab0d918d9bbe41b3ba500e3f5102645896e89587a4dc43bcab2125L227-L230) * Updated comments and documentation in the new `function_context.h` to clarify UDF usage and semantics, and made minor improvements to code style and naming. [[1]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL46-R61) [[2]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL123-R123) [[3]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL152-R154) [[4]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccL203-R200) [[5]](diffhunk://#diff-18893ab85e5c3bb4a2d0b01e0d62434f3f4f86c47a5221d600d3a8254dbe59ccR216) * Added missing includes for `vec/common/arena.h` where necessary to support string buffer and column operations. [[1]](diffhunk://#diff-59ad32ff05a22e3c0750589ed971a2cab922a7d91d3ad526acace6f6dcad20b8R36) [[2]](diffhunk://#diff-fc4e983dc58b2ef7fbced53547a021834db711969e048ee533cf7f1f827c46beR24) [[3]](diffhunk://#diff-3f379e2eb89e851f445dc67d09353a48756faa5020a04e59cf4d3508f2c06aa5R86) These changes collectively modernize and simplify the handling of user-defined functions in the codebase, reducing technical debt and improving maintainability.
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR is a pick of two previous pull requests that refactor runtime filter handling and remove stale UDF-related code. The main changes include refactoring the try_append_late_arrival_runtime_filter interface to use references instead of pointers, removing redundant methods, adding thread safety with mutexes, and simplifying conjunct management logic in scanners.
Changes:
- Refactored
RuntimeFilterConsumerHelper::try_append_late_arrival_runtime_filterto use reference parameters instead of pointers and changed parameter order - Added mutex protection for
_conjunctsaccess inScanLocalStateBasewith new methodsupdate_late_arrival_runtime_filterandclone_conjunct_ctxs - Removed the
_discard_conjunctshelper method and simplified conjunct lifecycle management in scanners - Made
Scanner::is_open()a const method - Updated all call sites to use the new API signatures
Reviewed changes
Copilot reviewed 66 out of 66 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| be/test/runtime_filter/runtime_filter_consumer_helper_test.cpp | Updated test to use new API signature with reference parameters and reordered arguments |
| be/src/vec/exec/scan/scanner.h | Made is_open() const and removed _discard_conjuncts() helper method |
| be/src/vec/exec/scan/scanner.cpp | Refactored runtime filter logic to use new ScanLocalStateBase methods for thread safety |
| be/src/vec/exec/scan/file_scanner.cpp | Simplified conjunct handling by directly assigning vectors instead of cloning |
| be/src/runtime_filter/runtime_filter_consumer_helper.h | Changed interface to use references and removed clone_conjunct_ctxs method |
| be/src/runtime_filter/runtime_filter_consumer_helper.cpp | Updated implementation to match new interface and removed cloning method |
| be/src/pipeline/exec/scan_operator.h | Added mutex and new methods for thread-safe runtime filter updates |
| be/src/pipeline/exec/scan_operator.cpp | Implemented new thread-safe methods with proper locking |
| be/src/pipeline/exec/multi_cast_data_stream_source.cpp | Updated to use new API signature with appropriate comment about single-threaded execution |
Comments suppressed due to low confidence (1)
be/src/vec/exec/scan/scanner.cpp:210
- The critical assignment
_applied_rf_num = arrived_rf_num;is missing after cloning the conjuncts. This was present in the original code but was accidentally removed during refactoring. Without this update, the scanner will keep trying to append late-arrival runtime filters on every call even though they have already been applied, causing unnecessary work and incorrect behavior. The_applied_rf_numfield needs to be updated toarrived_rf_numafter successfully cloning the conjuncts to track which runtime filters have been applied.
Status Scanner::close(RuntimeState* state) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
#59906
#59786