[refine](function) split monolithic function_string.h into domain-specific files#62262
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
I found 1 issue that should be addressed before merge.
- Goal of the task: splitting
function_string.hinto smaller translation units does reduce include fanout, but the refactor is not fully self-contained yet because one of the new files still relies on an implicit RE2 declaration from the global PCH. - Modification size/focus: the change is otherwise focused on the string-function split and related include cleanup.
- Concurrency: not applicable; I did not find new threading or locking changes in this PR.
- Special lifecycle/static initialization: no new lifecycle or static-init issues identified in the modified paths.
- Configuration: no config changes.
- Compatibility/incompatible changes: no FE/BE protocol or storage-format compatibility issue identified.
- Parallel code paths: registration/alias coverage for the split string functions looks consistent overall; I did not find a missing runtime registration in the reviewed paths.
- Special conditional checks: no new problematic conditional logic identified beyond the issue below.
- Test coverage: there are no new tests, and for a refactor like this the build itself is the key coverage. The missing direct include means the split file is not robust outside the current PCH build setup.
- Observability: not applicable.
- Transaction/persistence: not applicable.
- Data writes/modifications: not applicable.
- FE-BE variable passing: not applicable.
- Performance: the intended compile-time improvement makes sense; I did not identify a runtime-performance regression.
- Other issues: see the inline comment.
Summary opinion: the refactor is directionally good, but the new function_string_misc.cpp translation unit should add its direct RE2 dependency before merging so the split remains self-contained and build-robust.
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| #include <crc32c/crc32c.h> |
There was a problem hiding this comment.
FunctionAutoPartitionName::_auto_partition_type_of_range() still instantiates RE2 later in this file, but this new TU no longer includes re2/re2.h directly. After the split, it only builds because the BE target's global PCH happens to pull RE2 in transitively. That makes the refactor brittle: building this source without the PCH path will fail with an unknown RE2 type. Please add a direct #include <re2/re2.h> here so the new file is self-contained.
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…cific files (#62262) #62262 #56304 `function_string.h` was a 5393-line monolithic header containing ~40 function class implementations with heavy dependencies (`pugixml`, ICU, Boost.Locale, digest libs, etc.). Every translation unit including it pulled in all dependencies, causing slow compilation and unnecessary rebuilds. This PR splits it into 9 domain-specific files. 6 files with no external consumers are converted to standalone `.cpp` with their own registration functions. 3 files with external consumers remain as `.h`. **Converted to .cpp (6 files):** - `function_string_basic.cpp` — Strcmp, Substring, Left, Right, NullOrEmpty - `function_string_mask.cpp` — Mask, MaskPartial - `function_string_search.cpp` — LocatePos, SplitPart, SubstringIndex, SplitByString, CountSubString - `function_string_digest.cpp` — SM3, MD5, SHA1, SHA2 - `function_string_url.cpp` — ExtractURLParameter, ParseUrl, UrlDecode, UrlEncode - `function_string_misc.cpp` — AutoPartitionName, RandomBytes, ConvertTo, IntToChar, NgramSearch, Translate, XPathString, MakeSet, ExportSet, Crc32, UnicodeNormalize **Kept as .h (3 files):** - `function_string_concat.h` — used by `column_string_test.cpp` - `function_string_format.h` — used by `function_money_format_test.cpp` - `function_string_replace.h` — used by `function_reverse.h`, `function_sub_replace_test.cpp` **Other changes:** - `function_string.cpp` — removed original header include, calls 6 sub-registration functions via extern declarations - Removed unnecessary `function_string.h` includes from `partition_transformers.h` and `function_split_by_regexp.cpp` - Updated test files to include specific sub-headers - Fixed missing `block.h` include in `function_array_reverse.h` - Fixed pre-existing missing `vexpr_context.h` include in `viceberg_merge_sink.cpp` The original `function_string.h` is deleted. None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> --------- Co-authored-by: linrrarity <linzhenqi@selectdb.com>
What problem does this PR solve?
function_string.hwas a 5393-line monolithic header containing ~40 function class implementations with heavy dependencies (pugixml, ICU, Boost.Locale, digest libs, etc.). Every translation unit including it pulled in all dependencies, causing slow compilation and unnecessary rebuilds.This PR splits it into 9 domain-specific files. 6 files with no external consumers are converted to standalone
.cppwith their own registration functions. 3 files with external consumers remain as.h.Converted to .cpp (6 files):
function_string_basic.cpp— Strcmp, Substring, Left, Right, NullOrEmptyfunction_string_mask.cpp— Mask, MaskPartialfunction_string_search.cpp— LocatePos, SplitPart, SubstringIndex, SplitByString, CountSubStringfunction_string_digest.cpp— SM3, MD5, SHA1, SHA2function_string_url.cpp— ExtractURLParameter, ParseUrl, UrlDecode, UrlEncodefunction_string_misc.cpp— AutoPartitionName, RandomBytes, ConvertTo, IntToChar, NgramSearch, Translate, XPathString, MakeSet, ExportSet, Crc32, UnicodeNormalizeKept as .h (3 files):
function_string_concat.h— used bycolumn_string_test.cppfunction_string_format.h— used byfunction_money_format_test.cppfunction_string_replace.h— used byfunction_reverse.h,function_sub_replace_test.cppOther changes:
function_string.cpp— removed original header include, calls 6 sub-registration functions via extern declarationsfunction_string.hincludes frompartition_transformers.handfunction_split_by_regexp.cppblock.hinclude infunction_array_reverse.hvexpr_context.hinclude inviceberg_merge_sink.cppThe original
function_string.his deleted.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)