[opt](nereids) optimize length(str_col) by only read offset sub column by englefly · Pull Request #62205 · apache/doris

englefly · 2026-04-08T05:23:45Z

What problem does this PR solve?

Optimized the calculation of length(str_col).
Treat the string column as a combination of an offset sub column and a chars sub column.
Prune the string column via NestedColumnPruning so that the BE only needs to read the offset sub column, thereby saving I/O for reading the chars sub column.
Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

Thearas · 2026-04-08T05:23:51Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

englefly · 2026-04-08T05:56:24Z

run buildall

hello-stephen · 2026-04-08T07:12:09Z

FE UT Coverage Report

Increment line coverage 50.25% (102/203) 🎉
Increment coverage report
Complete coverage report

englefly · 2026-04-08T08:10:07Z

run buildall

englefly · 2026-04-08T10:35:03Z

run external

englefly · 2026-04-08T10:38:13Z

run external

github-actions · 2026-04-08T11:47:14Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-04-08T11:47:16Z

PR approved by anyone and no changes requested.

englefly · 2026-04-08T11:48:58Z

/review

github-actions

I found 2 issues that should be addressed before merging.

StringEmptyToLengthRule does not match the analyzed str_col = '' shape used in production, because type coercion wraps the empty-string literal in Cast(...). The new unit test explicitly bypasses coercion to make the rule pass, which means the optimization is not actually exercised on real rewritten expressions.
ExpressionUtils.extractUniformSlot() now infers slot = '' from any length(slot) = 0, but length() also accepts VARBINARY. That leaks a string literal into uniform-slot / constant-propagation state for binary columns and can mis-rewrite downstream expressions with the wrong typed constant.

Critical checkpoint conclusions:

Goal / correctness: The intended optimization is only partially achieved; one main rewrite path does not fire on analyzed expressions, and one new inference path is semantically too broad. Existing tests do not prove end-to-end correctness for the production expression shape.
Change scope / focus: The PR is focused on Nereids expression and nested-column pruning, but it also changes uniform-slot inference, which introduces an unrelated semantic risk.
Concurrency: No new concurrency or locking concerns found in the touched FE code.
Lifecycle / static init: No special lifecycle or static initialization issues found.
Config changes: None.
Compatibility: No FE/BE protocol or storage-format compatibility issue was identified from the touched code paths.
Parallel code paths: The optimization was wired into several access-path collectors, but the main expression rewrite path does not cover the analyzed comparison form after coercion.
Special conditional checks: The new delete guard is understandable, but the Literal-only empty-string check is too narrow for analyzed trees.
Test coverage: Added tests cover handcrafted pre-coercion expressions and explain-based regressions, but they miss the analyzed/coerced expression form and the new length(varbinary_col)=0 uniform-inference case.
Observability: No additional observability appears necessary for this change.
Transaction / persistence: Not applicable.
Data writes / modifications: Not applicable.
FE/BE variable passing: No new transmitted fields beyond access-path contents; no incompatibility confirmed in the touched path.
Performance: The intended pruning should help when it fires, but the missed rewrite path means some targeted queries will get no benefit.
Other issues: None beyond the two findings above.

...c/test/java/org/apache/doris/nereids/rules/expression/rules/StringEmptyToLengthRuleTest.java

fe/fe-core/src/main/java/org/apache/doris/nereids/util/ExpressionUtils.java

englefly · 2026-04-08T14:58:07Z

run buildall

github-actions · 2026-04-09T02:14:10Z

PR approved by at least one committer and no changes requested.

test no mat

dd5017c

englefly force-pushed the len-str-v2 branch from dc4d73b to dd5017c Compare April 8, 2026 05:56

fix

5bd5b01

englefly changed the title ~~Len str v2~~ [opt](nereids) optimize length(str_col) by only read offset sub column Apr 8, 2026

englefly marked this pull request as ready for review April 8, 2026 10:56

starocean999 previously approved these changes Apr 8, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 8, 2026

github-actions bot added the reviewed label Apr 8, 2026

github-actions bot reviewed Apr 8, 2026

View reviewed changes

...c/test/java/org/apache/doris/nereids/rules/expression/rules/StringEmptyToLengthRuleTest.java Show resolved Hide resolved

fe/fe-core/src/main/java/org/apache/doris/nereids/util/ExpressionUtils.java Show resolved Hide resolved

review

493d0bd

englefly dismissed starocean999’s stale review via 493d0bd April 8, 2026 14:57

github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 8, 2026

starocean999 approved these changes Apr 9, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 9, 2026

feiniaofeiafei approved these changes Apr 9, 2026

View reviewed changes

englefly merged commit eb2567d into apache:master Apr 9, 2026
31 of 34 checks passed

Conversation

englefly commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Apr 8, 2026

Uh oh!

englefly commented Apr 8, 2026

Uh oh!

hello-stephen commented Apr 8, 2026

FE UT Coverage Report

Uh oh!

englefly commented Apr 8, 2026

Uh oh!

englefly commented Apr 8, 2026

Uh oh!

englefly commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

englefly commented Apr 8, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

englefly commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

englefly commented Apr 8, 2026 •

edited

Loading