Skip to content

临时方案 Lazy mat pullup v1#63695

Closed
englefly wants to merge 1 commit into
apache:masterfrom
englefly:lazy-mat-pullup-v1
Closed

临时方案 Lazy mat pullup v1#63695
englefly wants to merge 1 commit into
apache:masterfrom
englefly:lazy-mat-pullup-v1

Conversation

@englefly
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

Issue Number: None

Related PR: None

Problem Summary: TopN(Project(scan)) can compute output-only expressions before TopN even when those aliases are not used by ordering or filtering. This makes nested expression inputs such as struct_col.city operative before TopN and prevents lazy materialization. Generalize the Project-under-TopN pull-up rule for safe non-aggregate cases and make OperativeColumnDerive propagate Project alias inputs only when the alias is already operative, so output-only aliases can be computed after TopN.

Improve TopN plans by delaying output-only Project expressions until after TopN, enabling lazy materialization for their input columns.

- Test: Regression test / Build
    - ./build.sh --fe
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s topn_project_pullup_column_pruning -forceGenOut
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s topn_project_pullup_column_pruning
- Behavior changed: Yes (optimizer can place output-only Project expressions above TopN)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

[improvement](fe) Split nested project under topn

Issue Number: None

Related PR: None

Problem Summary: Pulling a whole expression such as substring(struct_element(struct_col, 'city'), 1) above TopN prevents the scan-side project from exposing struct_element to NestedColumnPruning. Split PreferPushDownProject subexpressions into a lower Project so struct_element remains below TopN and can generate struct_col.city access paths, while the upper Project keeps the remaining substring work after TopN. Root PreferPushDownProject aliases are kept below TopN to avoid repeatedly pulling the generated lower Project back above TopN.

Improve TopN project pull-up plans so nested struct_element subexpressions can still participate in nested column pruning.

- Test: Regression test / Checkstyle / Fast compile
    - tools/fast-compile-fe.sh
    - cd fe && mvn checkstyle:check -pl fe-core -q
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s topn_project_pullup_column_pruning -forceGenOut
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s topn_project_pullup_column_pruning
- Behavior changed: Yes (TopN project pull-up can split nested subexpressions to preserve nested column pruning)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@englefly englefly force-pushed the lazy-mat-pullup-v1 branch from 3f6d40f to 3ae1b31 Compare May 26, 2026 14:49
@englefly englefly closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants