Skip to content

[Enhancement] Restore missing optimizations in DP statistics estimation#67852

Merged
kangkaisen merged 1 commit intoStarRocks:mainfrom
stephen-shelby:add_optimize
Jan 14, 2026
Merged

[Enhancement] Restore missing optimizations in DP statistics estimation#67852
kangkaisen merged 1 commit intoStarRocks:mainfrom
stephen-shelby:add_optimize

Conversation

@stephen-shelby
Copy link
Contributor

@stephen-shelby stephen-shelby commented Jan 13, 2026

Why I'm doing:

Recent changes on main unintentionally dropped some performance optimizations in the DP statistics estimation hot path. Since DP join reorder calls calculateStatistics() thousands of times, the regression is amplified. This PR restores those missing optimizations to bring performance back while keeping cost/statistics semantics unchanged.

What I'm doing:

  • Restore the guard for PredicateColumnsMgr.recordJoinPredicate(...) under skipPredicateColumnsCollectionScope() to avoid repeated predicate-columns recording during DP stats estimation.
  • Restore the rowcount-only fast path in correlated join estimation by using withOutputRowCount(...) instead of rebuilding Statistics via buildFrom(...).build().
  • Restore the optimized adjustStatisticsByRowCount() fast path:
    • update rowcount via withOutputRowCount first,
    • avoid stream/lambda overhead with loops and early breaks,
    • rebuild column stats only when NDV adjustment is actually needed.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

Note

Restores missing performance optimizations in DP stats estimation hot path without altering cost/statistics semantics.

  • Guard PredicateColumnsMgr.recordJoinPredicate(...) with shouldSkipPredicateColumnsCollection() to avoid repeated recording during DP estimation
  • Use withOutputRowCount(...) in correlated inner-join estimation instead of rebuilding Statistics
  • Optimize StatisticsEstimateUtils.adjustStatisticsByRowCount(...):
    • update rowcount via withOutputRowCount first
    • early-return if table row count is inaccurate or any column stats are unknown
    • loop with early-break to decide NDV adjustment and rebuild only when needed

Written by Cursor Bugbot for commit 2659684. This will update automatically on new commits. Configure here.

@StarRocks-Reviewer
Copy link

@cursor review

…mn collection and rowcount-only rebuilds

Signed-off-by: stephen <stephen5217@163.com>
@stephen-shelby stephen-shelby changed the title [Enhancement] Restore missing optimizations in DP statistics estimation [UT] Restore missing optimizations in DP statistics estimation Jan 13, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!

@stephen-shelby stephen-shelby changed the title [UT] Restore missing optimizations in DP statistics estimation [Enhancement] Restore missing optimizations in DP statistics estimation Jan 13, 2026
@sonarqubecloud
Copy link

@github-actions
Copy link
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[FE Incremental Coverage Report]

pass : 24 / 24 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/sql/optimizer/statistics/StatisticsEstimateUtils.java 18 18 100.00% []
🔵 com/starrocks/sql/optimizer/statistics/StatisticsCalculator.java 6 6 100.00% []

@github-actions
Copy link
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-celerdata
Copy link
Contributor

@cursor review

@kangkaisen kangkaisen merged commit 8a322d3 into StarRocks:main Jan 14, 2026
92 of 97 checks passed
@github-actions
Copy link
Contributor

@Mergifyio backport branch-4.1

@github-actions
Copy link
Contributor

@Mergifyio backport branch-4.0

@mergify
Copy link
Contributor

mergify bot commented Jan 14, 2026

backport branch-4.1

❌ No backport have been created

Details
  • Backport to branch branch-4.1 failed

GitHub error: Branch not found

@mergify
Copy link
Contributor

mergify bot commented Jan 14, 2026

backport branch-4.0

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Jan 14, 2026
wanpengfei-git pushed a commit that referenced this pull request Jan 14, 2026
…on (backport #67852) (#67921)

Co-authored-by: stephen <91597003+stephen-shelby@users.noreply.github.com>
farhad-celo pushed a commit to farhad-celo/starrocks that referenced this pull request Jan 20, 2026
…on (StarRocks#67852)

Signed-off-by: Farhad Shahmohammadi <f.shahmohammadi@celonis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants