Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: improve row count estimation of IndexJoin's inner scan #12085

Merged
merged 2 commits into from Sep 11, 2019

Conversation

eurekaka
Copy link
Contributor

@eurekaka eurekaka commented Sep 9, 2019

What problem does this PR solve?

Currently, the row count estimation of IndexJoin's inner child is not that accurate because:

  • we are assuming each outer row would find matches in inner child, in fact this may not hold normally;
  • when estimating row count of inner child, we are using Count / NDV, if the index used is composite index, and the join key only covers the prefix of the index, the row count would be smaller than the real count

What is changed and how it works?

Initially, I considered comparing histograms of both child plans to compute an overlapping ratio and using it for row count estimation, but found that it was trivial to handle the mismatch between the data types, and it was hard to compute overlap for composite index cases.

Alternatively, I choose to reuse the estimated row count of join result after evaluating join equal conditions, because leftCnt * rightCnt / max(leftNDV, rightNDV) has already taken the overlap into account more or less, and this approach can give more consistent row count estimations for join operator and its children.

Check List

Tests

  • Unit test

Code changes

N/A

Side effects

  • Possible performance regression

Related changes

N/A

Release note

  • Write release note for bug-fix or new feature.

@codecov
Copy link

codecov bot commented Sep 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@440bb74). Click here to learn what that means.
The diff coverage is 100%.

@@             Coverage Diff             @@
##             master     #12085   +/-   ##
===========================================
  Coverage          ?   81.3048%           
===========================================
  Files             ?        452           
  Lines             ?      96886           
  Branches          ?          0           
===========================================
  Hits              ?      78773           
  Misses            ?      12460           
  Partials          ?       5653

@eurekaka
Copy link
Contributor Author

eurekaka commented Sep 9, 2019

/run-all-tests

@AilinKid
Copy link
Contributor

AilinKid commented Sep 9, 2019

/run-all-tests

@zyxbest
Copy link
Contributor

zyxbest commented Sep 9, 2019

/run-unit-test

Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@eurekaka
Copy link
Contributor Author

/bench

@sre-bot
Copy link
Contributor

sre-bot commented Sep 10, 2019

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: 5c18c5df97d935398ea1b44098a0db3171999466
+++ tidb: 7278ab07104a307f04e72e21d02f6528f8591013
tikv: ff82aa9eba331585aec1c6cdf9e1584512bccb34
pd: ce060a9aeb66d6bbb39159243b879740dffae041
================================================================================
test-1: < oltp_insert >
    * QPS : 21141.44 ± 1.5888% (std=240.80) delta: -0.08%
    * AvgMs : 12.10 ± 1.6193% (std=0.14) delta: 0.08%
    * PercentileMs99 : 42.92 ± 1.0903% (std=0.38) delta: 1.46%
            
test-2: < oltp_update_non_index >
    * QPS : 29436.80 ± 0.2322% (std=47.10) delta: -0.10%
    * AvgMs : 8.69 ± 0.2071% (std=0.01) delta: 0.09%
    * PercentileMs99 : 30.59 ± 1.0788% (std=0.27) delta: 1.09%
            
test-3: < oltp_read_write >
    * QPS : 37051.41 ± 0.3770% (std=80.95) delta: 0.32%
    * AvgMs : 138.73 ± 0.3792% (std=0.31) delta: -0.32%
    * PercentileMs99 : 257.95 ± 0.0000% (std=0.00) delta: 0.00%
            
test-4: < oltp_point_select >
    * QPS : 74734.42 ± 3.2705% (std=1572.71) delta: -0.30%
    * AvgMs : 3.43 ± 3.0940% (std=0.07) delta: 0.40%
    * PercentileMs99 : 7.43 ± 0.0000% (std=0.00) delta: 0.00%
            
test-5: < oltp_update_index >
    * QPS : 16880.32 ± 0.5051% (std=62.94) delta: 0.29%
    * AvgMs : 15.16 ± 0.5012% (std=0.06) delta: -0.29%
    * PercentileMs99 : 48.34 ± 0.0000% (std=0.00) delta: 0.00%
            

https://perf.pingcap.com

@eurekaka eurekaka added status/LGT1 Indicates that a PR has LGTM 1. and removed status/WIP labels Sep 10, 2019
Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Please resolve the conflicts.

@alivxxx alivxxx added status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 11, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 11, 2019

Your auto merge job has been accepted, waiting for 12009

@sre-bot
Copy link
Contributor

sre-bot commented Sep 11, 2019

/run-all-tests

@sre-bot sre-bot merged commit f2adf1d into pingcap:master Sep 11, 2019
@eurekaka eurekaka deleted the inlj_inner_scan branch October 8, 2019 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants