opt: incorrect index chosen when using LIKE #31929

RaduBerinde · 2018-10-26T16:55:05Z

This case was encountered by a customer.

exec-ddl
CREATE TABLE abcde (
  a INT PRIMARY KEY,
  b INT,
  c STRING,
  d INT,
  e INT,
  INDEX bad(b, d),
  INDEX good(b, c, d)
)
----
TABLE abcde
 ├── a int not null
 ├── b int
 ├── c string
 ├── d int
 ├── e int
 ├── INDEX primary
 │    └── a int not null
 ├── INDEX bad
 │    ├── b int
 │    ├── d int
 │    └── a int not null
 └── INDEX good
      ├── b int
      ├── c string
      ├── d int
      └── a int not null

opt
SELECT * FROM abcde WHERE b = 1 AND c LIKE '+1-1000%'
----
select
 ├── columns: a:1(int!null) b:2(int!null) c:3(string) d:4(int) e:5(int)
 ├── key: (1)
 ├── fd: ()-->(2), (1)-->(3-5)
 ├── index-join abcde
 │    ├── columns: a:1(int!null) b:2(int) c:3(string) d:4(int) e:5(int)
 │    ├── key: (1)
 │    ├── fd: ()-->(2), (1)-->(3-5)
 │    └── scan abcde@bad
 │         ├── columns: a:1(int!null) b:2(int!null) d:4(int)
 │         ├── constraint: /2/4/1: [/1 - /1]
 │         ├── key: (1)
 │         └── fd: ()-->(2), (1)-->(4)
 └── filters [type=bool, outer=(3)]
      └── c LIKE '+1-1000%' [type=bool, outer=(3)]

The better plan is to use the other index where we can constrain the second column as well. It seems to be a stat issue, both plans estimate 10 rows for the scan even though one is strictly more constrained than the other:

New expression 1 of 2:
  select
   ├── columns: a:1(int!null) b:2(int!null) c:3(string) d:4(int) e:5(int)
   ├── stats: [rows=3.33333333, distinct(2)=1]
   ├── cost: 51.8
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   ├── index-join abcde
   │    ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int) t.public.abcde.c:3(string) t.public.abcde.d:4(int) t.public.abcde.e:5(int)
   │    ├── stats: [rows=10]
   │    ├── cost: 51.7
   │    ├── key: (1)
   │    ├── fd: ()-->(2), (1)-->(3-5)
   │    └── scan abcde@bad
   │         ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int!null) t.public.abcde.d:4(int)
   │         ├── constraint: /2/4/1: [/1 - /1]
   │         ├── stats: [rows=10, distinct(2)=1]
   │         ├── cost: 10.6
   │         ├── key: (1)
   │         └── fd: ()-->(2), (1)-->(4)
   └── filters [type=bool, outer=(3)]
        └── like [type=bool, outer=(3)]
             ├── variable: t.public.abcde.c [type=string, outer=(3)]
             └── const: '+1-1000%' [type=string]

New expression 2 of 2:
  index-join abcde
   ├── columns: a:1(int!null) b:2(int!null) c:3(string) d:4(int) e:5(int)
   ├── stats: [rows=3.33333333, distinct(2)=1]
   ├── cost: 51.9
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   └── scan abcde@good
        ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int!null) t.public.abcde.c:3(string!null) t.public.abcde.d:4(int)
        ├── constraint: /2/3/4/1: [/1/'+1-1000' - /1/'+1-1001')
        ├── stats: [rows=10, distinct(2)=1]
        ├── cost: 10.8
        ├── key: (1)
        └── fd: ()-->(2), (1)-->(3,4)
----

Note that this works fine on master, but not on release-2.1. We need to figure out which change fixed it, and whether it fixed the root cause or it's just by chance that this particular case is fixed.

CC @rytaft

The text was updated successfully, but these errors were encountered:

RaduBerinde · 2018-10-26T17:02:17Z

I confirmed that it was Bilal's change #30827 that fixed it:

New expression 1 of 2:
  select
   ├── columns: a:1(int!null) b:2(int!null) c:3(string) d:4(int) e:5(int)
   ├── stats: [rows=3.3, distinct(1)=3.3, null(1)=0, distinct(2)=1, null(2)=0]
   ├── cost: 51.282
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   ├── index-join abcde
   │    ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int) t.public.abcde.c:3(string) t.public.abcde.d:4(int) t.public.abcde.e:5(int)
   │    ├── stats: [rows=9.9]
   │    ├── cost: 51.183
   │    ├── key: (1)
   │    ├── fd: ()-->(2), (1)-->(3-5)
   │    └── scan abcde@bad
   │         ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int!null) t.public.abcde.d:4(int)
   │         ├── constraint: /2/4/1: [/1 - /1]
   │         ├── stats: [rows=9.9, distinct(1)=9.9, null(1)=0, distinct(2)=1, null(2)=0]
   │         ├── cost: 10.494
   │         ├── key: (1)
   │         └── fd: ()-->(2), (1)-->(4)
   └── filters
        └── like [type=bool, outer=(3)]
             ├── variable: t.public.abcde.c [type=string]
             └── const: '+1-1000%' [type=string]

New expression 2 of 2:
  index-join abcde
   ├── columns: a:1(int!null) b:2(int!null) c:3(string) d:4(int) e:5(int)
   ├── stats: [rows=3.3, distinct(1)=3.3, null(1)=0, distinct(2)=1, null(2)=0]
   ├── cost: 50.86719
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   └── scan abcde@good
        ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int!null) t.public.abcde.c:3(string!null) t.public.abcde.d:4(int)
        ├── constraint: /2/3/4/1: [/1/'+1-1000' - /1/'+1-1001')
        ├── stats: [rows=9.801, distinct(1)=9.801, null(1)=0, distinct(2)=1, null(2)=0, distinct(3)=9.5617925, null(3)=0]
        ├── cost: 10.58508
        ├── key: (1)
        └── fd: ()-->(2), (1)-->(3,4)

The row counts are still really close so I doubt that this change fixed the root cause - the second index should have significantly fewer rows.

RaduBerinde · 2018-10-26T17:08:08Z

CC @rytaft

RaduBerinde · 2018-10-26T17:19:47Z

Part of the problem here is not generating a (non-index) constraint for LIKE. This makes the initial select have 3.3 estimated rows (because of the 1/3 unknown constraint factor):

Source expression:
  select
   ├── columns: a:1(int!null) b:2(int!null) c:3(string) d:4(int) e:5(int)
   ├── stats: [rows=3.3, distinct(1)=3.3, null(1)=0, distinct(2)=1, null(2)=0]
   ├── cost: 1110
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   ├── scan abcde
   │    ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int) t.public.abcde.c:3(string) t.public.abcde.d:4(int) t.public.abcde.e:5(int)
   │    ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=100, null(2)=10]
   │    ├── cost: 1100
   │    ├── key: (1)
   │    ├── fd: (1)-->(2-5)
   │    └── prune: (1-5)
   └── filters
        ├── eq [type=bool, outer=(2), constraints=(/2: [/1 - /1]; tight), fd=()-->(2)]
        │    ├── variable: t.public.abcde.b [type=int]
        │    └── const: 1 [type=int]
        └── like [type=bool, outer=(3)]
             ├── variable: t.public.abcde.c [type=string]
             └── const: '+1-1000%' [type=string]

The constrained scan then estimates 9.8 rows. I think any index constraint on a column should have at least 1/3 selectivity to make these two things compatible.

RaduBerinde · 2018-10-26T17:24:35Z

Note: adding support for constraints on LIKE should fix this and is good to have. But I think this should have worked without it so there's another underlying problem here that I'd like to get to the bottom of.

rytaft · 2018-10-26T17:54:59Z

Taking a look now - thanks!

RaduBerinde · 2018-10-26T17:56:49Z

Actually, I just tried it and if we have a constraint for the string column, the disparity is even worse - the estimation for the initial select is ~1 row (vs 3.3 with no constraint) and the constrained index still has ~9.8 rows.

Source expression:
  select
   ├── columns: a:1(int!null) b:2(int!null) c:3(string!null) d:4(int) e:5(int)
   ├── stats: [rows=1.089, distinct(1)=1.089, null(1)=0, distinct(2)=1, null(2)=0, distinct(3)=1.089, null(3)=0]
   ├── cost: 1110
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   ├── scan abcde
   │    ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int) t.public.abcde.c:3(string) t.public.abcde.d:4(int) t.public.abcde.e:5(int)
   │    ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=100, null(2)=10, distinct(3)=100, null(3)=10]
   │    ├── cost: 1100
   │    ├── key: (1)
   │    ├── fd: (1)-->(2-5)
   │    └── prune: (1-5)
   └── filters
        ├── eq [type=bool, outer=(2), constraints=(/2: [/1 - /1]; tight), fd=()-->(2)]
        │    ├── variable: t.public.abcde.b [type=int]
        │    └── const: 1 [type=int]
        └── like [type=bool, outer=(3), constraints=(/3: [/'+1-1000' - /'+1-1001'); tight)]
             ├── variable: t.public.abcde.c [type=string]
             └── const: '+1-1000%' [type=string]

New expression 1 of 2:
  select
   ├── columns: a:1(int!null) b:2(int!null) c:3(string!null) d:4(int) e:5(int)
   ├── stats: [rows=1.089, distinct(1)=1.089, null(1)=0, distinct(2)=1, null(2)=0, distinct(3)=1.089, null(3)=0]
   ├── cost: 51.282
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   ├── index-join abcde
   │    ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int) t.public.abcde.c:3(string) t.public.abcde.d:4(int) t.public.abcde.e:5(int)
   │    ├── stats: [rows=9.9]
   │    ├── cost: 51.183
   │    ├── key: (1)
   │    ├── fd: ()-->(2), (1)-->(3-5)
   │    └── scan abcde@bad
   │         ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int!null) t.public.abcde.d:4(int)
   │         ├── constraint: /2/4/1: [/1 - /1]
   │         ├── stats: [rows=9.9, distinct(1)=9.9, null(1)=0, distinct(2)=1, null(2)=0]
   │         ├── cost: 10.494
   │         ├── key: (1)
   │         └── fd: ()-->(2), (1)-->(4)
   └── filters
        └── like [type=bool, outer=(3), constraints=(/3: [/'+1-1000' - /'+1-1001'); tight)]
             ├── variable: t.public.abcde.c [type=string]
             └── const: '+1-1000%' [type=string]

New expression 2 of 2:
  index-join abcde
   ├── columns: a:1(int!null) b:2(int!null) c:3(string!null) d:4(int) e:5(int)
   ├── stats: [rows=1.089, distinct(1)=1.089, null(1)=0, distinct(2)=1, null(2)=0, distinct(3)=1.089, null(3)=0]
   ├── cost: 50.86719
   ├── key: (1)
   ├── fd: ()-->(2), (1)-->(3-5)
   ├── prune: (1,4,5)
   └── scan abcde@good
        ├── columns: t.public.abcde.a:1(int!null) t.public.abcde.b:2(int!null) t.public.abcde.c:3(string!null) t.public.abcde.d:4(int)
        ├── constraint: /2/3/4/1: [/1/'+1-1000' - /1/'+1-1001')
        ├── stats: [rows=9.801, distinct(1)=9.801, null(1)=0, distinct(2)=1, null(2)=0, distinct(3)=9.5617925, null(3)=0]
        ├── cost: 10.58508
        ├── key: (1)
        └── fd: ()-->(2), (1)-->(3,4)

Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes cockroachdb#31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed.

Adding constraint builder support for `LikeOp` and `SimilarToOp`. Also improving the logic to still try to determine not-null constraints if `buildSingleColumnConstraintConst` fails to come up with a constraint. Related to cockroachdb#31929. Release note: None

31835: ui: add node id to debug index page r=couchand a=couchand <img width="960" alt="screen shot 2018-10-24 at 5 25 17 pm" src="https://user-images.githubusercontent.com/793969/47462966-e4592c80-d7b2-11e8-996c-54bb09c13974.png"> ui: extract debug annotation from license type Release note: None ui: add node id to debug index page Fixes: #31540 Release note (admin ui change): Add the current node ID to the debug index page, to help identify the current node when viewing the web UI through a load balancer. 31942: opt: (non-index) constraints for LIKE, SIMILAR TO r=RaduBerinde a=RaduBerinde Adding constraint builder support for `LikeOp` and `SimilarToOp`. Also improving the logic to still try to determine not-null constraints if `buildSingleColumnConstraintConst` fails to come up with a constraint. Related to #31929. Release note: None Co-authored-by: Andrew Couch <andrew@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes cockroachdb#31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed.

31937: opt: fix selectivity estimates for index constraints r=rytaft a=rytaft Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes #31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes cockroachdb#31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed.

Adding constraint builder support for `LikeOp` and `SimilarToOp`. Also improving the logic to still try to determine not-null constraints if `buildSingleColumnConstraintConst` fails to come up with a constraint. Related to cockroachdb#31929. Release note (performance improvement): Queries using LIKE get better execution plans in some cases.

RaduBerinde added this to the 2.1.x milestone Oct 26, 2018

rytaft mentioned this issue Oct 26, 2018

opt: fix selectivity estimates for index constraints #31937

Merged

RaduBerinde mentioned this issue Oct 27, 2018

opt: (non-index) constraints for LIKE, SIMILAR TO #31942

Merged

petermattis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Oct 28, 2018

craig bot closed this as completed in #31937 Oct 30, 2018

rytaft mentioned this issue Oct 30, 2018

release-2.1: opt: fix selectivity estimates for index constraints #32011

Merged

RaduBerinde mentioned this issue Dec 12, 2018

release-2.1: opt: (non-index) constraints for LIKE, SIMILAR TO #33072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: incorrect index chosen when using LIKE #31929

opt: incorrect index chosen when using LIKE #31929

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018 •

edited

Loading

rytaft commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

opt: incorrect index chosen when using LIKE #31929

opt: incorrect index chosen when using LIKE #31929

Comments

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018 • edited Loading

rytaft commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018 •

edited

Loading