-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
stats: fix histograms for collated strings
Currently, whenever we're encoding the upper bounds of the histogram buckets we always use the datum's key-encoding. However, some datums are composite, and we need to operate on their value encoding. This strategy happens to be ok for floats and decimals because their value part only stores some auxiliary information (like the number of trailing zeroes and the sign of zero), so even if we lose that auxiliary information, when decoding the upper bound we end up with a datum that has the same comparison properties. Unfortunately, this is not the case for collated strings where key-encoding and value-encoding can be vastly different. So the fact that we always use the key-encoding for collated strings makes it so that when we key-decode upper bounds, we end up with garbage datums. As a result, our histograms are meaningless for collated strings. This problem is now fixed by using the value encoding for collated strings. Doing so for other types that might have composite encoding doesn't seem necessary at the moment (for floats and decimals things already work, and we currently don't collect histograms on JSONs or arrays). In order to make this change backwards-compatible we gate it based on the new histogram version - if the histogram has the previous version, then it must have used key-encoding for collated strings, so we will keep on using key-decoding as well. Only once the cluster upgrades to 24.1 version do we start creating the histograms with the new version which will use value-encoding for collated strings. Release note (bug fix): CockroachDB now correctly uses the histograms on columns of COLLATED STRING type. The bug has been present since pre-22.1 release.
- Loading branch information
1 parent
12ce879
commit 4d77dd6
Showing
19 changed files
with
212 additions
and
85 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# LogicTest: local | ||
|
||
# Regression test for incorrectly using the key-encoding for collated strings | ||
# as upper bounds of histogram buckets (#98400). | ||
statement ok | ||
CREATE TABLE t98400 (k INT PRIMARY KEY, s STRING COLLATE en_US_u_ks_level2); | ||
INSERT INTO t98400 SELECT i, 'hello' FROM generate_series(1, 10) g(i); | ||
INSERT INTO t98400 SELECT i, 'world' FROM generate_series(11, 12) g(i); | ||
INSERT INTO t98400 SELECT 13, 'foo'; | ||
|
||
statement ok | ||
ANALYZE t98400; | ||
|
||
# We expect that the filter is estimated to match 10 rows. | ||
query T | ||
EXPLAIN (OPT, VERBOSE) SELECT * FROM t98400 WHERE s = 'hello' COLLATE en_US_u_ks_level2; | ||
---- | ||
select | ||
├── columns: k:1 s:2 | ||
├── stats: [rows=10, distinct(2)=1, null(2)=0] | ||
│ histogram(2)= 0 10 | ||
│ <--- 'hello' COLLATE en_US_u_ks_level2 | ||
├── cost: 43.02 | ||
├── key: (1) | ||
├── fd: ()-->(2) | ||
├── distribution: test | ||
├── prune: (1) | ||
├── scan t98400 | ||
│ ├── columns: k:1 s:2 | ||
│ ├── stats: [rows=13, distinct(1)=13, null(1)=0, distinct(2)=3, null(2)=0] | ||
│ │ histogram(1)= 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 | ||
│ │ <--- 1 --- 2 --- 3 --- 4 --- 5 --- 6 --- 7 --- 8 --- 9 --- 10 --- 11 --- 12 --- 13 | ||
│ │ histogram(2)= 0 1 10 2 | ||
│ │ <--- 'foo' COLLATE en_US_u_ks_level2 ---- 'world' COLLATE en_US_u_ks_level2 | ||
│ ├── cost: 42.86 | ||
│ ├── key: (1) | ||
│ ├── fd: (1)-->(2) | ||
│ ├── distribution: test | ||
│ └── prune: (1,2) | ||
└── filters | ||
└── s:2 = 'hello' COLLATE en_US_u_ks_level2 [outer=(2), constraints=(/2: [/'hello' COLLATE en_US_u_ks_level2 - /'hello' COLLATE en_US_u_ks_level2]; tight), fd=()-->(2)] |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.