-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: xform: avoid locality-optimized scans which must always read remote rows #87866
Conversation
Previously, for tables with old-style partitioning, which don't use the new multiregion abstractions, there were no guardrails in place to prevent 2 cases where locality-optimized scan must always read ranges in remote regions: 1. When a scan with no hard limit has a non-unique index constraint (could return more than one row per matched index key, not including the partitioning key column) 2. When the max cardinality of a constrained scan is less than the hard limit placed on the scan via a LIMIT clause This was inadequate because locality-optimized scan is usually slower than distributed scans when reading from remote regions is required. If we can statically determine reading from remote regions is required, locality-optimized scan should not even be costed and considered by the optimizer. Multiregion tables, such as REGIONAL BY ROW tables, don't encounter this issue because the internal `crdb_region` partitioning column is not part of the UNIQUE constraint in that case, for example: ``` CREATE TABLE regional_by_row_table ( col1 int, col2 bool NOT NULL, UNIQUE INDEX idx(col1) -- crdb_region is implicitly the 1st idx col ) LOCALITY REGIONAL BY ROW; SELECT * FROM regional_by_row_table WHERE col1 = 1; ``` In the above, we could use LOS and split this into a local scan: `SELECT * FROM regional_by_row_table WHERE crdb_region = 'us' AND col1 = 1;` ... and remote scans: ``` SELECT * FROM regional_by_row_table WHERE crdb_region IN ('ca', 'ap') AND col1 = 1; ``` The max cardinality of the local scan is 1, and the max cardinality of the original scan is 1, so we know it's possible to fulfill the request solely with the local scan. To address this, this patch avoids planning locality-optimized scan for the two cases listed at the top of the description. The first case is detected by the local scan of the UNION ALL having a lower max cardinality than the max cardinality including all constraint spans (for example, given a pair of columns (part_col, col1), if col1 is a unique key, then max_cardinality(col1) will equal max_cardinality(part_col, col1). The second case is detected by a direct comparison of the hard limit with the max cardinality of the local scan. Release note (bug fix): This patch fixes a misused query optimization involving tables with one or more PARTITION BY clauses and partition zone constraints which assign region locality to those partitions. In some cases the optimizer picks a `locality-optimized search` query plan which is not truly locality-optimized, and has higher latency than competing query plans which use distributed scan. Locality-optimized search is now avoided in cases which are known not to benefit from this optimization. Release justification: Low risk fix for suboptimal locality-optimized scan
2cb2a7a
to
c9c5956
Compare
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)
Backport 1/1 commits from #87350 on behalf of @msirek.
/cc @cockroachdb/release
Previously, for tables with old-style partitioning, which don't use the
new multiregion abstractions, there were no guardrails in place to
prevent 2 cases where locality-optimized scan must always read ranges in
remote regions:
(could return more than one row per matched index key, not including
the partitioning key column)
limit placed on the scan via a LIMIT clause
This was inadequate because locality-optimized scan is usually slower
than distributed scans when reading from remote regions is required. If
we can statically determine reading from remote regions is required,
locality-optimized scan should not even be costed and considered by the
optimizer. Multiregion tables, such as REGIONAL BY ROW tables, don't
encounter this issue because the internal
crdb_region
partitioningcolumn is not part of the UNIQUE constraint in that case, for example:
In the above, we could use LOS and split this into a local scan:
SELECT * FROM regional_by_row_table WHERE crdb_region = 'us' AND col1 = 1;
... and remote scans:
The max cardinality of the local scan is 1, and the max cardinality of
the original scan is 1, so we know it's possible to fulfill the request
solely with the local scan.
To address this, this patch avoids planning locality-optimized scan for
the two cases listed at the top of the description. The first case is
detected by the local scan of the UNION ALL having a lower max
cardinality than the max cardinality including all constraint spans
(for example, given a pair of columns (part_col, col1), if col1 is a
unique key, then max_cardinality(col1) will equal
max_cardinality(part_col, col1). The second case is detected by a
direct comparison of the hard limit with the max cardinality of the
local scan.
Release justification: Low risk fix for suboptimal locality-optimized scan
Release note (bug fix): Because of a misused query optimization involving tables with one or more PARTITION BY clauses and partition zone constraints which assign region locality to those partitions, in some cases the optimizer would pick a
locality-optimized search
query plan which is not truly locality-optimized and has higher latency than competing query plans which use distributed scan. Locality-optimized search is now avoided in cases which are known not to benefit from this optimization.