New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: loose index scan #24584
Comments
For future readers, a loose index scan skips over an index using multiple seeks. For example, take the following schema and table:
That |
To see benefit, the "loose index scan" would need to be implemented down at the level of the KV |
I don't agree with @petermattis that this would need to be implemented at the kv layer to see benefit. Imagine a table that has 3 columns, In this case, today, a query like
would need to scan 20,000,000 rows, which works out to be a lot of roundtrips to the kv layer. If we implemented a loose index scan without kv support, we'd need only 5 roundtrips:
Assuming the table has
|
@jordanlewis The approach you describe here is interesting. With per-column histograms, or perhaps even just distinct counts, I think the optimizer could determine when this type of index scan is beneficial. Pushing support into the KV layer would still be desirable at some point, as I think this type of scan could be used in more situations (i.e. when the win isn't as clear cut as your scenario). |
Linking to #38082 |
Oracle calls this "index skip scans" https://gerardnico.com/db/oracle/index_scans#index_skip_scans |
Requested in cockroachdb#24584. Release note (sql change): Added support for a table reader that performs a loose index scan over the underlying table. The index scan table reader uses information about the index being scanned to skip unnecessary rows while scanning the table allowing for some optimizations to be used for some types of queries.
Requested in cockroachdb#24584. Release note (sql change): Added support for a table reader that performs a loose index scan over the underlying table. The index scan table reader uses information about the index being scanned to skip unnecessary rows while scanning the table allowing for some optimizations to be used for some types of queries.
Requested in cockroachdb#24584. Release note (sql change): Added support for a table reader that performs a loose index scan over the underlying table. The index scan table reader uses information about the index being scanned to skip unnecessary rows while scanning the table allowing for some optimizations to be used for some types of queries.
implemented in #38216 |
This processor is not used. It was implemented for cockroachdb#24584, but the PR to start planning it stalled since Ridwan left, a year ago (cockroachdb#39668). This guy is marginally in my way, and, worse, it's dead code. Unless someone promises to finish that PR imminently :). Release note: None
This appears to have been closed prematurely. Fixing it needs #39668, which is stalled at the moment. |
51178: sql/rowexec: delete indexSkipTableReader r=andreimatei a=andreimatei This processor is not used. It was implemented for #24584, but the PR to start planning it stalled since Ridwan left, a year ago (#39668). This guy is marginally in my way, and, worse, it's dead code. Unless someone promises to finish that PR imminently :). Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
A loose index scan can also be beneficial in a simpler case than the one described by @jordanlewis above—querying for distinct values.
If there are, for example, 1000 unique values of Example: Assume
Postgres doesn't have loose index scans, but you can mimic a loose index scan using a recursive CTE: https://malisper.me/the-missing-postgres-scan-the-loose-index-scan |
Here's another type of query that would benefit from loose index scans. Consider the table and query:
Currently, we would scan all KVs in |
Mysql supports loose index scans and in postgres they can be simulated by using a recursive CTE.
See: https://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html
One usecase example:
Index over (a,timestamp DESC,c)
a and timestamp are not unique but c is.
Normally a full-tablescan is used but a loose index scan would be benefitable
Importance: Nice-To-Have performance enhancement
Jira issue: CRDB-5753
The text was updated successfully, but these errors were encountered: