Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recost LookupJoins to remove special casing for indexes where every column is used in a filter. #1889

Merged
merged 5 commits into from Jul 25, 2023

Conversation

nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Jul 21, 2023

The only joins that are affected by this recosting are joins that use an index where every column in the index has a key expression, but we can't prove that lookups are constant.

We currently special case this, giving these a slightly elevated priority over joins that don't use every column in the index (and we can't prove that lookups are constant.)

So for example a lookup with one key expression on an index with one column ends up being preferred over say, a lookup with three key expressions on an index with four columns, even though it looks like in practice the latter is a more efficient join. Removing this special casing resulted in improved plans in specific client queries.

Most of the changed plans in this PR are workarounds for #1893, #1894, and #1895. A few are actually better plans than we were previously generating.

…nique. Rather than changes those tests, I made the index unique.
…al due to a bug, link to the bug.

(Linking to the bug is not possible in `query_plans.go` because the tests are generated.)
@nicktobey nicktobey marked this pull request as ready for review July 25, 2023 00:29
@@ -326,7 +326,7 @@ where
" │ │ └─ org1 (longtext)\n" +
" │ └─ TableAlias(dimension)\n" +
" │ └─ IndexedTableAccess(asset)\n" +
" │ ├─ index: [asset.orgId,asset.assetId]\n" +
" │ ├─ index: [asset.orgId,asset.name,asset.val]\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be picking (orgId,name,assetId) here, bit concerned that I missed us changing it to (orgId,assetID)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's happening because of the low cardinality of the joined tables.

Basically, when costing non-unique lookups, we assume that each key expression cuts the table in half. If the table is small enough, and if the index has enough key expressions, this can result in a cost lower than the cost of a unique lookup.

As the size of the input tables increases, this effect disappears. I added just four extra rows to the test table and it now chooses the correct index.

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one I am concerned about is the customer query where we should be picking the unique index, not competing between non-unique.

At extremely low cardinalities, we may pick a non-unique index with more columns than a unique index with fewer columns. This behavior disappears at higher cardinalities, which is where we care more about the performance.
@nicktobey nicktobey merged commit d536668 into main Jul 25, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants