New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: allow index joins (with hinting) across two tables #19038
Comments
Two items in your issue here:
I will keep in mind that this improvement has significant impact, this will help us prioritize. |
We need to reprioritize this as it has come up as a customer issue. |
This has come up from users on the forum as well: https://forum.cockroachlabs.com/t/subquery-evaluation-on-simple-table-structure/1275/11. |
See also #21301:
And @RaduBerinde says:
|
Anyone interested in picking this up? |
My hands are full, but I would really like to see this make it into the 2.0 release. |
assigning to paul for now since he's mucking around in joins. If it gets to be too hard we can assign someone else. |
A quick summary of a meeting with @knz and @andreimatei I had yesterday regarding hinting: Differentiation between constraints and hintsHint: Suggestions for the database that it is free to ignore. Need to distinguish whether or not this syntax of forcing a lookup join is a hint or constraint. It looks like a constraint. Long term note: Might be nice to consider having a system where we can have hints that can be ignored and then a flag which can force the hints, turning them into constraints. This would be useful for testing to have an easy way to force a certain execution path. How Join Hinting would look in the syntaxIn the end we decided that something like: SELECT * FROM ABC JOIN@{STRATEGY=LOOKUP} DEFG ON A = D AND B = E; specific indexes can be forced for using the join using the current forced index syntax. For example to specify the join should be on the SELECT * FROM ABC JOIN@{STRATEGY=LOOKUP} DEFG@DEF ON A = D AND B = E; The hint will always follow the Benefits of using this syntax:
|
I believe Postgres doesn't have inline hints/constraints like this. Is there syntax from other systems that we should be adopting? I'm slightly anxious about deciding on this syntax in the next week or so. Perhaps in the short term we should have a session variable like |
The session variable is probably simpler to implement/use too. The question though is whether it would satisfy the use case(s). |
I'm in favor if the It appears as though it would satisfy at least the use case for the TPC-C query, the queries in the forum linked above, and the linked issue. If we are okay with the limitations in the short term, it seems like a lightweight way to introduce the functionality. Also it would give us more time to think through how we may want to introduce inline hinting if at all. |
Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.
Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.
Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.
Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.
Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.
Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.
use
to enable, and
to disable |
Great stuff paul!
…On Thu, Feb 15, 2018 at 7:24 PM Paul Bardea ***@***.***> wrote:
Closed #19038 <#19038> via
#22674 <#22674>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19038 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALOpBMkTgg80nSd_0njFtWkMTBh0ma3nks5tVMrOgaJpZM4PuMwr>
.
|
Consider the following TPC-C query:
This issue is mostly concerned with the JOIN between the two tables. We have some filters which we can push down to their respective tables before the JOIN. Now the
order_line
table happens to be way sparser than thestock
table after all the filters are applied. The ideal way to execute this query is to do a full table scan on theorder_line
table, and then take the resulting tuples and do lookups in thestock
table for those values (our primary index definition in thestock
is such that these lookups would be very efficient). Basically, merge or hash join, you really don't want to do the work associated with a full scan over thestock
table, as the number of items is very high compared to the selectivity of the filters and the join.This is exactly how an index join is executed when we need to look up values in the primary index based on matches in a secondary index, so we already have the algorithmic processors working and extensively tested, we just need to extend it to perform joins across different tables. I understand that recognizing when this would be a superior option requires extensive table statistics, but given that we want to start thinking about hinting explicit query execution strategies, this is something we should support with specialized syntax.
cc @RaduBerinde @jordanlewis @knz.
The text was updated successfully, but these errors were encountered: