distsql: allow index joins (with hinting) across two tables #19038

rjnn · 2017-10-04T20:35:30Z

Consider the following TPC-C query:

SELECT COUNT(DISTINCT(s_i_id)) FROM order_line JOIN stock ON s_i_id=ol_i_id
WHERE ol_w_id = $1 AND ol_d_id = $2 AND ol_o_id BETWEEN $3 - 20 AND $3 - 1
AND s_w_id = $1 AND s_quantity < $4

This issue is mostly concerned with the JOIN between the two tables. We have some filters which we can push down to their respective tables before the JOIN. Now the order_line table happens to be way sparser than the stock table after all the filters are applied. The ideal way to execute this query is to do a full table scan on the order_line table, and then take the resulting tuples and do lookups in the stock table for those values (our primary index definition in the stock is such that these lookups would be very efficient). Basically, merge or hash join, you really don't want to do the work associated with a full scan over the stock table, as the number of items is very high compared to the selectivity of the filters and the join.

This is exactly how an index join is executed when we need to look up values in the primary index based on matches in a secondary index, so we already have the algorithmic processors working and extensively tested, we just need to extend it to perform joins across different tables. I understand that recognizing when this would be a superior option requires extensive table statistics, but given that we want to start thinking about hinting explicit query execution strategies, this is something we should support with specialized syntax.

cc @RaduBerinde @jordanlewis @knz.

The text was updated successfully, but these errors were encountered:

knz · 2017-10-04T20:41:00Z

Two items in your issue here:

the join algorithm is called a lookup join and Jordan was asking for this last week too. I have put it as an item in docs: meta-RFC on upcoming SQL changes #18977. It depends on a few other things to be done first.
the ability to hint in a query which join algorithm to use is also another item. This one doesn't have prerequisites.

I will keep in mind that this improvement has significant impact, this will help us prioritize.

rjnn · 2018-01-18T21:27:00Z

We need to reprioritize this as it has come up as a customer issue.

nvanbenschoten · 2018-01-19T05:03:59Z

This has come up from users on the forum as well: https://forum.cockroachlabs.com/t/subquery-evaluation-on-simple-table-structure/1275/11.

petermattis · 2018-01-19T13:51:43Z

Differentiation between constraints and hints

Hint: Suggestions for the database that it is free to ignore.
Constraint: Forced "hints". If non-sensical would throw error.

Need to distinguish whether or not this syntax of forcing a lookup join is a hint or constraint. It looks like a constraint.

Long term note: Might be nice to consider having a system where we can have hints that can be ignored and then a flag which can force the hints, turning them into constraints. This would be useful for testing to have an easy way to force a certain execution path.

How Join Hinting would look in the syntax

In the end we decided that something like:

SELECT * FROM ABC JOIN@{STRATEGY=LOOKUP} DEFG ON A = D AND B = E;

specific indexes can be forced for using the join using the current forced index syntax. For example to specify the join should be on the DEF index, you can write as below:

SELECT * FROM ABC JOIN@{STRATEGY=LOOKUP} DEFG@DEF ON A = D AND B = E;

The hint will always follow the JOIN word: e.g.: SELECT * FROM ABC NATURAL JOIN@{STRATEGY=LOOKUP} DEFG;.

Benefits of using this syntax:

Can specify multiple strategies (can be extended for hash/merge joiners in the future) which would be useful for testing.
Plays well with leveraging index constraints already implemented to specify which index to use in the joiner.
Familiar syntax alraedy used for forced indexes.

petermattis · 2018-01-31T18:59:48Z

I believe Postgres doesn't have inline hints/constraints like this. Is there syntax from other systems that we should be adopting? I'm slightly anxious about deciding on this syntax in the next week or so. Perhaps in the short term we should have a session variable like set experimental.force_lookup_join = true. This would clearly be more limited as the hint would apply to every join in a query. Mainly I want to raise an alternative and make sure it was considered.

knz · 2018-01-31T19:05:47Z

The session variable is probably simpler to implement/use too.
(nit: experimental_force_lookup_join as session variables cannot contain periods)

The question though is whether it would satisfy the use case(s).

pbardea · 2018-01-31T20:59:58Z

I'm in favor if the set experimental_force_lookup_join = true; syntax since it's less of a commitment and doesn't change the inline syntax for select queries.

It appears as though it would satisfy at least the use case for the TPC-C query, the queries in the forum linked above, and the linked issue. If we are okay with the limitations in the short term, it seems like a lightweight way to introduce the functionality. Also it would give us more time to think through how we may want to introduce inline hinting if at all.

@1

Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.

@1

Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.

@1

Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.

@1

Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.

@1

Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.

@1

Changes to planner: - Added a CLUSTER SETTING to experimentally enable lookup join in planner. - Add support for planning lookup joins when this flag is enabled. Main parts: - Correctly map the scanNodes to the joinReader. - Appropriately filter right side along with any `onExpr`s, such as @1 > @2 when joining two single column tables. Changes to joinReader: joinReader now supports filtering on the right columns through `onCond`, which is needed for loookupJoins. Additionally, in preparation for it being used as a join it embeds joinerBase. Additionally, now when performing a lookupJoin the spans are batched which should result in decreased network traffic between nodes as all a lookup is done for a batch of rows at a time. Fixes cockroachdb#19038 Release note (performance improvement): Experimentally enable some joins to perform a lookup join and increase join speed for cases where right side of join is much larger than the left.

vivekmenezes · 2018-02-16T13:56:53Z

use

 SET experimental_force_lookup_join = true;

to enable, and

 SET experimental_force_lookup_join = false;

to disable

vivekmenezes · 2018-02-16T13:57:38Z

Great stuff paul!

…

On Thu, Feb 15, 2018 at 7:24 PM Paul Bardea ***@***.***> wrote: Closed #19038 <#19038> via #22674 <#22674>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#19038 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALOpBMkTgg80nSd_0njFtWkMTBh0ma3nks5tVMrOgaJpZM4PuMwr> .

rjnn added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Oct 4, 2017

rjnn added this to To Do in (DEPRECATED) SQL execution via automation Jan 18, 2018

vivekmenezes assigned RaduBerinde Jan 19, 2018

vivekmenezes added this to the 2.0 milestone Jan 19, 2018

vivekmenezes assigned pbardea and unassigned RaduBerinde Jan 19, 2018

pbardea mentioned this issue Jan 30, 2018

distsql: extend joinReader to do lookupJoin #21801

Merged

pbardea mentioned this issue Feb 14, 2018

distsql: support lookup joins experimentally #22674

Merged

pbardea closed this as completed in #22674 Feb 16, 2018

(DEPRECATED) SQL execution automation moved this from Actionable Issues to Done Feb 16, 2018

vivekmenezes added the A-sql-optimizer SQL logical planning and optimizations. label Feb 20, 2018

vivekmenezes mentioned this issue Feb 20, 2018

optimizer: automatically use lookup join when beneficial #22852

Closed

lnhsingh mentioned this issue May 16, 2018

Lookup Joins cockroachdb/docs#3153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distsql: allow index joins (with hinting) across two tables #19038

distsql: allow index joins (with hinting) across two tables #19038

rjnn commented Oct 4, 2017

knz commented Oct 4, 2017 •

edited

rjnn commented Jan 18, 2018

nvanbenschoten commented Jan 19, 2018

petermattis commented Jan 19, 2018

vivekmenezes commented Jan 19, 2018

rjnn commented Jan 19, 2018

vivekmenezes commented Jan 19, 2018

pbardea commented Jan 31, 2018 •

edited

petermattis commented Jan 31, 2018

knz commented Jan 31, 2018

pbardea commented Jan 31, 2018

vivekmenezes commented Feb 16, 2018

vivekmenezes commented Feb 16, 2018 via email

distsql: allow index joins (with hinting) across two tables #19038

distsql: allow index joins (with hinting) across two tables #19038

Comments

rjnn commented Oct 4, 2017

knz commented Oct 4, 2017 • edited

rjnn commented Jan 18, 2018

nvanbenschoten commented Jan 19, 2018

petermattis commented Jan 19, 2018

vivekmenezes commented Jan 19, 2018

rjnn commented Jan 19, 2018

vivekmenezes commented Jan 19, 2018

pbardea commented Jan 31, 2018 • edited

Differentiation between constraints and hints

How Join Hinting would look in the syntax

petermattis commented Jan 31, 2018

knz commented Jan 31, 2018

pbardea commented Jan 31, 2018

vivekmenezes commented Feb 16, 2018

vivekmenezes commented Feb 16, 2018 via email

knz commented Oct 4, 2017 •

edited

pbardea commented Jan 31, 2018 •

edited