Skip to content

Commit 488dcb2

Browse files
craig[bot]yuzefovich
andcommitted
Merge #54808
54808: sql: introduce RIGHT SEMI and RIGHT ANTI joins r=yuzefovich a=yuzefovich **sql: introduce RIGHT SEMI and RIGHT ANTI joins** This commit introduces right semi and right anti joins which will give an ability to the optimizer to control on which relation we're building the hash table (in case of the hash join). This will allow us to bake in the assumption that we're building the hash table on the right side into the row-execution (we already have this assumption in the vectorized hash join). Additionally, this commit adds several assertions to different types of joiners about which join types they support as well as fixes the `joinerBase` code to handle the join types that emit only columns from one side (previously, it just happened to work because the left side columns were always in the output). Release note: None **rowexec: move hash joiner unit tests** This commit renames "joinerTestCase" to "hashJoinerTestCase" since those tests are only used by the hash joiner and moves the corresponding code to hashjoiner_test.go. I believe that originally the intention was to generalize them to use in several joiners types, but that intention was never implemented, and since we're moving towards deprecating these processors, I don't think it ever will. Release note: None **rowexec: support right semi and right anti hash joins** This commit adds support for right semi and right anti hash joins which turned out to be pretty easy to do. This paves the way for refactoring the hash joiner code to hard-code that we're always storing the right side. Addresses: #54707. Release note: None **rowexec: clean up the hash joiner** This commit cleans up the hash joiner by baking in the assumption that we're always building the hash table on the right side. The assumption allows us to remove the buffering logic (which attempted to find a shorter relation, until a certain threshold), and we will now rely on the optimizer to request an appropriate join type with the right relation having smaller cardinality. Release note: None **colexec: support right semi and right anti in merge joiner** This commit adds the support for right semi and right anti merge joins. This turned out to be relatively easy when following the examples of left semi and left anti (with the new right cases being simpler because of not having to worry about set-operation joins which are intertwined with left semi and left anti joins). The testing is done by "mirroring" existing test cases for left semi and left anti joins. Addresses: #54707. Release note: None **colexec: support right semi and right anti in the hash joiner** The support of right semi and right anti in the hash joiner required extending the idea of tracking whether build rows matched which was already being done in case of full/right outer joins. Notable difference is that right semi/anti joins don't emit any output during the probing phase, so the "collection" simply tracks the matched rows whereas the output is fully emitted in `hjEmittingRight` phase of the algorithm. Note that because the external hash joiner falls back to the disk-backed merge joiner in some cases it is important to point out that the vectorized merge joiner already supports all join types. Addresses: #54707. Release note: None **rowexec: support right semi and right anti in the merge joiner** The support of right semi and right anti merge joiner is added via tracking which rows have been matched on the right side. In case of right semi we remember that the right row is matched, emit it, and then make sure to ignore it (to not emit for the second time). In case of right anti, all unmatched right rows will be emitted (similar to right/full outer joins). This support was mostly added in order to test the correctness of the vectorized merge joiner (I don't expect the optimizer to ever plan right semi/anti merge joins). Addresses: #54707. Release note: None **opt: plan right semi and right anti hash joins** This commit utilizes the recently added right semi and right anti joins in the optimizer. Internally, those join types are represented as the same opt operator with the only changes being switching two input relations if the left has smaller cardinality when execbuilding the hash join (with the same change to the costing). Fixes: #54707. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
2 parents 64c7dee + 6037326 commit 488dcb2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+77333
-5144
lines changed

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -846,7 +846,9 @@ EXECGEN_TARGETS = \
846846
pkg/sql/colexec/mergejoiner_leftanti.eg.go \
847847
pkg/sql/colexec/mergejoiner_leftouter.eg.go \
848848
pkg/sql/colexec/mergejoiner_leftsemi.eg.go \
849+
pkg/sql/colexec/mergejoiner_rightanti.eg.go \
849850
pkg/sql/colexec/mergejoiner_rightouter.eg.go \
851+
pkg/sql/colexec/mergejoiner_rightsemi.eg.go \
850852
pkg/sql/colexec/ordered_synchronizer.eg.go \
851853
pkg/sql/colexec/proj_const_left_ops.eg.go \
852854
pkg/sql/colexec/proj_const_right_ops.eg.go \

pkg/sql/apply_join.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,8 @@ func newApplyJoinNode(
8686
return nil, errors.AssertionFailedf("unsupported right outer apply join: %d", log.Safe(joinType))
8787
case descpb.ExceptAllJoin, descpb.IntersectAllJoin:
8888
return nil, errors.AssertionFailedf("unsupported apply set op: %d", log.Safe(joinType))
89+
case descpb.RightSemiJoin, descpb.RightAntiJoin:
90+
return nil, errors.AssertionFailedf("unsupported right semi/anti apply join: %d", log.Safe(joinType))
8991
}
9092

9193
return &applyJoinNode{

pkg/sql/catalog/descpb/join_type.go

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ const (
2626
LeftAntiJoin = JoinType_LEFT_ANTI
2727
IntersectAllJoin = JoinType_INTERSECT_ALL
2828
ExceptAllJoin = JoinType_EXCEPT_ALL
29+
RightSemiJoin = JoinType_RIGHT_SEMI
30+
RightAntiJoin = JoinType_RIGHT_ANTI
2931
)
3032

3133
// JoinTypeFromAstString takes a join string as found in a SQL
@@ -54,6 +56,17 @@ func (j JoinType) IsSetOpJoin() bool {
5456
return j == IntersectAllJoin || j == ExceptAllJoin
5557
}
5658

59+
// ShouldIncludeLeftColsInOutput returns true if this join should include
60+
// the columns from the left side into the output.
61+
func (j JoinType) ShouldIncludeLeftColsInOutput() bool {
62+
switch j {
63+
case RightSemiJoin, RightAntiJoin:
64+
return false
65+
default:
66+
return true
67+
}
68+
}
69+
5770
// ShouldIncludeRightColsInOutput returns true if this join should include
5871
// the columns from the right side into the output.
5972
func (j JoinType) ShouldIncludeRightColsInOutput() bool {
@@ -64,3 +77,15 @@ func (j JoinType) ShouldIncludeRightColsInOutput() bool {
6477
return true
6578
}
6679
}
80+
81+
// IsEmptyOutputWhenRightIsEmpty returns whether this join type will always
82+
// produce an empty output when the right relation is empty.
83+
func (j JoinType) IsEmptyOutputWhenRightIsEmpty() bool {
84+
switch j {
85+
case InnerJoin, RightOuterJoin, LeftSemiJoin,
86+
RightSemiJoin, IntersectAllJoin, RightAntiJoin:
87+
return true
88+
default:
89+
return false
90+
}
91+
}

pkg/sql/catalog/descpb/join_type.pb.go

Lines changed: 31 additions & 17 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/sql/catalog/descpb/join_type.proto

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ enum JoinType {
2424
// one row from the right side (as per equality columns and ON condition).
2525
LEFT_SEMI = 4;
2626

27-
// A left anti join is an "inverted" semi join: it returns the rows from the
28-
// left side that don't match any columns on the right side (as per equality
27+
// A left anti join is an "inverted" left semi join: it returns the rows from
28+
// the left side that don't match any rows on the right side (as per equality
2929
// columns and ON condition).
3030
LEFT_ANTI = 5;
3131

@@ -75,4 +75,15 @@ enum JoinType {
7575
// In practice, there is a one-to-one mapping between the left and right
7676
// columns (they are all equality columns).
7777
EXCEPT_ALL = 7;
78+
79+
// A right semi join returns the rows from the right side that match at least
80+
// one row from the left side (as per equality columns and ON condition). It
81+
// is a commuted version of the left semi join.
82+
RIGHT_SEMI = 8;
83+
84+
// A right anti join is an "inverted" right semi join: it returns the rows
85+
// from the right side that don't match any rows on the left side (as per
86+
// equality columns and ON condition). It is a commuted version of the left
87+
// anti join.
88+
RIGHT_ANTI = 9;
7889
}

pkg/sql/colexec/BUILD.bazel

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,9 @@ targets = [
225225
"mergejoiner_leftanti.eg.go",
226226
"mergejoiner_leftouter.eg.go",
227227
"mergejoiner_leftsemi.eg.go",
228+
"mergejoiner_rightanti.eg.go",
228229
"mergejoiner_rightouter.eg.go",
230+
"mergejoiner_rightsemi.eg.go",
229231
"ordered_synchronizer.eg.go",
230232
"proj_const_left_ops.eg.go",
231233
"proj_const_right_ops.eg.go",

pkg/sql/colexec/colbuilder/execplan.go

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -227,8 +227,7 @@ func supportedNatively(spec *execinfrapb.ProcessorSpec) error {
227227
return nil
228228

229229
case spec.Core.MergeJoiner != nil:
230-
if !spec.Core.MergeJoiner.OnExpr.Empty() &&
231-
spec.Core.MergeJoiner.Type != descpb.InnerJoin {
230+
if !spec.Core.MergeJoiner.OnExpr.Empty() && spec.Core.MergeJoiner.Type != descpb.InnerJoin {
232231
return errors.Errorf("can't plan non-inner merge join with ON expressions")
233232
}
234233
return nil
@@ -923,12 +922,12 @@ func NewColOperator(
923922
args.TestingKnobs.SpillingCallbackFn,
924923
)
925924
}
926-
result.ColumnTypes = make([]*types.T, len(leftTypes)+len(rightTypes))
927-
copy(result.ColumnTypes, leftTypes)
928-
if !core.HashJoiner.Type.ShouldIncludeRightColsInOutput() {
929-
result.ColumnTypes = result.ColumnTypes[:len(leftTypes):len(leftTypes)]
930-
} else {
931-
copy(result.ColumnTypes[len(leftTypes):], rightTypes)
925+
result.ColumnTypes = make([]*types.T, 0, len(leftTypes)+len(rightTypes))
926+
if core.HashJoiner.Type.ShouldIncludeLeftColsInOutput() {
927+
result.ColumnTypes = append(result.ColumnTypes, leftTypes...)
928+
}
929+
if core.HashJoiner.Type.ShouldIncludeRightColsInOutput() {
930+
result.ColumnTypes = append(result.ColumnTypes, rightTypes...)
932931
}
933932

934933
if !core.HashJoiner.OnExpr.Empty() && core.HashJoiner.Type == descpb.InnerJoin {
@@ -984,12 +983,12 @@ func NewColOperator(
984983

985984
result.Op = mj
986985
result.ToClose = append(result.ToClose, mj.(colexecbase.Closer))
987-
result.ColumnTypes = make([]*types.T, len(leftTypes)+len(rightTypes))
988-
copy(result.ColumnTypes, leftTypes)
989-
if !core.MergeJoiner.Type.ShouldIncludeRightColsInOutput() {
990-
result.ColumnTypes = result.ColumnTypes[:len(leftTypes):len(leftTypes)]
991-
} else {
992-
copy(result.ColumnTypes[len(leftTypes):], rightTypes)
986+
result.ColumnTypes = make([]*types.T, 0, len(leftTypes)+len(rightTypes))
987+
if core.MergeJoiner.Type.ShouldIncludeLeftColsInOutput() {
988+
result.ColumnTypes = append(result.ColumnTypes, leftTypes...)
989+
}
990+
if core.MergeJoiner.Type.ShouldIncludeRightColsInOutput() {
991+
result.ColumnTypes = append(result.ColumnTypes, rightTypes...)
993992
}
994993

995994
if onExpr != nil {

pkg/sql/colexec/execgen/cmd/execgen/mergejoiner_gen.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,9 @@ type joinTypeInfo struct {
3434
IsLeftOuter bool
3535
IsRightOuter bool
3636
IsLeftSemi bool
37+
IsRightSemi bool
3738
IsLeftAnti bool
39+
IsRightAnti bool
3840
IsSetOp bool
3941

4042
String string
@@ -173,10 +175,18 @@ func init() {
173175
IsLeftSemi: true,
174176
String: "LeftSemi",
175177
},
178+
{
179+
IsRightSemi: true,
180+
String: "RightSemi",
181+
},
176182
{
177183
IsLeftAnti: true,
178184
String: "LeftAnti",
179185
},
186+
{
187+
IsRightAnti: true,
188+
String: "RightAnti",
189+
},
180190
{
181191
IsLeftSemi: true,
182192
IsSetOp: true,

pkg/sql/colexec/external_hash_joiner_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ func TestExternalHashJoiner(t *testing.T) {
6161
// which the joiner spills to disk.
6262
for _, spillForced := range []bool{false, true} {
6363
flowCtx.Cfg.TestingKnobs.ForceDiskSpill = spillForced
64-
for _, tcs := range [][]*joinTestCase{hjTestCases, mjTestCases} {
64+
for _, tcs := range [][]*joinTestCase{getHJTestCases(), getMJTestCases()} {
6565
for _, tc := range tcs {
6666
delegateFDAcquisitions := rng.Float64() < 0.5
6767
log.Infof(ctx, "spillForced=%t/%s/delegateFDAcquisitions=%t", spillForced, tc.description, delegateFDAcquisitions)

pkg/sql/colexec/hashjoiner.eg.go

Lines changed: 33 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)