Skip to content

[SPARK-28268][SQL] Rewrite non-correlated Semi/Anti join as Filter#25064

Closed
francis0407 wants to merge 1 commit intoapache:masterfrom
francis0407:SPARK-28268
Closed

[SPARK-28268][SQL] Rewrite non-correlated Semi/Anti join as Filter#25064
francis0407 wants to merge 1 commit intoapache:masterfrom
francis0407:SPARK-28268

Conversation

@francis0407
Copy link
Contributor

What changes were proposed in this pull request?

When semi/anti join has a non-correlated join condition, we can convert it to a Filter with a non-correlated Exists subquery. As the Exists subquery is non-correlated, we can use a physical plan for it to avoid join.

Actually, this optimization is mainly for the non-correlated subqueries (Exists/In). We currently rewrite Exists/InSubquery as semi/anti/existential join, whether it is correlated or not. And they are mostly executed using a BroadcastNestedLoopJoin which is really not a good choice.

Here are some examples:
1.

SELECT t1a
FROM    t1  
SEMI JOIN t2
ON t2a > 10 OR t2b = 'a'

=>

SELECT t1a
FROM t1
WHERE EXISTS(SELECT 1 
             FROM t2 
             WHERE t2a > 10 OR t2b = 'a')
SELECT t1a
FROM  t1
ANTI JOIN t2
ON t1b > 10 AND t2b = 'b'

=>

SELECT t1a
FROM t1
WHERE NOT(t1b > 10 
          AND EXISTS(SELECT 1
                     FROM  t2
                     WHERE t2b = 'b'))

This PR adds a new optimize rule : ReplaceLeftSemiAntiJoinWithFilter.
This rule replaces non-correlated LeftSemi/LeftAnti Join with Filter. When the condition of a semi/anti join can be split by And into expressions where each expression only refers to attributes from one side, we can turn it into a Filter with a non-correlated Exists subquery.

Besides, this PR adds a physical plan for non-correlated Exists.

How was this patch tested?

ut

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

@github-actions github-actions bot added the Stale label Dec 27, 2019
@francis0407
Copy link
Contributor Author

Closed due to out of date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants