Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25259][SQL] left/right join support push down during-join predicates #22250

Closed
wants to merge 1 commit into from
Closed

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Aug 28, 2018

What changes were proposed in this pull request?

Prepare data:

create temporary view EMPLOYEE as select * from values
  ("000010", "HAAS", "A00"),
  ("000010", "THOMPSON", "B01"),
  ("000030", "KWAN", "C01"),
  ("000110", "LUCCHESSI", "A00"),
  ("000120", "O'CONNELL", "A))"),
  ("000130", "QUINTANA", "C01")
  as EMPLOYEE(EMPNO, LASTNAME, WORKDEPT);

create temporary view DEPARTMENT as select * from values
  ("A00", "SPIFFY COMPUTER SERVICE DIV.", "000010"),
  ("B01", "PLANNING", "000020"),
  ("C01", "INFORMATION CENTER", "000030"),
  ("D01", "DEVELOPMENT CENTER", null)
  as EMPLOYEE(DEPTNO, DEPTNAME, MGRNO);

create temporary view PROJECT as select * from values
  ("AD3100", "ADMIN SERVICES", "D01"),
  ("IF1000", "QUERY SERVICES", "C01"),
  ("IF2000", "USER EDUCATION", "E01"),
  ("MA2100", "WELD LINE AUDOMATION", "D01"),
  ("PL2100", "WELD LINE PLANNING", "01")
  as EMPLOYEE(PROJNO, PROJNAME, DEPTNO);

For the below SQL, we can push DEPTNO='E01' to right side to reduce data reading:

SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
FROM PROJECT P LEFT OUTER JOIN DEPARTMENT D
ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';

Optimized SQL is equivalent to:

SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
FROM PROJECT P LEFT OUTER JOIN (SELECT * FROM DEPARTMENT WHERE DEPTNO='E01') D
ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';

This pr enhancement PushPredicateThroughJoin to support this feature.

How was this patch tested?

unit tests

@SparkQA
Copy link

SparkQA commented Aug 28, 2018

Test build #95328 has finished for PR 22250 at commit f9b32d5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Aug 28, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 28, 2018

Test build #95335 has finished for PR 22250 at commit f9b32d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Aug 29, 2018

cc @cloud-fan

val (leftJoinConditions, rightJoinConditions, commonJoinCondition) =
split(joinCondition.map(splitConjunctivePredicates).getOrElse(Nil), left, right)
val condition = joinCondition.map(splitConjunctivePredicates).getOrElse(Nil)
val additionalCondition = inferAdditionalConstraints(condition.toSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we can only do this in InferFiltersFromConstraints

@wangyum wangyum closed this Aug 29, 2018
@wangyum
Copy link
Member Author

wangyum commented Aug 29, 2018

Fixed by SPARK-21479.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants