Skip to content

[WIP][SPARK-28103][SQL] Fix constraints of a Union Logical Plan#24912

Closed
William1104 wants to merge 1 commit intoapache:masterfrom
William1104:feature/SPARK-28103
Closed

[WIP][SPARK-28103][SQL] Fix constraints of a Union Logical Plan#24912
William1104 wants to merge 1 commit intoapache:masterfrom
William1104:feature/SPARK-28103

Conversation

@William1104
Copy link
Contributor

@William1104 William1104 commented Jun 19, 2019

[ Work in progress]

What changes were proposed in this pull request?

Currently, the constraints of a union table could be turned empty if any subtable is turned into an empty local relation. The side effect is filter cannot be inferred correctly (by InferFiltersFromConstrains).

This PR updates how the constraints of a Union table is calculated. Basically, it skips a child if it is based on an empty local relation.

We may reproduce the issue with the following setup:

  1. Prepare two tables:
spark.sql("CREATE TABLE IF NOT EXISTS table1(id string, val string) USING PARQUET");
spark.sql("CREATE TABLE IF NOT EXISTS table2(id string, val string) USING PARQUET");
  1. Create a union view on table1.
spark.sql("""
     | CREATE VIEW partitioned_table_1 AS
     | SELECT * FROM table1 WHERE id = 'a'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'b'
     | UNION ALL
     | SELECT * FROM table1 WHERE id = 'c'
     | UNION ALL
     | SELECT * FROM table1 WHERE id NOT IN ('a','b','c')
     | """.stripMargin)
  1. View the optimized plan of this SQL. The filter '[t2.id = 'a']' cannot be inferred. We can see that the constraints of the left table are empty.
scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id [t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 'a'").queryExecution.optimizedPlan

res39: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#4)
:- Union
:  :- Filter (isnotnull(id#0) && (id#0 = a))
:  :  +- Relation[id#0,val#1] parquet
:  :- LocalRelation <empty>, [id#0, val#1]
:  :- LocalRelation <empty>, [id#0, val#1]
:  +- Filter ((isnotnull(id#0) && NOT id#0 IN (a,b,c)) && (id#0 = a))
:     +- Relation[id#0,val#1] parquet
+- Filter isnotnull(id#4)
   +- Relation[id#4,val#5] parquet

scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id [t1.id] = t2.id [t2.id] AND t1.id [t1.id] = 'a'").queryExecution.optimizedPlan.children(0).constraints
res40: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set()

After applying the above fix, the constraints of the union table is not empty anymore and the filter should be inferred properly.

scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id = t2.id AND t1.id = 'a' ").queryExecution.optimizedPlan
res9: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Join Inner, (id#0 = id#16)
:- Union
:  :- Filter (isnotnull(id#0) AND (id#0 = a))
:  :  +- Relation[id#0,val#1] parquet
:  :- LocalRelation <empty>, [id#0, val#1]
:  :- LocalRelation <empty>, [id#0, val#1]
:  +- Filter ((isnotnull(id#0) AND NOT id#0 IN (a,b,c)) AND (id#0 = a))
:     +- Relation[id#0,val#1] parquet
+- Filter ((id#16 = a) AND isnotnull(id#16))
   +- Relation[id#16,val#17] parquet

scala> spark.sql("SELECT * FROM partitioned_table_1 t1, table2 t2 WHERE t1.id = t2.id AND t1.id = 'a' ").queryExecution.optimizedPlan.children(0).constraints
res10: org.apache.spark.sql.catalyst.expressions.ExpressionSet = Set(isnotnull(id#0), (id#0 = a))

scala>

How was this patch tested?

  • Existing unit test.
  • Package the system and the run the same SQL via spark-shell to make sure the plan is evaluated in an expected way.
  • (May need to create more test cases to cover this patch)

…tied when a subtable is converted into empty local relation
@William1104 William1104 changed the title [WIP] SPARK-28103 Fix constraints of a Union Logical Plan [WIP][SPARK-28103][SQL] Fix constraints of a Union Logical Plan Jun 19, 2019
@SparkQA
Copy link

SparkQA commented Jun 23, 2019

Test build #4802 has finished for PR 24912 at commit e4f60b8.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@venkata91
Copy link
Contributor

@William1104 Is this still work in progress?

@William1104
Copy link
Contributor Author

@William1104 Is this still work in progress?

Hi @venkata91, I am sorry that I have to put this work on hold at this moment. I hope I will work on it again some time later.

@HyukjinKwon
Copy link
Member

@William1104, please reopen the PR when you start to work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants