[SPARK-18464][SQL][followup] support old table which doesn't store schema in table properties #18907

cloud-fan · 2017-08-10T16:52:32Z

What changes were proposed in this pull request?

This is a follow-up of #15900 , to fix one more bug:
When table schema is empty and need to be inferred at runtime, we should not resolve parent plans before the schema has been inferred, or the parent plans will be resolved against an empty schema and may get wrong result for something like select *

The fix logic is: introduce UnresolvedCatalogRelation as a placeholder. Then we replace it with LogicalRelation or HiveTableRelation during analysis, so that it's guaranteed that we won't resolve parent plans until the schema has been inferred.

How was this patch tested?

regression test

cloud-fan · 2017-08-10T16:53:10Z

cc @gatorsmile @yhuai

gatorsmile · 2017-08-10T17:07:53Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

+  // If table schema is empty, Spark will infer the schema at runtime, so we should mark this
+  // relation as unresolved and wait it to be replaced by relation with actual schema, before
+  // resolving parent plans.
+  override lazy val resolved: Boolean = tableMeta.schema.nonEmpty


We also support a table with zero column. Will this affect such a support?

do we? it definitely will.

but why shall we support 0 column tables? does hive support it?

https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala#L1894-L1923

We have a test case. I also saw multiple related JIRAs. The users complain the table with zero column does not have a correct number of rows after EXCEPT and INTERSECT...

I think the schema is generated by their programs. That is why it contains zero column.

That's temp view, I'm really wondering how useful a 0-column table is

https://issues.apache.org/jira/browse/SPARK-20008 is another JIRA

SparkQA · 2017-08-10T19:42:04Z

Test build #80495 has finished for PR 18907 at commit 227d931.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-11T04:44:29Z

Test build #80520 has finished for PR 18907 at commit f9d3d17.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode
case class HiveTableRelation(

cloud-fan · 2017-08-11T04:46:03Z

retest this please

SparkQA · 2017-08-11T04:58:55Z

Test build #80521 has finished for PR 18907 at commit f9d3d17.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode
case class HiveTableRelation(

cloud-fan · 2017-08-11T12:20:20Z

retest this please

cloud-fan · 2017-08-11T15:43:33Z

retest this please

gatorsmile · 2017-08-11T16:18:46Z

restest this please

gatorsmile · 2017-08-11T16:18:51Z

test this please

gatorsmile · 2017-08-11T16:18:57Z

ok to test

hvanhovell · 2017-08-11T16:21:48Z

I think something is up jenkins. @shaneknapp could you take a look?

shaneknapp · 2017-08-11T16:36:04Z

sometimes jobs don't like to trigger and there's nothing in the logs as to exactly why. since nothing was building, i decided to kick jenkins and then retrigger this build.

shaneknapp · 2017-08-11T16:36:14Z

ok to test

shaneknapp · 2017-08-11T16:36:17Z

test this please

shaneknapp · 2017-08-11T16:39:43Z

thanks for the heads up @hvanhovell -- looks like there was some gunk in the pipes and now we've got ~10 pull request builds running. :)

SparkQA · 2017-08-11T17:04:31Z

Test build #80526 has finished for PR 18907 at commit 3820442.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode
case class HiveTableRelation(

hvanhovell · 2017-08-11T17:23:14Z

@shaneknapp thanks for quick response!

SparkQA · 2017-08-11T18:20:17Z

Test build #80543 has finished for PR 18907 at commit ec8b465.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait CatalogRelation extends LeafNode
case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation
case class HiveTableRelation(

gatorsmile · 2017-08-12T05:58:31Z

[error]  * Replaces {@link CatalogRelation} with {@link HiveTableRelation} if its table provider is hive.
[error]                    ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/target/java/org/apache/spark/sql/hive/FindHiveTable.java:3: error: reference not found
[error]  * Replaces {@link CatalogRelation} with {@link HiveTableRelation} if its table provider is hive.
[error]     ```

SparkQA · 2017-08-14T15:57:53Z

Test build #80630 has finished for PR 18907 at commit 8f4bc08.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait CatalogRelation extends LeafNode
case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation
case class HiveTableRelation(

SparkQA · 2017-08-14T19:49:39Z

Test build #80640 has finished for PR 18907 at commit 9f2ab2c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait CatalogRelation extends LeafNode
case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation
case class HiveTableRelation(

gatorsmile · 2017-08-14T20:59:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

@@ -210,12 +210,10 @@ case class DataSourceAnalysis(conf: SQLConf) extends Rule[LogicalPlan] with Cast
 * Replaces [[CatalogRelation]] with data source table if its table provider is not hive.


CatalogRelation -> UnresolvedCatalogRelation

gatorsmile · 2017-08-14T21:04:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala

+ * A placeholder for a table relation, which will be replaced by concrete relation like
+ * `LogicalRelation` or `HiveTableRelation`, during analysis.
+ */
+case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation {


Is that possible we move HiveTableRelation to our core package? Then, many rules become very clear. We have a case for LogicalRelation and another case for HiveTableRelation

Or another way is to not let UnresolvedCatalogRelation extend CatalogRelation? Then, CatalogRelation can be a pure node for representing HiveTableRelation. We can rename it to a more easy-to-understand name.

SparkQA · 2017-08-15T15:46:05Z

Test build #80682 has finished for PR 18907 at commit 0a18435.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends LeafNode
case class HiveTableRelation(

gatorsmile · 2017-08-15T16:04:15Z

LGTM

gatorsmile · 2017-08-15T16:06:00Z

Thanks! Merging to master.

Hit conflicts when trying to merge to the previous versions.

cloud-fan · 2017-08-15T23:34:40Z

I'll send new PRs for 2.2 and 2.1

…hema in table properties This is a follow-up of apache#15900 , to fix one more bug: When table schema is empty and need to be inferred at runtime, we should not resolve parent plans before the schema has been inferred, or the parent plans will be resolved against an empty schema and may get wrong result for something like `select *` The fix logic is: introduce `UnresolvedCatalogRelation` as a placeholder. Then we replace it with `LogicalRelation` or `HiveTableRelation` during analysis, so that it's guaranteed that we won't resolve parent plans until the schema has been inferred. regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#18907 from cloud-fan/bug.

…hema in table properties backport #18907 to branch 2.2 Author: Wenchen Fan <wenchen@databricks.com> Closes #18963 from cloud-fan/backport.

…hema in table properties backport apache#18907 to branch 2.2 Author: Wenchen Fan <wenchen@databricks.com> Closes apache#18963 from cloud-fan/backport.

gatorsmile reviewed Aug 10, 2017

View reviewed changes

cloud-fan force-pushed the bug branch from 227d931 to f9d3d17 Compare August 11, 2017 04:29

cloud-fan force-pushed the bug branch from f9d3d17 to 3820442 Compare August 11, 2017 06:48

cloud-fan force-pushed the bug branch 2 times, most recently from b190354 to ec8b465 Compare August 11, 2017 18:05

cloud-fan force-pushed the bug branch from ec8b465 to 8f4bc08 Compare August 14, 2017 14:21

cloud-fan force-pushed the bug branch from 8f4bc08 to 9f2ab2c Compare August 14, 2017 17:01

gatorsmile reviewed Aug 14, 2017

View reviewed changes

support old table which doesn't store schema in table properties

0a18435

cloud-fan force-pushed the bug branch from 9f2ab2c to 0a18435 Compare August 15, 2017 12:58

asfgit closed this in 14bdb25 Aug 15, 2017

cloud-fan mentioned this pull request Aug 16, 2017

[SPARK-18464][SQL][backport] support old table which doesn't store schema in table properties #18963

Closed

asfgit pushed a commit that referenced this pull request Aug 16, 2017

[SPARK-18464][SQL][BACKPORT] support old table which doesn't store sc…

2a96975

…hema in table properties backport #18907 to branch 2.2 Author: Wenchen Fan <wenchen@databricks.com> Closes #18963 from cloud-fan/backport.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18464][SQL][followup] support old table which doesn't store schema in table properties #18907

[SPARK-18464][SQL][followup] support old table which doesn't store schema in table properties #18907

cloud-fan commented Aug 10, 2017 •

edited

Loading

cloud-fan commented Aug 10, 2017

gatorsmile Aug 10, 2017

cloud-fan Aug 10, 2017

cloud-fan Aug 10, 2017

gatorsmile Aug 10, 2017

gatorsmile Aug 10, 2017

cloud-fan Aug 11, 2017

gatorsmile Aug 11, 2017

SparkQA commented Aug 10, 2017

SparkQA commented Aug 11, 2017

cloud-fan commented Aug 11, 2017

SparkQA commented Aug 11, 2017

cloud-fan commented Aug 11, 2017

cloud-fan commented Aug 11, 2017

gatorsmile commented Aug 11, 2017

gatorsmile commented Aug 11, 2017

gatorsmile commented Aug 11, 2017

hvanhovell commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

SparkQA commented Aug 11, 2017

hvanhovell commented Aug 11, 2017

SparkQA commented Aug 11, 2017

gatorsmile commented Aug 12, 2017

SparkQA commented Aug 14, 2017

SparkQA commented Aug 14, 2017

gatorsmile Aug 14, 2017

gatorsmile Aug 14, 2017

SparkQA commented Aug 15, 2017

gatorsmile commented Aug 15, 2017

gatorsmile commented Aug 15, 2017

cloud-fan commented Aug 15, 2017

		@@ -210,12 +210,10 @@ case class DataSourceAnalysis(conf: SQLConf) extends Rule[LogicalPlan] with Cast
		* Replaces [[CatalogRelation]] with data source table if its table provider is not hive.

[SPARK-18464][SQL][followup] support old table which doesn't store schema in table properties #18907

[SPARK-18464][SQL][followup] support old table which doesn't store schema in table properties #18907

Conversation

cloud-fan commented Aug 10, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Aug 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 10, 2017

SparkQA commented Aug 11, 2017

cloud-fan commented Aug 11, 2017

SparkQA commented Aug 11, 2017

cloud-fan commented Aug 11, 2017

cloud-fan commented Aug 11, 2017

gatorsmile commented Aug 11, 2017

gatorsmile commented Aug 11, 2017

gatorsmile commented Aug 11, 2017

hvanhovell commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

shaneknapp commented Aug 11, 2017

SparkQA commented Aug 11, 2017

hvanhovell commented Aug 11, 2017

SparkQA commented Aug 11, 2017

gatorsmile commented Aug 12, 2017

SparkQA commented Aug 14, 2017

SparkQA commented Aug 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 15, 2017

gatorsmile commented Aug 15, 2017

gatorsmile commented Aug 15, 2017

cloud-fan commented Aug 15, 2017

cloud-fan commented Aug 10, 2017 •

edited

Loading