Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18464][SQL][followup] support old table which doesn't store schema in table properties #18907

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Aug 10, 2017

What changes were proposed in this pull request?

This is a follow-up of #15900 , to fix one more bug:
When table schema is empty and need to be inferred at runtime, we should not resolve parent plans before the schema has been inferred, or the parent plans will be resolved against an empty schema and may get wrong result for something like select *

The fix logic is: introduce UnresolvedCatalogRelation as a placeholder. Then we replace it with LogicalRelation or HiveTableRelation during analysis, so that it's guaranteed that we won't resolve parent plans until the schema has been inferred.

How was this patch tested?

regression test

@cloud-fan
Copy link
Contributor Author

cc @gatorsmile @yhuai

// If table schema is empty, Spark will infer the schema at runtime, so we should mark this
// relation as unresolved and wait it to be replaced by relation with actual schema, before
// resolving parent plans.
override lazy val resolved: Boolean = tableMeta.schema.nonEmpty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also support a table with zero column. Will this affect such a support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we? it definitely will.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why shall we support 0 column tables? does hive support it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala#L1894-L1923

We have a test case. I also saw multiple related JIRAs. The users complain the table with zero column does not have a correct number of rows after EXCEPT and INTERSECT...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the schema is generated by their programs. That is why it contains zero column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's temp view, I'm really wondering how useful a 0-column table is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80495 has finished for PR 18907 at commit 227d931.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80520 has finished for PR 18907 at commit f9d3d17.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode
  • case class HiveTableRelation(

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80521 has finished for PR 18907 at commit f9d3d17.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode
  • case class HiveTableRelation(

@cloud-fan
Copy link
Contributor Author

retest this please

1 similar comment
@cloud-fan
Copy link
Contributor Author

retest this please

@gatorsmile
Copy link
Member

restest this please

@gatorsmile
Copy link
Member

test this please

@gatorsmile
Copy link
Member

ok to test

@hvanhovell
Copy link
Contributor

I think something is up jenkins. @shaneknapp could you take a look?

@shaneknapp
Copy link
Contributor

sometimes jobs don't like to trigger and there's nothing in the logs as to exactly why. since nothing was building, i decided to kick jenkins and then retrigger this build.

@shaneknapp
Copy link
Contributor

ok to test

@shaneknapp
Copy link
Contributor

test this please

@shaneknapp
Copy link
Contributor

thanks for the heads up @hvanhovell -- looks like there was some gunk in the pipes and now we've got ~10 pull request builds running. :)

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80526 has finished for PR 18907 at commit 3820442.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CatalogRelation(tableMeta: CatalogTable) extends LeafNode
  • case class HiveTableRelation(

@hvanhovell
Copy link
Contributor

@shaneknapp thanks for quick response!

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80543 has finished for PR 18907 at commit ec8b465.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CatalogRelation extends LeafNode
  • case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation
  • case class HiveTableRelation(

@gatorsmile
Copy link
Member

[error]  * Replaces {@link CatalogRelation} with {@link HiveTableRelation} if its table provider is hive.
[error]                    ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive/target/java/org/apache/spark/sql/hive/FindHiveTable.java:3: error: reference not found
[error]  * Replaces {@link CatalogRelation} with {@link HiveTableRelation} if its table provider is hive.
[error]     ```

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80630 has finished for PR 18907 at commit 8f4bc08.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CatalogRelation extends LeafNode
  • case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation
  • case class HiveTableRelation(

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80640 has finished for PR 18907 at commit 9f2ab2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CatalogRelation extends LeafNode
  • case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation
  • case class HiveTableRelation(

@@ -210,12 +210,10 @@ case class DataSourceAnalysis(conf: SQLConf) extends Rule[LogicalPlan] with Cast
* Replaces [[CatalogRelation]] with data source table if its table provider is not hive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogRelation -> UnresolvedCatalogRelation

* A placeholder for a table relation, which will be replaced by concrete relation like
* `LogicalRelation` or `HiveTableRelation`, during analysis.
*/
case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that possible we move HiveTableRelation to our core package? Then, many rules become very clear. We have a case for LogicalRelation and another case for HiveTableRelation

Or another way is to not let UnresolvedCatalogRelation extend CatalogRelation? Then, CatalogRelation can be a pure node for representing HiveTableRelation. We can rename it to a more easy-to-understand name.

@SparkQA
Copy link

SparkQA commented Aug 15, 2017

Test build #80682 has finished for PR 18907 at commit 0a18435.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends LeafNode
  • case class HiveTableRelation(

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

Hit conflicts when trying to merge to the previous versions.

@asfgit asfgit closed this in 14bdb25 Aug 15, 2017
@cloud-fan
Copy link
Contributor Author

I'll send new PRs for 2.2 and 2.1

cloud-fan added a commit to cloud-fan/spark that referenced this pull request Aug 16, 2017
…hema in table properties

This is a follow-up of apache#15900 , to fix one more bug:
When table schema is empty and need to be inferred at runtime, we should not resolve parent plans before the schema has been inferred, or the parent plans will be resolved against an empty schema and may get wrong result for something like `select *`

The fix logic is: introduce `UnresolvedCatalogRelation` as a placeholder. Then we replace it with `LogicalRelation` or `HiveTableRelation` during analysis, so that it's guaranteed that we won't resolve parent plans until the schema has been inferred.

regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#18907 from cloud-fan/bug.
asfgit pushed a commit that referenced this pull request Aug 16, 2017
…hema in table properties

backport #18907 to branch 2.2

Author: Wenchen Fan <wenchen@databricks.com>

Closes #18963 from cloud-fan/backport.
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…hema in table properties

backport apache#18907 to branch 2.2

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#18963 from cloud-fan/backport.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants