-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18464][SQL][followup] support old table which doesn't store schema in table properties #18907
Conversation
// If table schema is empty, Spark will infer the schema at runtime, so we should mark this | ||
// relation as unresolved and wait it to be replaced by relation with actual schema, before | ||
// resolving parent plans. | ||
override lazy val resolved: Boolean = tableMeta.schema.nonEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also support a table with zero column. Will this affect such a support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we? it definitely will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why shall we support 0 column tables? does hive support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a test case. I also saw multiple related JIRAs. The users complain the table with zero column does not have a correct number of rows after EXCEPT
and INTERSECT
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the schema is generated by their programs. That is why it contains zero column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's temp view, I'm really wondering how useful a 0-column table is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/SPARK-20008 is another JIRA
Test build #80495 has finished for PR 18907 at commit
|
Test build #80520 has finished for PR 18907 at commit
|
retest this please |
Test build #80521 has finished for PR 18907 at commit
|
retest this please |
1 similar comment
retest this please |
restest this please |
test this please |
ok to test |
I think something is up jenkins. @shaneknapp could you take a look? |
sometimes jobs don't like to trigger and there's nothing in the logs as to exactly why. since nothing was building, i decided to kick jenkins and then retrigger this build. |
ok to test |
test this please |
thanks for the heads up @hvanhovell -- looks like there was some gunk in the pipes and now we've got ~10 pull request builds running. :) |
Test build #80526 has finished for PR 18907 at commit
|
@shaneknapp thanks for quick response! |
b190354
to
ec8b465
Compare
Test build #80543 has finished for PR 18907 at commit
|
|
Test build #80630 has finished for PR 18907 at commit
|
Test build #80640 has finished for PR 18907 at commit
|
@@ -210,12 +210,10 @@ case class DataSourceAnalysis(conf: SQLConf) extends Rule[LogicalPlan] with Cast | |||
* Replaces [[CatalogRelation]] with data source table if its table provider is not hive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CatalogRelation
-> UnresolvedCatalogRelation
* A placeholder for a table relation, which will be replaced by concrete relation like | ||
* `LogicalRelation` or `HiveTableRelation`, during analysis. | ||
*/ | ||
case class UnresolvedCatalogRelation(tableMeta: CatalogTable) extends CatalogRelation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that possible we move HiveTableRelation
to our core package? Then, many rules become very clear. We have a case for LogicalRelation
and another case for HiveTableRelation
Or another way is to not let UnresolvedCatalogRelation
extend CatalogRelation
? Then, CatalogRelation
can be a pure node for representing HiveTableRelation
. We can rename it to a more easy-to-understand name.
Test build #80682 has finished for PR 18907 at commit
|
LGTM |
Thanks! Merging to master. Hit conflicts when trying to merge to the previous versions. |
I'll send new PRs for 2.2 and 2.1 |
…hema in table properties This is a follow-up of apache#15900 , to fix one more bug: When table schema is empty and need to be inferred at runtime, we should not resolve parent plans before the schema has been inferred, or the parent plans will be resolved against an empty schema and may get wrong result for something like `select *` The fix logic is: introduce `UnresolvedCatalogRelation` as a placeholder. Then we replace it with `LogicalRelation` or `HiveTableRelation` during analysis, so that it's guaranteed that we won't resolve parent plans until the schema has been inferred. regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#18907 from cloud-fan/bug.
…hema in table properties backport apache#18907 to branch 2.2 Author: Wenchen Fan <wenchen@databricks.com> Closes apache#18963 from cloud-fan/backport.
What changes were proposed in this pull request?
This is a follow-up of #15900 , to fix one more bug:
When table schema is empty and need to be inferred at runtime, we should not resolve parent plans before the schema has been inferred, or the parent plans will be resolved against an empty schema and may get wrong result for something like
select *
The fix logic is: introduce
UnresolvedCatalogRelation
as a placeholder. Then we replace it withLogicalRelation
orHiveTableRelation
during analysis, so that it's guaranteed that we won't resolve parent plans until the schema has been inferred.How was this patch tested?
regression test