Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19611][SQL][followup] set dataSchema correctly in HiveMetastoreCatalog.convertToLogicalRelation #19615

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

We made a mistake in #16944 . In HiveMetastoreCatalog#inferIfNeeded we infer the data schema, merge with full schema, and return the new full schema. At caller side we treat the full schema as data schema and set it to HadoopFsRelation.

This doesn't cause any problem because both parquet and orc can work with a wrong data schema that has extra columns, but it's better to fix this mistake.

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cc @budde @gatorsmile

@SparkQA
Copy link

SparkQA commented Oct 31, 2017

Test build #83234 has finished for PR 19615 at commit 46f530f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

also cc @dongjoon-hyun @viirya

@viirya
Copy link
Member

viirya commented Oct 31, 2017

LGTM

@@ -164,13 +164,12 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
}
}

val (dataSchema, updatedTable) =
inferIfNeeded(relation, options, fileFormat, Option(fileIndex))
val updatedTable = inferIfNeeded(relation, options, fileFormat, Option(fileIndex))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we missed this. Actually the variable name is indicating it is data schema...

asfgit pushed a commit that referenced this pull request Oct 31, 2017
…eCatalog.convertToLogicalRelation

We made a mistake in #16944 . In `HiveMetastoreCatalog#inferIfNeeded` we infer the data schema, merge with full schema, and return the new full schema. At caller side we treat the full schema as data schema and set it to `HadoopFsRelation`.

This doesn't cause any problem because both parquet and orc can work with a wrong data schema that has extra columns, but it's better to fix this mistake.

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #19615 from cloud-fan/infer.

(cherry picked from commit 4d9ebf3)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor Author

thanks, merging to master/2.2!

@asfgit asfgit closed this in 4d9ebf3 Oct 31, 2017
@dongjoon-hyun
Copy link
Member

Thank you so much, @cloud-fan !

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…eCatalog.convertToLogicalRelation

We made a mistake in apache#16944 . In `HiveMetastoreCatalog#inferIfNeeded` we infer the data schema, merge with full schema, and return the new full schema. At caller side we treat the full schema as data schema and set it to `HadoopFsRelation`.

This doesn't cause any problem because both parquet and orc can work with a wrong data schema that has extra columns, but it's better to fix this mistake.

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#19615 from cloud-fan/infer.

(cherry picked from commit 4d9ebf3)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants