Skip to content

Conversation

@CodingCat
Copy link
Contributor

What changes were proposed in this pull request?

the current code path ignore the value of spark.sql.hive.convertMetastoreParquet when building data source table

case UnresolvedCatalogRelation(tableMeta) if DDLUtils.isDatasourceTable(tableMeta) =>

as a result, even I turned off spark.sql.hive.convertMetastoreParquet, Spark SQL still uses its own parquet reader to access table instead of delegate to serder

This PR checks the value of the configuration when building data source table

How was this patch tested?

existing test

@CodingCat CodingCat changed the title [SQL][SPARK-24797] respect spark.sql.hive.convertMetastoreOrc/Parquet when build… [SPARK-24797] [SQL] respect spark.sql.hive.convertMetastoreOrc/Parquet when build… Jul 13, 2018
@CodingCat
Copy link
Contributor Author

@felixcheung

@SparkQA
Copy link

SparkQA commented Jul 13, 2018

Test build #92960 has finished for PR 21757 at commit a5d72cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, _, _)
if DDLUtils.isDatasourceTable(tableMeta) =>
if DDLUtils.isDatasourceTable(tableMeta) &&
DDLUtils.convertSchema(tableMeta, sparkSession) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a right fix. If the original table is the native data source table, we will always use our parquet/orc reader instead of hive serde.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean any table built through df.write.format("..") should be taken as a data source table no matter we register it with HMS or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using format("parquet") to create a new table, it will be a data source table. We always use the native reader/writer to read/write such a table.

Copy link
Contributor Author

@CodingCat CodingCat Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

@CodingCat CodingCat closed this Jul 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants