[SPARK-27238][SQL] Add fine-grained configurations to handle `convertMetastoreParquet ` and `convertMetastoreOrc` by 10110346 · Pull Request #24174 · apache/spark

10110346 · 2019-03-22T03:28:29Z

What changes were proposed in this pull request?

In the same APP, TableA and TableB are both hive Parquet tables, but TableA can't use the built-in Parquet reader and writer.
In this situation, spark.sql.hive.convertMetastoreParquet can't control this well, so I think we can add a fine-grained configuration to handle this case.

How was this patch tested?

Add a unit test

SparkQA · 2019-03-22T04:34:44Z

Test build #103800 has finished for PR 24174 at commit 4ea706b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-22T07:05:02Z

Test build #103805 has finished for PR 24174 at commit 3453a5e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-22T09:18:04Z

Test build #103808 has finished for PR 24174 at commit 944504a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-03-25T01:42:33Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala

+  val CONVERT_METASTORE_PARQUET_EXCLUDED_TABLES =
+    buildConf("spark.sql.hive.convertMetastoreParquet.excludedTables")
+    .doc("A comma-separated list of Parquet table names, which do not use the built-in Parquet" +
+      "reader and writer when \"spark.sql.hive.convertMetastoreParquet\" is true.")


Why don't we simply disable spark.sql.hive.convertMetastoreParquet in this case? It looks going to make the codes super complicated.

Thanks @HyukjinKwon
Take another example, in a SQL statement, this SQL statement needs to read many hive Parquet tables, but TableA can't use the built-in Parquet reader and writer, if we disable spark.sql.hive.convertMetastoreParquet, the other tables can't use the built-in Parquet reader and writer too, we know, the performance of built-in Parquet reader and writer is much better, this will affect the performance of the SQL statement.

What I am not sure is that if we really need this fine-grained for this optimization. BTW, why TableA wasn't able to use built-in Parquet reader and writer in your case?

Data is incompatible, but our tests are not in the latest version.
I think that since we have this configuration spark. sql. hive. convertMetastoreParquet, the reason is that it may have incompatible problems.

I think we should fix either the data or this optimization compatible. Spark started to have a huge bunch of configurations, and I think we should better avoid it.

Ok, I will this PR, thanks

10110346 force-pushed the CONVERT_EXCLUDED_TABLES branch from 4ea706b to 3453a5e Compare March 22, 2019 06:01

fix

944504a

10110346 changed the title ~~[SPARK-27238][SQL] Add a fine-grained configuration to handle convertMetastoreParquet~~ [SPARK-27238][SQL] Add fine-grained configurations to handle convertMetastoreParquet and convertMetastoreOrc Mar 22, 2019

10110346 force-pushed the CONVERT_EXCLUDED_TABLES branch from 3453a5e to 944504a Compare March 22, 2019 06:57

HyukjinKwon reviewed Mar 25, 2019

View reviewed changes

10110346 closed this Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-27238][SQL] Add fine-grained configurations to handle `convertMetastoreParquet` and `convertMetastoreOrc`#24174

[SPARK-27238][SQL] Add fine-grained configurations to handle `convertMetastoreParquet` and `convertMetastoreOrc`#24174
10110346 wants to merge 1 commit intoapache:masterfrom
10110346:CONVERT_EXCLUDED_TABLES

10110346 commented Mar 22, 2019 •

edited

Loading

Uh oh!

SparkQA commented Mar 22, 2019

Uh oh!

SparkQA commented Mar 22, 2019

Uh oh!

SparkQA commented Mar 22, 2019

Uh oh!

HyukjinKwon Mar 25, 2019

Uh oh!

10110346 Mar 25, 2019

Uh oh!

HyukjinKwon Mar 25, 2019

Uh oh!

10110346 Mar 25, 2019

Uh oh!

HyukjinKwon Mar 25, 2019

Uh oh!

10110346 Mar 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

10110346 commented Mar 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 22, 2019

Uh oh!

SparkQA commented Mar 22, 2019

Uh oh!

SparkQA commented Mar 22, 2019

Uh oh!

HyukjinKwon Mar 25, 2019

Choose a reason for hiding this comment

Uh oh!

10110346 Mar 25, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Mar 25, 2019

Choose a reason for hiding this comment

Uh oh!

10110346 Mar 25, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Mar 25, 2019

Choose a reason for hiding this comment

Uh oh!

10110346 Mar 26, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

10110346 commented Mar 22, 2019 •

edited

Loading