Skip to content

[SPARK-27238][SQL] Add fine-grained configurations to handle convertMetastoreParquet and convertMetastoreOrc#24174

Closed
10110346 wants to merge 1 commit intoapache:masterfrom
10110346:CONVERT_EXCLUDED_TABLES
Closed

[SPARK-27238][SQL] Add fine-grained configurations to handle convertMetastoreParquet and convertMetastoreOrc#24174
10110346 wants to merge 1 commit intoapache:masterfrom
10110346:CONVERT_EXCLUDED_TABLES

Conversation

@10110346
Copy link
Contributor

@10110346 10110346 commented Mar 22, 2019

What changes were proposed in this pull request?

In the same APP, TableA and TableB are both hive Parquet tables, but TableA can't use the built-in Parquet reader and writer.
In this situation, spark.sql.hive.convertMetastoreParquet can't control this well, so I think we can add a fine-grained configuration to handle this case.

How was this patch tested?

Add a unit test

@SparkQA
Copy link

SparkQA commented Mar 22, 2019

Test build #103800 has finished for PR 24174 at commit 4ea706b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@10110346 10110346 force-pushed the CONVERT_EXCLUDED_TABLES branch from 4ea706b to 3453a5e Compare March 22, 2019 06:01
@10110346 10110346 changed the title [SPARK-27238][SQL] Add a fine-grained configuration to handle convertMetastoreParquet [SPARK-27238][SQL] Add fine-grained configurations to handle convertMetastoreParquet and convertMetastoreOrc Mar 22, 2019
@10110346 10110346 force-pushed the CONVERT_EXCLUDED_TABLES branch from 3453a5e to 944504a Compare March 22, 2019 06:57
@SparkQA
Copy link

SparkQA commented Mar 22, 2019

Test build #103805 has finished for PR 24174 at commit 3453a5e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2019

Test build #103808 has finished for PR 24174 at commit 944504a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val CONVERT_METASTORE_PARQUET_EXCLUDED_TABLES =
buildConf("spark.sql.hive.convertMetastoreParquet.excludedTables")
.doc("A comma-separated list of Parquet table names, which do not use the built-in Parquet" +
"reader and writer when \"spark.sql.hive.convertMetastoreParquet\" is true.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we simply disable spark.sql.hive.convertMetastoreParquet in this case? It looks going to make the codes super complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HyukjinKwon
Take another example, in a SQL statement, this SQL statement needs to read many hive Parquet tables, but TableA can't use the built-in Parquet reader and writer, if we disable spark.sql.hive.convertMetastoreParquet, the other tables can't use the built-in Parquet reader and writer too, we know, the performance of built-in Parquet reader and writer is much better, this will affect the performance of the SQL statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am not sure is that if we really need this fine-grained for this optimization. BTW, why TableA wasn't able to use built-in Parquet reader and writer in your case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is incompatible, but our tests are not in the latest version.
I think that since we have this configuration spark. sql. hive. convertMetastoreParquet, the reason is that it may have incompatible problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fix either the data or this optimization compatible. Spark started to have a huge bunch of configurations, and I think we should better avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will this PR, thanks

@10110346 10110346 closed this Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants