[SPARK-27238][SQL] Add fine-grained configurations to handle convertMetastoreParquet and convertMetastoreOrc#24174
Conversation
|
Test build #103800 has finished for PR 24174 at commit
|
4ea706b to
3453a5e
Compare
convertMetastoreParquet convertMetastoreParquet and convertMetastoreOrc
3453a5e to
944504a
Compare
|
Test build #103805 has finished for PR 24174 at commit
|
|
Test build #103808 has finished for PR 24174 at commit
|
| val CONVERT_METASTORE_PARQUET_EXCLUDED_TABLES = | ||
| buildConf("spark.sql.hive.convertMetastoreParquet.excludedTables") | ||
| .doc("A comma-separated list of Parquet table names, which do not use the built-in Parquet" + | ||
| "reader and writer when \"spark.sql.hive.convertMetastoreParquet\" is true.") |
There was a problem hiding this comment.
Why don't we simply disable spark.sql.hive.convertMetastoreParquet in this case? It looks going to make the codes super complicated.
There was a problem hiding this comment.
Thanks @HyukjinKwon
Take another example, in a SQL statement, this SQL statement needs to read many hive Parquet tables, but TableA can't use the built-in Parquet reader and writer, if we disable spark.sql.hive.convertMetastoreParquet, the other tables can't use the built-in Parquet reader and writer too, we know, the performance of built-in Parquet reader and writer is much better, this will affect the performance of the SQL statement.
There was a problem hiding this comment.
What I am not sure is that if we really need this fine-grained for this optimization. BTW, why TableA wasn't able to use built-in Parquet reader and writer in your case?
There was a problem hiding this comment.
Data is incompatible, but our tests are not in the latest version.
I think that since we have this configuration spark. sql. hive. convertMetastoreParquet, the reason is that it may have incompatible problems.
There was a problem hiding this comment.
I think we should fix either the data or this optimization compatible. Spark started to have a huge bunch of configurations, and I think we should better avoid it.
There was a problem hiding this comment.
Ok, I will this PR, thanks
What changes were proposed in this pull request?
In the same APP, TableA and TableB are both hive Parquet tables, but TableA can't use the built-in Parquet reader and writer.
In this situation,
spark.sql.hive.convertMetastoreParquetcan't control this well, so I think we can add a fine-grained configuration to handle this case.How was this patch tested?
Add a unit test