[SPARK-16825] [SQL] Replace hive.default.fileformat by spark.sql.default.fileformat #14430

gatorsmile · 2016-07-31T21:08:47Z

What changes were proposed in this pull request?

Currently, we are using hive.default.fileformat for specifying the default file format in CREATE TABLE statement. Multiple issues exist:

This parameter value is not from hive-site.xml. Thus, even if users change the hive.default.fileformat in hive-site.xml, Spark will ignore it. To change the parameter values, users have to use Spark interface, (e.g., by a SET command or API).
This parameter is not documented.
This parameter value will not be sent to Hive metastore. It is being used by Spark internals when processing CREATE TABLE statement.
This parameter is case sensitive.

Since this is being used by Spark only, it does not make sense to use a parameter starting from hive. we might follow the other Hive-related parameters and introduce a new Spark parameter here. It should be public. Thus, this PR is to replace hive.default.fileformat by spark.sql.default.fileformat. It also makes it case insensitive.

How was this patch tested?

Added test cases

gatorsmile · 2016-07-31T21:10:21Z

...bility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala

-    // Explicitly set fs to local fs.
-    sql(s"set fs.default.name=file://$testTempDir/")
-    // Ask Hive to run jobs in-process as a single map and reduce task.
-    sql("set mapred.job.tracker=local")


These statements are dead code. If we really need, we have to use the runSqlHive API of HiveClient. Since it does not cause errors, I remove them.

SparkQA · 2016-07-31T23:00:34Z

Test build #63056 has finished for PR 14430 at commit 0dc7cda.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-31T23:06:31Z

Test build #63057 has finished for PR 14430 at commit c0817fe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-08-01T00:26:30Z

We already have default sources. It is very confusing to have default format on top of that.

gatorsmile · 2016-08-01T01:18:11Z

@rxin That is definitely true! I had the same concern when changing the code.

Now, we are having two types of tables: one is Data source tables and the other is Hive tables. The default value of the default format for data source tables is parquet. The default format of hive tables is textfile, if users do not specify the format.

Maybe we can use the same configuration parameter but different default values? spark.sql.sources.default. Let me try it and you can judge whether it is better

SparkQA · 2016-08-01T04:48:57Z

Test build #63067 has finished for PR 14430 at commit 7b64edc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xmubeta · 2019-09-26T14:05:48Z

Please bear me if my question is silly.

I have the same request to set the default file format to parquet for spark-sql. I didn't see this PR was merged into any version. Is that true? Does it mean I need to patch the code by myself? Any other way to set the default file format other than using set command in session.

Thank you.

gatorsmile added 3 commits July 31, 2016 08:08

remove useless set statements.

336bd11

change hive.default.fileformat to spark.sql.default.fileformat

340e463

typo

0dc7cda

gatorsmile reviewed Jul 31, 2016
View reviewed changes

typo

c0817fe

combine the parms to the same one

7b64edc

gatorsmile mentioned this pull request Sep 1, 2016

[SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in CREATE TABLE LIKE command #14531

Closed

gatorsmile closed this Nov 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16825] [SQL] Replace hive.default.fileformat by spark.sql.default.fileformat #14430

[SPARK-16825] [SQL] Replace hive.default.fileformat by spark.sql.default.fileformat #14430

gatorsmile commented Jul 31, 2016 •

edited

Loading

gatorsmile Jul 31, 2016

SparkQA commented Jul 31, 2016

SparkQA commented Jul 31, 2016

rxin commented Aug 1, 2016

gatorsmile commented Aug 1, 2016 •

edited

Loading

SparkQA commented Aug 1, 2016

xmubeta commented Sep 26, 2019

[SPARK-16825] [SQL] Replace hive.default.fileformat by spark.sql.default.fileformat #14430

[SPARK-16825] [SQL] Replace hive.default.fileformat by spark.sql.default.fileformat #14430

Conversation

gatorsmile commented Jul 31, 2016 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile Jul 31, 2016

Choose a reason for hiding this comment

SparkQA commented Jul 31, 2016

SparkQA commented Jul 31, 2016

rxin commented Aug 1, 2016

gatorsmile commented Aug 1, 2016 • edited Loading

SparkQA commented Aug 1, 2016

xmubeta commented Sep 26, 2019

gatorsmile commented Jul 31, 2016 •

edited

Loading

gatorsmile commented Aug 1, 2016 •

edited

Loading