Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16825] [SQL] Replace hive.default.fileformat by spark.sql.default.fileformat #14430

Closed
wants to merge 5 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jul 31, 2016

What changes were proposed in this pull request?

Currently, we are using hive.default.fileformat for specifying the default file format in CREATE TABLE statement. Multiple issues exist:

  • This parameter value is not from hive-site.xml. Thus, even if users change the hive.default.fileformat in hive-site.xml, Spark will ignore it. To change the parameter values, users have to use Spark interface, (e.g., by a SET command or API).
  • This parameter is not documented.
  • This parameter value will not be sent to Hive metastore. It is being used by Spark internals when processing CREATE TABLE statement.
  • This parameter is case sensitive.

Since this is being used by Spark only, it does not make sense to use a parameter starting from hive. we might follow the other Hive-related parameters and introduce a new Spark parameter here. It should be public. Thus, this PR is to replace hive.default.fileformat by spark.sql.default.fileformat. It also makes it case insensitive.

How was this patch tested?

Added test cases

// Explicitly set fs to local fs.
sql(s"set fs.default.name=file://$testTempDir/")
// Ask Hive to run jobs in-process as a single map and reduce task.
sql("set mapred.job.tracker=local")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These statements are dead code. If we really need, we have to use the runSqlHive API of HiveClient. Since it does not cause errors, I remove them.

@SparkQA
Copy link

SparkQA commented Jul 31, 2016

Test build #63056 has finished for PR 14430 at commit 0dc7cda.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 31, 2016

Test build #63057 has finished for PR 14430 at commit c0817fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 1, 2016

We already have default sources. It is very confusing to have default format on top of that.

@gatorsmile
Copy link
Member Author

gatorsmile commented Aug 1, 2016

@rxin That is definitely true! I had the same concern when changing the code.

Now, we are having two types of tables: one is Data source tables and the other is Hive tables. The default value of the default format for data source tables is parquet. The default format of hive tables is textfile, if users do not specify the format.

Maybe we can use the same configuration parameter but different default values? spark.sql.sources.default. Let me try it and you can judge whether it is better

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63067 has finished for PR 14430 at commit 7b64edc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xmubeta
Copy link

xmubeta commented Sep 26, 2019

Please bear me if my question is silly.

I have the same request to set the default file format to parquet for spark-sql. I didn't see this PR was merged into any version. Is that true? Does it mean I need to patch the code by myself? Any other way to set the default file format other than using set command in session.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants