-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16825] [SQL] Replace hive.default.fileformat by spark.sql.default.fileformat #14430
Conversation
// Explicitly set fs to local fs. | ||
sql(s"set fs.default.name=file://$testTempDir/") | ||
// Ask Hive to run jobs in-process as a single map and reduce task. | ||
sql("set mapred.job.tracker=local") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These statements are dead code. If we really need, we have to use the runSqlHive
API of HiveClient
. Since it does not cause errors, I remove them.
Test build #63056 has finished for PR 14430 at commit
|
Test build #63057 has finished for PR 14430 at commit
|
We already have default sources. It is very confusing to have default format on top of that. |
@rxin That is definitely true! I had the same concern when changing the code. Now, we are having two types of tables: one is Data source tables and the other is Hive tables. The default value of the default format for data source tables is Maybe we can use the same configuration parameter but different default values? |
Test build #63067 has finished for PR 14430 at commit
|
Please bear me if my question is silly. I have the same request to set the default file format to parquet for spark-sql. I didn't see this PR was merged into any version. Is that true? Does it mean I need to patch the code by myself? Any other way to set the default file format other than using set command in session. Thank you. |
What changes were proposed in this pull request?
Currently, we are using
hive.default.fileformat
for specifying the default file format in CREATE TABLE statement. Multiple issues exist:hive-site.xml
. Thus, even if users change the hive.default.fileformat inhive-site.xml
, Spark will ignore it. To change the parameter values, users have to use Spark interface, (e.g., by a SET command or API).Since this is being used by Spark only, it does not make sense to use a parameter starting from
hive
. we might follow the other Hive-related parameters and introduce a new Spark parameter here. It should be public. Thus, this PR is to replacehive.default.fileformat
byspark.sql.default.fileformat
. It also makes it case insensitive.How was this patch tested?
Added test cases