[SPARK-17203][SQL] data source options should always be case insensitive #14773

cloud-fan · 2016-08-23T15:23:30Z

What changes were proposed in this pull request?

We don't have a clear definition about When data source options should be case sensitive and when not. Currently path is case-insensitive, numFeatures in LibSVMFileFormat is case-sensitive, maxFilesPerTrigger in FileStreamSource is case-insensitive, etc.

Instead of letting every conf decide whether they should be case-sensitive or not themselves, I think it's better to make it clear that all data source options are case-insensitive.

How was this patch tested?

existing tests.

cloud-fan · 2016-08-23T15:23:43Z

cc @yhuai @rxin @gatorsmile

SparkQA · 2016-08-23T16:50:09Z

Test build #64289 has finished for PR 14773 at commit 2e4368e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-08-24T01:49:19Z

Maybe these options should just case sensitive in general?

rxin · 2016-08-26T23:16:39Z

Actually I take that back. Just realized we explicitly documented that these options were case insensitive in the data source API.

frreiss · 2016-08-26T23:47:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala

@@ -65,7 +65,7 @@ case class CreateDataSourceTableCommand(

    var isExternal = true
    val optionsWithPath =
-      if (!new CaseInsensitiveMap(options).contains("path") && managedIfNoPath) {
+      if (!options.contains("path") && managedIfNoPath) {


You might want to create a constant for the string "path" as a data source param. It occurs in quite a few places.

cloud-fan · 2016-08-31T08:28:11Z

I'll back on it after we unify the path option and the locationUri for data source and hive serde tables.

clockfly · 2016-09-02T05:28:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala

+      val duplicatedKeys = lowercaseKeys.groupBy(identity).collect {
+        case (x, ys) if ys.size > 1 => x
+      }
+      throw new AnalysisException(


Why it is an AnalysisException? This class is only supposed to be used at driver?

data source options should always be case insensitive

2e4368e

frreiss reviewed Aug 26, 2016
View reviewed changes

clockfly reviewed Sep 2, 2016
View reviewed changes

cloud-fan closed this Nov 16, 2016

cloud-fan deleted the case branch December 14, 2016 12:33

gatorsmile mentioned this pull request Feb 3, 2017

[SPARK-19397] [SQL] Make option names of LIBSVM and TEXT case insensitive #16737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17203][SQL] data source options should always be case insensitive #14773

[SPARK-17203][SQL] data source options should always be case insensitive #14773

cloud-fan commented Aug 23, 2016

cloud-fan commented Aug 23, 2016 •

edited

Loading

SparkQA commented Aug 23, 2016

rxin commented Aug 24, 2016 •

edited

Loading

rxin commented Aug 26, 2016

frreiss Aug 26, 2016 •

edited

Loading

cloud-fan commented Aug 31, 2016

clockfly Sep 2, 2016 •

edited

Loading

[SPARK-17203][SQL] data source options should always be case insensitive #14773

[SPARK-17203][SQL] data source options should always be case insensitive #14773

Conversation

cloud-fan commented Aug 23, 2016

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Aug 23, 2016 • edited Loading

SparkQA commented Aug 23, 2016

rxin commented Aug 24, 2016 • edited Loading

rxin commented Aug 26, 2016

frreiss Aug 26, 2016 • edited Loading

Choose a reason for hiding this comment

cloud-fan commented Aug 31, 2016

clockfly Sep 2, 2016 • edited Loading

Choose a reason for hiding this comment

cloud-fan commented Aug 23, 2016 •

edited

Loading

rxin commented Aug 24, 2016 •

edited

Loading

frreiss Aug 26, 2016 •

edited

Loading

clockfly Sep 2, 2016 •

edited

Loading