[SPARK-4618][SQL] Make foreign DDL commands options case-insensitive #3470

scwf · 2014-11-26T05:06:22Z

Using lowercase for options key to make it case-insensitive, then we should use lower case to get value from parameters.
So flowing cmd work

      create temporary table normal_parquet
      USING org.apache.spark.sql.parquet
      OPTIONS (
        PATH '/xxx/data'
      )

SparkQA · 2014-11-26T05:07:51Z

Test build #23877 has started for PR 3470 at commit e244e8d.

This patch merges cleanly.

SparkQA · 2014-11-26T06:16:01Z

Test build #23877 has finished for PR 3470 at commit e244e8d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-26T06:16:04Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23877/
Test PASSed.

marmbrus · 2014-12-01T23:28:49Z

I think this is probably a reasonable semantic to have given keywords are not generally case sensitive in SQL, but I think that we need to add some documentation to DataSource that says that this is the case. You might even go so far as to provide a case insensitive map to avoid confusion when developers try to do lookups with keywords that contain capital letters.

thoughts @liancheng @rxin?

Also, given the current implementation, I would just statically lowercase samplingRatio instead of using toLower.

rxin · 2014-12-02T02:33:05Z

How about moving the toLowerCase into the get funciton itself?

scwf · 2014-12-02T02:38:28Z

Yes, We can implement a case insensitive map and in this map's get function we use toLowercase

marmbrus · 2014-12-02T02:39:40Z

Yeah, that's what I was thinking when I said case insensitive map.
On Dec 1, 2014 6:33 PM, "Reynold Xin" notifications@github.com wrote:

How about moving the toLowerCase into the get funciton itself?

—
Reply to this email directly or view it on GitHub
#3470 (comment).

rxin · 2014-12-02T02:42:28Z

That definitely seems like the better option to me. It makes the options universally lower case.

liancheng · 2014-12-02T04:46:44Z

Actually I'm not sure whether making option names case insensitive a good idea. Semantically, these options are very similar to Hive table properties, which are case sensitive. This makes me think these options should be case sensitive at the very beginning. For simple options that look like keywords, case insensitive might make sense. But we probably want to add dotted option names like partition.defaultName in the future. Another reason is that, case insensitivity is always a source of bugs...

scwf · 2014-12-02T05:13:18Z

Hi @liancheng, diff of options in datasource API and Hive table properties is there are some options very like Keywords and from users they want them case insensitive. After we make them case insensitive users can write PATH for path and also dotted option names work.

SparkQA · 2014-12-02T16:00:02Z

Test build #24048 has started for PR 3470 at commit 3c132ef.

This patch merges cleanly.

SparkQA · 2014-12-02T17:04:03Z

Test build #24048 has finished for PR 3470 at commit 3c132ef.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CaseInsensitiveMap(_baseMap: Map[String, String]) extends Map[String, String]

AmplabJenkins · 2014-12-02T17:04:06Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24048/
Test PASSed.

marmbrus · 2014-12-02T19:17:00Z

sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala

I don't know if we actually want to change the signature here. This is a public API, and while 1.2 hasn't been officially released, some libraries have already been published against the original signature.

Hmm, how about revert to Map here and make case insensitive configurable, so we can write like this:

private[sql] class DefaultSource extends RelationProvider { /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { val caseSensitive = sqlContext.getConf("spark.sql.caseSensitive", "false").toBoolean val paras = if (caseSensitive) { parameters } else { new CaseInsensitiveMap(parameters) } val fileName = paras.getOrElse("path", sys.error("Option 'path' not specified")) val samplingRatio = paras.get("samplingRatio").map(_.toDouble).getOrElse(1.0) JSONRelation(fileName, samplingRatio)(sqlContext) } }

spark.sql.caseSensitive is about identifiers (i.e., attributes and table names). I'd say this is more analogous to keyword case insensitivity. I don't know any database that doesn't treat SELECT and select the same so I'm not sure if that should be configurable.

You can still pass your CaseInsensitiveMap in and it will have the desired effect. Just don't change the function signature.

1 About spark.sql.caseSensitive, my idea is that we do not need make attributes and table names case sensitive, just like hive has done. Configurable there make things complex.
2 "You can still pass your CaseInsensitiveMap in and it will have the desired effect. Just don't change the function signature."
You mean like this?

class DefaultSource extends RelationProvider { /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { val path = new CaseInsensitiveMap(parameters) .getOrElse("path", sys.error("'path' must be specified for parquet tables.")) ParquetRelation2(path)(sqlContext) } }

You mean like this?

No, I mean you should construct the case insensitive map in the parser just like now. The only change is to pass it into createRelation without changing the signature of the createRelation function.

SparkQA · 2014-12-12T06:45:34Z

Test build #24393 has started for PR 3470 at commit 8f4f585.

This patch merges cleanly.

SparkQA · 2014-12-12T08:03:58Z

Test build #24393 has finished for PR 3470 at commit 8f4f585.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CaseInsensitiveMap(_baseMap: Map[String, String]) extends Map[String, String]

AmplabJenkins · 2014-12-12T08:04:01Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24393/
Test PASSed.

scwf · 2014-12-13T01:09:09Z

@marmbrus, updated, is this ok

marmbrus · 2014-12-16T22:47:48Z

sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala

Lets make both the class and the object protected. Also we don't generally use the _underscore convention in Spark SQL.

marmbrus · 2014-12-16T22:50:40Z

Minor style comments otherwise LGTM.

SparkQA · 2014-12-16T23:27:31Z

Test build #24514 has started for PR 3470 at commit ae78509.

This patch merges cleanly.

SparkQA · 2014-12-17T00:40:50Z

Test build #24514 has finished for PR 3470 at commit ae78509.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-17T00:40:53Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24514/
Test PASSed.

scwf · 2014-12-17T00:50:03Z

Updated

scwf added 2 commits November 26, 2014 12:47

make options in-casesensitive

e0cb017

using lower case in json

e244e8d

scwf added 3 commits December 2, 2014 23:42

adding CaseInsensitiveMap

4f86401

Merge branch 'master' of https://github.com/apache/spark into ddl-ulcase

a0fc20b

minor fix

3c132ef

marmbrus reviewed Dec 2, 2014
View reviewed changes

address comments

8f4f585

marmbrus reviewed Dec 16, 2014
View reviewed changes

address comments

ae78509

asfgit closed this in 6069880 Dec 17, 2014

scwf deleted the ddl-ulcase branch January 7, 2015 09:47

[SPARK-4618][SQL] Make foreign DDL commands options case-insensitive #3470

[SPARK-4618][SQL] Make foreign DDL commands options case-insensitive #3470

Uh oh!

Conversation

scwf commented Nov 26, 2014

Uh oh!

SparkQA commented Nov 26, 2014

Uh oh!

SparkQA commented Nov 26, 2014

Uh oh!

AmplabJenkins commented Nov 26, 2014

Uh oh!

marmbrus commented Dec 1, 2014

Uh oh!

rxin commented Dec 2, 2014

Uh oh!

scwf commented Dec 2, 2014

Uh oh!

marmbrus commented Dec 2, 2014

Uh oh!

rxin commented Dec 2, 2014

Uh oh!

liancheng commented Dec 2, 2014

Uh oh!

scwf commented Dec 2, 2014

Uh oh!

SparkQA commented Dec 2, 2014

Uh oh!

SparkQA commented Dec 2, 2014

Uh oh!

AmplabJenkins commented Dec 2, 2014

Uh oh!

marmbrus Dec 2, 2014

Choose a reason for hiding this comment

Uh oh!

scwf Dec 2, 2014

Choose a reason for hiding this comment

Uh oh!

marmbrus Dec 4, 2014

Choose a reason for hiding this comment

Uh oh!

scwf Dec 5, 2014

Choose a reason for hiding this comment

Uh oh!

marmbrus Dec 11, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 12, 2014

Uh oh!

SparkQA commented Dec 12, 2014

Uh oh!

AmplabJenkins commented Dec 12, 2014

Uh oh!

scwf commented Dec 13, 2014

Uh oh!

marmbrus Dec 16, 2014

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Dec 16, 2014

Uh oh!

SparkQA commented Dec 16, 2014

Uh oh!

SparkQA commented Dec 17, 2014

Uh oh!

AmplabJenkins commented Dec 17, 2014

Uh oh!

scwf commented Dec 17, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants