Skip to content

Conversation

@scwf
Copy link
Contributor

@scwf scwf commented Nov 26, 2014

Using lowercase for options key to make it case-insensitive, then we should use lower case to get value from parameters.
So flowing cmd work

      create temporary table normal_parquet
      USING org.apache.spark.sql.parquet
      OPTIONS (
        PATH '/xxx/data'
      )

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23877 has started for PR 3470 at commit e244e8d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23877 has finished for PR 3470 at commit e244e8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23877/
Test PASSed.

@marmbrus
Copy link
Contributor

marmbrus commented Dec 1, 2014

I think this is probably a reasonable semantic to have given keywords are not generally case sensitive in SQL, but I think that we need to add some documentation to DataSource that says that this is the case. You might even go so far as to provide a case insensitive map to avoid confusion when developers try to do lookups with keywords that contain capital letters.

thoughts @liancheng @rxin?

Also, given the current implementation, I would just statically lowercase samplingRatio instead of using toLower.

@rxin
Copy link
Contributor

rxin commented Dec 2, 2014

How about moving the toLowerCase into the get funciton itself?

@scwf
Copy link
Contributor Author

scwf commented Dec 2, 2014

Yes, We can implement a case insensitive map and in this map's get function we use toLowercase

@marmbrus
Copy link
Contributor

marmbrus commented Dec 2, 2014

Yeah, that's what I was thinking when I said case insensitive map.
On Dec 1, 2014 6:33 PM, "Reynold Xin" notifications@github.com wrote:

How about moving the toLowerCase into the get funciton itself?


Reply to this email directly or view it on GitHub
#3470 (comment).

@rxin
Copy link
Contributor

rxin commented Dec 2, 2014

That definitely seems like the better option to me. It makes the options universally lower case.

@liancheng
Copy link
Contributor

Actually I'm not sure whether making option names case insensitive a good idea. Semantically, these options are very similar to Hive table properties, which are case sensitive. This makes me think these options should be case sensitive at the very beginning. For simple options that look like keywords, case insensitive might make sense. But we probably want to add dotted option names like partition.defaultName in the future. Another reason is that, case insensitivity is always a source of bugs...

@scwf
Copy link
Contributor Author

scwf commented Dec 2, 2014

Hi @liancheng, diff of options in datasource API and Hive table properties is there are some options very like Keywords and from users they want them case insensitive. After we make them case insensitive users can write PATH for path and also dotted option names work.

@SparkQA
Copy link

SparkQA commented Dec 2, 2014

Test build #24048 has started for PR 3470 at commit 3c132ef.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 2, 2014

Test build #24048 has finished for PR 3470 at commit 3c132ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CaseInsensitiveMap(_baseMap: Map[String, String]) extends Map[String, String]

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24048/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we actually want to change the signature here. This is a public API, and while 1.2 hasn't been officially released, some libraries have already been published against the original signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how about revert to Map here and make case insensitive configurable, so we can write like this:

private[sql] class DefaultSource extends RelationProvider {
  /** Returns a new base relation with the given parameters. */
  override def createRelation(
      sqlContext: SQLContext,
      parameters: Map[String, String]): BaseRelation = {
    val caseSensitive = sqlContext.getConf("spark.sql.caseSensitive", "false").toBoolean
    val paras =
    if (caseSensitive) {
      parameters
    } else {
      new CaseInsensitiveMap(parameters)
    }
    val fileName = paras.getOrElse("path", sys.error("Option 'path' not specified"))
    val samplingRatio = paras.get("samplingRatio").map(_.toDouble).getOrElse(1.0)

    JSONRelation(fileName, samplingRatio)(sqlContext)
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.sql.caseSensitive is about identifiers (i.e., attributes and table names). I'd say this is more analogous to keyword case insensitivity. I don't know any database that doesn't treat SELECT and select the same so I'm not sure if that should be configurable.

You can still pass your CaseInsensitiveMap in and it will have the desired effect. Just don't change the function signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 About spark.sql.caseSensitive, my idea is that we do not need make attributes and table names case sensitive, just like hive has done. Configurable there make things complex.
2 "You can still pass your CaseInsensitiveMap in and it will have the desired effect. Just don't change the function signature."
You mean like this?

class DefaultSource extends RelationProvider {
  /** Returns a new base relation with the given parameters. */
  override def createRelation(
      sqlContext: SQLContext,
      parameters: Map[String, String]): BaseRelation = {
    val path = new CaseInsensitiveMap(parameters)
      .getOrElse("path", sys.error("'path' must be specified for parquet tables."))

    ParquetRelation2(path)(sqlContext)
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

No, I mean you should construct the case insensitive map in the parser just like now. The only change is to pass it into createRelation without changing the signature of the createRelation function.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24393 has started for PR 3470 at commit 8f4f585.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 12, 2014

Test build #24393 has finished for PR 3470 at commit 8f4f585.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CaseInsensitiveMap(_baseMap: Map[String, String]) extends Map[String, String]

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24393/
Test PASSed.

@scwf
Copy link
Contributor Author

scwf commented Dec 13, 2014

@marmbrus, updated, is this ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make both the class and the object protected. Also we don't generally use the _underscore convention in Spark SQL.

@marmbrus
Copy link
Contributor

Minor style comments otherwise LGTM.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24514 has started for PR 3470 at commit ae78509.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 17, 2014

Test build #24514 has finished for PR 3470 at commit ae78509.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24514/
Test PASSed.

@scwf
Copy link
Contributor Author

scwf commented Dec 17, 2014

Updated

@asfgit asfgit closed this in 6069880 Dec 17, 2014
@scwf scwf deleted the ddl-ulcase branch January 7, 2015 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants