Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17203][SQL] data source options should always be case insensitive #14773

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

We don't have a clear definition about When data source options should be case sensitive and when not. Currently path is case-insensitive, numFeatures in LibSVMFileFormat is case-sensitive, maxFilesPerTrigger in FileStreamSource is case-insensitive, etc.

Instead of letting every conf decide whether they should be case-sensitive or not themselves, I think it's better to make it clear that all data source options are case-insensitive.

How was this patch tested?

existing tests.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Aug 23, 2016

cc @yhuai @rxin @gatorsmile

@SparkQA
Copy link

SparkQA commented Aug 23, 2016

Test build #64289 has finished for PR 14773 at commit 2e4368e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 24, 2016

Maybe these options should just case sensitive in general?

@rxin
Copy link
Contributor

rxin commented Aug 26, 2016

Actually I take that back. Just realized we explicitly documented that these options were case insensitive in the data source API.

@@ -65,7 +65,7 @@ case class CreateDataSourceTableCommand(

var isExternal = true
val optionsWithPath =
if (!new CaseInsensitiveMap(options).contains("path") && managedIfNoPath) {
if (!options.contains("path") && managedIfNoPath) {
Copy link
Contributor

@frreiss frreiss Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to create a constant for the string "path" as a data source param. It occurs in quite a few places.

@cloud-fan
Copy link
Contributor Author

I'll back on it after we unify the path option and the locationUri for data source and hive serde tables.

val duplicatedKeys = lowercaseKeys.groupBy(identity).collect {
case (x, ys) if ys.size > 1 => x
}
throw new AnalysisException(
Copy link
Contributor

@clockfly clockfly Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is an AnalysisException? This class is only supposed to be used at driver?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants