-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4618][SQL] Make foreign DDL commands options case-insensitive #3470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #23877 has started for PR 3470 at commit
|
|
Test build #23877 has finished for PR 3470 at commit
|
|
Test PASSed. |
|
I think this is probably a reasonable semantic to have given keywords are not generally case sensitive in SQL, but I think that we need to add some documentation to thoughts @liancheng @rxin? Also, given the current implementation, I would just statically lowercase samplingRatio instead of using toLower. |
|
How about moving the toLowerCase into the get funciton itself? |
|
Yes, We can implement a case insensitive map and in this map's |
|
Yeah, that's what I was thinking when I said case insensitive map.
|
|
That definitely seems like the better option to me. It makes the options universally lower case. |
|
Actually I'm not sure whether making option names case insensitive a good idea. Semantically, these options are very similar to Hive table properties, which are case sensitive. This makes me think these options should be case sensitive at the very beginning. For simple options that look like keywords, case insensitive might make sense. But we probably want to add dotted option names like |
|
Hi @liancheng, diff of options in datasource API and Hive table properties is there are some options very like Keywords and from users they want them case insensitive. After we make them case insensitive users can write |
|
Test build #24048 has started for PR 3470 at commit
|
|
Test build #24048 has finished for PR 3470 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we actually want to change the signature here. This is a public API, and while 1.2 hasn't been officially released, some libraries have already been published against the original signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how about revert to Map here and make case insensitive configurable, so we can write like this:
private[sql] class DefaultSource extends RelationProvider {
/** Returns a new base relation with the given parameters. */
override def createRelation(
sqlContext: SQLContext,
parameters: Map[String, String]): BaseRelation = {
val caseSensitive = sqlContext.getConf("spark.sql.caseSensitive", "false").toBoolean
val paras =
if (caseSensitive) {
parameters
} else {
new CaseInsensitiveMap(parameters)
}
val fileName = paras.getOrElse("path", sys.error("Option 'path' not specified"))
val samplingRatio = paras.get("samplingRatio").map(_.toDouble).getOrElse(1.0)
JSONRelation(fileName, samplingRatio)(sqlContext)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.caseSensitive is about identifiers (i.e., attributes and table names). I'd say this is more analogous to keyword case insensitivity. I don't know any database that doesn't treat SELECT and select the same so I'm not sure if that should be configurable.
You can still pass your CaseInsensitiveMap in and it will have the desired effect. Just don't change the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 About spark.sql.caseSensitive, my idea is that we do not need make attributes and table names case sensitive, just like hive has done. Configurable there make things complex.
2 "You can still pass your CaseInsensitiveMap in and it will have the desired effect. Just don't change the function signature."
You mean like this?
class DefaultSource extends RelationProvider {
/** Returns a new base relation with the given parameters. */
override def createRelation(
sqlContext: SQLContext,
parameters: Map[String, String]): BaseRelation = {
val path = new CaseInsensitiveMap(parameters)
.getOrElse("path", sys.error("'path' must be specified for parquet tables."))
ParquetRelation2(path)(sqlContext)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like this?
No, I mean you should construct the case insensitive map in the parser just like now. The only change is to pass it into createRelation without changing the signature of the createRelation function.
|
Test build #24393 has started for PR 3470 at commit
|
|
Test build #24393 has finished for PR 3470 at commit
|
|
Test PASSed. |
|
@marmbrus, updated, is this ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make both the class and the object protected. Also we don't generally use the _underscore convention in Spark SQL.
|
Minor style comments otherwise LGTM. |
|
Test build #24514 has started for PR 3470 at commit
|
|
Test build #24514 has finished for PR 3470 at commit
|
|
Test PASSed. |
|
Updated |
Using lowercase for
optionskey to make it case-insensitive, then we should use lower case to get value from parameters.So flowing cmd work