[SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string#12175
[SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string#12175dongjoon-hyun wants to merge 4 commits intoapache:masterfrom dongjoon-hyun:SPARK-14402
Conversation
…tters in lowercase
|
Test build #54976 has finished for PR 12175 at commit
|
… stringExpression.
|
Hi, @srowen . I minimized the change on master.
override def nullSafeEval(string: Any): Any = {
- string.asInstanceOf[UTF8String].toTitleCase
+ string.asInstanceOf[UTF8String].toLowerCase.toTitleCase
}
override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
- defineCodeGen(ctx, ev, str => s"$str.toTitleCase()")
+ defineCodeGen(ctx, ev, str => s"$str.toLowerCase().toTitleCase()")
}I think it's enough for |
|
I think that's pretty reasonable as a minimally invasive fix. CC @marmbrus for visibility as it's technically a behavior change |
|
Thank you, @srowen ! |
|
It does seem reasonable to match hive since that was probably the original intention. I've tagged the JIRA for inclusion in the release notes. A few comments:
|
|
Thank you, @marmbrus . I will update the scala docand add description annotation for InitCap. |
|
Test build #55000 has finished for PR 12175 at commit
|
|
Test build #55006 has finished for PR 12175 at commit
|
|
Test build #55009 has finished for PR 12175 at commit
|
|
Thanks, merging to master. |
What changes were proposed in this pull request?
Current, SparkSQL
initCapis usingtoTitleCasefunction. However,UTF8String.toTitleCaseimplementation changes only the first letter and just copy the other letters: e.g. sParK --> SParK. This is the correct implementationtoTitleCase.This PR updates the implementation of
initcapusingtoLowerCaseandtoTitleCase.How was this patch tested?
Pass the Jenkins tests (including new testcase).