New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8248][SQL] string function: length #6724
Conversation
@rxin can you review this? Once this merged, the other string functions can take this as the example. |
Test build #34512 has finished for PR 6724 at commit
|
Test build #34513 has finished for PR 6724 at commit
|
expression[Lower]("lower"), | ||
expression[Substring]("substr"), | ||
expression[Substring]("substring"), | ||
expression[Length]("length") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you sort this alphabetically
For failed tests, it will be fixed by c8e7cd2 |
How can we support get the length of BinaryType, ArrayType and MapType? |
They are different functions right now. We can still support them even if they are named the same though. |
// string functions | ||
expression[Upper]("lcase"), | ||
expression[Lower]("lower"), | ||
expression[StringLength]("strlen"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't rename this one since we need hive compatibility here... only rename the data frame function.
Test build #34554 has finished for PR 6724 at commit
|
checkEvaluation(StringLength(regEx), 5, create_row("abdef")) | ||
checkEvaluation(StringLength(regEx), 0, create_row("")) | ||
checkEvaluation(StringLength(regEx), null, create_row(null)) | ||
checkEvaluation(StringLength(Literal.create(null, StringType)), null, create_row("abdef")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @davies pointed out, this probably failed in codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pull in his fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I thought @davies will fix this. I will take look at this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea but his fix won't be merged for a while because it's part of a much broader change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can port some of fix from that big PR as a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do that. Take your big PR into smaller ones.
Test build #34555 has finished for PR 6724 at commit
|
Test build #34561 has finished for PR 6724 at commit
|
} | ||
""" | ||
child match { | ||
case Literal(null, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies can you review this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @davies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to fix this in Literal.genCode, or you have do fix it in many places, for example, BinaryExpression also should fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, it will throws exception
Code generation of strlen(null) failed:
[info]
[info] int primitive1 = -1;
[info] if (!true) {
[info] primitive1 = (null).length();
[info] }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will leave this for you @davies, and disable the null test temporally in the unit test.
@@ -1299,6 +1300,19 @@ object functions { | |||
*/ | |||
def toRadians(columnName: String): Column = toRadians(Column(columnName)) | |||
|
|||
/** | |||
* Length of a given string value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computes the length of a string column.
Test build #34572 has finished for PR 6724 at commit
|
Test build #34584 has finished for PR 6724 at commit
|
retest this please |
Test build #34591 has finished for PR 6724 at commit
|
@@ -37,6 +37,7 @@ import org.apache.spark.util.Utils | |||
* @groupname normal_funcs Non-aggregate functions | |||
* @groupname math_funcs Math functions | |||
* @groupname window_funcs Window functions | |||
* @groupname string_funcs functions for DataFrames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"functions for DataFrames " -> "String functions"
Test build #34645 has finished for PR 6724 at commit
|
@rxin, any more comment before merge? |
I will create another PR with bunch of string functions after this merged, it will be great if this goes first. |
Test build #34647 has finished for PR 6724 at commit
|
////////////////////////////////////////////////////////////////////////////////////////////// | ||
|
||
/** | ||
* Computes the length of a given string value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in your next pr, please add a period to the end of this.
OK I'm merging this one. |
Author: Cheng Hao <hao.cheng@intel.com> Closes apache#6724 from chenghao-intel/length and squashes the following commits: aaa3c31 [Cheng Hao] revert the additional change 97148a9 [Cheng Hao] remove the codegen testing temporally ae08003 [Cheng Hao] update the comments 1eb1fd1 [Cheng Hao] simplify the code as commented 3e92d32 [Cheng Hao] use the selectExpr in unit test intead of SQLQuery 3c729aa [Cheng Hao] fix bug for constant null value in codegen 3641f06 [Cheng Hao] keep the length() method for registered function 8e30171 [Cheng Hao] update the code as comment db604ae [Cheng Hao] Add code gen support 548d2ef [Cheng Hao] register the length() 09a0738 [Cheng Hao] add length support
No description provided.