-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8264][SQL]add substring_index function #7533
Conversation
Test build #37824 has finished for PR 7533 at commit
|
Test build #37941 has finished for PR 7533 at commit
|
Test build #38006 has finished for PR 7533 at commit
|
|
||
override def dataType: DataType = StringType | ||
override def inputTypes: Seq[DataType] = Seq(StringType, StringType, IntegerType) | ||
override def nullable: Boolean = strExpr.nullable || delimExpr.nullable || countExpr.nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to override the nullable
method, as the default value is false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's abstract in Expression which Substring_index inherited from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad, actually I mean the function foldable
, not the nullable
.
@chenghao-intel just refactor the code to remedy the problem of invoking numChars to much, but I still think we should add numChars as a field to UTF8String since it's quite an useful function. |
@rxin @chenghao-intel should I turn these functions(ordinalIndexOf, lastOrdinalIndexOf and subStringIndex) to be public in UTF8String? I guess they would be useful but on one use them except for substring_index UDF. |
Test build #38058 has finished for PR 7533 at commit
|
retest this please. |
Test build #55 has finished for PR 7533 at commit
|
Test build #38063 has finished for PR 7533 at commit
|
* @group string_funcs | ||
* @since 1.5.0 | ||
*/ | ||
def substring_index(str: String, delim: String, count: Int): Column = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this version of API. @rxin actually made some clean up, and removed the string
(the column name) version API.
Test build #38186 has finished for PR 7533 at commit
|
* @return the n-th last index of the search String, | ||
* <code>-1</code> if no match or <code>null</code> string input | ||
*/ | ||
public static int lastOrdinalIndexOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better not make it the static function, like the indexOf
, it mean we are locating the substring from CURRENT string.
} | ||
bytePos--; | ||
} | ||
throw new RuntimeException("Invalid UTF8 string"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give more verbose info, like this.toString()
?
Test build #38294 has finished for PR 7533 at commit
|
Test build #38298 has finished for PR 7533 at commit
|
* right) is returned. substring_index performs a case-sensitive match when searching for delim. | ||
*/ | ||
public UTF8String subStringIndex(UTF8String delim, int count) { | ||
if (delim == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont' need to check the null value here, as it's done in the expression side.
Test build #38317 has finished for PR 7533 at commit
|
@rxin it looks a few better now. could you take a look at this? |
} | ||
bytePos--; | ||
} | ||
throw new RuntimeException("Invalid UTF8 string: " + toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: throws more concrete exception like IllegalArgumentException
or IllegalCharacterException
etc.
* right) is returned. substring_index performs a case-sensitive match when searching for delim. | ||
*/ | ||
case class Substring_index(strExpr: Expression, delimExpr: Expression, countExpr: Expression) | ||
extends Expression with ImplicitCastInputTypes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use TernaryExpression ?
@zhichao-li Do you mind me to take over this one? |
No description provided.