New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8263][SQL] substr/substring should also support binary type #7641
Conversation
Test build #38335 has finished for PR 7641 at commit
|
ISTM that So, how about including this fix in this pr? |
str.dataType match { | ||
case StringType => stringEval.asInstanceOf[UTF8String] | ||
.substringSQL(posEval.asInstanceOf[Int], lenEval.asInstanceOf[Int]) | ||
case BinaryType => Substring.subStringBinarySQL(stringEval.asInstanceOf[Array[Byte]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substring#subStringBinarySQL
is similar to the codes of UTF8String.
So, how about reusing them?
UTF8String.fromBytes(stringEval.asInstanceOf[Array[Byte]).substringSQL(...).getBytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substring#subStringBinarySQL
is created on purpose for the index in byte, but the substringSQL
is operate on the index of code point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood.
@@ -672,6 +672,38 @@ case class StringSplit(str: Expression, pattern: Expression) | |||
override def prettyName: String = "split" | |||
} | |||
|
|||
object Substring { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark as private[sql]
? And add the scaladoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's ok - things in catalyst doesn't need to be private (the entire package is considered private)
We also need to update the scaladoc for the dataframe API for |
Test build #39168 has finished for PR 7641 at commit
|
Test build #39172 has finished for PR 7641 at commit
|
@@ -672,6 +672,38 @@ case class StringSplit(str: Expression, pattern: Expression) | |||
override def prettyName: String = "split" | |||
} | |||
|
|||
object Substring { | |||
|
|||
private def makeIndex(pos: Int, len: Int, inputLen: Int): (Int, Int) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to inline this.
LGTM, I will fix the conflict and merge it, thanks! |
No description provided.