Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8263][SQL] substr/substring should also support binary type #7641

Closed
wants to merge 3 commits into from

Conversation

zhichao-li
Copy link
Contributor

No description provided.

@SparkQA
Copy link

SparkQA commented Jul 24, 2015

Test build #38335 has finished for PR 7641 at commit 4f68bfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 29, 2015

ISTM that Concat also needs to support binary types according to Hive.

So, how about including this fix in this pr?

str.dataType match {
case StringType => stringEval.asInstanceOf[UTF8String]
.substringSQL(posEval.asInstanceOf[Int], lenEval.asInstanceOf[Int])
case BinaryType => Substring.subStringBinarySQL(stringEval.asInstanceOf[Array[Byte]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substring#subStringBinarySQL is similar to the codes of UTF8String.
So, how about reusing them?

UTF8String.fromBytes(stringEval.asInstanceOf[Array[Byte]).substringSQL(...).getBytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substring#subStringBinarySQL is created on purpose for the index in byte, but the substringSQL is operate on the index of code point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

@zhichao-li
Copy link
Contributor Author

cc @rxin @davies @chenghao-intel

@@ -672,6 +672,38 @@ case class StringSplit(str: Expression, pattern: Expression)
override def prettyName: String = "split"
}

object Substring {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark as private[sql]? And add the scaladoc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok - things in catalyst doesn't need to be private (the entire package is considered private)

@chenghao-intel
Copy link
Contributor

We also need to update the scaladoc for the dataframe API for substring, as it now supports the binary type.

@SparkQA
Copy link

SparkQA commented Jul 31, 2015

Test build #39168 has finished for PR 7641 at commit 99aa130.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 31, 2015

Test build #39172 has finished for PR 7641 at commit 01d795e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -672,6 +672,38 @@ case class StringSplit(str: Expression, pattern: Expression)
override def prettyName: String = "split"
}

object Substring {

private def makeIndex(pos: Int, len: Int, inputLen: Int): (Int, Int) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to inline this.

@davies
Copy link
Contributor

davies commented Aug 1, 2015

LGTM, I will fix the conflict and merge it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants