Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8248][SQL] string function: length #6724

Closed
wants to merge 11 commits into from

Conversation

chenghao-intel
Copy link
Contributor

No description provided.

@chenghao-intel
Copy link
Contributor Author

@rxin can you review this? Once this merged, the other string functions can take this as the example.

@SparkQA
Copy link

SparkQA commented Jun 9, 2015

Test build #34512 has finished for PR 6724 at commit 548d2ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SimpleFunctionRegistry extends FunctionRegistry
    • case class Rand(seed: Long) extends RDG(seed)
    • case class Randn(seed: Long) extends RDG(seed)
    • case class Length(child: Expression) extends UnaryExpression with ExpectsInputTypes
    • class StringKeyHashMap[T](normalizer: (String) => String)
    • logInfo(s"Using user defined output committer class $
    • logInfo(s"Using output committer class $

@SparkQA
Copy link

SparkQA commented Jun 9, 2015

Test build #34513 has finished for PR 6724 at commit db604ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Length(child: Expression) extends UnaryExpression with ExpectsInputTypes

expression[Lower]("lower"),
expression[Substring]("substr"),
expression[Substring]("substring"),
expression[Length]("length")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you sort this alphabetically

@davies
Copy link
Contributor

davies commented Jun 9, 2015

For failed tests, it will be fixed by c8e7cd2

@davies
Copy link
Contributor

davies commented Jun 9, 2015

How can we support get the length of BinaryType, ArrayType and MapType?

@rxin
Copy link
Contributor

rxin commented Jun 9, 2015

They are different functions right now. We can still support them even if they are named the same though.

// string functions
expression[Upper]("lcase"),
expression[Lower]("lower"),
expression[StringLength]("strlen"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't rename this one since we need hive compatibility here... only rename the data frame function.

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34554 has finished for PR 6724 at commit 8e30171.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

checkEvaluation(StringLength(regEx), 5, create_row("abdef"))
checkEvaluation(StringLength(regEx), 0, create_row(""))
checkEvaluation(StringLength(regEx), null, create_row(null))
checkEvaluation(StringLength(Literal.create(null, StringType)), null, create_row("abdef"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @davies pointed out, this probably failed in codegen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pull in his fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I thought @davies will fix this. I will take look at this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea but his fix won't be merged for a while because it's part of a much broader change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can port some of fix from that big PR as a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that. Take your big PR into smaller ones.

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34555 has finished for PR 6724 at commit 3641f06.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34561 has finished for PR 6724 at commit 3c729aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

}
"""
child match {
case Literal(null, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies can you review this part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to fix this in Literal.genCode, or you have do fix it in many places, for example, BinaryExpression also should fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, it will throws exception

Code generation of strlen(null) failed:
[info]   
[info]         int primitive1 = -1;
[info]         if (!true) {
[info]           primitive1 = (null).length();
[info]         }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will leave this for you @davies, and disable the null test temporally in the unit test.

@@ -1299,6 +1300,19 @@ object functions {
*/
def toRadians(columnName: String): Column = toRadians(Column(columnName))

/**
* Length of a given string value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computes the length of a string column.

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34572 has finished for PR 6724 at commit 3e92d32.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34584 has finished for PR 6724 at commit 1eb1fd1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

@chenghao-intel
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34591 has finished for PR 6724 at commit 1eb1fd1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

@@ -37,6 +37,7 @@ import org.apache.spark.util.Utils
* @groupname normal_funcs Non-aggregate functions
* @groupname math_funcs Math functions
* @groupname window_funcs Window functions
* @groupname string_funcs functions for DataFrames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"functions for DataFrames " -> "String functions"

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34645 has finished for PR 6724 at commit ae08003.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

@chenghao-intel
Copy link
Contributor Author

@rxin, any more comment before merge?

@chenghao-intel
Copy link
Contributor Author

I will create another PR with bunch of string functions after this merged, it will be great if this goes first.

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34647 has finished for PR 6724 at commit aaa3c31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

//////////////////////////////////////////////////////////////////////////////////////////////

/**
* Computes the length of a given string value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your next pr, please add a period to the end of this.

@rxin
Copy link
Contributor

rxin commented Jun 11, 2015

OK I'm merging this one.

@asfgit asfgit closed this in 9fe3adc Jun 11, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Author: Cheng Hao <hao.cheng@intel.com>

Closes apache#6724 from chenghao-intel/length and squashes the following commits:

aaa3c31 [Cheng Hao] revert the additional change
97148a9 [Cheng Hao] remove the codegen testing temporally
ae08003 [Cheng Hao] update the comments
1eb1fd1 [Cheng Hao] simplify the code as commented
3e92d32 [Cheng Hao] use the selectExpr in unit test intead of SQLQuery
3c729aa [Cheng Hao] fix bug for constant null value in codegen
3641f06 [Cheng Hao] keep the length() method for registered function
8e30171 [Cheng Hao] update the code as comment
db604ae [Cheng Hao] Add code gen support
548d2ef [Cheng Hao] register the length()
09a0738 [Cheng Hao] add length support
@chenghao-intel chenghao-intel deleted the length branch July 2, 2015 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants