-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8245][SQL] FormatNumber/Length Support for Expression #7034
Conversation
case e if !e.childrenResolved => e | ||
|
||
case e: ExpressionConstraint => | ||
val newChildren = e.children.zip(e.constraint).map { case (expr, constraint) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we require e.children.length == e.constraint.length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that also, but this is will be solved during the debugging time, and in a release version, we never run into the case e.children.length != e.constraint.length
, doesn't it?
Thanks for doing it! We do need an abstraction to do type checking and supported-type cast together, the |
Test build #35836 has finished for PR 7034 at commit
|
This seems too complicated. Why not just rename ExpectsInputTypes to AutoCastInputTypes, and then add another ExpectsInputTypes to do type checking without auto casting? |
Thank you all for reviewing the code & suggestions. Recently, we've been working on adding more expression, but always be challenged for the data type supporting / casting issues, that's would be great if we can figure out how to solve that ASAP, either in this PR or someone else's PR. |
Test build #35913 has finished for PR 7034 at commit
|
retest this please |
Test build #35933 has finished for PR 7034 at commit
|
* Accept all of the data types, except the [[UserDefinedType]] for the child expression. | ||
*/ | ||
case object AcceptAllTypeExceptUserDefinedType extends AcceptType { | ||
def accept(dt: DataType): Boolean = if (dt.isInstanceOf[UserDefinedType[_]]) false else true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!(dt.isInstanceOf[UserDefinedType[_]])
Jira: https://issues.apache.org/jira/browse/SPARK-8223 https://issues.apache.org/jira/browse/SPARK-8224 ~~I am aware of #7174 and will update this pr, if it's merged.~~ Done I don't know if #7034 can simplify this, but we can have a look on it, if it gets merged rxin In the Jira ticket the function as no second argument. I added a `numBits` argument that allows to specify the number of bits. I guess this improves the usability. I wanted to add `shiftleft(value)` as well, but the `selectExpr` dataframe tests crashes, if I have both. I order to do this, I added the following to the functions.scala `def shiftRight(e: Column): Column = ShiftRight(e.expr, lit(1).expr)`, but as I mentioned this doesn't pass tests like `df.selectExpr("shiftRight(a)", ...` (not enough arguments exception). If we need the bitwise shift in order to be hive compatible, I suggest to add `shiftLeft` and something like `shiftLeftX` Author: Tarek Auel <tarek.auel@googlemail.com> Closes #7178 from tarekauel/8223 and squashes the following commits: 8023bb5 [Tarek Auel] [SPARK-8223][SPARK-8224] fixed test f3f64e6 [Tarek Auel] [SPARK-8223][SPARK-8224] Integer -> Int f628706 [Tarek Auel] [SPARK-8223][SPARK-8224] removed toString; updated function description 3b56f2a [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223 5189690 [Tarek Auel] [SPARK-8223][SPARK-8224] minor fix and style fix 9434a28 [Tarek Auel] Merge remote-tracking branch 'origin/master' into 8223 44ee324 [Tarek Auel] [SPARK-8223][SPARK-8224] docu fix ac7fe9d [Tarek Auel] [SPARK-8223][SPARK-8224] right and left bit shift
Can this be closed now that #5796 is merged? |
It still have the expression |
58e67ef
to
73a7cc3
Compare
child.dataType match { | ||
case StringType => defineCodeGen(ctx, ev, c => s"($c).numChars()") | ||
case BinaryType => defineCodeGen(ctx, ev, c => s"($c).length") | ||
case NullType => defineCodeGen(ctx, ev, c => s"-1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need to support NullType here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will causes exception in StringFunctionSuite
, as we will not run Analyzer
at all there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just remove that test case, can't you?
checkEvaluation(Length(Literal.create(null, NullType)), null, create_row(null))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yes, we can do that now since you've handled the NullType in a single place.
* and returns the result as a string. If D is 0, the result has no decimal point or | ||
* fractional part. | ||
*/ | ||
case class FormatNumber(x: Expression, d: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override prettyName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, yes, it's done, but in the end of this class code.
LGTM other than that. |
Test build #37333 has finished for PR 7034 at commit
|
Test build #37338 has finished for PR 7034 at commit
|
@chenghao-intel you need to update Python to reflect the strlen -> length naming change. |
oh, yes, thanks for the reminding. |
f282180
to
601bbf5
Compare
Test build #37436 has finished for PR 7034 at commit
|
LGTM |
Test build #37443 has finished for PR 7034 at commit
|
Thanks - merging this. |
BinaryType
forLength
FormatNumber