-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8218][SQL] Add binary log math function #6725
Conversation
Test build #34514 has finished for PR 6725 at commit
|
* @group math_funcs | ||
* @since 1.4.0 | ||
*/ | ||
def logarithm(l: Column, r: Column): Column = Logarithm(l.expr, r.expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's name the 1st argument as "base", and the second as "a".
and also only keep the following 2 versions:
log(base: Double, a: Column)
log(base: Double, a: String)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I think it is OK to just name this log, and we should move them to where log is right now.
Test build #34588 has finished for PR 6725 at commit
|
Test build #34590 has finished for PR 6725 at commit
|
@@ -202,3 +202,27 @@ case class Pow(left: Expression, right: Expression) | |||
""" | |||
} | |||
} | |||
|
|||
object Logarithm { | |||
def apply(exprs: Seq[Expression]): Expression = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this one do we? just need an apply with one argument, along with the original case class' constructor.
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/functions.scala sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
Test build #34681 has finished for PR 6725 at commit
|
Test build #34710 has finished for PR 6725 at commit
|
Test build #34712 has finished for PR 6725 at commit
|
def apply(child: Expression): Expression = new Log(child) | ||
} | ||
|
||
case class Logarithm(left: Expression, right: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be doing this throughout the file, but it seems pretty confusing to me to be using left
and right
instead of something like value
and base
. I'm not sure this is worth whatever code reuse we are getting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - one thing is that left/right is coming from BinaryExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is what I meant saying it wasn't worth whatever code reuse we are getting. The other option would be to name the arguments and have def left = base
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inheriting from BinaryMathExpression
seems like a bad idea to me. It is forcing the arguments to have weird names and is resulting in dead code.
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
Test build #34743 has finished for PR 6725 at commit
|
Test build #34745 has finished for PR 6725 at commit
|
* returns one output value. | ||
* @param name The short name of the function | ||
*/ | ||
abstract class AbstractBinaryMathExpression[T, U, V](name: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin, do you think I need to revert this? It is now just used by Atan2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let's not complicate the type hierarchy just for one thing
Test build #34860 has finished for PR 6725 at commit
|
Test build #34863 has finished for PR 6725 at commit
|
@@ -143,7 +143,8 @@ def _(): | |||
'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' + | |||
'polar coordinates (r, theta).', | |||
'hypot': 'Computes `sqrt(a^2^ + b^2^)` without intermediate overflow or underflow.', | |||
'pow': 'Returns the value of the first argument raised to the power of the second argument.' | |||
'pow': 'Returns the value of the first argument raised to the power of the second argument.', | |||
'log': 'Returns the first argument-based logarithm of the second argument', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we define this function directly instead of using this magic?
Test build #34862 timed out for PR 6725 at commit |
} | ||
|
||
override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = { | ||
defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.log($c2) / java.lang.Math.log($c1)") + s""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if left == EulerNumber, can we generate the code to just use java.lang.Math.log?
Test build #34867 timed out for PR 6725 at commit |
Test build #34881 has finished for PR 6725 at commit
|
Test build #34885 has finished for PR 6725 at commit
|
Test build #34888 has finished for PR 6725 at commit
|
@@ -404,6 +405,21 @@ def when(condition, value): | |||
|
|||
|
|||
@since(1.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.5
Thanks - I'm going to merge this and fix the since version. |
Thanks. |
Some minor updates based on after merging apache#6725.
JIRA: https://issues.apache.org/jira/browse/SPARK-8218 Because there is already `log` unary function defined, the binary log function is called `logarithm` for now. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#6725 from viirya/expr_binary_log and squashes the following commits: bf96bd9 [Liang-Chi Hsieh] Compare log result in string. 102070d [Liang-Chi Hsieh] Round log result to better comparing in python test. fd01863 [Liang-Chi Hsieh] For comments. beed631 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log 6089d11 [Liang-Chi Hsieh] Remove unnecessary override. 8cf37b7 [Liang-Chi Hsieh] For comments. bc89597 [Liang-Chi Hsieh] For comments. db7dc38 [Liang-Chi Hsieh] Use ctor instead of companion object. 0634ef7 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log 1750034 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log 3d75bfc [Liang-Chi Hsieh] Fix scala style. 5b39c02 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log 23c54a3 [Liang-Chi Hsieh] Fix scala style. ebc9929 [Liang-Chi Hsieh] Let Logarithm accept one parameter too. 605574d [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log 21c3bfd [Liang-Chi Hsieh] Fix scala style. c6c187f [Liang-Chi Hsieh] For comments. c795342 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log f373bac [Liang-Chi Hsieh] Add binary log expression.
Some minor updates based on after merging apache#6725. Author: Reynold Xin <rxin@databricks.com> Closes apache#6871 from rxin/log and squashes the following commits: ab51542 [Reynold Xin] Use JVM log 76fc8de [Reynold Xin] Fixed arg. a7c1522 [Reynold Xin] [SPARK-8218][SQL] Binary log math function update.
This is a follow up of [SPARK-8283](https://issues.apache.org/jira/browse/SPARK-8283) ([PR-6828](#6828)), to support both `struct` and `named_struct` in Spark SQL. After [#6725](#6828), the semantic of [`CreateStruct`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala#L56) methods have changed a little and do not limited to cols of `NamedExpressions`, it will name non-NamedExpression fields following the hive convention, col1, col2 ... This PR would both loosen [`struct`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L723) to take children of `Expression` type and add `named_struct` support. Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #6874 from yijieshen/SPARK-8283 and squashes the following commits: 4cd3375 [Yijie Shen] change struct documentation d599d0b [Yijie Shen] rebase code 9a7039e [Yijie Shen] fix reviews and regenerate golden answers b487354 [Yijie Shen] replace assert using checkAnswer f07e114 [Yijie Shen] tiny fix 9613be9 [Yijie Shen] review fix 7fef712 [Yijie Shen] Fix checkInputTypes' implementation using foldable and nullable 60812a7 [Yijie Shen] Fix type check 828d694 [Yijie Shen] remove unnecessary resolved assertion inside dataType method fd3cd8e [Yijie Shen] remove type check from eval 7a71255 [Yijie Shen] tiny fix ccbbd86 [Yijie Shen] Fix reviews 47da332 [Yijie Shen] remove nameStruct API from DataFrame 917e680 [Yijie Shen] Fix reviews 4bd75ad [Yijie Shen] loosen struct method in functions.scala to take Expression children 0acb7be [Yijie Shen] Add CreateNamedStruct in both DataFrame function API and FunctionRegistery
JIRA: https://issues.apache.org/jira/browse/SPARK-8218
Because there is already
log
unary function defined, the binary log function is calledlogarithm
for now.