-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8218][SQL] Add binary log math function #6725
Changes from 7 commits
f373bac
c795342
c6c187f
21c3bfd
605574d
ebc9929
23c54a3
5b39c02
3d75bfc
1750034
0634ef7
db7dc38
bc89597
8cf37b7
6089d11
beed631
fd01863
102070d
bf96bd9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -254,3 +254,54 @@ case class Pow(left: Expression, right: Expression) | |
""" | ||
} | ||
} | ||
|
||
object Logarithm { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need this? We should assume people will use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because we want to support the usage of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. take a look at #6806 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok. it solves this problem. |
||
def apply(child: Expression): Expression = new Log(child) | ||
} | ||
|
||
case class Logarithm(left: Expression, right: Expression) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We seem to be doing this throughout the file, but it seems pretty confusing to me to be using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh - one thing is that left/right is coming from BinaryExpression There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that is what I meant saying it wasn't worth whatever code reuse we are getting. The other option would be to name the arguments and have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inheriting from |
||
extends BinaryMathExpression((c1, c2) => math.log(c2) / math.log(c1), "LOG") { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we need to override the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Due to support the case when the left is null in that 10 base logarithm is applied, to override There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, but this is confusing. Is it this lambda that is being used for evaluation or the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be ok because there is other binary math expression There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would argue that that function is also implemented in a confusing way. We should not shoehorn things into the class hierarchy if its going to result hard to follow code. I'd rather we have small amounts of code duplication. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. Sounds reasonable. I think I can refactor this part of codes a little. |
||
override def eval(input: Row): Any = { | ||
val evalE2 = right.eval(input) | ||
if (evalE2 == null) { | ||
null | ||
} else { | ||
val evalE1 = left.eval(input) | ||
var result: Double = 0.0 | ||
if (evalE1 == null) { | ||
result = math.log(evalE2.asInstanceOf[Double]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it standard to assume base 10 if the exponent is null? |
||
} else { | ||
result = math.log(evalE2.asInstanceOf[Double]) / math.log(evalE1.asInstanceOf[Double]) | ||
} | ||
if (result.isNaN) null else result | ||
} | ||
} | ||
|
||
override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = { | ||
if (left.dataType != right.dataType) { | ||
// log.warn(s"${left.dataType} != ${right.dataType}") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove |
||
} | ||
|
||
val eval1 = left.gen(ctx) | ||
val eval2 = right.gen(ctx) | ||
val resultCode = | ||
s"java.lang.Math.log(${eval2.primitive}) / java.lang.Math.log(${eval1.primitive})" | ||
|
||
s""" | ||
${eval2.code} | ||
boolean ${ev.isNull} = ${eval2.isNull}; | ||
${ctx.javaType(dataType)} ${ev.primitive} = ${ctx.defaultValue(dataType)}; | ||
if (!${ev.isNull}) { | ||
${eval1.code} | ||
if (!${eval1.isNull}) { | ||
${ev.primitive} = ${resultCode}; | ||
} else { | ||
${ev.primitive} = java.lang.Math.log(${eval2.primitive}); | ||
} | ||
} | ||
if (Double.valueOf(${ev.primitive}).isNaN()) { | ||
${ev.isNull} = true; | ||
} | ||
""" | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -204,4 +204,18 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper { | |
testBinary(Atan2, math.atan2) | ||
} | ||
|
||
test("binary log") { | ||
val f = (c1: Double, c2: Double) => math.log(c2) / math.log(c1) | ||
val domain = (1 to 20).map(v => (v * 0.1, v * 0.2)) | ||
|
||
domain.foreach { case (v1, v2) => | ||
checkEvaluation(Logarithm(Literal(v1), Literal(v2)), f(v1 + 0.0, v2 + 0.0), EmptyRow) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indent is wrong. |
||
checkEvaluation(Logarithm(Literal(v2), Literal(v1)), f(v2 + 0.0, v1 + 0.0), EmptyRow) | ||
} | ||
// When base is null, Logarithm is as same as Log | ||
checkEvaluation(Logarithm(Literal.create(null, DoubleType), Literal(1.0)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment about wrapping. |
||
math.log(1.0), create_row(null)) | ||
checkEvaluation(Logarithm(Literal(1.0), Literal.create(null, DoubleType)), | ||
null, create_row(null)) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1083,6 +1083,22 @@ object functions { | |
*/ | ||
def log(columnName: String): Column = log(Column(columnName)) | ||
|
||
/** | ||
* Returns the first argument-base logarithm of the second argument. | ||
* | ||
* @group math_funcs | ||
* @since 1.4.0 | ||
*/ | ||
def log(base: Double, a: Column): Column = Logarithm(lit(base).expr, a.expr) | ||
|
||
/** | ||
* Returns the first argument-base logarithm of the second argument. | ||
* | ||
* @group math_funcs | ||
* @since 1.4.0 | ||
*/ | ||
def log(base: Double, a: String): Column = log(base, Column(a)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we rename a to columnName here |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why don't we support specify the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is suggested by @rxin. I think it is reasonable because it is hard to have a use case to returns the logarithm of one column with another column as base. Usually you want to compute the logarithm values for a column with the same base. |
||
/** | ||
* Computes the logarithm of the given value in base 10. | ||
* | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -110,6 +110,19 @@ class DataFrameFunctionsSuite extends QueryTest { | |
testData2.collect().toSeq.map(r => Row(~r.getInt(0)))) | ||
} | ||
|
||
test("log") { | ||
val df = Seq[(Integer, Integer)]((123, null)).toDF("a", "b") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you move this test into the math expression suite? |
||
checkAnswer( | ||
df.select(org.apache.spark.sql.functions.log("a"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: When wrapping, wrap all arguments. Also the indent below is wrong. |
||
org.apache.spark.sql.functions.log(2.0, "a"), | ||
org.apache.spark.sql.functions.log("b")), | ||
Row(math.log(123), math.log(123) / math.log(2), null)) | ||
|
||
checkAnswer( | ||
df.selectExpr("log(a)", "log(2.0, a)", "log(b)"), | ||
Row(math.log(123), math.log(123) / math.log(2), null)) | ||
} | ||
|
||
test("length") { | ||
checkAnswer( | ||
nullStrings.select(strlen($"s"), strlen("s")), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we define this function directly instead of using this magic?