Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8218][SQL] Add binary log math function #6725

Closed
wants to merge 19 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jun 9, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-8218

Because there is already log unary function defined, the binary log function is called logarithm for now.

@SparkQA
Copy link

SparkQA commented Jun 9, 2015

Test build #34514 has finished for PR 6725 at commit f373bac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

* @group math_funcs
* @since 1.4.0
*/
def logarithm(l: Column, r: Column): Column = Logarithm(l.expr, r.expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's name the 1st argument as "base", and the second as "a".

and also only keep the following 2 versions:

log(base: Double, a: Column)

log(base: Double, a: String)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I think it is OK to just name this log, and we should move them to where log is right now.

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34588 has finished for PR 6725 at commit c6c187f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jun 10, 2015

Test build #34590 has finished for PR 6725 at commit 21c3bfd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@@ -202,3 +202,27 @@ case class Pow(left: Expression, right: Expression)
"""
}
}

object Logarithm {
def apply(exprs: Seq[Expression]): Expression = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this one do we? just need an apply with one argument, along with the original case class' constructor.

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/functions.scala
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34681 has finished for PR 6725 at commit 605574d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Log2(child: Expression)
    • case class Logarithm(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34710 has finished for PR 6725 at commit ebc9929.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34712 has finished for PR 6725 at commit 23c54a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

def apply(child: Expression): Expression = new Log(child)
}

case class Logarithm(left: Expression, right: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be doing this throughout the file, but it seems pretty confusing to me to be using left and right instead of something like value and base. I'm not sure this is worth whatever code reuse we are getting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - one thing is that left/right is coming from BinaryExpression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what I meant saying it wasn't worth whatever code reuse we are getting. The other option would be to name the arguments and have def left = base, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheriting from BinaryMathExpression seems like a bad idea to me. It is forcing the arguments to have weird names and is resulting in dead code.

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34743 has finished for PR 6725 at commit 5b39c02.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34745 has finished for PR 6725 at commit 3d75bfc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@viirya
Copy link
Member Author

viirya commented Jun 12, 2015

@rxin @marmbrus I think your comments are addressed now.

* returns one output value.
* @param name The short name of the function
*/
abstract class AbstractBinaryMathExpression[T, U, V](name: String)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin, do you think I need to revert this? It is now just used by Atan2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let's not complicate the type hierarchy just for one thing

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34860 has finished for PR 6725 at commit 0634ef7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JoinedRow extends InternalRow
    • class JoinedRow2 extends InternalRow
    • class JoinedRow3 extends InternalRow
    • class JoinedRow4 extends InternalRow
    • class JoinedRow5 extends InternalRow
    • class JoinedRow6 extends InternalRow
    • class BaseOrdering extends Ordering[InternalRow]
    • abstract class AbstractBinaryMathExpression[T, U, V](name: String)
    • case class Logarithm(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34863 has finished for PR 6725 at commit bc89597.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JoinedRow extends InternalRow
    • class JoinedRow2 extends InternalRow
    • class JoinedRow3 extends InternalRow
    • class JoinedRow4 extends InternalRow
    • class JoinedRow5 extends InternalRow
    • class JoinedRow6 extends InternalRow
    • class BaseOrdering extends Ordering[InternalRow]
    • abstract class AbstractBinaryMathExpression[T, U, V](name: String)
    • case class Logarithm(left: Expression, right: Expression)

@@ -143,7 +143,8 @@ def _():
'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
'polar coordinates (r, theta).',
'hypot': 'Computes `sqrt(a^2^ + b^2^)` without intermediate overflow or underflow.',
'pow': 'Returns the value of the first argument raised to the power of the second argument.'
'pow': 'Returns the value of the first argument raised to the power of the second argument.',
'log': 'Returns the first argument-based logarithm of the second argument',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define this function directly instead of using this magic?

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34862 timed out for PR 6725 at commit db7dc38 after a configured wait of 175m.

}

override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.log($c2) / java.lang.Math.log($c1)") + s"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if left == EulerNumber, can we generate the code to just use java.lang.Math.log?

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34867 timed out for PR 6725 at commit 6089d11 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34881 has finished for PR 6725 at commit fd01863.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34885 has finished for PR 6725 at commit 102070d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34888 has finished for PR 6725 at commit bf96bd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Logarithm(left: Expression, right: Expression)

@@ -404,6 +405,21 @@ def when(condition, value):


@since(1.4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.5

@rxin
Copy link
Contributor

rxin commented Jun 18, 2015

Thanks - I'm going to merge this and fix the since version.

@asfgit asfgit closed this in fee3438 Jun 18, 2015
@viirya
Copy link
Member Author

viirya commented Jun 18, 2015

Thanks.

rxin added a commit to rxin/spark that referenced this pull request Jun 18, 2015
Some minor updates based on after merging apache#6725.
asfgit pushed a commit that referenced this pull request Jun 19, 2015
Some minor updates based on after merging #6725.

Author: Reynold Xin <rxin@databricks.com>

Closes #6871 from rxin/log and squashes the following commits:

ab51542 [Reynold Xin] Use JVM log
76fc8de [Reynold Xin] Fixed arg.
a7c1522 [Reynold Xin] [SPARK-8218][SQL] Binary log math function update.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
JIRA: https://issues.apache.org/jira/browse/SPARK-8218

Because there is already `log` unary function defined, the binary log function is called `logarithm` for now.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#6725 from viirya/expr_binary_log and squashes the following commits:

bf96bd9 [Liang-Chi Hsieh] Compare log result in string.
102070d [Liang-Chi Hsieh] Round log result to better comparing in python test.
fd01863 [Liang-Chi Hsieh] For comments.
beed631 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
6089d11 [Liang-Chi Hsieh] Remove unnecessary override.
8cf37b7 [Liang-Chi Hsieh] For comments.
bc89597 [Liang-Chi Hsieh] For comments.
db7dc38 [Liang-Chi Hsieh] Use ctor instead of companion object.
0634ef7 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
1750034 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
3d75bfc [Liang-Chi Hsieh] Fix scala style.
5b39c02 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
23c54a3 [Liang-Chi Hsieh] Fix scala style.
ebc9929 [Liang-Chi Hsieh] Let Logarithm accept one parameter too.
605574d [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
21c3bfd [Liang-Chi Hsieh] Fix scala style.
c6c187f [Liang-Chi Hsieh] For comments.
c795342 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into expr_binary_log
f373bac [Liang-Chi Hsieh] Add binary log expression.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Some minor updates based on after merging apache#6725.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#6871 from rxin/log and squashes the following commits:

ab51542 [Reynold Xin] Use JVM log
76fc8de [Reynold Xin] Fixed arg.
a7c1522 [Reynold Xin] [SPARK-8218][SQL] Binary log math function update.
asfgit pushed a commit that referenced this pull request Jul 2, 2015
This is a follow up of [SPARK-8283](https://issues.apache.org/jira/browse/SPARK-8283) ([PR-6828](#6828)), to support both `struct` and `named_struct` in Spark SQL.

After [#6725](#6828), the semantic of [`CreateStruct`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala#L56) methods have changed a little and do not limited to cols of `NamedExpressions`, it will name non-NamedExpression fields following the hive convention, col1, col2 ...

This PR would both loosen [`struct`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L723) to take children of `Expression` type and add `named_struct` support.

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #6874 from yijieshen/SPARK-8283 and squashes the following commits:

4cd3375 [Yijie Shen] change struct documentation
d599d0b [Yijie Shen] rebase code
9a7039e [Yijie Shen] fix reviews and regenerate golden answers
b487354 [Yijie Shen] replace assert using checkAnswer
f07e114 [Yijie Shen] tiny fix
9613be9 [Yijie Shen] review fix
7fef712 [Yijie Shen] Fix checkInputTypes' implementation using foldable and nullable
60812a7 [Yijie Shen] Fix type check
828d694 [Yijie Shen] remove unnecessary resolved assertion inside dataType method
fd3cd8e [Yijie Shen] remove type check from eval
7a71255 [Yijie Shen] tiny fix
ccbbd86 [Yijie Shen] Fix reviews
47da332 [Yijie Shen] remove nameStruct API from DataFrame
917e680 [Yijie Shen] Fix reviews
4bd75ad [Yijie Shen] loosen struct method in functions.scala to take Expression children
0acb7be [Yijie Shen] Add CreateNamedStruct in both DataFrame function API and FunctionRegistery
@viirya viirya deleted the expr_binary_log branch December 27, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants