New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24393][SQL] SQL builtin: isinf #21482
Changes from 2 commits
bcdaab2
9ab0eb2
069f9d9
f34bfdc
a6c3903
7e396f7
13b5aaa
d381f0c
432c61b
663fa47
f240fdf
6a4d46e
559900a
b727838
6bd6735
be23549
cb8f9d0
d60aa21
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -281,6 +281,8 @@ exportMethods("%<=>%", | |
"initcap", | ||
"input_file_name", | ||
"instr", | ||
"isInf", | ||
"isinf", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add tests for these? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added tests in my latest commit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really don't understand why we have both. It usually has the ones matching to Scala side or R specific function. Otherwise, I don't think we should have both. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know if there's any consistency behind the different capitalization schemes? There's If not, how about we just go with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we shouldn't add other variants unless there's a clear reason. It sounds like we are adding this for no reason.
Yea, please. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I commented on this here #21482 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I got it now. I roughly remember we keep functions this_naming_style in functions[.py|.R|.scala], e.g.(SPARK-10621). |
||
"isNaN", | ||
"isNotNull", | ||
"isNull", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -907,6 +907,30 @@ setMethod("initcap", | |
column(jc) | ||
}) | ||
|
||
#' @details | ||
#' \code{isinf}: Returns true if the column is Infinity. | ||
#' @rdname column_nonaggregate_functions | ||
#' @aliases isnan isnan,Column-method | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
#' @note isinf since 2.4.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing |
||
setMethod("isinf", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need to have duplicate method definition like this. Maybe we can follow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be alright if I omit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For |
||
signature(x = "Column"), | ||
function(x) { | ||
jc <- callJStatic("org.apache.spark.sql.functions", "isinf", x@jc) | ||
column(jc) | ||
}) | ||
|
||
#' @details | ||
#' \code{isInf}: Returns true if the column is Infinity. | ||
#' @rdname column_nonaggregate_functions | ||
#' @aliases isnan isnan,Column-method | ||
#' @note isinf since 2.4.0 | ||
setMethod("isInf", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. R has There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea, but we might not have a way to extend it (sort of)
It looks like S3 without a generic. |
||
signature(x = "Column"), | ||
function(x) { | ||
jc <- callJStatic("org.apache.spark.sql.functions", "isinf", x@jc) | ||
column(jc) | ||
}) | ||
|
||
#' @details | ||
#' \code{isnan}: Returns true if the column is NaN. | ||
#' @rdname column_nonaggregate_functions | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -695,6 +695,12 @@ setGeneric("getField", function(x, ...) { standardGeneric("getField") }) | |
#' @rdname columnfunctions | ||
setGeneric("getItem", function(x, ...) { standardGeneric("getItem") }) | ||
|
||
#' @rdname columnfunctions | ||
setGeneric("isInf", function(x) { standardGeneric("isInf") }) | ||
|
||
#' @rdname columnfunctions | ||
setGeneric("isinf", function(x) { standardGeneric("isinf") }) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
#' @rdname columnfunctions | ||
setGeneric("isNaN", function(x) { standardGeneric("isNaN") }) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -468,6 +468,18 @@ def input_file_name(): | |
return Column(sc._jvm.functions.input_file_name()) | ||
|
||
|
||
@since(2.4) | ||
def isinf(col): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we expose this to column.py too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you want me to add the function to column.py? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HyukjinKwon could you clarify, please? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, please because I see it's exposed in Column.scala. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added it in my latest commit |
||
"""An expression that returns true iff the column is NaN. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto. is this the same with |
||
|
||
>>> df = spark.createDataFrame([(1.0, float('inf')), (float('inf'), 2.0)], ("a", "b")) | ||
>>> df.select(isinf("a").alias("r1"), isinf(df.a).alias("r2")).collect() | ||
[Row(r1=False, r2=False), Row(r1=True, r2=True)] | ||
""" | ||
sc = SparkContext._active_spark_context | ||
return Column(sc._jvm.functions.isinf(_to_java_column(col))) | ||
|
||
|
||
@since(1.6) | ||
def isnan(col): | ||
"""An expression that returns true iff the column is NaN. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: | |
override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" | ||
} | ||
|
||
/** | ||
* Evaluates to `true` iff it's Infinity. | ||
*/ | ||
@ExpressionDescription( | ||
usage = "_FUNC_(expr) - Returns True if expr evaluates to infinite else returns False ", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True -> true, False -> false to be consistent |
||
examples = """ | ||
Examples: | ||
> SELECT _FUNC_(1/0); | ||
True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you run the example and check the results? |
||
> SELECT _FUNC_(5); | ||
False | ||
""") | ||
case class IsInf(child: Expression) extends UnaryExpression | ||
with Predicate with ImplicitCastInputTypes { | ||
|
||
override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) | ||
|
||
override def nullable: Boolean = false | ||
|
||
override def eval(input: InternalRow): Boolean = { | ||
val value = child.eval(input) | ||
if (value == null) { | ||
false | ||
} else { | ||
child.dataType match { | ||
case DoubleType => value.asInstanceOf[Double].isInfinity | ||
case FloatType => value.asInstanceOf[Float].isInfinity | ||
} | ||
} | ||
} | ||
|
||
|
||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
val eval = child.genCode(ctx) | ||
child.dataType match { | ||
case DoubleType | FloatType => | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this match block necessary since there is only one case pattern? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The function can only test for infinity values for datatypes Double and Float, and hence we need to match the child datatype with these types There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we will only see double and float types here because of |
||
ev.copy(code = code""" | ||
${eval.code} | ||
${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can assign |
||
${ev.value} = !${eval.isNull} && Double.isInfinite(${eval.value});""", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. out of interest, why use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The non-codegen version uses the isInfinity method defined for scala's Double and Float, whereas the codegen version uses java's static method "isInfinite" defined for the classes Double and Float. |
||
isNull = FalseLiteral) | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Evaluates to `true` iff it's NaN. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,6 +56,18 @@ class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { | |
assert(ex.contains("Null value appeared in non-nullable field")) | ||
} | ||
|
||
test("IsInf") { | ||
checkEvaluation(IsInf(Literal(Double.PositiveInfinity)), true) | ||
checkEvaluation(IsInf(Literal(Double.NegativeInfinity)), true) | ||
checkEvaluation(IsInf(Literal(Float.PositiveInfinity)), true) | ||
checkEvaluation(IsInf(Literal(Float.NegativeInfinity)), true) | ||
checkEvaluation(IsInf(Literal.create(null, DoubleType)), false) | ||
checkEvaluation(IsInf(Literal(Float.MaxValue)), false) | ||
checkEvaluation(IsInf(Literal(5.5f)), false) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. check NaN as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the checks in my later commits |
||
checkEvaluation(IsInf(Literal(Float.NaN)), expected = false) | ||
checkEvaluation(IsInf(Literal(Double.NaN)), expected = false) | ||
} | ||
|
||
test("IsNaN") { | ||
checkEvaluation(IsNaN(Literal(Double.NaN)), true) | ||
checkEvaluation(IsNaN(Literal(Float.NaN)), true) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -557,6 +557,14 @@ class Column(val expr: Expression) extends Logging { | |
(this >= lowerBound) && (this <= upperBound) | ||
} | ||
|
||
/** | ||
* True if the current expression is NaN. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ? is this the same with |
||
* | ||
* @group expr_ops | ||
* @since 2.4.0 | ||
*/ | ||
def isInf: Column = withExpr { IsInf(expr) } | ||
|
||
/** | ||
* True if the current expression is NaN. | ||
* | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1107,6 +1107,14 @@ object functions { | |
*/ | ||
def input_file_name(): Column = withExpr { InputFileName() } | ||
|
||
/** | ||
* Return true iff the column is Infinity. | ||
* | ||
* @group normal_funcs | ||
* @since 2.4.0 | ||
*/ | ||
def isinf(e: Column): Column = withExpr { IsInf(e.expr) } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mind if I ask to elaborate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have followed what seemed to be the preexistent convention for function names in those particular files. For example in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess someone should elaborate if Column.isFoo vs function's isfoo is the right pattern we want to stay with... |
||
|
||
/** | ||
* Return true iff the column is NaN. | ||
* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate the attempt to include R, though a question, why do we have
isInf
andisinf
?why not just
isinf
like python?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we may have
isInf
orisinf
here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just followed what has been done for
isnan
, which also hasisNan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the functions are case insensitive so i don't think we need both?