Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24393][SQL] SQL builtin: isinf #21482

Closed
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions R/pkg/NAMESPACE
Expand Up @@ -281,6 +281,8 @@ exportMethods("%<=>%",
"initcap",
"input_file_name",
"instr",
"isInf",
"isinf",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate the attempt to include R, though a question, why do we have isInf and isinf?
why not just isinf like python?

Copy link
Member

@viirya viirya Jun 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we may have isInf or isinf here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just followed what has been done for isnan, which also has isNan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the functions are case insensitive so i don't think we need both?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests for these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests in my latest commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand why we have both. It usually has the ones matching to Scala side or R specific function. Otherwise, I don't think we should have both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there's any consistency behind the different capitalization schemes? There's format_number, isnan and isNotNull here, for example.

If not, how about we just go with isInf for now and if other aliases are needed in the future they can be added and discussed then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't add other variants unless there's a clear reason. It sounds like we are adding this for no reason.

how about we just go with isInf for now and if other aliases are needed in the future they can be added and discussed then?

Yea, please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on this here #21482 (comment)
it's because Column capitalizes it differently from functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I got it now. I roughly remember we keep functions this_naming_style in functions[.py|.R|.scala], e.g.(SPARK-10621).

"isNaN",
"isNotNull",
"isNull",
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/column.R
Expand Up @@ -67,7 +67,7 @@ operators <- list(
# we can not override `&&` and `||`, so use `&` and `|` instead
"&" = "and", "|" = "or", "^" = "pow"
)
column_functions1 <- c("asc", "desc", "isNaN", "isNull", "isNotNull")
column_functions1 <- c("asc", "desc", "isInf", "isNaN", "isNull", "isNotNull")
column_functions2 <- c("like", "rlike", "getField", "getItem", "contains")

createOperator <- function(op) {
Expand Down
11 changes: 11 additions & 0 deletions R/pkg/R/functions.R
Expand Up @@ -907,6 +907,17 @@ setMethod("initcap",
column(jc)
})

#' @details
#' \code{isinf}: Returns true if the column is Infinity.
#' @rdname column_nonaggregate_functions
#' @note isinf since 2.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing @aliases

setMethod("isinf",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to have duplicate method definition like this. Maybe we can follow isNaN's approach if we really need to have both isinf and isInf.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be alright if I omit isinf completely and only implement isInf?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For isNaN case, functions.R only defines isnan. isNaN is defined in column.R. If we really want to have both isInf and isinf like isNaN and isnan, maybe to follow it instead having two duplicate method definition here is better.

signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "isinf", x@jc)
column(jc)
})

#' @details
#' \code{isnan}: Returns true if the column is NaN.
#' @rdname column_nonaggregate_functions
Expand Down
6 changes: 6 additions & 0 deletions R/pkg/R/generics.R
Expand Up @@ -695,6 +695,12 @@ setGeneric("getField", function(x, ...) { standardGeneric("getField") })
#' @rdname columnfunctions
setGeneric("getItem", function(x, ...) { standardGeneric("getItem") })

#' @rdname columnfunctions
setGeneric("isInf", function(x) { standardGeneric("isInf") })

#' @rdname columnfunctions
setGeneric("isinf", function(x) { standardGeneric("isinf") })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnan lower case is not a column functions see https://github.com/NihalHarish/spark/blob/7e396f70f58ffd309e7f738751f3aa8cfe321ce7/R/pkg/R/generics.R#L1002
@rdname columnfunctions will cause it to go to the wrong doc page


#' @rdname columnfunctions
setGeneric("isNaN", function(x) { standardGeneric("isNaN") })

Expand Down
12 changes: 12 additions & 0 deletions python/pyspark/sql/functions.py
Expand Up @@ -468,6 +468,18 @@ def input_file_name():
return Column(sc._jvm.functions.input_file_name())


@since(2.4)
def isinf(col):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we expose this to column.py too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to add the function to column.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon could you clarify, please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please because I see it's exposed in Column.scala.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it in my latest commit

"""An expression that returns true iff the column is Inf.

>>> df = spark.createDataFrame([(1.0, float('inf')), (float('inf'), 2.0)], ("a", "b"))
>>> df.select(isinf("a").alias("r1"), isinf(df.a).alias("r2")).collect()
[Row(r1=False, r2=False), Row(r1=True, r2=True)]
"""
sc = SparkContext._active_spark_context
return Column(sc._jvm.functions.isinf(_to_java_column(col)))


@since(1.6)
def isnan(col):
"""An expression that returns true iff the column is NaN.
Expand Down
Expand Up @@ -198,6 +198,7 @@ object FunctionRegistry {
expression[If]("if"),
expression[Inline]("inline"),
expressionGeneratorOuter[Inline]("inline_outer"),
expression[IsInf]("isinf"),
expression[IsNaN]("isnan"),
expression[IfNull]("ifnull"),
expression[IsNull]("isnull"),
Expand Down
Expand Up @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child:
override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})"
}

/**
* Evaluates to `true` iff it's Infinity.
*/
@ExpressionDescription(
usage = "_FUNC_(expr) - Returns true if expr evaluates to infinite else returns false ",
examples = """
Examples:
> SELECT _FUNC_(1/0);
true
> SELECT _FUNC_(5);
false
""")
case class IsInf(child: Expression) extends UnaryExpression
with Predicate with ImplicitCastInputTypes {

override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType))

override def nullable: Boolean = false

override def eval(input: InternalRow): Boolean = {
val value = child.eval(input)
if (value == null) {
false
} else {
child.dataType match {
case DoubleType => value.asInstanceOf[Double].isInfinity
case FloatType => value.asInstanceOf[Float].isInfinity
}
}
}


override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val eval = child.genCode(ctx)
child.dataType match {
case DoubleType | FloatType =>
Copy link
Member

@viirya viirya Jun 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this match block necessary since there is only one case pattern?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function can only test for infinity values for datatypes Double and Float, and hence we need to match the child datatype with these types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will only see double and float types here because of inputTypes.

ev.copy(code = code"""
${eval.code}
${CodeGenerator.javaType(dataType)} ${ev.value} = \
!${eval.isNull} && Double.isInfinite(${eval.value});""",
isNull = FalseLiteral)
}
}
}

/**
* Evaluates to `true` iff it's NaN.
Expand Down
Expand Up @@ -56,6 +56,18 @@ class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
assert(ex.contains("Null value appeared in non-nullable field"))
}

test("IsInf") {
checkEvaluation(IsInf(Literal(Double.PositiveInfinity)), true)
checkEvaluation(IsInf(Literal(Double.NegativeInfinity)), true)
checkEvaluation(IsInf(Literal(Float.PositiveInfinity)), true)
checkEvaluation(IsInf(Literal(Float.NegativeInfinity)), true)
checkEvaluation(IsInf(Literal.create(null, DoubleType)), false)
checkEvaluation(IsInf(Literal(Float.MaxValue)), false)
checkEvaluation(IsInf(Literal(5.5f)), false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check NaN as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the checks in my later commits

checkEvaluation(IsInf(Literal(Float.NaN)), expected = false)
checkEvaluation(IsInf(Literal(Double.NaN)), expected = false)
}

test("IsNaN") {
checkEvaluation(IsNaN(Literal(Double.NaN)), true)
checkEvaluation(IsNaN(Literal(Float.NaN)), true)
Expand Down
8 changes: 8 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/Column.scala
Expand Up @@ -557,6 +557,14 @@ class Column(val expr: Expression) extends Logging {
(this >= lowerBound) && (this <= upperBound)
}

/**
* True if the current expression is Inf.
*
* @group expr_ops
* @since 2.4.0
*/
def isInf: Column = withExpr { IsInf(expr) }

/**
* True if the current expression is NaN.
*
Expand Down
8 changes: 8 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/functions.scala
Expand Up @@ -1107,6 +1107,14 @@ object functions {
*/
def input_file_name(): Column = withExpr { InputFileName() }

/**
* Return true iff the column is Infinity.
*
* @group normal_funcs
* @since 2.4.0
*/
def isinf(e: Column): Column = withExpr { IsInf(e.expr) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind if I ask to elaborate isinf vs isInf across the APIs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have followed what seemed to be the preexistent convention for function names in those particular files. For example in functions.scala all function names were in lower case but in Column.scala all function names were in camel case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess someone should elaborate if Column.isFoo vs function's isfoo is the right pattern we want to stay with...


/**
* Return true iff the column is NaN.
*
Expand Down