New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38604][SQL] Keep ceil and floor with only a single argument the same as before #35913
Conversation
Yeah, +1 for @awdavidson's advice. The change itself looks making sense from a cursory look. cc @cloud-fan FYI. |
@@ -1783,7 +1783,9 @@ object functions { | |||
* @group math_funcs | |||
* @since 1.4.0 | |||
*/ | |||
def ceil(e: Column): Column = ceil(e, lit(0)) | |||
def ceil(e: Column): Column = withExpr { | |||
UnresolvedFunction(Seq("ceil"), Seq(e.expr), isDistinct = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is it just a code cleanup or it does fix a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, the problem you hit is: ceil(input)
and ceil(input, 0)
return the same result, but use different expressions which break some custom catalyst rules?
If that's the case, I'd suggest we also fix CeilFloorExpressionBuilderBase
and call buildWithOneParam
if the scale
is 0, to make ceil(input)
and ceil(input, 0)
exactly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I agree as the current change does not take into consideration that ceil(input)
and ceil(input, 0)
return different types although they mean the same thing (which could confuse end users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can call ceil(input)
/floor(input)
if scale = 0
, but we need the test case to validate the modifications. @awdavidson @revans2 Could you please provide us the test case to be sure ? thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bug.
scala> spark.range(1).selectExpr("id", "ceil(id) as one_arg_sql", "ceil(id, 0) as two_arg_sql").select(col("*"), ceil(col("id")).alias("one_arg_func"), ceil(col("id"), lit(0)).alias("two_arg_func")).printSchema
root
|-- id: long (nullable = false)
|-- one_arg_sql: long (nullable = true)
|-- two_arg_sql: decimal(20,0) (nullable = true)
|-- one_arg_func: decimal(20,0) (nullable = true)
|-- two_arg_func: decimal(20,0) (nullable = true)
scala> spark.range(1).selectExpr("cast(id as double) as id").selectExpr("id", "ceil(id) as one_arg_sql", "ceil(id, 0) as two_arg_sql").select(col("*"), ceil(col("id")).alias("one_arg_func"), ceil(col("id"), lit(0)).alias("two_arg_func")).printSchema
root
|-- id: double (nullable = false)
|-- one_arg_sql: long (nullable = true)
|-- two_arg_sql: decimal(30,0) (nullable = true)
|-- one_arg_func: decimal(30,0) (nullable = true)
|-- two_arg_func: decimal(30,0) (nullable = true)
Without this patch the SQL and scala APIs produce different results and the scala API produces a result that is different from what was in Spark 3.2.
I documented this in the JIRA https://issues.apache.org/jira/browse/SPARK-38604
After this patch the single argument version behaves like it did in 3.2 and is also consistent with the SQL API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added test cases that explicitly check the result type.
From a consistency standpoint if the return type is going to depend on the scale, then the scale can only ever be a literal value. If we want to break backwards compatibility, then I would suggest that we also fix the overflow issue https://issues.apache.org/jira/browse/SPARK-28135 with double being round to a long. That technically also applies to a double being cast to a decimal type and then rounded.
This maybe not the right place to discuss it, but this PR reminds me of the new overload |
I'm fine if we're 100% sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take this fix first, after adding a test.
It's a bit arguable whether or not ceil(input)
and ceil(input, 0)
should be treated exactly the same. Seems fine to keep it as it is as there is no backward compatibility issues.
The test failure looks to be unrelated. It is in docker tests that appear to have nothing to do wit this change.
|
Merged to master and branch-3.3. |
…e same as before This is just the fix. I didn't add any tests yet. I am happy to do it, I just wasn't sure where the right place to put the tests would be. Once I have tests I will cherry-pick this back to the 3.3 branch and put up a PR for that too. I am also happy to update the comments because this is a bit confusing that there is no indication that things have changed. Closes #35913 from revans2/ceil_floor_no_arg_behavior. Authored-by: Robert (Bobby) Evans <bobby@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 692e4b0) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR proposes to make
ceil(column)
in Scala/Python/Java APIs to keep the previous behaviour by returningLongType
instead ofDecimalType
.Why are the changes needed?
We shouldn't break compatibility and should keep the old behaviour of
ceil(column)
that returnsLongType
.Does this PR introduce any user-facing change?
No to end users.
This fixes a return type. In old version,
ceil(column)
returnsLongType
.6242145 mistakenly missed this case for Scala/Python APIs so it returns
DecimalType
.This PR restores the previous behaviour back.
How was this patch tested?
Unit test was added