-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33391][SQL] element_at with CreateArray not respect one based index. #30296
Conversation
Change-Id: Ib0f29cdf1016d53c0ca121fa463c5a2a8fd6b960
@cloud-fan FYI. |
Kubernetes integration test starting |
@@ -1966,7 +1966,7 @@ case class ElementAt(left: Expression, right: Expression) | |||
} | |||
|
|||
override def nullable: Boolean = left.dataType match { | |||
case _: ArrayType => computeNullabilityFromArray(left, right) | |||
case _: ArrayType => computeNullabilityFromArray(left, right, isOneBasedIndex = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about computeNullabilityFromArray(left, Subtract(right, Literal(1)))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it's a negative number like -1. -1 means the last one from right to left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_FUNC_(array, index) - Returns element of array at given (1-based) index. If index < 0,
accesses elements from the last to the first. Returns NULL if the index exceeds the length
of the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Then isOneBasedIndex
is a misleading name. Maybe the parameter should be normalizeIndex: (Int, Int) => Int = _._2
, which takes the array length and the index, and return the normalized 0-based non-negative index.
Kubernetes integration test status success |
Test build #130776 has finished for PR 30296 at commit
|
Change-Id: I702764bf4eb48e361f138f1c18246495ab99e570
def specialNormalizeIndex: (Int, Int) => Int = { | ||
(arrayLength: Int, index: Int) => { | ||
if (index < 0) { | ||
arrayLength + index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can still be negative and fail, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling nullable will not get exception or failed, if it's out of bounds, it's just returning a default true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if the passing index is negative and arrayLength + index still < 0, it will still failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to cover the arrayLength + index still < 0 inside this specialNormalizeIndex ?
(arrayLength: Int, index: Int) => { | ||
if (index < 0) { | ||
arrayLength + index | ||
} else if (index == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not
if (index <= 0) {
arrayLength + index
} ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, ElementAt
fails at runtime if index == 0
, so the nullable doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if the passed in index is 0, it will change to -1 and call the following code. it will throw exception, but the old behavior is return a default true.
ar(intOrdinal).nullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just try to follow the old behavior.
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130784 has finished for PR 30296 at commit
|
Change-Id: Ib4985b7b4b63babcd7f6e2e3fdc02dc084745d72
Kubernetes integration test starting |
Kubernetes integration test status success |
Seq(Row(3)) | ||
) | ||
|
||
// 0 is not a valid index, return nullable = false since it throws exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test 4, -4
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
@@ -1401,6 +1401,40 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { | |||
assert(e3.message.contains(errorMsg3)) | |||
} | |||
|
|||
test("SPARK-33391: element_at with CreateArray") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems an overkill to have end-to-end test for it. How about we just add more tests in CollectionExpressionsSuite.correctly handles ElementAt nullability for arrays
to test negative and invalid indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Change-Id: Iba60b51d72c50c048591ac5924593030d354d2c0
assert(!ElementAt(array, Subtract(Literal(2), Literal(2))).nullable) | ||
assert(!ElementAt(array, Literal(1)).nullable) | ||
assert(ElementAt(array, Literal(2)).nullable) | ||
assert(!ElementAt(array, Subtract(Literal(2), Literal(1))).nullable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's test valid negative ordinals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Kubernetes integration test starting |
Change-Id: If21a6914cb01bbf4f0253f5d733d8acc3f2d1f00
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130831 has finished for PR 30296 at commit
|
GA passed, merging to master/3.0! |
…index ### What changes were proposed in this pull request? element_at with CreateArray not respect one based index. repo step: ``` var df = spark.sql("select element_at(array(3, 2, 1), 0)") df.printSchema() df = spark.sql("select element_at(array(3, 2, 1), 1)") df.printSchema() df = spark.sql("select element_at(array(3, 2, 1), 2)") df.printSchema() df = spark.sql("select element_at(array(3, 2, 1), 3)") df.printSchema() root – element_at(array(3, 2, 1), 0): integer (nullable = false) root – element_at(array(3, 2, 1), 1): integer (nullable = false) root – element_at(array(3, 2, 1), 2): integer (nullable = false) root – element_at(array(3, 2, 1), 3): integer (nullable = true) correct answer should be 0 true which is outOfBounds return default true. 1 false 2 false 3 false ``` For expression eval, it respect the oneBasedIndex, but within checking the nullable, it calculates with zeroBasedIndex using `computeNullabilityFromArray`. ### Why are the changes needed? Correctness issue. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT and existing UT. Closes #30296 from leanken/leanken-SPARK-33391. Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e3a768d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Test build #130817 has finished for PR 30296 at commit
|
Test build #130836 has finished for PR 30296 at commit
|
@leanken If possible, please also mention when we introduced the bug. This is a regression introduced in https://issues.apache.org/jira/browse/SPARK-26965. Thus, Spark 2.4 is safe. |
Hi @leanken ! scala> spark.sql("select element_at(array(3, 2, 1), 0)").printSchema()
root
|-- element_at(array(3, 2, 1), 0): integer (nullable = false) It returned |
Oh. the PR desc is outdated, thanks for the mentions. And as for the correct answer, it should return false. |
Oh, I got it. Thanks for the quick response, @leanken !! :D |
What changes were proposed in this pull request?
element_at with CreateArray not respect one based index.
repo step:
For expression eval, it respect the oneBasedIndex, but within checking the nullable, it calculates with zeroBasedIndex using
computeNullabilityFromArray
.Why are the changes needed?
Correctness issue.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added UT and existing UT.