Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33391][SQL] element_at with CreateArray not respect one based index. #30296

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1966,7 +1966,20 @@ case class ElementAt(left: Expression, right: Expression)
}

override def nullable: Boolean = left.dataType match {
case _: ArrayType => computeNullabilityFromArray(left, right)
case _: ArrayType =>
def specialNormalizeIndex: (Int, Int) => Int = {
(arrayLength: Int, index: Int) => {
if (index < 0) {
arrayLength + index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can still be negative and fail, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling nullable will not get exception or failed, if it's out of bounds, it's just returning a default true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if the passing index is negative and arrayLength + index still < 0, it will still failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to cover the arrayLength + index still < 0 inside this specialNormalizeIndex ?

} else if (index == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

if (index <= 0) {
  arrayLength + index
} ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, ElementAt fails at runtime if index == 0, so the nullable doesn't really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if the passed in index is 0, it will change to -1 and call the following code. it will throw exception, but the old behavior is return a default true.

ar(intOrdinal).nullable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just try to follow the old behavior.

// make it default TRUE.
arrayLength
} else {
index - 1
}
}
}
computeNullabilityFromArray(left, right, normalizeIndex = specialNormalizeIndex)
case _: MapType => true
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -273,15 +273,19 @@ case class GetArrayItem(child: Expression, ordinal: Expression)
trait GetArrayItemUtil {

/** `Null` is returned for invalid ordinals. */
protected def computeNullabilityFromArray(child: Expression, ordinal: Expression): Boolean = {
protected def computeNullabilityFromArray(
child: Expression,
ordinal: Expression,
normalizeIndex: (Int, Int) => Int = (_: Int, index: Int) => index): Boolean = {

if (ordinal.foldable && !ordinal.nullable) {
val intOrdinal = ordinal.eval().asInstanceOf[Number].intValue()
child match {
case CreateArray(ar, _) if intOrdinal < ar.length =>
ar(intOrdinal).nullable
case CreateArray(ar, _) if normalizeIndex(ar.length, intOrdinal) < ar.length =>
ar(normalizeIndex(ar.length, intOrdinal)).nullable
case GetArrayStructFields(CreateArray(elements, _), field, _, _, _)
if intOrdinal < elements.length =>
elements(intOrdinal).nullable || field.nullable
if normalizeIndex(elements.length, intOrdinal) < elements.length =>
elements(normalizeIndex(elements.size, intOrdinal)).nullable || field.nullable
case _ =>
true
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1122,9 +1122,9 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
val a = AttributeReference("a", IntegerType, nullable = false)()
val b = AttributeReference("b", IntegerType, nullable = true)()
val array = CreateArray(a :: b :: Nil)
assert(!ElementAt(array, Literal(0)).nullable)
assert(ElementAt(array, Literal(1)).nullable)
assert(!ElementAt(array, Subtract(Literal(2), Literal(2))).nullable)
assert(!ElementAt(array, Literal(1)).nullable)
assert(ElementAt(array, Literal(2)).nullable)
assert(!ElementAt(array, Subtract(Literal(2), Literal(1))).nullable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's test valid negative ordinals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert(ElementAt(array, AttributeReference("ordinal", IntegerType)()).nullable)

// GetArrayStructFields case
Expand All @@ -1135,19 +1135,19 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
val inputArray1 = CreateArray(c :: Nil)
val inputArray1ContainsNull = c.nullable
val stArray1 = GetArrayStructFields(inputArray1, f1, 0, 2, inputArray1ContainsNull)
assert(!ElementAt(stArray1, Literal(0)).nullable)
assert(!ElementAt(stArray1, Literal(1)).nullable)
val stArray2 = GetArrayStructFields(inputArray1, f2, 1, 2, inputArray1ContainsNull)
assert(ElementAt(stArray2, Literal(0)).nullable)
assert(ElementAt(stArray2, Literal(1)).nullable)

val d = AttributeReference("d", structType, nullable = true)()
val inputArray2 = CreateArray(c :: d :: Nil)
val inputArray2ContainsNull = c.nullable || d.nullable
val stArray3 = GetArrayStructFields(inputArray2, f1, 0, 2, inputArray2ContainsNull)
assert(!ElementAt(stArray3, Literal(0)).nullable)
assert(ElementAt(stArray3, Literal(1)).nullable)
assert(!ElementAt(stArray3, Literal(1)).nullable)
assert(ElementAt(stArray3, Literal(2)).nullable)
val stArray4 = GetArrayStructFields(inputArray2, f2, 1, 2, inputArray2ContainsNull)
assert(ElementAt(stArray4, Literal(0)).nullable)
assert(ElementAt(stArray4, Literal(1)).nullable)
assert(ElementAt(stArray4, Literal(2)).nullable)
}

test("Concat") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1401,6 +1401,40 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
assert(e3.message.contains(errorMsg3))
}

test("SPARK-33391: element_at with CreateArray") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems an overkill to have end-to-end test for it. How about we just add more tests in CollectionExpressionsSuite.correctly handles ElementAt nullability for arrays to test negative and invalid indices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

// element_at should use one-based index and support negative index.
// valid index for array(1, 2, 3) should be 1,2,3,-1,-2,-3
var df = OneRowRelation().selectExpr("element_at(array(1, 2, 3), 1)")
assert(!df.schema.head.nullable)
checkAnswer(
df,
Seq(Row(1))
)

df = OneRowRelation().selectExpr("element_at(array(1, 2, 3), -1)")
assert(!df.schema.head.nullable)
checkAnswer(
df,
Seq(Row(3))
)

df = OneRowRelation().selectExpr("element_at(array(1, 2, 3), 3)")
assert(!df.schema.head.nullable)
checkAnswer(
df,
Seq(Row(3))
)

// 0 is not a valid index, return default nullable which is 'TRUE'.
df = OneRowRelation().selectExpr("element_at(array(1, 2, 3), 0)")
assert(df.schema.head.nullable)

val ex = intercept[ArrayIndexOutOfBoundsException] {
df.collect()
}
assert(ex.getMessage.contains("SQL array indices start at 1"))
}

test("array_union functions") {
val df1 = Seq((Array(1, 2, 3), Array(4, 2))).toDF("a", "b")
val ans1 = Row(Seq(1, 2, 3, 4))
Expand Down