Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38030][SQL][3.2] Canonicalization should not remove nullability of AttributeReference dataType #35446

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@ package org.apache.spark.sql.catalyst.expressions
* return the same answer given any input (i.e. false negatives are possible).
*
* The following rules are applied:
* - Names and nullability hints for [[org.apache.spark.sql.types.DataType]]s are stripped.
* - Names for [[GetStructField]] are stripped.
* - Names for [[org.apache.spark.sql.types.DataType]]s and [[GetStructField]] are stripped.
* - TimeZoneId for [[Cast]] and [[AnsiCast]] are stripped if `needsTimeZone` is false.
* - Commutative and associative operations ([[Add]] and [[Multiply]]) have their children ordered
* by `hashCode`.
Expand All @@ -39,10 +38,10 @@ object Canonicalize {
expressionReorder(ignoreTimeZone(ignoreNamesTypes(e)))
}

/** Remove names and nullability from types, and names from `GetStructField`. */
/** Remove names from types and `GetStructField`. */
private[expressions] def ignoreNamesTypes(e: Expression): Expression = e match {
case a: AttributeReference =>
AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
AttributeReference("none", a.dataType)(exprId = a.exprId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I realized that we should fix line 42 too. Could you fix it master branch first as a follow-up of SPARK-38030, @shardulm94 ?

- Remove names and nullability from types
+ Remove names from types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The canonicalization code has change substantiantially between 3.2 and master due to #34883. This comment no longer exists in the master branch so it should not be an issue. I will fix it in 3.2 and 3.1 PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for checking, @shardulm94 .

case GetStructField(child, ordinal, Some(_)) => GetStructField(child, ordinal, None)
case _ => e
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.plans.logical.Range
import org.apache.spark.sql.types.{IntegerType, LongType, StructField, StructType}
import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructField, StructType}

class CanonicalizeSuite extends SparkFunSuite {

Expand Down Expand Up @@ -177,4 +177,17 @@ class CanonicalizeSuite extends SparkFunSuite {
assert(expr.semanticEquals(attr))
assert(attr.semanticEquals(expr))
}

test("SPARK-38030: Canonicalization should not remove nullability of AttributeReference" +
" dataType") {
val structType = StructType(Seq(StructField("name", StringType, nullable = false)))
val attr = AttributeReference("col", structType)()
// AttributeReference dataType should not be converted to nullable
assert(attr.canonicalized.dataType === structType)

val cast = Cast(attr, structType)
assert(cast.resolved)
// canonicalization should not converted resolved cast to unresolved
assert(cast.canonicalized.resolved)
}
}