Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34769][SQL]AnsiTypeCoercion: return closest convertible type among TypeCollection #31859

Closed
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -158,7 +158,10 @@ object AnsiTypeCoercion extends TypeCoercionBase {
case _ if expectedType.acceptsType(inType) => Some(inType)

// Cast null type (usually from null literals) into target types
case (NullType, target) => Some(target.defaultConcreteType)
// By default, the result type is `target.defaultConcreteType`. When the target type is
// `TypeCollection`, there is another branch to find the "closet convertible data type" below.
case (NullType, target) if !target.isInstanceOf[TypeCollection] =>
Some(target.defaultConcreteType)

// This type coercion system will allow implicit converting String type literals as other
// primitive types, in case of breaking too many existing Spark SQL queries.
Expand Down Expand Up @@ -191,9 +194,35 @@ object AnsiTypeCoercion extends TypeCoercionBase {
case (DateType, TimestampType) => Some(TimestampType)

// When we reach here, input type is not acceptable for any types in this type collection,
// try to find the first one we can implicitly cast.
// first try to find the all the expected types we can implicitly cast:
// 1. if there is no convertible data types, return None;
// 2. if there is only one convertible data type, cast input as it;
// 3. otherwise if there are multiple convertible data types, find the closet convertible
// data type among them. If there is no such a data type, return None.
case (_, TypeCollection(types)) =>
types.flatMap(implicitCast(inType, _, isInputFoldable)).headOption
// Since Spark contains special objects like `NumericType` and `DecimalType`, which accepts
// multiple types and they are `AbstractDataType` instead of `DataType`, here we use the
// conversion result their representation.
val convertibleTypes = types.flatMap(implicitCast(inType, _, isInputFoldable))
if (convertibleTypes.isEmpty) {
None
} else {
// find the closet convertible data type, which can be implicit cast to all other
// convertible types.
val closestConvertibleType = convertibleTypes.find { dt =>
maropu marked this conversation as resolved.
Show resolved Hide resolved
convertibleTypes.forall { target =>
implicitCast(dt, target, isInputFoldable = false).isDefined
}
}
// If the closet convertible type is Float type and the convertible types contains Double
// type, simply return Double type as the closet convertible type to avoid potential
// precision loss on converting the Integral type as Float type.
if (closestConvertibleType.contains(FloatType) && convertibleTypes.contains(DoubleType)) {
Some(DoubleType)
} else {
closestConvertibleType
}
}

// Implicit cast between array types.
//
Expand Down
Expand Up @@ -345,8 +345,6 @@ class AnsiTypeCoercionSuite extends AnalysisTest {
}

test("eligible implicit type cast - TypeCollection") {
shouldCast(NullType, TypeCollection(StringType, BinaryType), StringType)

shouldCast(StringType, TypeCollection(StringType, BinaryType), StringType)
shouldCast(BinaryType, TypeCollection(StringType, BinaryType), BinaryType)
shouldCast(StringType, TypeCollection(BinaryType, StringType), StringType)
Expand All @@ -356,17 +354,10 @@ class AnsiTypeCoercionSuite extends AnalysisTest {
shouldCast(BinaryType, TypeCollection(BinaryType, IntegerType), BinaryType)
shouldCast(BinaryType, TypeCollection(IntegerType, BinaryType), BinaryType)

shouldNotCast(IntegerType, TypeCollection(StringType, BinaryType))
shouldNotCast(IntegerType, TypeCollection(BinaryType, StringType))

shouldCast(DecimalType.SYSTEM_DEFAULT,
TypeCollection(IntegerType, DecimalType), DecimalType.SYSTEM_DEFAULT)
shouldCast(DecimalType(10, 2), TypeCollection(IntegerType, DecimalType), DecimalType(10, 2))
shouldCast(DecimalType(10, 2), TypeCollection(DecimalType, IntegerType), DecimalType(10, 2))
shouldNotCast(IntegerType, TypeCollection(DecimalType(10, 2), StringType))

shouldNotCastStringInput(TypeCollection(NumericType, BinaryType))
shouldCastStringLiteral(TypeCollection(NumericType, BinaryType), DoubleType)

shouldCast(
ArrayType(StringType, false),
Expand All @@ -377,10 +368,26 @@ class AnsiTypeCoercionSuite extends AnalysisTest {
ArrayType(StringType, true),
TypeCollection(ArrayType(StringType), StringType),
ArrayType(StringType, true))

// When there are multiple convertible types in the `TypeCollection`, use the closet convertible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: closet -> closest

// data type among convertible types.
shouldCast(IntegerType, TypeCollection(BinaryType, FloatType, LongType), LongType)
shouldCast(IntegerType, TypeCollection(BinaryType, LongType, NumericType), IntegerType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't go to the code path to pick the closest convertible type. This hits case _ if expectedType.acceptsType(inType) => Some(inType)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

// If the result is Float type and Double type is also among the convertible target types,
// use Double Type instead of Float type.
shouldCast(LongType, TypeCollection(FloatType, DoubleType, StringType), DoubleType)
}

test("ineligible implicit type cast - TypeCollection") {
shouldNotCast(IntegerType, TypeCollection(StringType, BinaryType))
shouldNotCast(IntegerType, TypeCollection(BinaryType, StringType))
shouldNotCast(IntegerType, TypeCollection(DateType, TimestampType))
shouldNotCast(IntegerType, TypeCollection(DecimalType(10, 2), StringType))
shouldNotCastStringInput(TypeCollection(NumericType, BinaryType))
// When there are multiple convertible types in the `TypeCollection` and there is no closest
// convertible data type among the convertible types.
shouldNotCastStringLiteral(TypeCollection(NumericType, BinaryType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the previous case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is for String Literal, which can convert to any of (NumericType, BinaryType). The previous one is for String type column, which can't do the conversion.

shouldNotCast(NullType, TypeCollection(IntegerType, StringType))
}

test("tightest common bound for types") {
Expand Down
Expand Up @@ -73,8 +73,8 @@ select left(null, -2), left("abcd", -2), left("abcd", 0), left("abcd", 'a')
-- !query schema
struct<>
-- !query output
java.lang.NumberFormatException
invalid input syntax for type numeric: a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move left(null, -2) out to a new query? otherwise we can't test the error message of left("abcd", 'a')

org.apache.spark.sql.AnalysisException
cannot resolve 'substring(NULL, 1, -2)' due to data type mismatch: argument 1 requires (string or binary) type, however, 'NULL' is of null type.; line 1 pos 7


-- !query
Expand All @@ -92,7 +92,7 @@ select right(null, -2), right("abcd", -2), right("abcd", 0), right("abcd", 'a')
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'substring('abcd', (- CAST('a' AS DOUBLE)), 2147483647)' due to data type mismatch: argument 2 requires int type, however, '(- CAST('a' AS DOUBLE))' is of double type.; line 1 pos 61
cannot resolve 'substring(NULL, (- -2), 2147483647)' due to data type mismatch: argument 1 requires (string or binary) type, however, 'NULL' is of null type.; line 1 pos 7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto



-- !query
Expand Down
Expand Up @@ -130,7 +130,7 @@ select concat_ws(',',10,20,null,30)
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'concat_ws(',', 10, 20, NULL, 30)' due to data type mismatch: argument 2 requires (array<string> or string) type, however, '10' is of int type. argument 3 requires (array<string> or string) type, however, '20' is of int type. argument 5 requires (array<string> or string) type, however, '30' is of int type.; line 1 pos 7
cannot resolve 'concat_ws(',', 10, 20, NULL, 30)' due to data type mismatch: argument 2 requires (array<string> or string) type, however, '10' is of int type. argument 3 requires (array<string> or string) type, however, '20' is of int type. argument 4 requires (array<string> or string) type, however, 'NULL' is of null type. argument 5 requires (array<string> or string) type, however, '30' is of int type.; line 1 pos 7


-- !query
Expand All @@ -139,7 +139,7 @@ select concat_ws('',10,20,null,30)
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'concat_ws('', 10, 20, NULL, 30)' due to data type mismatch: argument 2 requires (array<string> or string) type, however, '10' is of int type. argument 3 requires (array<string> or string) type, however, '20' is of int type. argument 5 requires (array<string> or string) type, however, '30' is of int type.; line 1 pos 7
cannot resolve 'concat_ws('', 10, 20, NULL, 30)' due to data type mismatch: argument 2 requires (array<string> or string) type, however, '10' is of int type. argument 3 requires (array<string> or string) type, however, '20' is of int type. argument 4 requires (array<string> or string) type, however, 'NULL' is of null type. argument 5 requires (array<string> or string) type, however, '30' is of int type.; line 1 pos 7


-- !query
Expand All @@ -148,7 +148,7 @@ select concat_ws(NULL,10,20,null,30) is null
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
cannot resolve 'concat_ws(CAST(NULL AS STRING), 10, 20, NULL, 30)' due to data type mismatch: argument 2 requires (array<string> or string) type, however, '10' is of int type. argument 3 requires (array<string> or string) type, however, '20' is of int type. argument 5 requires (array<string> or string) type, however, '30' is of int type.; line 1 pos 7
cannot resolve 'concat_ws(CAST(NULL AS STRING), 10, 20, NULL, 30)' due to data type mismatch: argument 2 requires (array<string> or string) type, however, '10' is of int type. argument 3 requires (array<string> or string) type, however, '20' is of int type. argument 4 requires (array<string> or string) type, however, 'NULL' is of null type. argument 5 requires (array<string> or string) type, however, '30' is of int type.; line 1 pos 7


-- !query
Expand Down