[SPARK-39015][SQL] Remove the usage of toSQLValue(v) without an explicit type#36351
[SPARK-39015][SQL] Remove the usage of toSQLValue(v) without an explicit type#36351HyukjinKwon wants to merge 1 commit intoapache:masterfrom
Conversation
|
cc @gengliangwang @MaxGekk FYI |
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryErrorsBase.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AnsiCastSuiteBase.scala
Outdated
Show resolved
Hide resolved
When I added two methods
If you need to pass somewhere an internal value, just use the first method. Why do you need to remove the second one, I didn't get. |
Actually, I think this is error-prone (e.g., see the reported and fixed case here). With |
Now, you have to convert String to UTF8String (maybe other values) everywhere which is inconvenient. |
e24904f to
3233c98
Compare
|
Can't you just use correct function, and don't remove another one. And add comments to functions. Removing the second function seems like unrelated to the fix. |
We actually don't need to convert. Both |
There was a problem hiding this comment.
This is anther example of being error-prone.
b2de20a to
95b06a2
Compare
|
I thought that it's easier to have one function to use it everywhere vs having two to use. I can separate the fix if you still think it's arguable. |
Isn't needed, if it is possible to point out the type in all cases and |
|
Here, all external types are listed - if they are some missing, it's a bug. They are converted to There seems two cases missing |
95b06a2 to
709e7d1
Compare
There was a problem hiding this comment.
There's another case this PR fixes.
7a78184 to
02a5ea8
Compare
02a5ea8 to
02ac873
Compare
|
+1, LGTM. Merging to master. |
|
@HyukjinKwon Could you backport the changes to branch-3.3, please. |
|
sure |
…cit type
This PR proposes to remove the the usage of `toSQLValue(v)` without an explicit type.
`Literal(v)` is intended to be used from end-users so it cannot handle our internal types such as `UTF8String` and `ArrayBasedMapData`. Using this method can lead to unexpected error messages such as:
```
Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
...
```
Since It is impossible to have the corresponding data type from the internal types as one type can map to multiple external types (e.g., `Long` for `Timestamp`, `TimestampNTZ`, and `LongType`), the removal approach was taken.
To provide the error messages as intended.
Yes.
```scala
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.DataTypes
val arrayStructureData = Seq(
Row(Map("hair"->"black", "eye"->"brown")),
Row(Map("hair"->"blond", "eye"->"blue")),
Row(Map()))
val mapType = DataTypes.createMapType(StringType, StringType)
val arrayStructureSchema = new StructType().add("properties", mapType)
val mapTypeDF = spark.createDataFrame(
spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)
spark.conf.set("spark.sql.ansi.enabled", true)
mapTypeDF.selectExpr("element_at(properties, 'hair')").show
```
Before:
```
Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
...
```
After:
```
Caused by: org.apache.spark.SparkNoSuchElementException: [MAP_KEY_DOES_NOT_EXIST] Key 'hair' does not exist. To return NULL instead, use 'try_element_at'. If necessary set spark.sql.ansi.enabled to false to bypass this error.
== SQL(line 1, position 0) ==
element_at(properties, 'hair')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
Unittest was added. Otherwise, existing test cases should cover.
Closes apache#36351 from HyukjinKwon/SPARK-39015.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit e49147a)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…explicit type ### What changes were proposed in this pull request? This PR is a backport of #36351 This PR proposes to remove the the usage of `toSQLValue(v)` without an explicit type. `Literal(v)` is intended to be used from end-users so it cannot handle our internal types such as `UTF8String` and `ArrayBasedMapData`. Using this method can lead to unexpected error messages such as: ``` Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String. at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99) at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45) ... ``` Since It is impossible to have the corresponding data type from the internal types as one type can map to multiple external types (e.g., `Long` for `Timestamp`, `TimestampNTZ`, and `LongType`), the removal approach was taken. ### Why are the changes needed? To provide the error messages as intended. ### Does this PR introduce _any_ user-facing change? Yes. ```scala import org.apache.spark.sql.Row import org.apache.spark.sql.types.StructType import org.apache.spark.sql.types.StringType import org.apache.spark.sql.types.DataTypes val arrayStructureData = Seq( Row(Map("hair"->"black", "eye"->"brown")), Row(Map("hair"->"blond", "eye"->"blue")), Row(Map())) val mapType = DataTypes.createMapType(StringType, StringType) val arrayStructureSchema = new StructType().add("properties", mapType) val mapTypeDF = spark.createDataFrame( spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema) spark.conf.set("spark.sql.ansi.enabled", true) mapTypeDF.selectExpr("element_at(properties, 'hair')").show ``` Before: ``` Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String. at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99) at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45) ... ``` After: ``` Caused by: org.apache.spark.SparkNoSuchElementException: [MAP_KEY_DOES_NOT_EXIST] Key 'hair' does not exist. To return NULL instead, use 'try_element_at'. If necessary set spark.sql.ansi.enabled to false to bypass this error. == SQL(line 1, position 0) == element_at(properties, 'hair') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` ### How was this patch tested? Unittest was added. Otherwise, existing test cases should cover. Closes #36375 from HyukjinKwon/SPARK-39015-3.3. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
This PR proposes to remove the the usage of
toSQLValue(v)without an explicit type.Literal(v)is intended to be used from end-users so it cannot handle our internal types such asUTF8StringandArrayBasedMapData. Using this method can lead to unexpected error messages such as:Since It is impossible to have the corresponding data type from the internal types as one type can map to multiple external types (e.g.,
LongforTimestamp,TimestampNTZ, andLongType), the removal approach was taken.Why are the changes needed?
To provide the error messages as intended.
Does this PR introduce any user-facing change?
Yes.
Before:
After:
How was this patch tested?
Unittest was added. Otherwise, existing test cases should cover.