Skip to content

[SPARK-39015][SQL][3.3] Remove the usage of toSQLValue(v) without an explicit type#36375

Closed
HyukjinKwon wants to merge 2 commits intoapache:branch-3.3from
HyukjinKwon:SPARK-39015-3.3
Closed

[SPARK-39015][SQL][3.3] Remove the usage of toSQLValue(v) without an explicit type#36375
HyukjinKwon wants to merge 2 commits intoapache:branch-3.3from
HyukjinKwon:SPARK-39015-3.3

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR is a backport of #36351

This PR proposes to remove the the usage of toSQLValue(v) without an explicit type.

Literal(v) is intended to be used from end-users so it cannot handle our internal types such as UTF8String and ArrayBasedMapData. Using this method can lead to unexpected error messages such as:

Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
  at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
  ...

Since It is impossible to have the corresponding data type from the internal types as one type can map to multiple external types (e.g., Long for Timestamp, TimestampNTZ, and LongType), the removal approach was taken.

Why are the changes needed?

To provide the error messages as intended.

Does this PR introduce any user-facing change?

Yes.

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.DataTypes

val arrayStructureData = Seq(
Row(Map("hair"->"black", "eye"->"brown")),
Row(Map("hair"->"blond", "eye"->"blue")),
Row(Map()))

val mapType  = DataTypes.createMapType(StringType, StringType)

val arrayStructureSchema = new StructType().add("properties", mapType)

val mapTypeDF = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)

spark.conf.set("spark.sql.ansi.enabled", true)
mapTypeDF.selectExpr("element_at(properties, 'hair')").show

Before:

Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
  at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
  ...

After:

Caused by: org.apache.spark.SparkNoSuchElementException: [MAP_KEY_DOES_NOT_EXIST] Key 'hair' does not exist. To return NULL instead, use 'try_element_at'. If necessary set spark.sql.ansi.enabled to false to bypass this error.
== SQL(line 1, position 0) ==
element_at(properties, 'hair')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

How was this patch tested?

Unittest was added. Otherwise, existing test cases should cover.

…cit type

This PR proposes to remove the the usage of `toSQLValue(v)` without an explicit type.

`Literal(v)` is intended to be used from end-users so it cannot handle our internal types such as `UTF8String` and `ArrayBasedMapData`. Using this method can lead to unexpected error messages such as:

```
Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
  at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
  ...
```

Since It is impossible to have the corresponding data type from the internal types as one type can map to multiple external types (e.g., `Long` for `Timestamp`, `TimestampNTZ`, and `LongType`), the removal approach was taken.

To provide the error messages as intended.

Yes.

```scala
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.DataTypes

val arrayStructureData = Seq(
Row(Map("hair"->"black", "eye"->"brown")),
Row(Map("hair"->"blond", "eye"->"blue")),
Row(Map()))

val mapType  = DataTypes.createMapType(StringType, StringType)

val arrayStructureSchema = new StructType().add("properties", mapType)

val mapTypeDF = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)

spark.conf.set("spark.sql.ansi.enabled", true)
mapTypeDF.selectExpr("element_at(properties, 'hair')").show
```

Before:

```
Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
  at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
  ...
```

After:

```
Caused by: org.apache.spark.SparkNoSuchElementException: [MAP_KEY_DOES_NOT_EXIST] Key 'hair' does not exist. To return NULL instead, use 'try_element_at'. If necessary set spark.sql.ansi.enabled to false to bypass this error.
== SQL(line 1, position 0) ==
element_at(properties, 'hair')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

Unittest was added. Otherwise, existing test cases should cover.

Closes apache#36351 from HyukjinKwon/SPARK-39015.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit e49147a)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon changed the title [SPARK-39015][SQL] Remove the usage of toSQLValue(v) without an explicit type [SPARK-39015][SQL][3.3] Remove the usage of toSQLValue(v) without an explicit type Apr 27, 2022
@github-actions github-actions bot added the SQL label Apr 27, 2022
@MaxGekk
Copy link
Member

MaxGekk commented Apr 27, 2022

+1, LGTM. Merging to 3.3.
Thank you, @HyukjinKwon.

MaxGekk pushed a commit that referenced this pull request Apr 27, 2022
…explicit type

### What changes were proposed in this pull request?

This PR is a backport of #36351

This PR proposes to remove the the usage of `toSQLValue(v)` without an explicit type.

`Literal(v)` is intended to be used from end-users so it cannot handle our internal types such as `UTF8String` and `ArrayBasedMapData`. Using this method can lead to unexpected error messages such as:

```
Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
  at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
  ...
```

Since It is impossible to have the corresponding data type from the internal types as one type can map to multiple external types (e.g., `Long` for `Timestamp`, `TimestampNTZ`, and `LongType`), the removal approach was taken.

### Why are the changes needed?

To provide the error messages as intended.

### Does this PR introduce _any_ user-facing change?

Yes.

```scala
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.DataTypes

val arrayStructureData = Seq(
Row(Map("hair"->"black", "eye"->"brown")),
Row(Map("hair"->"blond", "eye"->"blue")),
Row(Map()))

val mapType  = DataTypes.createMapType(StringType, StringType)

val arrayStructureSchema = new StructType().add("properties", mapType)

val mapTypeDF = spark.createDataFrame(
    spark.sparkContext.parallelize(arrayStructureData),arrayStructureSchema)

spark.conf.set("spark.sql.ansi.enabled", true)
mapTypeDF.selectExpr("element_at(properties, 'hair')").show
```

Before:

```
Caused by: org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE] The feature is not supported: literal for 'hair' of class org.apache.spark.unsafe.types.UTF8String.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:241)
  at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:99)
  at org.apache.spark.sql.errors.QueryErrorsBase.toSQLValue(QueryErrorsBase.scala:45)
  ...
```

After:

```
Caused by: org.apache.spark.SparkNoSuchElementException: [MAP_KEY_DOES_NOT_EXIST] Key 'hair' does not exist. To return NULL instead, use 'try_element_at'. If necessary set spark.sql.ansi.enabled to false to bypass this error.
== SQL(line 1, position 0) ==
element_at(properties, 'hair')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

### How was this patch tested?

Unittest was added. Otherwise, existing test cases should cover.

Closes #36375 from HyukjinKwon/SPARK-39015-3.3.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk MaxGekk closed this Apr 27, 2022
@HyukjinKwon HyukjinKwon deleted the SPARK-39015-3.3 branch January 15, 2024 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants