Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40066][SQL] ANSI mode: always return null on invalid access to map column #37503

Closed

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

Change the syntax of map column access under ANSI mode: always return null results instead of throwing MAP_KEY_DOES_NOT_EXIST errors.
This PR also remove an internal spark.sql.ansi.strictIndexOperator.

Why are the changes needed?

Since #30386, Spark always throws an error on invalid access to a map column. There is no such syntax in the ANSI SQL standard since there is no Map type in it. There is a similar type multiset which returns null on non-existing element access.
Also, I investigated PostgreSQL/Snowflake/Biguqery and all of them returns null return on map(json) key not exists.
I suggest loosen the the syntax here. When users get the error, most of them will just use try_element_at() to get the same syntax or just turn off the ANSI SQL mode.

Does this PR introduce any user-facing change?

Yes, see above

How was this patch tested?

Unit tests

@gengliangwang
Copy link
Member Author

cc @leanken-zz as well

@@ -2943,15 +2943,6 @@ object SQLConf {
.booleanConf
.createWithDefault(false)

val ANSI_STRICT_INDEX_OPERATOR = buildConf("spark.sql.ansi.strictIndexOperator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should put it in removedSQLConfigs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated

@gengliangwang
Copy link
Member Author

@cloud-fan thanks for the review. Merging to master

HyukjinKwon added a commit that referenced this pull request Aug 17, 2022
…etting its dataType

### What changes were proposed in this pull request?

This PR is a followup of #37503 that adds a check if the `ElementAt` expression is resolved or not before getting its dataType.

### Why are the changes needed?

To make the tests pass with ANSI enabled. Currently it fails (https://github.com/apache/spark/runs/7870131749?check_suite_focus=true) as below:

```
[info] - map_filter *** FAILED *** (243 milliseconds)
[info]  org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object
[info]  at org.apache.spark.sql.catalyst.expressions.UnresolvedNamedLambdaVariable.dataType(higherOrderFunctions.scala:46)
[info]  at org.apache.spark.sql.catalyst.expressions.ElementAt.initQueryContext(collectionOperations.scala:2275)
[info]  at org.apache.spark.sql.catalyst.expressions.SupportQueryContext.$init$(Expression.scala:603)
[info]  at org.apache.spark.sql.catalyst.expressions.ElementAt.<init>(collectionOperations.scala:2105)
[info]  at org.apache.spark.sql.functions$.element_at(functions.scala:3958)
[info]  at org.apache.spark.sql.DataFrameFunctionsSuite.$anonfun$new$452(DataFrameFunctionsSuite.scala:2476)
[info]  at org.apache.spark.sql.functions$.createLambda(functions.scala:4029)
[info]  at org.apache.spark.sql.functions$.map_filter(functions.scala:4256)
[info]  at org.apache.spark.sql.DataFrameFunctionsSuite.$anonfun$new$451(DataFrameFunctionsSuite.scala:2476)
[info]  at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:133)
[info]  at org.apache.spark.sql.DataFrameFunctionsSuite.$anonfun$new$445(DataFrameFunctionsSuite.scala:2478)
[info]  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]  at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]  at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]  at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
[info]  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:204)
[info]  at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
```

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually tested with ANSI mode enabled.

Closes #37548 from HyukjinKwon/SPARK-40066.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants