Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40308][SQL] Allow non-foldable delimiter arguments to str_to_map function #37763

Closed
wants to merge 3 commits into from

Conversation

bersprockets
Copy link
Contributor

@bersprockets bersprockets commented Sep 1, 2022

What changes were proposed in this pull request?

Remove the check for foldable delimiter arguments from StringToMap#checkInputDataTypes.

Except for checkInputDataTypes, StringToMap is already able to handle non-foldable delimiter arguments (no other changes are required).

Why are the changes needed?

Hive 2.3.9 allows non-foldable delimiter arguments, e.g.:

drop table if exists maptbl;
create table maptbl as select ',' as del1, ':' as del2, 'a:1,b:2,c:3' as str;
insert into table maptbl select '%' as del1, '-' as del2, 'a-1%b-2%c-3' as str;
select str, str_to_map(str, del1, del2) from maptbl;

This returns

+--------------+----------------------------+
|     str      |            _c1             |
+--------------+----------------------------+
| a:1,b:2,c:3  | {"a":"1","b":"2","c":"3"}  |
| a-1%b-2%c-3  | {"a":"1","b":"2","c":"3"}  |
+--------------+----------------------------+
2 rows selected (0.13 seconds)

However, Spark returns an error:

str_to_map's delimiters must be foldable.; line 1 pos 12;

The use-case is more likely to be something like this:

select
  str,
  str_to_map(str, ',', if(region = 0, ':', '#')) as m
from
  maptbl2;

Does this PR introduce any user-facing change?

Yes, users can now specify non-foldable delimiter arguments to str_to_map.

Literals are still accepted, so the change is backwardly compatible.

How was this patch tested?

New unit tests.

@github-actions github-actions bot added the SQL label Sep 1, 2022
@@ -606,6 +606,36 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSession {
Seq(Row(Map("a" -> "1", "b" -> "2", "c" -> "3")))
)

checkAnswer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a test of my change, but it was a missing test (the case where a literal pair delimiter is given but no key/value delimiter is given).

@MaxGekk
Copy link
Member

MaxGekk commented Sep 2, 2022

@cloud-fan I wonder why did you insist in #13990 (comment) that the delimiters must be foldable.

} else {
super.checkInputDataTypes()
}
super.checkInputDataTypes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you invoke only the parent method, let's remove checkInputDataTypes() then.

@MaxGekk
Copy link
Member

MaxGekk commented Sep 3, 2022

+1, LGTM. Merging to master.
Thank you, @bersprockets and @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in 1431975 Sep 3, 2022
@bersprockets bersprockets deleted the str_to_map_play branch November 1, 2022 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants