Skip to content

[SPARK-47352][SQL] Fix Upper, Lower, InitCap collation awareness#46104

Closed
mihailomilosevic2001 wants to merge 14 commits intoapache:masterfrom
mihailomilosevic2001:SPARK-47352
Closed

[SPARK-47352][SQL] Fix Upper, Lower, InitCap collation awareness#46104
mihailomilosevic2001 wants to merge 14 commits intoapache:masterfrom
mihailomilosevic2001:SPARK-47352

Conversation

@mihailomilosevic2001
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add support for Locale aware expressions.

Why are the changes needed?

This is needed as some future collations might use different Locales then default.

Does this PR introduce any user-facing change?

Yes, we follow ICU implementations for collations that are non native.

How was this patch tested?

Tests for Upper, Lower and InitCap already exist.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Apr 17, 2024
# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
assertInitCap("ÄBĆΔE", "UTF8_BINARY_LCASE", "Äbćδe");
assertInitCap("aB世De", "UNICODE", "Ab世de");
assertInitCap("ÄBĆΔE", "UNICODE", "Äbćδe");
assertInitCap("aB世De", "UNICODE_CI", "Ab世De");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Should we just proceed with this ICU result? I am would expect Ab世de, but ICU seems to do conversion this way when we use UNICODE_CI.

public static UTF8String execBinary(final UTF8String v) {
return v.toUpperCase();
}
public static UTF8String execLowercase(final UTF8String v) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as execBinary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged these two to execUTF8.

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in b9f2270 Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants