Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47359][SQL] Support TRANSLATE function to work with collated strings #45820

Closed
wants to merge 30 commits into from

Conversation

miland-db
Copy link
Contributor

What changes were proposed in this pull request?

Extend built-in string functions to support non-binary, non-lowercase collation for: translate

Why are the changes needed?

Update collation support for built-in string functions in Spark.

Does this PR introduce any user-facing change?

Yes, users should now be able to use COLLATE within arguments for built-in string function TRANSLATE in Spark SQL queries, using non-binary collations such as UNICODE_CI.

How was this patch tested?

Unit tests for queries using StringTranslate (CollationStringExpressionsSuite.scala).

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 2, 2024
@miland-db
Copy link
Contributor Author

Adding Belgrade collation crew if anyone would like to review this: @uros-db @stefankandic @mihailom-db @dbatomic

@HyukjinKwon
Copy link
Member

Let's follow https://github.com/databricks/scala-style-guide, and remove unrelated changes, e.g., adding newlines which makes cherry-pick/backporting/reverting difficult.

@uros-db
Copy link
Contributor

uros-db commented Apr 11, 2024

heads up: we’ve done some major code restructuring in #45978, so please sync these changes before moving on

@miland-db you’ll likely need to rewrite the code in this PR, so please follow the guidelines outlined in https://issues.apache.org/jira/browse/SPARK-47410

# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
#	sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
Copy link
Contributor

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just flagging this PR will likely need a fix for the ICU implementation

# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
# Conflicts:
#	common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
Copy link
Contributor

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@miland-db miland-db requested a review from uros-db April 30, 2024 13:46
@miland-db
Copy link
Contributor Author

@cloud-fan please review

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 0329479 Apr 30, 2024
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…trings

### What changes were proposed in this pull request?
Extend built-in string functions to support non-binary, non-lowercase collation for: `translate`

### Why are the changes needed?
Update collation support for built-in string functions in Spark.

### Does this PR introduce _any_ user-facing change?
Yes, users should now be able to use COLLATE within arguments for built-in string function TRANSLATE in Spark SQL queries, using non-binary collations such as UNICODE_CI.

### How was this patch tested?
Unit tests for queries using StringTranslate (CollationStringExpressionsSuite.scala).

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45820 from miland-db/miland-db/string-translate.

Authored-by: Milan Dankovic <milan.dankovic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants