New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47359][SQL] Support TRANSLATE function to work with collated strings #45820
Conversation
Adding Belgrade collation crew if anyone would like to review this: @uros-db @stefankandic @mihailom-db @dbatomic |
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
Let's follow https://github.com/databricks/scala-style-guide, and remove unrelated changes, e.g., adding newlines which makes cherry-pick/backporting/reverting difficult. |
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
Outdated
Show resolved
Hide resolved
heads up: we’ve done some major code restructuring in #45978, so please sync these changes before moving on @miland-db you’ll likely need to rewrite the code in this PR, so please follow the guidelines outlined in https://issues.apache.org/jira/browse/SPARK-47410 |
# Conflicts: # common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java # sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
Show resolved
Hide resolved
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
# Conflicts: # common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just flagging this PR will likely need a fix for the ICU implementation
# Conflicts: # common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala
# Conflicts: # common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@cloud-fan please review |
thanks, merging to master! |
…trings ### What changes were proposed in this pull request? Extend built-in string functions to support non-binary, non-lowercase collation for: `translate` ### Why are the changes needed? Update collation support for built-in string functions in Spark. ### Does this PR introduce _any_ user-facing change? Yes, users should now be able to use COLLATE within arguments for built-in string function TRANSLATE in Spark SQL queries, using non-binary collations such as UNICODE_CI. ### How was this patch tested? Unit tests for queries using StringTranslate (CollationStringExpressionsSuite.scala). ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45820 from miland-db/miland-db/string-translate. Authored-by: Milan Dankovic <milan.dankovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Extend built-in string functions to support non-binary, non-lowercase collation for:
translate
Why are the changes needed?
Update collation support for built-in string functions in Spark.
Does this PR introduce any user-facing change?
Yes, users should now be able to use COLLATE within arguments for built-in string function TRANSLATE in Spark SQL queries, using non-binary collations such as UNICODE_CI.
How was this patch tested?
Unit tests for queries using StringTranslate (CollationStringExpressionsSuite.scala).
Was this patch authored or co-authored using generative AI tooling?
No