Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT][SQL] Eliminate unnecessary COLLATE expressions in query analysis #46421

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

uros-db
Copy link
Contributor

@uros-db uros-db commented May 7, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label May 7, 2024
@uros-db uros-db changed the title [WIP][SQL] Eliminate unnecessary COLLATE expressions in query analysis [WIP][SPARK-48156][SQL] Eliminate unnecessary COLLATE expressions in query analysis May 7, 2024
*/
object EliminateCollates extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions {
case Collate(child, collation) if child.dataType.sameType(StringType(collation)) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice solution, but unfortunately, in this setting we can't do it. If we just blindly remove Collate expressions from any place we will change a meaning of StringType priority. In other words, it will not have explicit meaning, but only implicit/default which is not correct, as user specifically said COLLATE in their query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consider something like modifying flags in order to preserve priority? i.e. remove the collate expression like in this rule, but also flag the corresponding StringType as "explicit" in priority - that way we could maintain the desired priority, and also get rid of the extra COLLATE expression

@uros-db uros-db changed the title [WIP][SPARK-48156][SQL] Eliminate unnecessary COLLATE expressions in query analysis [DRAFT][SQL] Eliminate unnecessary COLLATE expressions in query analysis May 7, 2024
@uros-db uros-db marked this pull request as draft May 14, 2024 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants