Skip to content

Conversation

@ilicmarkodb
Copy link
Contributor

What changes were proposed in this pull request?

Fixed propagating default collation to literals, subqueries, etc., in CREATE VIEW ... DEFAULT COLLATION ... query.
The issue was that the saved string used to construct the view did not include the DEFAULT COLLATION ... clause, resulting in the view being created without collation information.

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Tests added to DefaultCollationTestSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 27, 2025
@ilicmarkodb ilicmarkodb force-pushed the fix_subquery_literals_in_views_with_default_collation branch from 4237ea6 to 6af22b3 Compare March 30, 2025 16:17
// and does not include the DEFAULT COLLATION part, resulting in a plan without collation.
val plan = if (metadata.collation.isDefined) {
val newType = StringType(metadata.collation.get)
ResolveDDLCommandStringTypes.transformPlan(parsedPlan, newType)
Copy link
Contributor

@cloud-fan cloud-fan Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for unresolved parsed plan? I feel it's better to match the View node and transform its child to resolve collation in the rule ResolveDDLCommandStringTypes, which is similar to how we match DDL/DML commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that created View doesn't pass through ResolveDDLCommandStringTypes. I tried to catch it with debugger.

Copy link
Contributor Author

@ilicmarkodb ilicmarkodb Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When resolving a View, we only resolve its child, meaning the View itself does not go through ResolveDDLCommandStringTypes.
Also, I think that this is clear way to resolve the problem because we will have correct plan from the moment of creating View.

@ilicmarkodb ilicmarkodb requested a review from cloud-fan March 31, 2025 21:55
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except of minor comments

Comment on lines 417 to 420
| (c1)
| DEFAULT COLLATION sr_ai
| AS SELECT 'Ć' as c1 WHERE 'Ć' = 'C'
|""".stripMargin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 2 spaces indentation, please, tune your IDE. Also see https://github.com/databricks/scala-style-guide?tab=readme-ov-file#indent

Comment on lines 429 to 431
s"""CREATE VIEW $testView DEFAULT COLLATION UTF8_LCASE
| as SELECT 'a' as c1
|""".stripMargin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: indentation

@MaxGekk
Copy link
Member

MaxGekk commented Apr 4, 2025

+1, LGTM. Merging to master/4.0.
Thank you, @ilicmarkodb and @cloud-fan @dejankrak-db @stefankandic for review.

@MaxGekk MaxGekk closed this in babb950 Apr 4, 2025
MaxGekk pushed a commit that referenced this pull request Apr 4, 2025
…llation

### What changes were proposed in this pull request?
Fixed propagating default collation to literals, subqueries, etc., in `CREATE VIEW ... DEFAULT COLLATION ...` query.
The issue was that the saved string used to construct the view did not include the `DEFAULT COLLATION` ... clause, resulting in the view being created without collation information.

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tests added to `DefaultCollationTestSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #50436 from ilicmarkodb/fix_subquery_literals_in_views_with_default_collation.

Authored-by: ilicmarkodb <marko.ilic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit babb950)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
summaryzb pushed a commit to summaryzb/spark that referenced this pull request Apr 5, 2025
…llation

### What changes were proposed in this pull request?
Fixed propagating default collation to literals, subqueries, etc., in `CREATE VIEW ... DEFAULT COLLATION ...` query.
The issue was that the saved string used to construct the view did not include the `DEFAULT COLLATION` ... clause, resulting in the view being created without collation information.

### Why are the changes needed?
Bug fix.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tests added to `DefaultCollationTestSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50436 from ilicmarkodb/fix_subquery_literals_in_views_with_default_collation.

Authored-by: ilicmarkodb <marko.ilic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
// Also, table DEFAULT COLLATION cannot be specified through CREATE TABLE AS SELECT command.
case _: V2CreateTablePlan | _: ReplaceTable | _: CreateView | _: AlterViewAs => true
// Check if view has default collation
case _ if AnalysisContext.get.collation.isDefined => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation is quite wrong, because now we match all the views from SELECT queries in ResolveDDLCommandStringTypes.

image
image
image

@cloud-fan can we reconsider this implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean we should only do it if the collation is not the default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is DDL specific (isDDLCommand, isCreateOrAlterPlan), but the rule actually changes plans of queries over collated views as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean nested views? every time we resolve a view we set this thread-local collation so it shouldn't be messed up.

We can fix the naming issues later but we do need to apply view default collation to view queries.


// Check if view has default collation
case _ if AnalysisContext.get.collation.isDefined =>
StringType(AnalysisContext.get.collation.get)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilicmarkodb can you please implement this in the single-pass Analyzer? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants