- 
                Notifications
    
You must be signed in to change notification settings  - Fork 28.9k
 
[SPARK-51646][SQL] Fix propagating collation in views with default collation #50436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51646][SQL] Fix propagating collation in views with default collation #50436
Conversation
        
          
                ...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
          
            Show resolved
            Hide resolved
        
      4237ea6    to
    6af22b3      
    Compare
  
    | // and does not include the DEFAULT COLLATION part, resulting in a plan without collation. | ||
| val plan = if (metadata.collation.isDefined) { | ||
| val newType = StringType(metadata.collation.get) | ||
| ResolveDDLCommandStringTypes.transformPlan(parsedPlan, newType) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work for unresolved parsed plan? I feel it's better to match the View node and transform its child to resolve collation in the rule ResolveDDLCommandStringTypes, which is similar to how we match DDL/DML commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that created View doesn't pass through ResolveDDLCommandStringTypes. I tried to catch it with debugger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ViewResolution.scala
Line 24 in a73ca47
| object ViewResolution { | 
When resolving a View, we only resolve its child, meaning the View itself does not go through ResolveDDLCommandStringTypes.
Also, I think that this is clear way to resolve the problem because we will have correct plan from the moment of creating View.
        
          
                ...yst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDDLCommandStringTypes.scala
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm except of minor comments
| | (c1) | ||
| | DEFAULT COLLATION sr_ai | ||
| | AS SELECT 'Ć' as c1 WHERE 'Ć' = 'C' | ||
| |""".stripMargin) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should 2 spaces indentation, please, tune your IDE. Also see https://github.com/databricks/scala-style-guide?tab=readme-ov-file#indent
| s"""CREATE VIEW $testView DEFAULT COLLATION UTF8_LCASE | ||
| | as SELECT 'a' as c1 | ||
| |""".stripMargin) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: indentation
| 
           +1, LGTM. Merging to master/4.0.  | 
    
…llation ### What changes were proposed in this pull request? Fixed propagating default collation to literals, subqueries, etc., in `CREATE VIEW ... DEFAULT COLLATION ...` query. The issue was that the saved string used to construct the view did not include the `DEFAULT COLLATION` ... clause, resulting in the view being created without collation information. ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests added to `DefaultCollationTestSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50436 from ilicmarkodb/fix_subquery_literals_in_views_with_default_collation. Authored-by: ilicmarkodb <marko.ilic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit babb950) Signed-off-by: Max Gekk <max.gekk@gmail.com>
…llation ### What changes were proposed in this pull request? Fixed propagating default collation to literals, subqueries, etc., in `CREATE VIEW ... DEFAULT COLLATION ...` query. The issue was that the saved string used to construct the view did not include the `DEFAULT COLLATION` ... clause, resulting in the view being created without collation information. ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests added to `DefaultCollationTestSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50436 from ilicmarkodb/fix_subquery_literals_in_views_with_default_collation. Authored-by: ilicmarkodb <marko.ilic@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
| // Also, table DEFAULT COLLATION cannot be specified through CREATE TABLE AS SELECT command. | ||
| case _: V2CreateTablePlan | _: ReplaceTable | _: CreateView | _: AlterViewAs => true | ||
| // Check if view has default collation | ||
| case _ if AnalysisContext.get.collation.isDefined => true | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this implementation is quite wrong, because now we match all the views from SELECT queries in ResolveDDLCommandStringTypes.
@cloud-fan can we reconsider this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean we should only do it if the collation is not the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is DDL specific (isDDLCommand, isCreateOrAlterPlan), but the rule actually changes plans of queries over collated views as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean nested views? every time we resolve a view we set this thread-local collation so it shouldn't be messed up.
We can fix the naming issues later but we do need to apply view default collation to view queries.
| 
               | 
          ||
| // Check if view has default collation | ||
| case _ if AnalysisContext.get.collation.isDefined => | ||
| StringType(AnalysisContext.get.collation.get) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilicmarkodb can you please implement this in the single-pass Analyzer? Thanks!



What changes were proposed in this pull request?
Fixed propagating default collation to literals, subqueries, etc., in
CREATE VIEW ... DEFAULT COLLATION ...query.The issue was that the saved string used to construct the view did not include the
DEFAULT COLLATION... clause, resulting in the view being created without collation information.Why are the changes needed?
Bug fix.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Tests added to
DefaultCollationTestSuite.Was this patch authored or co-authored using generative AI tooling?
No.