-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37569][SQL] Don't mark nested view fields as nullable #34839
Conversation
Great find @shardulm94 ! Nice to see that it's a simple fix. Appreciate the short history lesson as well. |
ok to test |
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala
Outdated
Show resolved
Hide resolved
Test build #146028 has finished for PR 34839 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
4b8e2ca
to
e34dbd6
Compare
Kubernetes integration test starting |
Kubernetes integration test status failure |
e34dbd6
to
be8b9df
Compare
Test build #146045 has finished for PR 34839 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems okay. This change doesn't break other tests too.
Test build #146052 has finished for PR 34839 at commit
|
val sql = "SELECT id, named_struct('a', id) AS nested FROM RANGE(10)" | ||
val df = spark.sql(sql) | ||
|
||
spark.sql(s"CREATE VIEW v AS $sql") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call the createView
method to create view, otherwise we just test the permanent view 3 times...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it now
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #146075 has finished for PR 34839 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
When analyzing a view, we should not unnecessarily mark nested fields as nullable. If the columns projected by the view define themselves as non-nullable, their nullability should be preserved.
Why are the changes needed?
Consider a view as follows with all fields non-nullable (required)
When trying to read this view, it incorrectly marks nested column a as nullable
However, we can see that the view schema has been correctly stored as non-nullable
This is caused by this line in Analyzer.scala. Going through the history of changes for this block of code, it seems like
asNullable
is a remnant of a time before we added checks to ensure that the from and to types of the cast were compatible. As nullability is already checked, it should be safe to add a cast without converting the target datatype to nullable.Does this PR introduce any user-facing change?
Yes. View analysis will preserve nullability of nested fields instead of marking all nested fields as nullable.
How was this patch tested?
Added unit test