New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.4: Add support for reading Iceberg views #9422
Conversation
0491871
to
7b92c36
Compare
7b92c36
to
45e2f84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nastra !
Since this was a clean backport. I'll go ahead and merge. |
// there's no explicit view defined for spark, so it will fall back to the defined trino view | ||
assertThat(sql("SELECT * FROM %s", viewName)) | ||
.hasSize(10) | ||
.containsExactlyInAnyOrderElementsOf(expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to make this the default behavior? There are views that are equally parsable by both Trino and Spark but they mean different things. So just using a Trino view "as is" in Spark may be incorrect. Consider the case when the view accesses array elements, where array indexes start from 0 in Spark and 1 in Trino. FYI @rdblue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to consider having a strictness
flag of some sort, that by default would only allow reading/modifying views that have been created by Spark. In a lenient mode this could then also fallback reading views that have been created by a different engine. @wmoustafa thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not prefer it, but yes that is the least we could do, along with explicitly stating the caveats like array indexes, null handling, etc.
This backports #9340 to Spark 3.4