Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.4: Add support for reading Iceberg views #9422

Merged
merged 1 commit into from Jan 5, 2024

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Jan 5, 2024

This backports #9340 to Spark 3.4

@nastra nastra added this to In progress in View support Jan 5, 2024
@nastra nastra added this to the Iceberg 1.5.0 milestone Jan 5, 2024
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nastra !

@amogh-jahagirdar
Copy link
Contributor

Since this was a clean backport. I'll go ahead and merge.

@amogh-jahagirdar amogh-jahagirdar merged commit 2101ac2 into apache:main Jan 5, 2024
31 checks passed
@nastra nastra deleted the spark34-view-read-support branch January 5, 2024 16:23
@nastra nastra moved this from In progress to Done in View support Jan 16, 2024
geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024
adnanhemani pushed a commit to adnanhemani/iceberg that referenced this pull request Jan 30, 2024
Comment on lines +128 to +131
// there's no explicit view defined for spark, so it will fall back to the defined trino view
assertThat(sql("SELECT * FROM %s", viewName))
.hasSize(10)
.containsExactlyInAnyOrderElementsOf(expected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to make this the default behavior? There are views that are equally parsable by both Trino and Spark but they mean different things. So just using a Trino view "as is" in Spark may be incorrect. Consider the case when the view accesses array elements, where array indexes start from 0 in Spark and 1 in Trino. FYI @rdblue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to consider having a strictness flag of some sort, that by default would only allow reading/modifying views that have been created by Spark. In a lenient mode this could then also fallback reading views that have been created by a different engine. @wmoustafa thoughts on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not prefer it, but yes that is the least we could do, along with explicitly stating the caveats like array indexes, null handling, etc.

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants