Spark 3.4: Add support for reading Iceberg views #9422

nastra · 2024-01-05T12:29:04Z

This backports #9340 to Spark 3.4

amogh-jahagirdar

amogh-jahagirdar · 2024-01-05T16:21:06Z

Since this was a clean backport. I'll go ahead and merge.

wmoustafa · 2024-01-31T00:22:22Z

spark/v3.4/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java

+    // there's no explicit view defined for spark, so it will fall back to the defined trino view
+    assertThat(sql("SELECT * FROM %s", viewName))
+        .hasSize(10)
+        .containsExactlyInAnyOrderElementsOf(expected);


Are we sure we want to make this the default behavior? There are views that are equally parsable by both Trino and Spark but they mean different things. So just using a Trino view "as is" in Spark may be incorrect. Consider the case when the view accesses array elements, where array indexes start from 0 in Spark and 1 in Trino. FYI @rdblue.

I think we might want to consider having a strictness flag of some sort, that by default would only allow reading/modifying views that have been created by Spark. In a lenient mode this could then also fallback reading views that have been created by a different engine. @wmoustafa thoughts on that?

I do not prefer it, but yes that is the least we could do, along with explicitly stating the caveats like array indexes, null handling, etc.

github-actions bot added spark build labels Jan 5, 2024

nastra force-pushed the spark34-view-read-support branch from 0491871 to 7b92c36 Compare January 5, 2024 12:37

Spark 3.4: Add support for reading Iceberg views

45e2f84

nastra force-pushed the spark34-view-read-support branch from 7b92c36 to 45e2f84 Compare January 5, 2024 12:37

nastra added this to In progress in View support Jan 5, 2024

nastra added this to the Iceberg 1.5.0 milestone Jan 5, 2024

amogh-jahagirdar approved these changes Jan 5, 2024

View reviewed changes

amogh-jahagirdar merged commit 2101ac2 into apache:main Jan 5, 2024
31 checks passed

nastra deleted the spark34-view-read-support branch January 5, 2024 16:23

nastra moved this from In progress to Done in View support Jan 16, 2024

geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024

Spark 3.4: Add support for reading Iceberg views (apache#9422)

03ef2ff

adnanhemani pushed a commit to adnanhemani/iceberg that referenced this pull request Jan 30, 2024

Spark 3.4: Add support for reading Iceberg views (apache#9422)

a2321bd

wmoustafa reviewed Jan 31, 2024

View reviewed changes

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024

Spark 3.4: Add support for reading Iceberg views (apache#9422)

59b8f93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.4: Add support for reading Iceberg views #9422

Spark 3.4: Add support for reading Iceberg views #9422

nastra commented Jan 5, 2024

amogh-jahagirdar left a comment

amogh-jahagirdar commented Jan 5, 2024

wmoustafa Jan 31, 2024

nastra Jan 31, 2024

wmoustafa Jan 31, 2024

Spark 3.4: Add support for reading Iceberg views #9422

Spark 3.4: Add support for reading Iceberg views #9422

Conversation

nastra commented Jan 5, 2024

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

amogh-jahagirdar commented Jan 5, 2024

wmoustafa Jan 31, 2024

Choose a reason for hiding this comment

nastra Jan 31, 2024

Choose a reason for hiding this comment

wmoustafa Jan 31, 2024

Choose a reason for hiding this comment