Spark: Support dropping views #9421

nastra · 2024-01-05T11:44:48Z

This PR introduces DROP support (https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-drop-view.html) for Iceberg views and requires a pre-substitution batch in
IcebergSparkSqlExtensionsParser because ResolveSessionCatalog exits early in https://github.com/apache/spark/blob/branch-3.5/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L224-L229 for V2 commands

nastra · 2024-01-05T11:54:54Z

....5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala

-      plan: LogicalPlan,
-      catalogAndNamespace: Seq[String]): LogicalPlan = plan transformExpressions {
+    plan: LogicalPlan,
+    catalogAndNamespace: Seq[String]): LogicalPlan = plan transformExpressions {


formatting was off here

rdblue · 2024-01-05T17:27:11Z

.../scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala

+      errorOnExceed = true,
+      maxIterationsSetting = SQLConf.ANALYZER_MAX_ITERATIONS.key)
+
+    override protected def batches: Seq[Batch] = Seq(Batch("pre-substitution", fixedPoint, V2ViewSubstitution))


Do we want to call this pre-substitution still? I originally used that because I thought we wanted substitution rules in it. But it turns out that we don't need this for view substitution, only for command hijacking. Maybe a "Hijack Commands" batch?

And it just occurred to me that we may not need an executor at all if we don't need to run to a fixed point. Can we just apply a command hijacking rule by itself instead?

...extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec.scala

spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java

....5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala

rdblue

Overall, this looks about ready. The only major things are:

Do we need to convert all DropView commands to an Iceberg plan and convert back if the catalog isn't a v2 catalog?
Do we still need a rule batch or can we just apply the rule to convert DropView and other commands once? I thought we needed a batch for multiple view substitutions, which is no longer the case.

rdblue · 2024-01-16T00:44:44Z

...rk-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/HijackViewCommands.scala

+
+  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+    case DropView(UnresolvedIdentifier(nameParts, allowTemp), ifExists)
+      if isViewCatalog(catalogManager.currentCatalog) && !isTempView(nameParts) =>


I don't think this test is correct. The pattern using UnresolvedIdentifier will match any DROP VIEW plan and convert it to an Iceberg plan. The isViewCatalog check needs to test the catalog responsible for the view, not the current catalog. The current catalog check makes the v1 drop view test work when USE spark_catalog was called, but it would have the wrong behavior if an Iceberg catalog were the current.

I think the logic should use CatalogAndIdentifier from LookupCatalog like Spark does in ResolveCatalogs:

case d@DropView(UnresolvedIdentifier(nameParts, allowTemp), ifExists) if !isTempView(allowTemp, nameParts) => nameParts match { case CatalogAndIdentifier(catalog, ident) if isViewCatalog(catalog) => DropIcebergView(ResolvedIdentifier(catalog, ident), ifExists) case _ => d }

That way we know that the command is only replaced if a v2 ViewCatalog is responsible for it.

You can also make this easier for the next time by creating a custom pattern with unapply:

object ResolvedView { def unapply(unresolved: UnresolvedIdentifier): Option[ResolvedIdentifier] = unresolved match { case UnresolvedIdentifier(nameParts, true) if isTempView(nameParts) => None case UnresolvedIdentifier(CatalogAndIdentifier(catalog, ident), _) if isViewCatalog(catalog) => Some(ResolvedIdentifier(catalog, ident)) case _ => None } }

Then the rule is much simpler:

override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { case DropView(ResolvedView(resolved), ifExists) => DropIcebergView(resolved, ifExists) }

I've opened nastra#138 with these changes and test updates to catch this case by changing back to the test catalog.

thanks for spotting and fixing this 💯

...rk-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/HijackViewCommands.scala

rdblue · 2024-01-16T00:47:11Z

.../main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala

@@ -90,6 +93,9 @@ case class ExtendedDataSourceV2Strategy(spark: SparkSession) extends Strategy wi
    case OrderAwareCoalesce(numPartitions, coalescer, child) =>
      OrderAwareCoalesceExec(numPartitions, coalescer, planLater(child)) :: Nil

+    case DropIcebergView(ResolvedIdentifier(viewCatalog: ViewCatalog, ident), ifExists) =>


I'm not sure what happens if this isn't a ViewCatalog, but the new rewrite rule should ensure that it always is.

rdblue · 2024-01-16T00:47:38Z

I think once this includes the changes from nastra#138, I'm +1.

nastra · 2024-01-16T11:39:30Z

thanks for reviewing @rdblue. I'll go ahead and merge this, since everything should be addressed

github-actions bot added the spark label Jan 5, 2024

nastra force-pushed the spark-view-drop-support branch 2 times, most recently from 706bb5c to 17641a6 Compare January 5, 2024 11:52

nastra added this to the Iceberg 1.5.0 milestone Jan 5, 2024

nastra added this to In progress in View support Jan 5, 2024

nastra commented Jan 5, 2024

View reviewed changes

nastra requested a review from rdblue January 5, 2024 14:45

nastra mentioned this pull request Jan 5, 2024

Spark: Support creating views via SQL #9423

Merged

rdblue reviewed Jan 5, 2024

View reviewed changes

...extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DropV2ViewExec.scala Outdated Show resolved Hide resolved

rdblue reviewed Jan 5, 2024

View reviewed changes

spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 5, 2024

View reviewed changes

spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java Show resolved Hide resolved

rdblue reviewed Jan 5, 2024

View reviewed changes

....5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala Outdated Show resolved Hide resolved

rdblue requested changes Jan 5, 2024

View reviewed changes

nastra force-pushed the spark-view-drop-support branch from 73424e3 to 77efc8d Compare January 8, 2024 10:05

nastra requested a review from rdblue January 8, 2024 10:06

rdblue reviewed Jan 16, 2024

View reviewed changes

...rk-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/HijackViewCommands.scala Outdated Show resolved Hide resolved

rdblue reviewed Jan 16, 2024

View reviewed changes

nastra mentioned this pull request Jan 16, 2024

Spark: Support renaming views #9343

Merged

Spark: Support dropping Views

a420f0b

nastra force-pushed the spark-view-drop-support branch from 307015a to a420f0b Compare January 16, 2024 10:43

nastra merged commit 2cda2b9 into apache:main Jan 16, 2024
31 checks passed

nastra deleted the spark-view-drop-support branch January 16, 2024 11:40

nastra moved this from In progress to Done in View support Jan 16, 2024

nastra mentioned this pull request Jan 18, 2024

Spark 3.4: Support dropping views #9508

Merged

geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024

Spark: Support dropping Views (apache#9421)

4c3b7d9

adnanhemani pushed a commit to adnanhemani/iceberg that referenced this pull request Jan 30, 2024

Spark: Support dropping Views (apache#9421)

f2b204e

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024

Spark: Support dropping Views (apache#9421)

ec1e6b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Support dropping views #9421

Spark: Support dropping views #9421

nastra commented Jan 5, 2024 •

edited

nastra Jan 5, 2024

rdblue Jan 5, 2024

rdblue Jan 5, 2024

rdblue left a comment

rdblue Jan 16, 2024

nastra Jan 16, 2024

rdblue Jan 16, 2024

rdblue commented Jan 16, 2024

nastra commented Jan 16, 2024

Spark: Support dropping views #9421

Spark: Support dropping views #9421

Conversation

nastra commented Jan 5, 2024 • edited

nastra Jan 5, 2024

Choose a reason for hiding this comment

rdblue Jan 5, 2024

Choose a reason for hiding this comment

rdblue Jan 5, 2024

Choose a reason for hiding this comment

rdblue left a comment

Choose a reason for hiding this comment

rdblue Jan 16, 2024

Choose a reason for hiding this comment

nastra Jan 16, 2024

Choose a reason for hiding this comment

rdblue Jan 16, 2024

Choose a reason for hiding this comment

rdblue commented Jan 16, 2024

nastra commented Jan 16, 2024

nastra commented Jan 5, 2024 •

edited