Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Support dropping views #9421

Merged
merged 1 commit into from Jan 16, 2024
Merged

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Jan 5, 2024

This PR introduces DROP support (https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-drop-view.html) for Iceberg views and requires a pre-substitution batch in
IcebergSparkSqlExtensionsParser because ResolveSessionCatalog exits early in https://github.com/apache/spark/blob/branch-3.5/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L224-L229 for V2 commands

@github-actions github-actions bot added the spark label Jan 5, 2024
@nastra nastra force-pushed the spark-view-drop-support branch 2 times, most recently from 706bb5c to 17641a6 Compare January 5, 2024 11:52
@nastra nastra added this to the Iceberg 1.5.0 milestone Jan 5, 2024
@nastra nastra added this to In progress in View support Jan 5, 2024
plan: LogicalPlan,
catalogAndNamespace: Seq[String]): LogicalPlan = plan transformExpressions {
plan: LogicalPlan,
catalogAndNamespace: Seq[String]): LogicalPlan = plan transformExpressions {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting was off here

errorOnExceed = true,
maxIterationsSetting = SQLConf.ANALYZER_MAX_ITERATIONS.key)

override protected def batches: Seq[Batch] = Seq(Batch("pre-substitution", fixedPoint, V2ViewSubstitution))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call this pre-substitution still? I originally used that because I thought we wanted substitution rules in it. But it turns out that we don't need this for view substitution, only for command hijacking. Maybe a "Hijack Commands" batch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it just occurred to me that we may not need an executor at all if we don't need to run to a fixed point. Can we just apply a command hijacking rule by itself instead?

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks about ready. The only major things are:

  1. Do we need to convert all DropView commands to an Iceberg plan and convert back if the catalog isn't a v2 catalog?
  2. Do we still need a rule batch or can we just apply the rule to convert DropView and other commands once? I thought we needed a batch for multiple view substitutions, which is no longer the case.


override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
case DropView(UnresolvedIdentifier(nameParts, allowTemp), ifExists)
if isViewCatalog(catalogManager.currentCatalog) && !isTempView(nameParts) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test is correct. The pattern using UnresolvedIdentifier will match any DROP VIEW plan and convert it to an Iceberg plan. The isViewCatalog check needs to test the catalog responsible for the view, not the current catalog. The current catalog check makes the v1 drop view test work when USE spark_catalog was called, but it would have the wrong behavior if an Iceberg catalog were the current.

I think the logic should use CatalogAndIdentifier from LookupCatalog like Spark does in ResolveCatalogs:

  case d@DropView(UnresolvedIdentifier(nameParts, allowTemp), ifExists)
      if !isTempView(allowTemp, nameParts) =>
    nameParts match {
      case CatalogAndIdentifier(catalog, ident) if isViewCatalog(catalog) =>
        DropIcebergView(ResolvedIdentifier(catalog, ident), ifExists)

      case _ =>
        d
    }

That way we know that the command is only replaced if a v2 ViewCatalog is responsible for it.

You can also make this easier for the next time by creating a custom pattern with unapply:

  object ResolvedView {
    def unapply(unresolved: UnresolvedIdentifier): Option[ResolvedIdentifier] = unresolved match {
      case UnresolvedIdentifier(nameParts, true) if isTempView(nameParts) =>
        None

      case UnresolvedIdentifier(CatalogAndIdentifier(catalog, ident), _) if isViewCatalog(catalog) =>
        Some(ResolvedIdentifier(catalog, ident))

      case _ =>
        None
    }
  }

Then the rule is much simpler:

  override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
    case DropView(ResolvedView(resolved), ifExists) =>
      DropIcebergView(resolved, ifExists)
  }

I've opened nastra#138 with these changes and test updates to catch this case by changing back to the test catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for spotting and fixing this 💯

@@ -90,6 +93,9 @@ case class ExtendedDataSourceV2Strategy(spark: SparkSession) extends Strategy wi
case OrderAwareCoalesce(numPartitions, coalescer, child) =>
OrderAwareCoalesceExec(numPartitions, coalescer, planLater(child)) :: Nil

case DropIcebergView(ResolvedIdentifier(viewCatalog: ViewCatalog, ident), ifExists) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what happens if this isn't a ViewCatalog, but the new rewrite rule should ensure that it always is.

@rdblue
Copy link
Contributor

rdblue commented Jan 16, 2024

I think once this includes the changes from nastra#138, I'm +1.

@nastra
Copy link
Contributor Author

nastra commented Jan 16, 2024

thanks for reviewing @rdblue. I'll go ahead and merge this, since everything should be addressed

@nastra nastra merged commit 2cda2b9 into apache:main Jan 16, 2024
31 checks passed
@nastra nastra deleted the spark-view-drop-support branch January 16, 2024 11:40
@nastra nastra moved this from In progress to Done in View support Jan 16, 2024
geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024
adnanhemani pushed a commit to adnanhemani/iceberg that referenced this pull request Jan 30, 2024
devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants