Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Support dropping the CHECK constraints table feature #2987

Merged
merged 4 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions spark/src/main/resources/error/delta-error-classes.json
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,13 @@
],
"sqlState" : "42703"
},
"DELTA_CANNOT_DROP_CHECK_CONSTRAINT_FEATURE" : {
"message" : [
"Cannot drop the CHECK constraints table feature.",
"The following constraints must be dropped first: <constraints>."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that this a only a writer feature which means that can be safely ignored by readers that do not understand it. In that sense, dropping the feature should be allowed even if there are constraints in the table. They just need to be ignored. However, we do no need add a guard (if it is not already there) to only check constraints when the feature is enabled/supported. That is for writers that understand the feature.

On the other hand, if we do want to force this behavior it means the feature cannot be removed until the history is truncated. History validation and check is only done at the moment for all reader+writer features but there are plans to allow it for selected writer features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to that, it is good to drop constraints before dropping the feature but we should not check the history if there are constraints, i.e. actionUsesFeature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring users to drop all constraints first before allowing the feature to be dropped is intentional. Otherwise, some users might not realize that dropping the checkConstraints feature will drop all check constraints on their table, or they may not realize that there are check constraints defined on their table. In general it's a good principle that dropping (or adding) a feature should never affect the logical state of a table, only the physical state of a table. For example: Dropping the DV feature does not undelete rows, and dropping the CM feature does not revert name changes of columns.

We do not require all constraints to be removed from the history (see the added test), only that the current state of the table does not contain any check constraints. I have implemented actionUsesFeature, but AFAIK it is never called right now, as historyContainsFeature is only called for reader features (and this is a writer feature).

],
"sqlState" : "0AKDE"
},
"DELTA_CANNOT_EVALUATE_EXPRESSION" : {
"message" : [
"Cannot evaluate expression: <expression>"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,13 @@ trait DeltaErrorsBase
messageParameters = Array.empty)
}

def cannotDropCheckConstraintFeature(constraintNames: Seq[String]): AnalysisException = {
new DeltaAnalysisException(
errorClass = "DELTA_CANNOT_DROP_CHECK_CONSTRAINT_FEATURE",
messageParameters = Array(constraintNames.map(formatColumn).mkString(", "))
)
}

def incorrectLogStoreImplementationException(
sparkConf: SparkConf,
cause: Throwable): Throwable = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import org.apache.spark.sql.delta.catalog.DeltaTableV2
import org.apache.spark.sql.delta.commands.{AlterTableSetPropertiesDeltaCommand, AlterTableUnsetPropertiesDeltaCommand, DeltaReorgTableCommand, DeltaReorgTableMode, DeltaReorgTableSpec}
import org.apache.spark.sql.delta.commands.columnmapping.RemoveColumnMappingCommand
import org.apache.spark.sql.delta.commands.optimize.OptimizeMetrics
import org.apache.spark.sql.delta.constraints.Constraints
import org.apache.spark.sql.delta.managedcommit.ManagedCommitUtils
import org.apache.spark.sql.delta.metering.DeltaLogging
import org.apache.spark.sql.delta.util.{Utils => DeltaUtils}
Expand Down Expand Up @@ -378,3 +379,12 @@ case class ColumnMappingPreDowngradeCommand(table: DeltaTableV2)
true
}
}

case class CheckConstraintsPreDowngradeTableFeatureCommand(table: DeltaTableV2)
extends PreDowngradeTableFeatureCommand {
override def removeFeatureTracesIfNeeded(): Boolean = {
tomvanbussel marked this conversation as resolved.
Show resolved Hide resolved
val checkConstraintNames = Constraints.getCheckConstraintNames(table.initialSnapshot.metadata)
if (checkConstraintNames.isEmpty) return false
throw DeltaErrors.cannotDropCheckConstraintFeature(checkConstraintNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope in the future we have a nice way of showing a warning message before the execution of commands like this. So the user can understand about the consequences of the command before proceeding. Now we are forced to make him drop the constraints one by one. Not the best CUJ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that dropping the constraints one-by-one is not great. On the other hand, I do like asking the user to explicitly drop these constraints themselves. A warning can be easily missed/skipped. Perhaps we can add a ALTER TABLE ... DROP CONSTRAINTS syntax in the future that allows dropping multiple constraints at once?

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -457,12 +457,24 @@ object InvariantsTableFeature

object CheckConstraintsTableFeature
extends LegacyWriterFeature(name = "checkConstraints", minWriterVersion = 3)
with FeatureAutomaticallyEnabledByMetadata {
with FeatureAutomaticallyEnabledByMetadata
with RemovableFeature {
override def metadataRequiresFeatureToBeEnabled(
metadata: Metadata,
spark: SparkSession): Boolean = {
Constraints.getCheckConstraints(metadata, spark).nonEmpty
}

override def preDowngradeCommand(table: DeltaTableV2): PreDowngradeTableFeatureCommand =
CheckConstraintsPreDowngradeTableFeatureCommand(table)

override def validateRemoval(snapshot: Snapshot): Boolean =
Constraints.getCheckConstraintNames(snapshot.metadata).isEmpty

override def actionUsesFeature(action: Action): Boolean = action match {
tomvanbussel marked this conversation as resolved.
Show resolved Hide resolved
case m: Metadata => Constraints.getCheckConstraintNames(m).nonEmpty
case _ => false
}
}

object ChangeDataFeedTableFeature
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ object Constraints {
/** A SQL expression to check for when writing out data. */
case class Check(name: String, expression: Expression) extends Constraint

def getCheckConstraintNames(metadata: Metadata): Seq[String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Brackets can be omitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep the brackets here, as this is a multi-line method. This helps avoid unhappy surprises in the future when modifying the code. The rest of this class also use brackets for multi-line methods (even when not needed).

metadata.configuration.keys.collect {
case key if key.toLowerCase(Locale.ROOT).startsWith("delta.constraints.") =>
key.stripPrefix("delta.constraints.")
}.toSeq
}

/**
* Extract CHECK constraints from the table properties. Note that some CHECK constraints may also
* come from schema metadata; these constraints were never released in a public API but are
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -459,4 +459,35 @@ class CheckConstraintsSuite extends QueryTest
)
}
}

test("drop table feature") {
withTable("table") {
sql("CREATE TABLE table (a INT, b INT) USING DELTA " +
"TBLPROPERTIES ('delta.feature.checkConstraints' = 'supported')")
sql("ALTER TABLE table ADD CONSTRAINT c1 CHECK (a > 0)")
sql("ALTER TABLE table ADD CONSTRAINT c2 CHECK (b > 0)")

val error1 = intercept[AnalysisException] {
sql("ALTER TABLE table DROP FEATURE checkConstraints")
}
checkError(
error1,
errorClass = "DELTA_CANNOT_DROP_CHECK_CONSTRAINT_FEATURE",
parameters = Map("constraints" -> "`c1`, `c2`")
)

sql("ALTER TABLE table DROP CONSTRAINT c1")
val error2 = intercept[AnalysisException] {
sql("ALTER TABLE table DROP FEATURE checkConstraints")
}
checkError(
error2,
errorClass = "DELTA_CANNOT_DROP_CHECK_CONSTRAINT_FEATURE",
parameters = Map("constraints" -> "`c2`")
)

sql("ALTER TABLE table DROP CONSTRAINT c2")
sql("ALTER TABLE table DROP FEATURE checkConstraints")
tomvanbussel marked this conversation as resolved.
Show resolved Hide resolved
}
}
}