[SPARK-51441][SQL] Add DSv2 APIs for constraints #50253

aokolnychyi · 2025-03-12T15:37:29Z

What changes were proposed in this pull request?

This PR adds DSv2 APIs for constraints as per SPIP doc.

Why are the changes needed?

These changes are the first step for constraints support in Spark.

Does this PR introduce any user-facing change?

This PR adds new public interfaces that will be supported in the future.

How was this patch tested?

This PR comes with tests.

Was this patch authored or co-authored using generative AI tooling?

No.

...atalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java

dongjoon-hyun · 2025-03-13T15:03:37Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+   * @return a CHECK constraint with the provided configuration
+   */
+  static Check check(String name, String sql, ConstraintState state) {
+    return new Check(name, sql, null /* no predicate */, state);


Shall we use Java Optional.empty() instead of null? IIRC, this null was not a part of SPIP, wan't it? Please correct me if there is a reason to have null. I might miss the detail.

Either SQL or predicate must be present, so I don't think Optional is appropriate in this case. We have mixed use in the connector API. Column and ProcedureParameter use nulls while some other classes use Optional.

dongjoon-hyun · 2025-03-13T15:04:26Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+   * @return a CHECK constraint with the provided configuration
+   */
+  static Check check(String name, Predicate predicate, ConstraintState state) {
+    return new Check(name, null /* no SQL */, predicate, state);


ditto. Optional?

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/ForeignKey.java

dongjoon-hyun

Thank you, @aokolnychyi .

singhpk234 · 2025-03-15T23:13:04Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java

+ * (not {@code NULL}). The search condition must be deterministic and cannot contain subqueries and
+ * certain functions like aggregates.


is it possible to document what certain would imply here, an exhaustive list of conditional expressions we want to support would be benefitial
for ex:
can we have an UDF my_udf in a CHECK condition, which essentially is a subquery ?

It is up to us in the implementation to decide what to support. I feel this topic should be discussed a bit more. For now, I added UDFs to the list of unsupported examples. Let's refine as we go.

singhpk234 · 2025-03-15T23:17:27Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+   * The SQL string and predicate must be equivalent.
+   *
+   * @param name the constraint name
+   * @param sql the SQL representation of the search condition (Spark SQL dialect)


sql the SQL representation of the search condition (Spark SQL dialect)

[doubt] How are we planning to handle cases in lets say an inbuilt function is introduced at certain spark version and is referenced in check but the enforcer is at a lower version of spark sql is capturing all the SQL under Spark SQL dialect sufficient ?

For ex:

ALTER TABLE tbl ADD CONSTRAINT my_condition CHECK (func_spark_4(x) < 20);

Now the writer / enforcer is in spark_3 and doesn't know func_spark_4

That would mean the older version of Spark would not be able to parse/bind the expression and the write will fail. I think that's an acceptable behavior and is better than causing silent constraint violation.

sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/ConstraintSuite.scala

gengliangwang · 2025-03-27T20:17:45Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java

+ *
+ * @since 4.1.0
+ */
+public class Check extends BaseConstraint {


Shall we override the enforced()/rely()/validationStatus() method or variable? It will help developers to understand them.
Otherwise, the default values will only exist in docs and spark internal code.

I am not sure it is a good idea. Those methods existing in the parent interface. The method will show up in Javadoc and IDE will suggest them as well. I am worried about all code duplication because of this.

As a compromise, we can add more Javadoc at the top of this class. What do you think?

Adding more java doc seems ok.
If we change the default value in these classes, it can make the internal implementation simpler too. For example, we don't need to have the default ConstraintCharacteristic for each internal parsed constraints https://github.com/gengliangwang/spark/pull/13/files#diff-03507453aabc732a7e3efadc81cd840e436a392b1c62d5aa50647266bc3a9199R38

Added more Javadoc.

If we change the default value in these classes, it can make the internal implementation simpler too. For example, we don't need to have the default ConstraintCharacteristic for each internal parsed constraints

Let's re-evaluate this in the implementation PR. I think we can either remove the common builder for constraints or handle this in the parser.

...atalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java

gengliangwang · 2025-03-28T22:34:31Z

I will merge this one next Monday if there are no more comments

szehon-ho · 2025-03-29T08:02:12Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+ *
+ * @since 4.1.0
+ */
+@Evolving


qq, why is this evolving and not others? Not sure the convention in Spark

I wasn't sure about annotating classes that implement this interface.
I added to be safe, however. Should be everywhere now.

beliefer · 2025-03-29T10:02:04Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+   * @param name the constraint name
+   * @return a CHECK constraint builder
+   */
+  static Check.Builder check(String name) {


How about createCheckBuilder ?

I personally prefer shorter names whenever the usage/context is obvious enough.

Constraint.check("con1").predicateSql("id > 0").enforced(true).build();

This reads well to me and matches what we did for ProcedureParameter.

+1 with @aokolnychyi , check is neat

beliefer · 2025-03-29T10:03:12Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+   * @param columns columns that comprise the unique key
+   * @return a UNIQUE constraint builder
+   */
+  static Unique.Builder unique(String name, NamedReference[] columns) {


createUniqueBuilder ?

Same as in CHECK.

beliefer · 2025-03-29T10:03:50Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+   * @param columns columns that comprise the primary key
+   * @return a PRIMARY KEY constraint builder
+   */
+  static PrimaryKey.Builder primaryKey(String name, NamedReference[] columns) {


createPrimaryKeyBuilder

Same as in CHECK.

beliefer · 2025-03-29T10:04:16Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Constraint.java

+   * @param refColumns the referenced columns in the referenced table
+   * @return a FOREIGN KEY constraint builder
+   */
+  static ForeignKey.Builder foreignKey(


createForeignKeyBuilder

Same as in CHECK.

beliefer · 2025-03-29T10:07:52Z

...atalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java

+    return toDDL();
+  }
+
+  protected String toDDL(NamedReference[] columns) {


How about joinColumns ?

I am sure it would be more descriptive. I feel toDDL makes sense as it formats columns to be used in DDL.

beliefer · 2025-03-29T10:09:17Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java

+      Predicate predicate,
+      boolean enforced,
+      ValidationStatus validationStatus,
+      boolean rely) {


Shall we put the parameters of base class first? And then the subclass's parameters.

It feels more natural to me to follow the order of importance/definition in SQL.

CONSTRAINT name CHECK (predicate) [NOT] ENFORCED [NO]RELY

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java

beliefer · 2025-03-29T10:48:02Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/ForeignKey.java

+      boolean enforced,
+      ValidationStatus validationStatus,
+      boolean rely) {
+    super(name, enforced, validationStatus, rely);


parameter order.

Same opinion as in CHECK.

beliefer · 2025-03-29T10:49:26Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/PrimaryKey.java

+      boolean enforced,
+      ValidationStatus validationStatus,
+      boolean rely) {
+    super(name, enforced, validationStatus, rely);


Same opinion as in CHECK.

beliefer · 2025-03-29T10:49:59Z

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Unique.java

+      boolean enforced,
+      ValidationStatus validationStatus,
+      boolean rely) {
+    super(name, enforced, validationStatus, rely);


Same opinion as in CHECK.

gengliangwang · 2025-04-01T19:41:53Z

Hi @beliefer, @dongjoon-hyun, @singhpk234 — just checking if you have any further comments on this PR. Thanks!

gengliangwang · 2025-04-02T00:09:07Z

Thanks, merging to master

aokolnychyi · 2025-04-02T00:12:12Z

Thank you, @gengliangwang @beliefer @dongjoon-hyun @singhpk234 @szehon-ho!

### What changes were proposed in this pull request? This PR adds DSv2 APIs for constraints as per SPIP [doc](https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo/). ### Why are the changes needed? These changes are the first step for constraints support in Spark. ### Does this PR introduce _any_ user-facing change? This PR adds new public interfaces that will be supported in the future. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50253 from aokolnychyi/spark-51441. Authored-by: Anton Okolnychyi <aokolnychyi@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>

github-actions bot added the SQL label Mar 12, 2025

dongjoon-hyun reviewed Mar 13, 2025

View reviewed changes

...atalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java Show resolved Hide resolved

dongjoon-hyun reviewed Mar 13, 2025

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/ForeignKey.java Show resolved Hide resolved

dongjoon-hyun reviewed Mar 13, 2025

View reviewed changes

singhpk234 reviewed Mar 15, 2025

View reviewed changes

aokolnychyi force-pushed the spark-51441 branch from a4f06bf to 11cdf15 Compare March 20, 2025 20:49

aokolnychyi closed this Mar 21, 2025

aokolnychyi reopened this Mar 21, 2025

gengliangwang reviewed Mar 27, 2025

View reviewed changes

sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/ConstraintSuite.scala Outdated Show resolved Hide resolved

gengliangwang reviewed Mar 27, 2025

View reviewed changes

beliefer reviewed Mar 28, 2025

View reviewed changes

...atalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java Outdated Show resolved Hide resolved

beliefer reviewed Mar 28, 2025

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java Outdated Show resolved Hide resolved

beliefer reviewed Mar 28, 2025

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java Outdated Show resolved Hide resolved

gengliangwang reviewed Mar 28, 2025

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java Outdated Show resolved Hide resolved

aokolnychyi added 5 commits March 28, 2025 14:10

[SPARK-51441][SQL] Add DSv2 APIs for constraints

dc930c5

Flatten the structure, add builders

3711db7

Fix feedback

3677054

Refine Javadoc, improvements

2e61e44

Make naming consistent

9fec4e2

aokolnychyi force-pushed the spark-51441 branch from 939933c to 9fec4e2 Compare March 28, 2025 21:10

gengliangwang approved these changes Mar 28, 2025

View reviewed changes

szehon-ho approved these changes Mar 29, 2025

View reviewed changes

beliefer reviewed Mar 29, 2025

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java Outdated Show resolved Hide resolved

beliefer reviewed Mar 29, 2025

View reviewed changes

Feedback

2cfa501

gengliangwang closed this in 2f1bce2 Apr 2, 2025

		* (not {@code NULL}). The search condition must be deterministic and cannot contain subqueries and
		* certain functions like aggregates.

[SPARK-51441][SQL] Add DSv2 APIs for constraints #50253

[SPARK-51441][SQL] Add DSv2 APIs for constraints #50253

Uh oh!

Conversation

aokolnychyi commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

dongjoon-hyun Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

singhpk234 Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gengliangwang commented Mar 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Mar 12, 2025 •

edited

Loading

dongjoon-hyun Mar 13, 2025 •

edited

Loading

aokolnychyi Mar 20, 2025 •

edited

Loading

singhpk234 Mar 15, 2025 •

edited

Loading

aokolnychyi Mar 20, 2025 •

edited

Loading

aokolnychyi Mar 31, 2025 •

edited

Loading