-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51441][SQL] Add DSv2 APIs for constraints #50253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...atalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java
Show resolved
Hide resolved
* @return a CHECK constraint with the provided configuration | ||
*/ | ||
static Check check(String name, String sql, ConstraintState state) { | ||
return new Check(name, sql, null /* no predicate */, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use Java Optional.empty()
instead of null
? IIRC, this null
was not a part of SPIP, wan't it? Please correct me if there is a reason to have null
. I might miss the detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either SQL or predicate must be present, so I don't think Optional
is appropriate in this case. We have mixed use in the connector API. Column
and ProcedureParameter
use nulls while some other classes use Optional
.
* @return a CHECK constraint with the provided configuration | ||
*/ | ||
static Check check(String name, Predicate predicate, ConstraintState state) { | ||
return new Check(name, null /* no SQL */, predicate, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Optional
?
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/ForeignKey.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @aokolnychyi .
* (not {@code NULL}). The search condition must be deterministic and cannot contain subqueries and | ||
* certain functions like aggregates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to document what certain
would imply here, an exhaustive list of conditional expressions we want to support would be benefitial
for ex:
can we have an UDF my_udf
in a CHECK condition, which essentially is a subquery ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is up to us in the implementation to decide what to support. I feel this topic should be discussed a bit more. For now, I added UDFs to the list of unsupported examples. Let's refine as we go.
* The SQL string and predicate must be equivalent. | ||
* | ||
* @param name the constraint name | ||
* @param sql the SQL representation of the search condition (Spark SQL dialect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql the SQL representation of the search condition (Spark SQL dialect)
[doubt] How are we planning to handle cases in lets say an inbuilt function is introduced at certain spark version and is referenced in check but the enforcer is at a lower version of spark sql is capturing all the SQL under Spark SQL dialect sufficient ?
For ex:
ALTER TABLE tbl ADD CONSTRAINT my_condition CHECK (func_spark_4(x) < 20);
Now the writer / enforcer is in spark_3 and doesn't know func_spark_4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean the older version of Spark would not be able to parse/bind the expression and the write will fail. I think that's an acceptable behavior and is better than causing silent constraint violation.
a4f06bf
to
11cdf15
Compare
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/ConstraintSuite.scala
Outdated
Show resolved
Hide resolved
* | ||
* @since 4.1.0 | ||
*/ | ||
public class Check extends BaseConstraint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we override the enforced()
/rely()
/validationStatus()
method or variable? It will help developers to understand them.
Otherwise, the default values will only exist in docs and spark internal code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it is a good idea. Those methods existing in the parent interface. The method will show up in Javadoc and IDE will suggest them as well. I am worried about all code duplication because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a compromise, we can add more Javadoc at the top of this class. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more java doc seems ok.
If we change the default value in these classes, it can make the internal implementation simpler too. For example, we don't need to have the default ConstraintCharacteristic
for each internal parsed constraints https://github.com/gengliangwang/spark/pull/13/files#diff-03507453aabc732a7e3efadc81cd840e436a392b1c62d5aa50647266bc3a9199R38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more Javadoc.
If we change the default value in these classes, it can make the internal implementation simpler too. For example, we don't need to have the default ConstraintCharacteristic for each internal parsed constraints
Let's re-evaluate this in the implementation PR. I think we can either remove the common builder for constraints or handle this in the parser.
...atalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/BaseConstraint.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java
Outdated
Show resolved
Hide resolved
939933c
to
9fec4e2
Compare
I will merge this one next Monday if there are no more comments |
* | ||
* @since 4.1.0 | ||
*/ | ||
@Evolving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq, why is this evolving and not others? Not sure the convention in Spark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about annotating classes that implement this interface.
I added to be safe, however. Should be everywhere now.
* @param name the constraint name | ||
* @return a CHECK constraint builder | ||
*/ | ||
static Check.Builder check(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about createCheckBuilder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer shorter names whenever the usage/context is obvious enough.
Constraint.check("con1").predicateSql("id > 0").enforced(true).build();
This reads well to me and matches what we did for ProcedureParameter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 with @aokolnychyi , check
is neat
* @param columns columns that comprise the unique key | ||
* @return a UNIQUE constraint builder | ||
*/ | ||
static Unique.Builder unique(String name, NamedReference[] columns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createUniqueBuilder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in CHECK.
* @param columns columns that comprise the primary key | ||
* @return a PRIMARY KEY constraint builder | ||
*/ | ||
static PrimaryKey.Builder primaryKey(String name, NamedReference[] columns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createPrimaryKeyBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in CHECK.
* @param refColumns the referenced columns in the referenced table | ||
* @return a FOREIGN KEY constraint builder | ||
*/ | ||
static ForeignKey.Builder foreignKey( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createForeignKeyBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in CHECK.
return toDDL(); | ||
} | ||
|
||
protected String toDDL(NamedReference[] columns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about joinColumns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure it would be more descriptive. I feel toDDL
makes sense as it formats columns to be used in DDL.
Predicate predicate, | ||
boolean enforced, | ||
ValidationStatus validationStatus, | ||
boolean rely) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put the parameters of base class first? And then the subclass's parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels more natural to me to follow the order of importance/definition in SQL.
CONSTRAINT name CHECK (predicate) [NOT] ENFORCED [NO]RELY
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/constraints/Check.java
Outdated
Show resolved
Hide resolved
boolean enforced, | ||
ValidationStatus validationStatus, | ||
boolean rely) { | ||
super(name, enforced, validationStatus, rely); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameter order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same opinion as in CHECK.
boolean enforced, | ||
ValidationStatus validationStatus, | ||
boolean rely) { | ||
super(name, enforced, validationStatus, rely); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same opinion as in CHECK.
boolean enforced, | ||
ValidationStatus validationStatus, | ||
boolean rely) { | ||
super(name, enforced, validationStatus, rely); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same opinion as in CHECK.
Hi @beliefer, @dongjoon-hyun, @singhpk234 — just checking if you have any further comments on this PR. Thanks! |
Thanks, merging to master |
Thank you, @gengliangwang @beliefer @dongjoon-hyun @singhpk234 @szehon-ho! |
### What changes were proposed in this pull request? This PR adds DSv2 APIs for constraints as per SPIP [doc](https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo/). ### Why are the changes needed? These changes are the first step for constraints support in Spark. ### Does this PR introduce _any_ user-facing change? This PR adds new public interfaces that will be supported in the future. ### How was this patch tested? This PR comes with tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50253 from aokolnychyi/spark-51441. Authored-by: Anton Okolnychyi <aokolnychyi@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
This PR adds DSv2 APIs for constraints as per SPIP doc.
Why are the changes needed?
These changes are the first step for constraints support in Spark.
Does this PR introduce any user-facing change?
This PR adds new public interfaces that will be supported in the future.
How was this patch tested?
This PR comes with tests.
Was this patch authored or co-authored using generative AI tooling?
No.