-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51441][SQL] Add DSv2 APIs for constraints #50253
base: master
Are you sure you want to change the base?
Conversation
|
||
import org.apache.spark.sql.connector.expressions.NamedReference; | ||
|
||
import java.util.StringJoiner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you place this before org.apache.spark.
import statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @return a CHECK constraint with the provided configuration | ||
*/ | ||
static Check check(String name, String sql, ConstraintState state) { | ||
return new Check(name, sql, null /* no predicate */, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use Java Optional.empty()
instead of null
? IIRC, this null
was not a part of SPIP, wan't it? Please correct me if there is a reason to have null
. I might miss the detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either SQL or predicate must be present, so I don't think Optional
is appropriate in this case. We have mixed use in the connector API. Column
and ProcedureParameter
use nulls while some other classes use Optional
.
* @return a CHECK constraint with the provided configuration | ||
*/ | ||
static Check check(String name, Predicate predicate, ConstraintState state) { | ||
return new Check(name, null /* no SQL */, predicate, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Optional
?
import org.apache.spark.sql.connector.expressions.NamedReference; | ||
|
||
import java.util.Arrays; | ||
import java.util.Objects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. import
ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @aokolnychyi .
* (not {@code NULL}). The search condition must be deterministic and cannot contain subqueries and | ||
* certain functions like aggregates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to document what certain
would imply here, an exhaustive list of conditional expressions we want to support would be benefitial
for ex:
can we have an UDF my_udf
in a CHECK condition, which essentially is a subquery ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is up to us in the implementation to decide what to support. I feel this topic should be discussed a bit more. For now, I added UDFs to the list of unsupported examples. Let's refine as we go.
* The SQL string and predicate must be equivalent. | ||
* | ||
* @param name the constraint name | ||
* @param sql the SQL representation of the search condition (Spark SQL dialect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql the SQL representation of the search condition (Spark SQL dialect)
[doubt] How are we planning to handle cases in lets say an inbuilt function is introduced at certain spark version and is referenced in check but the enforcer is at a lower version of spark sql is capturing all the SQL under Spark SQL dialect sufficient ?
For ex:
ALTER TABLE tbl ADD CONSTRAINT my_condition CHECK (func_spark_4(x) < 20);
Now the writer / enforcer is in spark_3 and doesn't know func_spark_4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean the older version of Spark would not be able to parse/bind the expression and the write will fail. I think that's an acceptable behavior and is better than causing silent constraint violation.
a4f06bf
to
11cdf15
Compare
What changes were proposed in this pull request?
This PR adds DSv2 APIs for constraints as per SPIP doc.
Why are the changes needed?
These changes are the first step for constraints support in Spark.
Does this PR introduce any user-facing change?
This PR adds new public interfaces that will be supported in the future.
How was this patch tested?
This PR comes with tests.
Was this patch authored or co-authored using generative AI tooling?
No.