-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32793][SQL] Add raise_error function, adds error message parameter to assert_true #29947
Conversation
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
…2793 Signed-off-by: Karen Feng <karen.feng@databricks.com>
Test build #129422 has finished for PR 29947 at commit
|
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Test build #129423 has finished for PR 29947 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status success |
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Kubernetes integration test starting |
Kubernetes integration test status failure |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
Outdated
Show resolved
Hide resolved
Test build #129426 has finished for PR 29947 at commit
|
Test build #129427 has finished for PR 29947 at commit
|
sql/core/src/test/resources/sql-tests/inputs/sql-compatibility-functions.sql
Outdated
Show resolved
Hide resolved
Test build #129465 has finished for PR 29947 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Kubernetes integration test starting |
Kubernetes integration test status success |
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129526 has finished for PR 29947 at commit
|
Test build #129530 has finished for PR 29947 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
Outdated
Show resolved
Hide resolved
LGTM, too. |
Merged to master. |
Kubernetes integration test starting |
Kubernetes integration test status success |
@@ -137,6 +137,8 @@ def sha1(col: ColumnOrName) -> Column: ... | |||
def sha2(col: ColumnOrName, numBits: int) -> Column: ... | |||
def hash(*cols: ColumnOrName) -> Column: ... | |||
def xxhash64(*cols: ColumnOrName) -> Column: ... | |||
def assert_true(col: ColumnOrName, errMsg: Union[Column, str] = ...): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small notes (sorry for being late):
-
I think we should annotate return type for
assert_true
- it will type check because of implicitAny
, but I think it is better to avoid such casesdef assert_true(col: ColumnOrName, errMsg: Union[Column, str] = ...) -> Column: ...
-
For
def raise_error
I'd useNoReturn
:from typing import NoReturn def raise_error(errMsg: Union[Column, str]) -> NoReturn: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karenfeng Could you fix them above in followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For
def raise_error
I'd useNoReturn
:
This might indicate intention here, though technically speaking it's still a Column
, so
def raise_error(errMsg: Union[Column, str]) -> Column: ...
is still correct (and literal one). Do you have any thoughts about it @HyukjinKwon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing Column
is fine.
jc <- if (is.null(errMsg)) { | ||
callJStatic("org.apache.spark.sql.functions", "assert_true", x@jc) | ||
} else { | ||
if (is.character(errMsg) && length(errMsg) == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we throw an exception if length(errMsg) != 1
? Just in case user does something like this?
> assert_true(column("foo"), c("foo", "bar"))
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
trying to get slot "jc" from an object of a basic class ("character") with no slots
i.e.
...
} else {
if (is.character(errMsg) {
stopifnot(length(errMsg) == 1)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, more checks should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice we make this check anyway, so it is only a question if we do something about it.
@zero323 it's my bad that I rushed. Can you make a quick followup if you're available? I think @karenfeng lives in US timezone and probably is sleeping :-) |
@HyukjinKwon I am at work right now, with only my phone, but I'll open a PR once I am back home, unless it is resolved by then. |
Sure, thanks! |
…d SparkR ### What changes were proposed in this pull request? - Annotated return types of `assert_true` and `raise_error` as discussed [here](#29947 (review)). - Add `assert_true` and `raise_error` to SparkR NAMESPACE. - Validating message vector size in SparkR as discussed [here](#29947 (review)). ### Why are the changes needed? As discussed in review for #29947. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Existing tests. - Validation of annotations using MyPy Closes #29978 from zero323/SPARK-32793-FOLLOW-UP. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Adds a SQL function
raise_error
which underlies the refactoredassert_true
function.assert_true
now also (optionally) accepts a custom error message field.raise_error
is exposed in SQL, Python, Scala, and R.assert_true
was previously only exposed in SQL; it is now also exposed in Python, Scala, and R.Why are the changes needed?
Improves usability of
assert_true
by clarifying error messaging, and adds the useful helper functionraise_error
.Does this PR introduce any user-facing change?
Yes:
raise_error
function to the SQL, Python, Scala, and R APIs.assert_true
function to the SQL, Python and R APIs.How was this patch tested?
Adds unit tests in SQL, Python, Scala, and R for
assert_true
andraise_error
.