Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32793][SQL] Add raise_error function, adds error message parameter to assert_true #29947

Closed
wants to merge 25 commits into from

Conversation

karenfeng
Copy link
Contributor

What changes were proposed in this pull request?

Adds a SQL function raise_error which underlies the refactored assert_true function. assert_true now also (optionally) accepts a custom error message field.
raise_error is exposed in SQL, Python, Scala, and R.
assert_true was previously only exposed in SQL; it is now also exposed in Python, Scala, and R.

Why are the changes needed?

Improves usability of assert_true by clarifying error messaging, and adds the useful helper function raise_error.

Does this PR introduce any user-facing change?

Yes:

  • Adds raise_error function to the SQL, Python, Scala, and R APIs.
  • Adds assert_true function to the SQL, Python and R APIs.

How was this patch tested?

Adds unit tests in SQL, Python, Scala, and R for assert_true and raise_error.

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
…2793

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@karenfeng karenfeng changed the title [WIP][SPARK-32793][SQL] Add raise_error function, adds error message parameter to assert_true [SPARK-32793][SQL] Add raise_error function, adds error message parameter to assert_true Oct 5, 2020
@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Test build #129422 has finished for PR 29947 at commit df1fc36.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class DisableUnnecessaryBucketedScan(conf: SQLConf) extends Rule[SparkPlan]
  • abstract class JdbcConnectionProvider

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Test build #129423 has finished for PR 29947 at commit 0ff60f4.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34029/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34030/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34029/

@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34030/

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
@SparkQA
Copy link

SparkQA commented Oct 5, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34033/

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34033/

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Test build #129426 has finished for PR 29947 at commit 109af99.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Test build #129427 has finished for PR 29947 at commit c683d78.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Test build #129465 has finished for PR 29947 at commit 52e16ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34131/

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34131/

Signed-off-by: Karen Feng <karen.feng@databricks.com>
@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34135/

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34135/

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Test build #129526 has finished for PR 29947 at commit e5ad9e0.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 8, 2020

Test build #129530 has finished for PR 29947 at commit e921f66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@maropu
Copy link
Member

maropu commented Oct 8, 2020

LGTM, too.

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented Oct 8, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34142/

@SparkQA
Copy link

SparkQA commented Oct 8, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34142/

@@ -137,6 +137,8 @@ def sha1(col: ColumnOrName) -> Column: ...
def sha2(col: ColumnOrName, numBits: int) -> Column: ...
def hash(*cols: ColumnOrName) -> Column: ...
def xxhash64(*cols: ColumnOrName) -> Column: ...
def assert_true(col: ColumnOrName, errMsg: Union[Column, str] = ...): ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small notes (sorry for being late):

  • I think we should annotate return type for assert_true - it will type check because of implicit Any, but I think it is better to avoid such cases

    def assert_true(col: ColumnOrName, errMsg: Union[Column, str] = ...) -> Column: ...
  • For def raise_error I'd use NoReturn:

    from typing import NoReturn
    
    def raise_error(errMsg: Union[Column, str]) -> NoReturn: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karenfeng Could you fix them above in followup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might indicate intention here, though technically speaking it's still a Column, so

def raise_error(errMsg: Union[Column, str]) -> Column: ...

is still correct (and literal one). Do you have any thoughts about it @HyukjinKwon?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing Column is fine.

@zero323
Copy link
Member

zero323 commented Oct 8, 2020

It seem like R functions haven't been included in NAMESPACE. Unless it is intentional they should be added around here

"ascii",

and here

"radians",

for assert_true and raise_error respectively.

jc <- if (is.null(errMsg)) {
callJStatic("org.apache.spark.sql.functions", "assert_true", x@jc)
} else {
if (is.character(errMsg) && length(errMsg) == 1) {
Copy link
Member

@zero323 zero323 Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we throw an exception if length(errMsg) != 1? Just in case user does something like this?

> assert_true(column("foo"), c("foo", "bar"))
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  trying to get slot "jc" from an object of a basic class ("character") with no slots

i.e.

           ...
            } else {
              if (is.character(errMsg) {
                stopifnot(length(errMsg) == 1)
           ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, more checks should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice we make this check anyway, so it is only a question if we do something about it.

@HyukjinKwon
Copy link
Member

@zero323 it's my bad that I rushed. Can you make a quick followup if you're available? I think @karenfeng lives in US timezone and probably is sleeping :-)

@zero323
Copy link
Member

zero323 commented Oct 8, 2020

@HyukjinKwon I am at work right now, with only my phone, but I'll open a PR once I am back home, unless it is resolved by then.

@HyukjinKwon
Copy link
Member

Sure, thanks!

HyukjinKwon pushed a commit that referenced this pull request Oct 9, 2020
…d SparkR

### What changes were proposed in this pull request?

- Annotated return types of `assert_true` and `raise_error` as discussed [here](#29947 (review)).
- Add `assert_true` and `raise_error`  to SparkR NAMESPACE.
- Validating message vector size in SparkR as discussed [here](#29947 (review)).

### Why are the changes needed?

As discussed in review for #29947.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Existing tests.
- Validation of annotations using MyPy

Closes #29978 from zero323/SPARK-32793-FOLLOW-UP.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants