Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46036][SQL] Removing error-class from raise_error function #44004

Closed
wants to merge 7 commits into from

Conversation

dbatomic
Copy link
Contributor

What changes were proposed in this pull request?

  • Reverting raise_error to single param variant that is enforced through ExpressionBuilder interface (vs. reflection based lookup).
  • Keeping internal RaiseError class untouched with ability to throw custom exception class. The only way to call it as of this change is from internal code. Open question is whether we want to clean up this code as well, given that currently no one is using it and this code seems too complex if we don't need to expose RaiseError in surface layer. Looking for feedback on this one.

Why are the changes needed?

This PR is a follow up of raise_error improvement PR. As described in jira ticket, raise_error with custom error-class parameter was deemed to be too powerful with concern that user might throw internal system errors which shouldn't be exposed to external layers.

With this change, from external perspective, we revert to the old behaviour, where raise_error only accepts single string argument which will represent message of USER_RAISED_EXCEPTION exception object.

Does this PR introduce any user-facing change?

Yes, we introduce change in raise_error signature. With this change it is no longer allowed to call:

SELECT raise_error('VIEW_NOT_FOUND', Map('relationName', '`v`'))

Instead, only allowed version will be:

SELECT raise_error('custom error message'))

Which will result in USER_RAISED_EXCEPTION being thrown with provided message.

How was this patch tested?

The change updates existing sql tests under misc-functions.sql and adds some new checks.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot removed the DOCS label Nov 24, 2023

SELECT raise_error(NULL, NULL);
-- TODO: Need feedback on proper behaviour here:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for feedback here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit cast does not support converting complex type to string type. Actually we can skip these 3 tests as implicit cast is orthogonal to the raise_error function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @group misc_funcs
* @since 4.0.0
*/
def raise_error(c: Column, e: Column): Column = Column.fn("raise_error", c, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this API added no long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I understand the comment. Can you please elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since = "3.1.0",
group = "misc_funcs")
// scalastyle:on line.size.limit
object RaiseErrorExpressionBuilder extends ExpressionBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RaiseError works good, why you create the builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my offline discussion with @cloud-fan we tried to come with a way to achieve following:

  1. Leave RaiseError class intact so internally we can throw errors of custom class.
  2. Externally allow only raise_error('message') API.

My understanding is that expression[RaiseError]("raise_error") ends up doing reflection that would expose a function for every constructor signature in RaiseError, meaning that we will expose both def this(str: Expression) and def this(errorClass: Expression, errorParms: Expression) from RaiseError.

Instead of doing this, we take ExpressionBuilder path that will expose only single argument version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It hears good to me. We can use this builder to wrap the visibility.

// scalastyle:on line.size.limit
object RaiseErrorExpressionBuilder extends ExpressionBuilder {
override def build(funcName: String, expressions: Seq[Expression]): Expression = {
// for some reason pattern matching doesn't work here...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explicit length check is better than pattern match here. We don't need this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

@cloud-fan
Copy link
Contributor

cc @srielau

* @group misc_funcs
* @since 4.0.0
*/
def raise_error(c: Column, e: Column): Column = Column.fn("raise_error", c, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this API from connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan This PR will forbid to expose the errorParms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@beliefer Yes, idea is to forbid errorParams exposure, given that we will always throw USER_RAISED_EXCEPTION with single arg (string representing the message). @cloud-fan, please confirm if I got things right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbatomic Thank you for the explanation. Please wait the confirm.

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some comments.

since = "3.1.0",
group = "misc_funcs")
// scalastyle:on line.size.limit
object RaiseErrorExpressionBuilder extends ExpressionBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It hears good to me. We can use this builder to wrap the visibility.

* @group misc_funcs
* @since 4.0.0
*/
def raise_error(c: Column, e: Column): Column = Column.fn("raise_error", c, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan This PR will forbid to expose the errorParms?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in fa4096e Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants