Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40967][SQL] Migrate failAnalysis() onto error classes #38436

Closed

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 29, 2022

What changes were proposed in this pull request?

In the PR, I propose to migrate failAnalysis() errors onto temporary error classes with the prefix _LEGACY_ERROR_TEMP_23xx. The error message will not include the error classes, so, in this way we will preserve the existing behaviour.

Why are the changes needed?

The migration on temporary error classes allows to gather statistics about errors and detect most popular error classes. After that we could prioritise the work on migration.

The new error class name prefix _LEGACY_ERROR_TEMP_ proposed here kind of marks the error as developer-facing, not user-facing. Developers can still get the error class programmatically via the SparkThrowable interface, so that they can build error infra with it. End users won't see the error class in the message. This allows us to do the error migration very quickly, and we can refine the error classes and mark them as user-facing later (naming them properly, adding tests, etc.).

Does this PR introduce any user-facing change?

No. The error messages should be almost the same by default.

How was this patch tested?

By running the affected test suites:

$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"

@MaxGekk MaxGekk changed the title [WIP][SQL] Migrate failAnalysis() onto error classes [SPARK-40967][SQL] Migrate failAnalysis() onto error classes Oct 30, 2022
@MaxGekk MaxGekk marked this pull request as ready for review October 30, 2022 09:35
@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 30, 2022

@srielau @cloud-fan @itholic @panbingkun @LuciferYang Please, review this PR.

@@ -196,7 +198,8 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog {
hof.dataTypeMismatch(hof, checkRes)
case TypeCheckResult.TypeCheckFailure(message) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it still match TypeCheckResult.TypeCheckFailure(message)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we finish all tasks of converting the type check failures to error classes, we will remove TypeCheckResult.TypeCheckFailure(), and as a consequence of that, we can remove all match cases, and unneeded error classes like this.

So far, there are open tasks, I wouldn't do that:

since there is a risk to miss something.

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 31, 2022

Merging to master. Thank you, @LuciferYang @panbingkun @itholic @cloud-fan for review.

@MaxGekk MaxGekk closed this in 3c967f0 Oct 31, 2022
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?
In the PR, I propose to migrate `failAnalysis()` errors onto temporary error classes with the prefix `_LEGACY_ERROR_TEMP_23xx`. The error message will not include the error classes, so, in this way we will preserve the existing behaviour.

### Why are the changes needed?
The migration on temporary error classes allows to gather statistics about errors and detect most popular error classes. After that we could prioritise the work on migration.

The new error class name prefix `_LEGACY_ERROR_TEMP_` proposed here kind of marks the error as developer-facing, not user-facing. Developers can still get the error class programmatically via the `SparkThrowable` interface, so that they can build error infra with it. End users won't see the error class in the message. This allows us to do the error migration very quickly, and we can refine the error classes and mark them as user-facing later (naming them properly, adding tests, etc.).

### Does this PR introduce _any_ user-facing change?
No. The error messages should be almost the same by default.

### How was this patch tested?
By running the affected test suites:
```
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
```

Closes apache#38436 from MaxGekk/legacy-error-class-failAnalysis.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants