[SPARK-40473][SQL] Migrate parsing errors onto error classes#37916
[SPARK-40473][SQL] Migrate parsing errors onto error classes#37916MaxGekk wants to merge 13 commits intoapache:masterfrom
Conversation
srielau
left a comment
There was a problem hiding this comment.
This is very cool!
I have some general questions, observations:
- You seem to assume < 1000 of these. But just this one PR consumes close to a hundred slots"
- How do we keep the numbering straight for the next n PRs? I assume you used some tooling to whip this up so fast. So this was 0 => 1. How do we get from n => n + 1?
|
|
||
| /**/ | ||
| { | ||
| "errorClass" : "_LEGACY_ERROR_TEMP_055" |
There was a problem hiding this comment.
Yes. We don't provide context in many places (I mean QueryContext not line, pos). This is one of them.
There was a problem hiding this comment.
Just in case, users will see the same error message by default (not JSON).
Some time ago, I have counted the total number of exceptions to be ported onto error classes around 800. I don't see any problems to start using 4 digits after 999.
We have 3 types of errors, actually: parsing, compilation and execution. We could migrate them 1-by-1 using sequential numbers. Otherwise the PR will conflict a lot.
I think it will be faster to manually port the exceptions without any tooling. |
| } catch { | ||
| case e: NumberFormatException => | ||
| throw new ParseException(e.getMessage, ctx) | ||
| throw new ParseException( |
There was a problem hiding this comment.
We should also mention it in readme.md or somewhere: we don't have to instantiate the exception instance in a central file if it's only used once. This can also simplify the error migration work.
|
Can we update |
| 4. Check if the exception type already extends `SparkThrowable`. | ||
| If true, skip to step 6. | ||
| 5. Mix `SparkThrowable` into the exception. | ||
| 5. Mix `SparkThrowable` into the exception, and place it to Query.*Errors at least when it is invoked more than once. |
There was a problem hiding this comment.
I think we can change item 6 instead
Throw the exception with the error class and message parameters. If the same exception is thrown in several places, create a util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.
|
@dongjoon-hyun @viirya @HyukjinKwon The new error class name prefix |
|
|
||
| Error classes are a succinct, human-readable representation of the error category. | ||
|
|
||
| An uncategorized errors can be assigned to a legacy error class with the prefix _LEGACY_ERROR_TEMP_ and an unused sequential number, for instance _LEGACY_ERROR_TEMP_053. |
There was a problem hiding this comment.
nit: Maybe we should using \ in front of _LEGACY_ERROR_TEMP_ as an escape character ??
Otherwise, maybe it should look something like: LEGACY_ERROR_TEMP in the documents, which is missing leading and following underscore.
There was a problem hiding this comment.
I will mark it as code. Thank you.
|
@gatorsmile @cloud-fan @HyukjinKwon @gengliangwang @dongjoon-hyun @viirya I would like to merge this if there are no objections from your side. |
| throw new ParseException( | ||
| errorClass = "_LEGACY_ERROR_TEMP_0061", | ||
| messageParameters = Map("msg" -> e.getMessage), | ||
| ctx) |
There was a problem hiding this comment.
What should we do for re-throws? @MaxGekk @cloud-fan
There was a problem hiding this comment.
Usually, we check either the exception belongs to an error class or not. If so, just re-throw it as is otherwise assign an error class. But in this PR, I just want to assign error classes and don't want to change existing behaviour.
|
Merging to master. Thank you, @cloud-fan @itholic @entong @srielau for review. |
### What changes were proposed in this pull request? In the PR, I propose to use `checkError()` to check the error class, message parameters and the query context. After the PR #37916, all parsing exceptions should have an error class. ### Why are the changes needed? 1. Checking of the error classes plus the query context improves the test coverage. 2. Eliminating the dependency of error messages. So, tech editors will be able to modify `error-classes.json` w/o fixing the test suite. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test suite: ``` $ build/sbt "test:testOnly *ErrorParserSuite" ``` Closes #38204 from MaxGekk/intercept-parsing-error-class. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
In the PR, I propose to migrate all parsing errors onto temporary error classes with the prefix
_LEGACY_ERROR_TEMP_. The error message will not include the error classes, so, in this way we will preserve the existing behaviour.Why are the changes needed?
The migration on temporary error classes allows to gather statistics about errors and detect most popular error classes.
Does this PR introduce any user-facing change?
No. The error messages should be almost the same by default.
How was this patch tested?
By running the modified test suites: