[SPARK-56169][SQL] Fix ClassCastException in error reporting when GetStructField child type is changed by plan transformation#54970
Closed
ilicmarkodb wants to merge 1 commit intoapache:masterfrom
Conversation
e05f76d to
e1176a1
Compare
…StructField child type is changed by plan transformation ### What changes were proposed in this pull request? SPARK-53470 added `ExpectsInputTypes` to `GetStructField` so that `checkInputDataTypes()` catches the case where a plan transformation changes the child's type from `StructType` to something else. This can happen when an analyzer rule inserts a projection that changes a column's output type after `GetStructField` was already created referencing that column. However, when `CheckAnalysis` detects this mismatch, the error formatting path (`toPrettySQL` -> `usePrettyExpression`) accesses `GetStructField.dataType` which calls `childSchema` -> `child.dataType.asInstanceOf[StructType]`, throwing a raw `ClassCastException` before the proper `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` error can be reported. This PR fixes two things: 1. `usePrettyExpression` checks `child.dataType` before accessing `childSchema`, falling back to a safe representation when the child is not a `StructType` 2. `childSchema` uses pattern matching instead of an unsafe cast, throwing a clear `SparkException.internalError` instead of `ClassCastException` ### Why are the changes needed? Without this fix, users see a raw `ClassCastException: StringType$ cannot be cast to StructType` instead of the proper `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` error that `checkInputDataTypes()` was trying to report. ### Does this PR introduce _any_ user-facing change? Yes - users now get a proper `AnalysisException` with error class `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` instead of a raw `ClassCastException`. ### How was this patch tested? TODO: add unit test ### Was this patch authored or co-authored using generative AI tooling? Yes Co-authored-by: Isaac
e1176a1 to
787f217
Compare
Contributor
|
thanks, merging to master! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
SPARK-53470 added
ExpectsInputTypestoGetStructFieldso thatcheckInputDataTypes()catches the case where a plan transformation changes the child's type fromStructTypeto something else. This can happen when an analyzer rule inserts a projection that changes a column's output type afterGetStructFieldwas already created referencing that column.However, when
CheckAnalysisdetects this mismatch, the error formatting path (toPrettySQL->usePrettyExpression) accessesGetStructField.dataTypewhich callschildSchema->child.dataType.asInstanceOf[StructType], throwing a rawClassCastExceptionbefore the properDATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPEerror can be reported.This PR fixes two things:
usePrettyExpressioncheckschild.dataTypebefore accessingchildSchema, falling back to a safe representation when the child is not aStructTypechildSchemauses pattern matching instead of an unsafe cast, throwing a clearSparkException.internalErrorinstead ofClassCastExceptionWhy are the changes needed?
Without this fix, users see a raw
ClassCastException: StringType$ cannot be cast to StructTypeinstead of the properDATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPEerror thatcheckInputDataTypes()was trying to report.Does this PR introduce any user-facing change?
Yes - users now get a proper
AnalysisExceptionwith error classDATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPEinstead of a rawClassCastException.How was this patch tested?
New tests.
Was this patch authored or co-authored using generative AI tooling?
Yes
Co-authored-by: Claude