Skip to content

[SPARK-47262][SQL] Assign names to error conditions for parquet conversions#47802

Closed
vladanvasi-db wants to merge 9 commits intoapache:masterfrom
vladanvasi-db:vladanvasi-db/parquet-conversion-error-classes-fix
Closed

[SPARK-47262][SQL] Assign names to error conditions for parquet conversions#47802
vladanvasi-db wants to merge 9 commits intoapache:masterfrom
vladanvasi-db:vladanvasi-db/parquet-conversion-error-classes-fix

Conversation

@vladanvasi-db
Copy link
Contributor

@vladanvasi-db vladanvasi-db commented Aug 19, 2024

What changes were proposed in this pull request?

In the PR, I propose to rename the legacy error classes _LEGACY_ERROR_TEMP_2238, _LEGACY_ERROR_TEMP_2239 and _LEGACY_ERROR_TEMP_2240 to PARQUET_CONVERSION_FAILURE.WITHOUT_DECIMAL_METADATA, PARQUET_CONVERSION_FAILURE.DECIMAL and PARQUET_CONVERSION_FAILURE.

Why are the changes needed?

Proper name improves user experience w/ Spark SQL.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No testing was needed.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Aug 19, 2024
@vladanvasi-db vladanvasi-db changed the title [SPARK-47262] Renamed legacy classes for Parquet conversion errors [SPARK-47262] Changed Parquet conversion errors to throw SparkException.internal Aug 20, 2024
@vladanvasi-db vladanvasi-db force-pushed the vladanvasi-db/parquet-conversion-error-classes-fix branch from 9b86702 to 267c135 Compare August 20, 2024 13:26
@HyukjinKwon HyukjinKwon changed the title [SPARK-47262] Changed Parquet conversion errors to throw SparkException.internal [SPARK-47262][SQL] Changed Parquet conversion errors to throw SparkException.internal Aug 21, 2024
errorClass = "_LEGACY_ERROR_TEMP_2238",
errorClass = "PARQUET_CONVERSION_FAILURE_WITHOUT_DECIMAL_METADATA",
messageParameters = Map(
"typeName" -> t.typeName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use toSQLType to quote it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"typeName" -> t.typeName,
"typeName" -> toSQLType(t),

…f parquet conversion and modified parquet type in exception message to use sql format
errorClass = "_LEGACY_ERROR_TEMP_2238",
errorClass = "PARQUET_CONVERSION_FAILURE_WITHOUT_DECIMAL_METADATA",
messageParameters = Map(
"typeName" -> t.typeName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"typeName" -> t.typeName,
"typeName" -> toSQLType(t),

errorClass = "PARQUET_CONVERSION_FAILURE.DECIMAL",
messageParameters = Map(
"t" -> t.json,
"t" -> t.sql,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"t" -> t.sql,
"t" -> toSQLType(t),

errorClass = "PARQUET_CONVERSION_FAILURE",
messageParameters = Map(
"t" -> t.json,
"dataType" -> t.sql,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"dataType" -> t.sql,
"t" -> toSQLType(t),

"subClass": {
"WITHOUT_DECIMAL_METADATA" : {
"message" : [
"Unable to create a Parquet converter for <typeName> whose Parquet type is <parquetType> without decimal metadata. Please read this column/field as Spark BINARY type."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need to repeat parent class error message otherwise users will see duplicates.

…ion error subclasses' messages, so they do not repeat
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladanvasi-db Please, update PR's description according to your recent changes.

"subClass": {
"WITHOUT_DECIMAL_METADATA" : {
"message" : [
" Please read this column/field as Spark BINARY type."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gap is not needed because we merge parent's and chid's messages via a space.

Suggested change
" Please read this column/field as Spark BINARY type."
"Please read this column/field as Spark BINARY type."

},
"DECIMAL" : {
"message" : [
" Parquet DECIMAL type can only be backed by INT32, INT64, FIXED_LEN_BYTE_ARRAY, or BINARY."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" Parquet DECIMAL type can only be backed by INT32, INT64, FIXED_LEN_BYTE_ARRAY, or BINARY."
"Parquet DECIMAL type can only be backed by INT32, INT64, FIXED_LEN_BYTE_ARRAY, or BINARY."

@vladanvasi-db vladanvasi-db force-pushed the vladanvasi-db/parquet-conversion-error-classes-fix branch from da1ca27 to 9c6e30e Compare August 21, 2024 12:20
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for CI.

@vladanvasi-db vladanvasi-db changed the title [SPARK-47262][SQL] Changed Parquet conversion errors to throw SparkException.internal [SPARK-47262][SQL] Renamed parquet conversion error legacy classes Aug 21, 2024
@vladanvasi-db vladanvasi-db changed the title [SPARK-47262][SQL] Renamed parquet conversion error legacy classes [SPARK-47262][SQL] Renamed parquet conversion error legacy classes to PARQUET_CONVERSION_FAILURE_WITHOUT_DECIMAL_METADATA, PARQUET_DECIMAL_CONVERSION_FAILURE and PARQUET_CONVERSION_FAILURE Aug 21, 2024
@vladanvasi-db vladanvasi-db changed the title [SPARK-47262][SQL] Renamed parquet conversion error legacy classes to PARQUET_CONVERSION_FAILURE_WITHOUT_DECIMAL_METADATA, PARQUET_DECIMAL_CONVERSION_FAILURE and PARQUET_CONVERSION_FAILURE [SPARK-47262][SQL] Assign names to error conditions for parquet conversions Aug 21, 2024
@vladanvasi-db vladanvasi-db changed the title [SPARK-47262][SQL] Assign names to error conditions for parquet conversions [SPARK-47262][SQL] Assign names PARQUET_CONVERSION_FAILURE.WITHOUT_DECIMAL_METADATA, PARQUET_CONVERSION_FAILURE.DECIMAL and PARQUET_CONVERSION_FAILURE to error conditions for parquet conversions Aug 21, 2024
@MaxGekk MaxGekk changed the title [SPARK-47262][SQL] Assign names PARQUET_CONVERSION_FAILURE.WITHOUT_DECIMAL_METADATA, PARQUET_CONVERSION_FAILURE.DECIMAL and PARQUET_CONVERSION_FAILURE to error conditions for parquet conversions [SPARK-47262][SQL] Assign names to error conditions for parquet conversions Aug 21, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Aug 22, 2024

+1, LGTM. Merging to master.
Thank you, @vladanvasi-db.

@MaxGekk MaxGekk closed this in 3e8cd99 Aug 22, 2024
@MaxGekk
Copy link
Member

MaxGekk commented Aug 22, 2024

@vladanvasi-db Congratulations with your first contribution to Apache Spark!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants