Skip to content

[SPARK-56045][SQL] Add flag for ignoring Parquet UNKNOWN type annotation and revert to old behavior#54870

Closed
ZiyaZa wants to merge 2 commits intoapache:masterfrom
ZiyaZa:unknown-type-flag
Closed

[SPARK-56045][SQL] Add flag for ignoring Parquet UNKNOWN type annotation and revert to old behavior#54870
ZiyaZa wants to merge 2 commits intoapache:masterfrom
ZiyaZa:unknown-type-flag

Conversation

@ZiyaZa
Copy link
Contributor

@ZiyaZa ZiyaZa commented Mar 17, 2026

What changes were proposed in this pull request?

This PR introduces a new flag spark.sql.parquet.reader.respectUnknownTypeAnnotation.enabled for Parquet reader to control the behavior when it reads an external file with UNKNOWN logical type annotation:

  • (Default) When false, we infer the Spark type based on the physical type used in the Parquet file, as we did before Spark 4.1.
  • When true, we use NullType as the Spark type.

Why are the changes needed?

To fix the regression introduced by #52922, as we have been reading files differently since then.

Does this PR introduce any user-facing change?

Yes. With default flag value, when we read a Parquet file written by an external engine:

  • Before, we inferred NullType
  • Now, we'll infer a type based on the physical type (e.g. IntegerType)

How was this patch tested?

Added tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@cloud-fan
Copy link
Contributor

LGTM if CI is green, please create a new JIRA ticket as the original commit is already released.

@ZiyaZa ZiyaZa changed the title [SPARK-54220][SQL][FOLLOWUP] Add flag for ignoring Parquet UNKNOWN type annotation and revert to old behavior [SPARK-56045][SQL] Add flag for ignoring Parquet UNKNOWN type annotation and revert to old behavior Mar 18, 2026
@ZiyaZa
Copy link
Contributor Author

ZiyaZa commented Mar 18, 2026

LGTM if CI is green, please create a new JIRA ticket as the original commit is already released.

CI is green, linked the new ticket in the title.

@cloud-fan cloud-fan closed this in 50514c5 Mar 18, 2026
cloud-fan pushed a commit that referenced this pull request Mar 18, 2026
…ion and revert to old behavior

### What changes were proposed in this pull request?

This PR introduces a new flag `spark.sql.parquet.reader.respectUnknownTypeAnnotation.enabled` for Parquet reader to control the behavior when it reads an external file with `UNKNOWN` logical type annotation:

- (Default) When false, we infer the Spark type based on the physical type used in the Parquet file, as we did before Spark 4.1.
- When true, we use NullType as the Spark type.

### Why are the changes needed?

To fix the regression introduced by #52922, as we have been reading files differently since then.

### Does this PR introduce _any_ user-facing change?

Yes. With default flag value, when we read a Parquet file written by an external engine:

- Before, we inferred NullType
- Now, we'll infer a type based on the physical type (e.g. IntegerType)

### How was this patch tested?

Added tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54870 from ZiyaZa/unknown-type-flag.

Authored-by: Ziya Mukhtarov <ziya5muxtarov@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 50514c5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master/4.1!

"inference and infers NullType. When disabled, ignores the UNKNOWN annotation " +
"and uses the physical type instead.")
.version("4.1.2")
.withBindingPolicy(ConfigBindingPolicy.SESSION)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ZiyaZa and @cloud-fan .

This broken branch-4.1. Let me revert this from branch-4.1 only for now.

[error] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:1627:8: value withBindingPolicy is not a member of org.apache.spark.internal.config.ConfigBuilder
[error] possible cause: maybe a semicolon is missing before `value withBindingPolicy`?
[error]       .withBindingPolicy(ConfigBindingPolicy.SESSION)
[error]        ^
[error] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:1627:26: not found: value ConfigBindingPolicy
[error]       .withBindingPolicy(ConfigBindingPolicy.SESSION)
[error]                          ^
[error] two errors found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, it seems withBindingPolicy doesn't exist in 4.1. So deleting that line should solve it. I can create a PR deleting that line, or re-apply this PR as a whole after your revert. Either way works.

@dongjoon-hyun
Copy link
Member

Yes, it's already reverted. Please make a new backporting PR to branch-4.1 now to make it sure that CI passes, @ZiyaZa .

@dongjoon-hyun
Copy link
Member

BTW, thank you for the fix, @ZiyaZa .

@ZiyaZa
Copy link
Contributor Author

ZiyaZa commented Mar 18, 2026

Created a new PR here: #54885

Thanks for letting me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants