Skip to content

[SPARK-48649][SQL] Add "ignoreInvalidPartitionPaths" and "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring invalid partition paths#47006

Closed
sadikovi wants to merge 1 commit intoapache:masterfrom
sadikovi:SPARK-48649

Conversation

@sadikovi
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds a new data source config ignoreInvalidPartitionPaths and SQL session configuration flag spark.sql.files.ignoreInvalidPartitionPaths to control the behaviour of skipping invalid partition paths (base paths).

When the config is enabled, it allows skipping invalid paths such as:

table/
  invalid/...
  part=1/...
  part=2/... 
  part=3/...

In this case, table/invalid path will be ignored.

Data source option takes precedence over the SQL config so with the code:

spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "false")

spark.read.format("parquet").option("ignoreInvalidPartitionPaths", "true").load(...)

the query would ignore invalid partitions, i.e. the flag will be enabled.

The config is disabled by default.

Why are the changes needed?

Allows ignoring invalid partition paths that cannot be parsed.

Does this PR introduce any user-facing change?

No. The added configs are disabled by default to have the exact same behaviour as before.

How was this patch tested?

I added a unit test for this.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 18, 2024
@sadikovi sadikovi changed the title [SPARK-48649] Add "ignoreInvalidPartitionPaths" and "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring invalid partition paths [SPARK-48649][SQL] Add "ignoreInvalidPartitionPaths" and "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring invalid partition paths Jun 18, 2024
@sadikovi
Copy link
Contributor Author

cc @cloud-fan @dongjoon-hyun @gengliangwang for review. Thank you.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 5e28e95 Jun 19, 2024
@sadikovi
Copy link
Contributor Author

Thanks @cloud-fan. I forgot to ask, do we need any documentation updates or migration/release note for this?

@cloud-fan
Copy link
Contributor

This is not a breaking change (it's a new feature), so migration guide is not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants