[SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions by ti1uan · Pull Request #43798 · apache/spark

ti1uan · 2023-11-14T08:10:36Z

What changes were proposed in this pull request?

This change introduces a new functionality in the parquet method of the DataFrameWriter class to handle the writing of empty DataFrames to Parquet files, particularly when using partitioning. Previously, writing an empty DataFrame with partitions specified did not create any output in the target directory, which could lead to issues in subsequent jobs expecting files with the defined schema. Now, parquet method will check if the DataFrame is empty and if partitions are specified. If both conditions are true, private method _write_empty_partition is called to handle the empty DataFrame write operation.

Why are the changes needed?

This change addresses the issue reported in SPARK-45908 regarding the handling of empty DataFrames with partitions in PySpark's Parquet writing functionality.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually tested.

Was this patch authored or co-authored using generative AI tooling?

No

…rtitions

HyukjinKwon · 2023-11-15T02:11:44Z

python/pyspark/sql/readwriter.py

Does this work with Scala API too?

Thanks for your review. No, this change only applies for PySpark, Scala API has no change with it. Do you think we should support it for Scala API?

The change has to be made for API parity.

I will change this PR to draft and work on changes for Scala API

Hi @HyukjinKwon, I'm under the impression that PySpark relies on the underlying Scala API for Parquet operations. If that's correct, would updating the Scala API alone be sufficient to introduce this behavior to both PySpark and Scala API?

github-actions · 2024-02-24T00:17:05Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

refactor: Add support for writing empty DataFrames to parquet with pa…

a35f14e

…rtitions

github-actions bot added SQL PYTHON labels Nov 14, 2023

ti1uan changed the title ~~[Spark 45908][Python]~~ [Spark 45908][Python] Add support for writing empty DataFrames to parquet with partitions Nov 14, 2023

ti1uan changed the title ~~[Spark 45908][Python] Add support for writing empty DataFrames to parquet with partitions~~ [Spark-45908][Python] Add support for writing empty DataFrames to parquet with partitions Nov 14, 2023

ti1uan force-pushed the SPARK-45908 branch from 9f71286 to a35f14e Compare November 14, 2023 08:18

HyukjinKwon changed the title ~~[Spark-45908][Python] Add support for writing empty DataFrames to parquet with partitions~~ [SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions Nov 15, 2023

HyukjinKwon reviewed Nov 15, 2023

View reviewed changes

Tian Luan added 2 commits November 14, 2023 20:14

chore: Format code to improve clarity and readibility

e8f3996

chore: Run reformat-python script to reformat code

0693ca1

ti1uan marked this pull request as draft November 15, 2023 06:34

Merge branch 'apache:master' into SPARK-45908

fc1935c

github-actions bot added the Stale label Feb 24, 2024

github-actions bot closed this Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions#43798

[SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions#43798
ti1uan wants to merge 4 commits intoapache:masterfrom
ti1uan:SPARK-45908

ti1uan commented Nov 14, 2023 •

edited

Loading

Uh oh!

HyukjinKwon Nov 15, 2023

Uh oh!

ti1uan Nov 15, 2023

Uh oh!

HyukjinKwon Nov 15, 2023

Uh oh!

ti1uan Nov 15, 2023

Uh oh!

ti1uan Nov 15, 2023

Uh oh!

github-actions bot commented Feb 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ti1uan commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

ti1uan Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

ti1uan Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

ti1uan Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

ti1uan commented Nov 14, 2023 •

edited

Loading