[SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions#43798
[SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions#43798ti1uan wants to merge 4 commits intoapache:masterfrom
Conversation
python/pyspark/sql/readwriter.py
Outdated
There was a problem hiding this comment.
Does this work with Scala API too?
There was a problem hiding this comment.
Thanks for your review. No, this change only applies for PySpark, Scala API has no change with it. Do you think we should support it for Scala API?
There was a problem hiding this comment.
The change has to be made for API parity.
There was a problem hiding this comment.
I will change this PR to draft and work on changes for Scala API
There was a problem hiding this comment.
Hi @HyukjinKwon, I'm under the impression that PySpark relies on the underlying Scala API for Parquet operations. If that's correct, would updating the Scala API alone be sufficient to introduce this behavior to both PySpark and Scala API?
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This change introduces a new functionality in the
parquetmethod of the DataFrameWriter class to handle the writing of empty DataFrames to Parquet files, particularly when using partitioning. Previously, writing an empty DataFrame with partitions specified did not create any output in the target directory, which could lead to issues in subsequent jobs expecting files with the defined schema. Now,parquetmethod will check if the DataFrame is empty and if partitions are specified. If both conditions are true, private method_write_empty_partitionis called to handle the empty DataFrame write operation.Why are the changes needed?
This change addresses the issue reported in SPARK-45908 regarding the handling of empty DataFrames with partitions in PySpark's Parquet writing functionality.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually tested.
Was this patch authored or co-authored using generative AI tooling?
No