[SPARK-56060][PS] Handle pandas 3 null string conversion in describe() for empty timestamp frames#54893
Closed
ueshin wants to merge 1 commit intoapache:masterfrom
Closed
[SPARK-56060][PS] Handle pandas 3 null string conversion in describe() for empty timestamp frames#54893ueshin wants to merge 1 commit intoapache:masterfrom
ueshin wants to merge 1 commit intoapache:masterfrom
Conversation
Member
Author
HyukjinKwon
approved these changes
Mar 18, 2026
zhengruifeng
approved these changes
Mar 19, 2026
Member
|
Merged to master. |
terana
pushed a commit
to terana/spark
that referenced
this pull request
Mar 23, 2026
…) for empty timestamp frames ### What changes were proposed in this pull request? This PR updates pandas-on-Spark `DataFrame.describe()` and the related `test_describe_empty` expectations for empty timestamp-containing frames to handle the pandas 3 `astype(str)` behavior change on null values. In pandas 2, empty timestamp stats were string-converted as `"None"` in the relevant `describe()` path. In pandas 3, `astype(str)` preserves those empty stats as missing values instead. This patch updates the pandas-on-Spark result construction and the corresponding test expectations to follow that behavior consistently. ### Why are the changes needed? `pyspark.pandas.tests.computation.test_describe FrameDescribeTests.test_describe_empty` fails with pandas 3 because pandas changed how `astype(str)` handles null values in empty timestamp `describe()` results. Without this change, pandas-on-Spark and the pandas-based expectation disagree for empty timestamp-only and mixed timestamp frames. ### Does this PR introduce _any_ user-facing change? Yes. For pandas-on-Spark `DataFrame.describe()` on empty timestamp-containing frames, null timestamp stats now follow the pandas 3 string-conversion behavior instead of always being materialized as `"None"`. ### How was this patch tested? Ran the related `pyspark.pandas.tests.computation.test_describe` tests in both pandas 2 and pandas 3 Python environments. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: OpenAI Codex (GPT-5) Closes apache#54893 from ueshin/issues/SPARK-56060/describe. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR updates pandas-on-Spark
DataFrame.describe()and the relatedtest_describe_emptyexpectations for empty timestamp-containing frames to handle the pandas 3astype(str)behavior change on null values.In pandas 2, empty timestamp stats were string-converted as
"None"in the relevantdescribe()path. In pandas 3,astype(str)preserves those empty stats as missing values instead. This patch updates the pandas-on-Spark result construction and the corresponding test expectations to follow that behavior consistently.Why are the changes needed?
pyspark.pandas.tests.computation.test_describe FrameDescribeTests.test_describe_emptyfails with pandas 3 because pandas changed howastype(str)handles null values in empty timestampdescribe()results.Without this change, pandas-on-Spark and the pandas-based expectation disagree for empty timestamp-only and mixed timestamp frames.
Does this PR introduce any user-facing change?
Yes.
For pandas-on-Spark
DataFrame.describe()on empty timestamp-containing frames, null timestamp stats now follow the pandas 3 string-conversion behavior instead of always being materialized as"None".How was this patch tested?
Ran the related
pyspark.pandas.tests.computation.test_describetests in both pandas 2 and pandas 3 Python environments.Was this patch authored or co-authored using generative AI tooling?
Generated-by: OpenAI Codex (GPT-5)