Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Fix nullable mismatch in ps.read_excel

Why are the changes needed?

to re-enable the tests

Does this PR introduce any user-facing change?

no

How was this patch tested?

updated ut

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng zhengruifeng changed the title [SPARK-40353][PYTHON] Fix nullable mismatch in ps.read_excel [SPARK-40353][PS] Fix nullable mismatch in ps.read_excel Mar 19, 2025
pd.read_excel(open(path1, "rb"), index_col=0),
)
self.assert_eq(
ps.read_excel(open(path1, "rb"), index_col=0, squeeze=True),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squeeze is dropped at pandas 2.0, so we need to remove it

self.assert_eq(psdfs["Sheet_name_1"], pdfs1["Sheet_name_1"])
self.assert_eq(psdfs["Sheet_name_2"], pdfs1["Sheet_name_2"])

psdfs = ps.read_excel(tmp, sheet_name=sheet_name, index_col=0, squeeze=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such tests make no sense since squeeze=True is not allowed any more


psdf = cast(DataFrame, from_pandas(pdf))
return_schema = force_decimal_precision_scale(
as_nullable_spark_type(psdf._internal.spark_frame.drop(*HIDDEN_COLUMNS).schema)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_nullable_spark_type convert all fields (both index and data) to nullable=true, while in InternalFrame it assert the index field should have nullable=false

@zhengruifeng zhengruifeng changed the title [SPARK-40353][PS] Fix nullable mismatch in ps.read_excel [SPARK-40353][PS][CONNECT] Fix index nullable mismatch in ps.read_excel Mar 19, 2025
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the ps_read_excel branch March 24, 2025 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants