You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-40590][TEST] Fix ps.read_parquet when pandas_metadata is True
### What changes were proposed in this pull request?
This PR proposes to fix the `ps.read_parquet` test since the `pd.to_parquet` is broken when the index is `MultiIndex` from pandas 1.5.0.
We leverage the `pd.to_parquet` in the test, so the test failed with pandas 1.5.0 as below (`MultiIndex` is not respected):
```python
DataFrame shape mismatch
[left]: (20, 5)
[right]: (20, 4)
Left:
i32 i64 f bhello index
0 0 0 0.0 yo 0.617492
1 1 1 1.0 people 0.823826
2 2 2 2.0 people 0.443275
3 0 3 3.0 hello 0.639776
4 1 4 4.0 yo 0.393410
5 2 0 5.0 yo 0.898860
6 0 1 6.0 people 0.725236
7 1 2 7.0 yo 0.933009
8 2 3 8.0 yo 0.663381
9 0 4 9.0 hello 0.471077
10 1 0 10.0 hello 0.562182
11 2 1 11.0 people 0.734902
12 0 2 12.0 yo 0.956519
13 1 3 13.0 hello 0.860517
14 2 4 14.0 people 0.012749
15 0 0 15.0 people 0.561815
16 1 1 16.0 people 0.389130
17 2 2 17.0 hello 0.930301
18 0 3 18.0 hello 0.835025
19 1 4 19.0 yo 0.212191
i32 int32
i64 int64
f float64
bhello object
index float64
dtype: object
Right:
i32 i64 f bhello
index
0 0.617492 0 0 0.0 yo
1 0.823826 1 1 1.0 people
2 0.443275 2 2 2.0 people
3 0.639776 0 3 3.0 hello
4 0.393410 1 4 4.0 yo
5 0.898860 2 0 5.0 yo
6 0.725236 0 1 6.0 people
7 0.933009 1 2 7.0 yo
8 0.663381 2 3 8.0 yo
9 0.471077 0 4 9.0 hello
10 0.562182 1 0 10.0 hello
11 0.734902 2 1 11.0 people
12 0.956519 0 2 12.0 yo
13 0.860517 1 3 13.0 hello
14 0.012749 2 4 14.0 people
15 0.561815 0 0 15.0 people
16 0.389130 1 1 16.0 people
17 0.930301 2 2 17.0 hello
18 0.835025 0 3 18.0 hello
19 0.212191 1 4 19.0 yo
i32 int32
i64 int64
f float64
bhello object
dtype: object
```
### Why are the changes needed?
We should make the all test passing with pandas 1.5.0.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually test with pandas 1.5.0.
Closes#38055 from itholic/SPARK-40590.
Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
0 commit comments