Fix `test_merge_by_index_patterns` for `pandas` 2.0 #9930

j-bennet · 2023-02-08T02:48:05Z

Fixes the following failures in CI with pandas 2.0:

dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2: AssertionError: DataFrame.columns are different
dask/dataframe/tests/test_multi.py::test_merge_by_index_patterns[tasks-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_join_by_index_patterns[disk-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_join_by_index_patterns[tasks-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns[disk-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns[tasks-inner]: AssertionError: DataFrame.index are different

The failure is related to the following change in 2.0:

https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#empty-dataframes-series-will-now-default-to-have-a-rangeindex

In this test, we're already applying a workaround to set dtype on returned pandas dataframes, because pandas doesn't set it on empty dataframes. But with 2.0, it looks like we needed to apply this in more places. Not sure why this changed.

Tests added / passed
Passes pre-commit run --all-files

jrbourbeau

Thanks for your work here @j-bennet. It looks like there are other similar failures (e.g. dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns, dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2, etc) that are still happening. Would you like to handle those here, or in a follow-up?

j-bennet · 2023-02-13T22:00:51Z

It looks like there are other similar failures (e.g. dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns, dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2, etc) that are still happening. Would you like to handle those here, or in a follow-up?

Good catch, yes, they look like the same error. I'll include them in the PR.

j-bennet · 2023-02-14T02:23:25Z

Didn't find a good way to fix dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2, xfailed for now.

j-bennet · 2023-02-14T07:38:07Z

The remaining test failures here are flaky tests:

FAILED dask/tests/test_distributed.py::test_blockwise_dataframe_io[False-True-hdf] - AssertionError: DataFrame are different
FAILED dask/dataframe/tests/test_groupby.py::test_dataframe_aggregations_multilevel[cov-disk-1-<lambda>1] - FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

jrbourbeau

Thanks @j-bennet. Overall this looks reasonable. Are these changes more of a workaround and we should follow-up to make the index match, or is this a good long-term solution?

j-bennet · 2023-02-14T20:02:04Z

Are these changes more of a workaround and we should follow-up to make the index match, or is this a good long-term solution?

It's a workaround. I created an issue to fix this properly:

Properly fix index class and dtype mismatch in empty DataFrame or Series with pandas 2.0 #9957

jrbourbeau · 2023-02-14T20:28:49Z

Great, thanks @j-bennet

j-bennet requested a review from jrbourbeau February 8, 2023 02:48

github-actions bot added the dataframe label Feb 8, 2023

j-bennet changed the title ~~Fix test_merge_by_index_patterns for pandas 2.0 compatibility~~ Fix test_merge_by_index_patterns for pandas 2.0 compatibility Feb 8, 2023

j-bennet force-pushed the j-bennet/9736-fix-merge-df branch from a68c688 to ec128b9 Compare February 13, 2023 17:57

jrbourbeau added the upstream label Feb 13, 2023

jrbourbeau reviewed Feb 13, 2023

View reviewed changes

j-bennet requested a review from jrbourbeau February 13, 2023 22:34

j-bennet added 4 commits February 13, 2023 21:01

Fix merge dataframe tests for 2.0.

8e19df2

test-upstream

aae9449

Fix other tests with the same error.

c11aa27

Xfail the test.

a8374fd

j-bennet force-pushed the j-bennet/9736-fix-merge-df branch from f5dae85 to a8374fd Compare February 14, 2023 05:01

Merged main to branch.

18560db

jrbourbeau reviewed Feb 14, 2023

View reviewed changes

j-bennet mentioned this pull request Feb 14, 2023

Properly fix index class and dtype mismatch in empty DataFrame or Series with pandas 2.0 #9957

Open

jrbourbeau changed the title ~~Fix test_merge_by_index_patterns for pandas 2.0 compatibility~~ Fix test_merge_by_index_patterns for pandas 2.0 Feb 14, 2023

jrbourbeau merged commit 0890b96 into dask:main Feb 14, 2023

j-bennet deleted the j-bennet/9736-fix-merge-df branch February 14, 2023 20:30

jrbourbeau mentioned this pull request Feb 14, 2023

Un-xfail test_set_index_overlap_2 for pandas 2.0 #9959

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `test_merge_by_index_patterns` for `pandas` 2.0 #9930

Fix `test_merge_by_index_patterns` for `pandas` 2.0 #9930

j-bennet commented Feb 8, 2023 •

edited

Loading

jrbourbeau left a comment

j-bennet commented Feb 13, 2023 •

edited

Loading

j-bennet commented Feb 14, 2023

j-bennet commented Feb 14, 2023

jrbourbeau left a comment

j-bennet commented Feb 14, 2023

jrbourbeau commented Feb 14, 2023

Fix test_merge_by_index_patterns for pandas 2.0 #9930

Fix test_merge_by_index_patterns for pandas 2.0 #9930

Conversation

j-bennet commented Feb 8, 2023 • edited Loading

jrbourbeau left a comment

Choose a reason for hiding this comment

j-bennet commented Feb 13, 2023 • edited Loading

j-bennet commented Feb 14, 2023

j-bennet commented Feb 14, 2023

jrbourbeau left a comment

Choose a reason for hiding this comment

j-bennet commented Feb 14, 2023

jrbourbeau commented Feb 14, 2023

Fix `test_merge_by_index_patterns` for `pandas` 2.0 #9930

Fix `test_merge_by_index_patterns` for `pandas` 2.0 #9930

j-bennet commented Feb 8, 2023 •

edited

Loading

j-bennet commented Feb 13, 2023 •

edited

Loading