Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test_merge_by_index_patterns for pandas 2.0 #9930

Merged
merged 5 commits into from
Feb 14, 2023

Conversation

j-bennet
Copy link
Contributor

@j-bennet j-bennet commented Feb 8, 2023

Fixes the following failures in CI with pandas 2.0:

dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2: AssertionError: DataFrame.columns are different
dask/dataframe/tests/test_multi.py::test_merge_by_index_patterns[tasks-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_join_by_index_patterns[disk-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_join_by_index_patterns[tasks-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns[disk-inner]: AssertionError: DataFrame.index are different
dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns[tasks-inner]: AssertionError: DataFrame.index are different

The failure is related to the following change in 2.0:

https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#empty-dataframes-series-will-now-default-to-have-a-rangeindex

In this test, we're already applying a workaround to set dtype on returned pandas dataframes, because pandas doesn't set it on empty dataframes. But with 2.0, it looks like we needed to apply this in more places. Not sure why this changed.

  • Tests added / passed
  • Passes pre-commit run --all-files

@j-bennet j-bennet changed the title Fix test_merge_by_index_patterns for pandas 2.0 compatibility Fix test_merge_by_index_patterns for pandas 2.0 compatibility Feb 8, 2023
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work here @j-bennet. It looks like there are other similar failures (e.g. dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns, dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2, etc) that are still happening. Would you like to handle those here, or in a follow-up?

@j-bennet
Copy link
Contributor Author

j-bennet commented Feb 13, 2023

It looks like there are other similar failures (e.g. dask/dataframe/tests/test_multi.py::test_merge_by_multiple_columns, dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2, etc) that are still happening. Would you like to handle those here, or in a follow-up?

Good catch, yes, they look like the same error. I'll include them in the PR.

@j-bennet
Copy link
Contributor Author

Didn't find a good way to fix dask/dataframe/tests/test_shuffle.py::test_set_index_overlap_2, xfailed for now.

@j-bennet
Copy link
Contributor Author

The remaining test failures here are flaky tests:

FAILED dask/tests/test_distributed.py::test_blockwise_dataframe_io[False-True-hdf] - AssertionError: DataFrame are different
FAILED dask/dataframe/tests/test_groupby.py::test_dataframe_aggregations_multilevel[cov-disk-1-<lambda>1] - FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-bennet. Overall this looks reasonable. Are these changes more of a workaround and we should follow-up to make the index match, or is this a good long-term solution?

@j-bennet
Copy link
Contributor Author

Are these changes more of a workaround and we should follow-up to make the index match, or is this a good long-term solution?

It's a workaround. I created an issue to fix this properly:

@jrbourbeau
Copy link
Member

Great, thanks @j-bennet

@jrbourbeau jrbourbeau changed the title Fix test_merge_by_index_patterns for pandas 2.0 compatibility Fix test_merge_by_index_patterns for pandas 2.0 Feb 14, 2023
@jrbourbeau jrbourbeau merged commit 0890b96 into dask:main Feb 14, 2023
@j-bennet j-bennet deleted the j-bennet/9736-fix-merge-df branch February 14, 2023 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants