Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default types_mapper to from_pyarrow_table_dispatch for pandas #10446

Merged

Conversation

rjzamora
Copy link
Member

dask.dataframe.dispatch.from_pyarrow_table_dispatch is currently used by distributed's p2p-shuffle algorithm. However, in order to ensure that pyarrow string types are not lost during the pandas-arrow-pandas round trip, a special types_mapper argument is passed through to pa.Table.to_pandas. The types_mapper argument is not backend agnostic, as such an argument must be ignored by the cudf implementation of from_pyarrow_table_dispatch.

This PR moves the types_mapper logic into the pandas implementation of from_pyarrow_table_dispatch.

@rjzamora rjzamora added dataframe enhancement Improve existing functionality or make things work better labels Aug 16, 2023
@rjzamora rjzamora self-assigned this Aug 16, 2023
@github-actions github-actions bot added the dispatch Related to `Dispatch` extension objects label Aug 16, 2023
@rjzamora
Copy link
Member Author

cc @hendrikmakait - Note that my primary goal here is to allow us to remove the types_mapper argument from this line of distributed, because it currently results in warnings like this for the cudf backend:

UserWarning: Ignoring the following arguments to `from_pyarrow_table_dispatch`: ['types_mapper']

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @rjzamora!

@hendrikmakait hendrikmakait merged commit e7cec43 into dask:main Aug 17, 2023
25 checks passed
@rjzamora rjzamora deleted the from_pyarrow_table_dispatch-types_mapper branch August 17, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataframe dispatch Related to `Dispatch` extension objects enhancement Improve existing functionality or make things work better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants