Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for converting pandas extension dtypes to arrays #10018

Merged
merged 2 commits into from Mar 7, 2023

Conversation

jrbourbeau
Copy link
Member

Currently when one calls .values on a Dask Series / Index that holds extension dtypes, an error along the lines of TypeError: Cannot interpret 'Float32Dtype()' as a data type is raised because numpy doesn't know how to handle pandas extension dtypes (xref #9401).

This PR proposes we add initial support for this case by just converting extension dtypes we encounter to objects before handing off to numpy. There are other approaches we could take, for example something like #7375, that would give better parity with pandas behavior, but casting to object feels like reasonable progress over the status quo of throwing an error.

cc @rjzamora @j-bennet for thoughts

@@ -371,7 +372,12 @@ def size(x):


def values(df):
return df.values
values = df.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than throwing an exception. I'd add a warning here. Something like this:

warnings.warn('Dask doesn't have support for X dtype yet. Values were converted from X to object. See issue XXX for more information', UserWarning)

Users might notice the conversion and think it's a bug, that would let them know that there's no need to file an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @j-bennet that casting to object is better than an error, and that we should probably warn the user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed that's the most friendly thing to do. I pushed a commit yesterday which adds a warning about converting to object dtype in this case

Copy link
Contributor

@j-bennet j-bennet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jrbourbeau
Copy link
Member Author

Thanks for reviewing @j-bennet @rjzamora

@jrbourbeau jrbourbeau merged commit 89db50e into dask:main Mar 7, 2023
@jrbourbeau jrbourbeau deleted the extension-dtypes-values branch March 7, 2023 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants