Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix .loc of dataframe with nullable boolean dtype #8368

Merged

Conversation

m-rossi
Copy link
Contributor

@m-rossi m-rossi commented Nov 9, 2021

I use the Nullable Boolean data type with Dask. It worked fine since the recent version 2021.11.0. Let me show an example:

import dask.dataframe as dd
import pandas as pd

s1 = pd.Series([0, 1, 2])
s2 = pd.Series([True, False, pd.NA], dtype="boolean")

ddf1 = dd.from_pandas(s1, npartitions=1)
ddf2 = dd.from_pandas(s2, npartitions=1)

While this indexing

ddf1[ddf2]

works fine, this indexing

ddf1.loc[ddf2]

fails with the following error

KeyError: 'Cannot index with non-boolean dask Series. Try passing computed values instead (e.g. ``ddf.loc[iindexer.compute()]``)'

This PR adds the pd.BooleanDtype() to list of boolean-dask Series and adds a test which fails before my code changes.

  • Tests added / passed
  • Passes pre-commit run --all-files

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@quasiben
Copy link
Member

quasiben commented Nov 9, 2021

add to allowlist

Copy link
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this change so quickly. It looks like this doesn't quite work with older versions of pandas. I think you can use is_bool_dtype like we do in

if is_bool_dtype(data._meta):

dask/dataframe/indexing.py Outdated Show resolved Hide resolved
Copy link
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this @m-rossi!

@jsignell jsignell merged commit fccddfb into dask:main Nov 11, 2021
@m-rossi m-rossi deleted the fix-loc-for-pandas-nullable-boolean-data-type branch November 11, 2021 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants