Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No support for some Pandas Extension Dtypes #399

Open
Duncan-Hunter opened this issue Jun 19, 2024 · 1 comment
Open

No support for some Pandas Extension Dtypes #399

Duncan-Hunter opened this issue Jun 19, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Duncan-Hunter
Copy link

Describe the bug
Pandas has extension DTypes. When you fit a Univariate calculator, or presumably anything that else that checks for dtypes using _split_features_by_type, columns are dropped because Int64 is not in

[
        'int_',
        'int8',
        'int16',
        'int32',
        'int64',
        'uint8',
        'uint16',
        'uint32',
        'uint64',
        'float_',
        'float16',
        'float32',
        'float64',
    ]

To Reproduce
Using an environment with nannyml=0.10.7

import numpy as np
import pandas as pd


num_dtypes = [
    'int_',
    'int8',
    'int16',
    'int32',
    'int64',
    'uint8',
    'uint16',
    'uint32',
    'uint64',
    'float_',
    'float16',
    'float32',
    'float64',
    ]

test = pd.Series([1, 2, 3, 4, 5], dtype='Int64')

print("In num_dtypes: ", test.dtype in num_dtypes)
print("in ['Int64']: ", test.dtype in ['Int64'])
print("dtype: ", test.dtype)

test = test.astype(test.dtype.type)

print("new dtype: ", test.dtype)
print("In num_dtypes: ", test.dtype in num_dtypes)
In num_dtypes:  False
in ['Int64']:  True
dtype:  Int64
new dtype:  int64
In num_dtypes:  True

Expected behavior
There should be support for these dtypes, and columns shouldn't be dropped without the user knowing.

Additional context
I'm going to work around the issue by converting my datatypes to underlying numpy types using pd.Series.dtype.type. But for a fix, I think you should use np.issubdtype(dtype.type, np.number).

@Duncan-Hunter Duncan-Hunter added bug Something isn't working triage Needs to be assessed labels Jun 19, 2024
@nnansters
Copy link
Contributor

Hey @Duncan-Hunter ,

good catch, good suggestion. I'll take a look into the np.issubdtype function for a cleaner solution.

Worst case scenario we can always add the extension dtypes to the list above.

@nnansters nnansters removed the triage Needs to be assessed label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants