No support for some Pandas Extension Dtypes #399

Duncan-Hunter · 2024-06-19T10:34:33Z

Describe the bug
Pandas has extension DTypes. When you fit a Univariate calculator, or presumably anything that else that checks for dtypes using _split_features_by_type, columns are dropped because Int64 is not in

[
        'int_',
        'int8',
        'int16',
        'int32',
        'int64',
        'uint8',
        'uint16',
        'uint32',
        'uint64',
        'float_',
        'float16',
        'float32',
        'float64',
    ]

To Reproduce
Using an environment with nannyml=0.10.7

import numpy as np
import pandas as pd


num_dtypes = [
    'int_',
    'int8',
    'int16',
    'int32',
    'int64',
    'uint8',
    'uint16',
    'uint32',
    'uint64',
    'float_',
    'float16',
    'float32',
    'float64',
    ]

test = pd.Series([1, 2, 3, 4, 5], dtype='Int64')

print("In num_dtypes: ", test.dtype in num_dtypes)
print("in ['Int64']: ", test.dtype in ['Int64'])
print("dtype: ", test.dtype)

test = test.astype(test.dtype.type)

print("new dtype: ", test.dtype)
print("In num_dtypes: ", test.dtype in num_dtypes)

In num_dtypes:  False
in ['Int64']:  True
dtype:  Int64
new dtype:  int64
In num_dtypes:  True

Expected behavior
There should be support for these dtypes, and columns shouldn't be dropped without the user knowing.

Additional context
I'm going to work around the issue by converting my datatypes to underlying numpy types using pd.Series.dtype.type. But for a fix, I think you should use np.issubdtype(dtype.type, np.number).

The text was updated successfully, but these errors were encountered:

nnansters · 2024-06-19T10:41:31Z

Hey @Duncan-Hunter ,

good catch, good suggestion. I'll take a look into the np.issubdtype function for a cleaner solution.

Worst case scenario we can always add the extension dtypes to the list above.

Duncan-Hunter added bug Something isn't working triage Needs to be assessed labels Jun 19, 2024

Duncan-Hunter assigned nnansters Jun 19, 2024

nnansters removed the triage Needs to be assessed label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No support for some Pandas Extension Dtypes #399

No support for some Pandas Extension Dtypes #399

Duncan-Hunter commented Jun 19, 2024

nnansters commented Jun 19, 2024

No support for some Pandas Extension Dtypes #399

No support for some Pandas Extension Dtypes #399

Comments

Duncan-Hunter commented Jun 19, 2024

nnansters commented Jun 19, 2024