Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical support broken with recent pandas #597

Closed
dhirschfeld opened this issue Nov 5, 2017 · 1 comment
Closed

Categorical support broken with recent pandas #597

dhirschfeld opened this issue Nov 5, 2017 · 1 comment

Comments

@dhirschfeld
Copy link
Contributor

test_discover is broken for me:

import pandas as pd
import pandas.util.testing as tm
import numpy as np
import dask.dataframe as dd
from datashape import var, Record, int64, float64, Categorical
from datashape.util.testing import assert_dshape_equal

from odo import convert, discover


def test_discover():
    df = pd.DataFrame({'x': list('a'*5 + 'b'*5 + 'c'*5),
                       'y': np.arange(15, dtype=np.int64),
                       'z': list(map(float, range(15)))},
                       columns=['x', 'y', 'z'])
    df.x = df.x.astype('category')
    ddf = dd.from_pandas(df, npartitions=2)
    assert_dshape_equal(discover(ddf),
                        var * Record([('x', Categorical(['a', 'b', 'c'])),
                                            ('y', int64), ('z', float64)]))
    assert_dshape_equal(discover(ddf.x), var * Categorical(['a', 'b', 'c']))
Traceback (most recent call last):

  File "<ipython-input-2-75fca8249e6d>", line 7, in <module>
    assert_dshape_equal(discover(ddf),

  File "C:\Miniconda3\lib\site-packages\multipledispatch\dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)

  File "C:\Miniconda3\lib\site-packages\odo\backends\dask.py", line 27, in discover_dask_dataframe
    return var * discover(df.head()).measure

  File "C:\Miniconda3\lib\site-packages\multipledispatch\dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)

  File "C:\Miniconda3\lib\site-packages\odo\backends\pandas.py", line 39, in discover_dataframe
    for k in df.columns])

  File "C:\Miniconda3\lib\site-packages\odo\backends\pandas.py", line 39, in <listcomp>
    for k in df.columns])

  File "C:\Miniconda3\lib\site-packages\odo\backends\pandas.py", line 31, in dshape_from_pandas
    dshape = datashape.CType.from_numpy_dtype(col.dtype)

  File "C:\Miniconda3\lib\site-packages\datashape\coretypes.py", line 781, in from_numpy_dtype
    if np.issubdtype(dt, np.datetime64):

  File "C:\Miniconda3\lib\site-packages\numpy\core\numerictypes.py", line 755, in issubdtype
    return issubclass(dtype(arg1).type, arg2)

TypeError: data type not understood
@dhirschfeld
Copy link
Contributor Author

In dshape_form_pandas there is an explicit test for categorical:

def dshape_from_pandas(col):
if isinstance(col.dtype, categorical):
return Categorical(col.cat.categories.tolist())
elif col.dtype.kind == 'M':
tz = getattr(col.dtype, 'tz', None)
if tz is not None:
# Pandas stores this as a pytz.tzinfo, but DataShape wants a
# string.
tz = str(tz)
return Option(DateTime(tz=tz))
dshape = datashape.CType.from_numpy_dtype(col.dtype)
dshape = string if dshape == object_ else dshape
return Option(dshape) if dshape in possibly_missing else dshape

But this is failing for me since the definition of categorical:

categorical = type(pd.Categorical.dtype)

...gives a property object rather than an actual dtype:

In [5]: pd.Categorical.dtype
Out[5]: <property at 0x4843818>
In [6]: type(pd.Categorical.dtype)
Out[6]: property

This is with pandas 0.21.0 from conda-forge.

dhirschfeld pushed a commit to dhirschfeld/odo that referenced this issue Nov 6, 2017
@dhirschfeld dhirschfeld changed the title Categorical support broken? Categorical support broken with recent pandas Nov 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant