New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clear_known_categories
overrides categories.dtype for non-object dtypes
#5756
Comments
Hrm, yes, I can see how that would cause an issue. Currently it looks like we use the presence of the special value Alternatively, we might make the concatenation process more relaxed in the case where we have a mix of known and unknown categoricals (maybe we always convert everything to unknown in that case. That might be easier to implement. I don't have a strong background here. cc @TomAugspurger , who I suspect has more background here than I do. |
FYI in pandas no exception is raised in
|
The only think I can think of use to use an empty index of the correct sub-type. In [11]: pd.Categorical(pd.DatetimeIndex([])).categories
Out[11]: DatetimeIndex([], dtype='datetime64[ns]', freq=None) Not sure if that will fix everything / break other things, but that's where I would start. |
Are empty categorical rare enough that overlapping emptiness with
unknown-ness is ok?
…On Thu, Jan 2, 2020 at 4:08 AM Tom Augspurger ***@***.***> wrote:
The only think I can think of use to use an empty index of the correct
sub-type.
In [11]: pd.Categorical(pd.DatetimeIndex([])).categories
Out[11]: DatetimeIndex([], dtype='datetime64[ns]', freq=None)
Not sure if that will fix everything / break other things, but that's
where I would start.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5756?email_source=notifications&email_token=AACKZTHZBOW6XH4VNKGLOEDQ3XKMXA5CNFSM4KBVAHUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH6GQ7A#issuecomment-570189948>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTCM4SPYXL6BYKEECZTQ3XKMXANCNFSM4KBVAHUA>
.
|
I would need to look a bit more closely at how we use the actual value |
At
dask/dask/dataframe/utils.py
Line 262 in c69d10f
categories.dtype == object
).Minimal example:
This can cause an exception at https://github.com/dask/dask/blob/master/dask/dataframe/methods.py#L460 when using
dd.concat
if there is a categorical column with non-object categories (in our case, datetimes), when during the concatenation, an empty pandas dataframes and a non-empty one are to be concatenated.Original stacktrace when calling
dd.concat
:The text was updated successfully, but these errors were encountered: