Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column assignments fails with Float64Dtype type #7156

Closed
nils-braun opened this issue Feb 2, 2021 · 2 comments · Fixed by #7173
Closed

Column assignments fails with Float64Dtype type #7156

nils-braun opened this issue Feb 2, 2021 · 2 comments · Fixed by #7173

Comments

@nils-braun
Copy link
Contributor

nils-braun commented Feb 2, 2021

What happened:
When using pandas' new Float64 nullable type (with pandas >= 1.2), column assignment fails with

TypeError: Cannot interpret 'Float64Dtype()' as a data type

Minimal Complete Verifiable Example:

Fails at least with pandas version 1.2.0 (below that, the new extension type was not introduced).

import pandas as pd
import dask.dataframe as dd

# some example data. Important is only the Float64, the new pandas extension type
df = dd.from_pandas(pd.DataFrame({"a": [1.1]}, dtype="Float64"), npartitions=1)

df.assign(new_col=df["a"])
# TypeError: Cannot interpret 'Float64Dtype()' as a data type

Full stacktrace

Traceback (most recent call last):
  File "bla.py", line 7, in <module>
    df.assign(new_col=df["a"])
  File "/home/nils/anaconda3/envs/dask-sql/lib/python3.8/site-packages/dask/dataframe/core.py", line 4004, in assign
    df2 = self._meta_nonempty.assign(**_extract_meta(kwargs, nonempty=True))
  File "/home/nils/anaconda3/envs/dask-sql/lib/python3.8/site-packages/dask/dataframe/core.py", line 360, in _meta_nonempty
    return meta_nonempty(self._meta)
  File "/home/nils/anaconda3/envs/dask-sql/lib/python3.8/site-packages/dask/utils.py", line 509, in __call__
    return meth(arg, *args, **kwargs)
  File "/home/nils/anaconda3/envs/dask-sql/lib/python3.8/site-packages/dask/dataframe/utils.py", line 391, in meta_nonempty_dataframe
    dt_s_dict[dt] = _nonempty_series(x.iloc[:, i], idx=idx)
  File "/home/nils/anaconda3/envs/dask-sql/lib/python3.8/site-packages/dask/dataframe/utils.py", line 578, in _nonempty_series
    data = np.array([entry, entry], dtype=dtype)
TypeError: Cannot interpret 'Float64Dtype()' as a data type

As a fix, I guess one wants to introduce a function similar to https://github.com/dask/dask/blob/master/dask/dataframe/utils.py#L43 (but for floats) and use it in https://github.com/dask/dask/blob/master/dask/dataframe/utils.py#L540. I am happy to open a PR to fix this if this solution is ok.

Environment:

  • Dask version: 2021.1.1
  • Pandas version: 1.2.0
  • Python version:
  • Operating System: ubuntu
  • Install method (conda, pip, source): conda
@jsignell
Copy link
Member

jsignell commented Feb 4, 2021

I think the solution that you propose is very reasonable and if you are willing to open a PR even better! Just as a heads up there is a release planned for friday (tomorrow) so it'd be great to get this in.

@nils-braun
Copy link
Contributor Author

@jsignell PR is open - I hope it was still on time yesterday evening (European time)

jsignell pushed a commit that referenced this issue Feb 5, 2021
* Added support for Float64, solving #7156

* Early exit on older pandas versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants