Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression metrics raise exception for dask.dataframe.core.Series #756

Open
jameslamb opened this issue Nov 17, 2020 · 3 comments
Open

regression metrics raise exception for dask.dataframe.core.Series #756

jameslamb opened this issue Nov 17, 2020 · 3 comments

Comments

@jameslamb
Copy link
Member

What happened:

I tried to pass columns from a Dask DataFrame into regression metrics like mean_squared_error(), and this raised errors like

AttributeError: 'Scalar' object has no attribute 'mean'

What you expected to happen:

I expected that I'd be able to pass a column from a Dask DataFrame (which has type dask.dataframe.core.Series) into any of the metrics functions.

Minimal Complete Verifiable Example:

import dask
import dask.dataframe as dd
from dask.distributed import Client, LocalCluster

cluster = LocalCluster()
client = Client(cluster)
cluster

ddf = dask.datasets.timeseries()

from dask_ml.metrics import mean_squared_error

mean_squared_error(
    y_true=ddf["y"],
    y_pred=ddf["y"]
)

Anything else we need to know?:

I looked around and couldn't find documentation that would lead me to think this wouldn't work, or other issues that seemed related.

Environment:

  • Dask version (output of pip freeze | grep -E "dask|distributed")
    • dask==2.30.0
      dask-cloudprovider==0.4.1
      dask-glm==0.2.0
      dask-ml==1.7.0
      distributed==2.30.1
  • Python version: 3.8.3.final.0
  • Operating System: macOS 10.14.6
  • Install method (conda, pip, source): pip

Thanks for your time and consideration

@TomAugspurger
Copy link
Member

TomAugspurger commented Nov 17, 2020 via email

@jameslamb
Copy link
Member Author

Is it fair to say that supporting dd.Series inputs is something that I should expect to work in these metrics functions?

Looking at it again, I see that the type hint on these functions is ArrayLike, and that that is

ArrayLike = TypeVar("ArrayLike", Array, np.ndarray) 

ArrayLike = TypeVar("ArrayLike", Array, np.ndarray)

If it's expected to work with Series, or something you'd welcome, I'd be happy to submit a PR to add that for metrics.

@TomAugspurger
Copy link
Member

TomAugspurger commented Nov 19, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants