Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect_problem_type can't handle DataColumns #1955

Closed
freddyaboulton opened this issue Mar 10, 2021 · 1 comment · Fixed by #2181
Closed

detect_problem_type can't handle DataColumns #1955

freddyaboulton opened this issue Mar 10, 2021 · 1 comment · Fixed by #2181
Labels
bug Issues tracking problems with existing features.

Comments

@freddyaboulton
Copy link
Contributor

Repro

from evalml.problem_types import detect_problem_type
from evalml.demos import load_breast_cancer

X, y = load_breast_cancer()

detect_problem_type(y)
-----------------------------------------------------------------
TypeError                       Traceback (most recent call last)
<ipython-input-41-6bcb844deb7e> in <module>
----> 1 detect_problem_type(y)

~/sources/evalml/evalml/problem_types/utils.py in detect_problem_type(y)
     41     """
     42     y = pd.Series(y).dropna()
---> 43     num_classes = y.nunique()
     44     if num_classes < 2:
     45         raise ValueError("Less than 2 classes detected! Target unusable for modeling")

~/miniconda3/envs/evalml/lib/python3.8/site-packages/pandas/core/base.py in nunique(self, dropna)
   1301         4
   1302         """
-> 1303         uniqs = self.unique()
   1304         n = len(uniqs)
   1305         if dropna and isna(uniqs).any():

~/miniconda3/envs/evalml/lib/python3.8/site-packages/pandas/core/series.py in unique(self)
   1879         Categories (3, object): ['a' < 'b' < 'c']
   1880         """
-> 1881         result = super().unique()
   1882         return result
   1883 

~/miniconda3/envs/evalml/lib/python3.8/site-packages/pandas/core/base.py in unique(self)
   1263                     result = np.asarray(result)
   1264         else:
-> 1265             result = unique1d(values)
   1266 
   1267         return result

~/miniconda3/envs/evalml/lib/python3.8/site-packages/pandas/core/algorithms.py in unique(values)
    397 
    398     table = htable(len(values))
--> 399     uniques = table.unique(values)
    400     uniques = _reconstruct_data(uniques, original.dtype, original)
    401     return uniques

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.unique()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()

TypeError: unhashable type: 'DataColumn'

I think the problem is that calling pd.Series on a DataColumn doesn't produce the type of series we're expecting.

@freddyaboulton freddyaboulton added the bug Issues tracking problems with existing features. label Mar 10, 2021
@dsherry
Copy link
Contributor

dsherry commented Mar 11, 2021

@tyler3991 says we're close on the new woodwork accessor API. So I'm tempted to punt on this until we've added that support. Because once we've updated, there won't even be a DataColumn anymore! I think it'll solve this problem implicitly. That's tracked by #1965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants