We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To reproduce: https://www.kaggle.com/itay94/notebookf8c78e84d7
--------------------------------------------------------------------------- MemoryError Traceback (most recent call last) /tmp/ipykernel_34/2864323761.py in <module> 1 from deepchecks.checks import LabelAmbiguity 2 ----> 3 LabelAmbiguity().run(ds_train) /opt/conda/lib/python3.7/site-packages/deepchecks/base/check.py in wrapped(*args, **kwargs) 275 @wraps(func) 276 def wrapped(*args, **kwargs): --> 277 result = func(*args, **kwargs) 278 if not isinstance(result, CheckResult): 279 raise DeepchecksValueError(f'Check {class_instance.name()} expected to return CheckResult bot got: ' /opt/conda/lib/python3.7/site-packages/deepchecks/checks/integrity/label_ambiguity.py in run(self, dataset, model) 64 65 group_unique_data = dataset.data.groupby(dataset.features, dropna=False) ---> 66 group_unique_labels = group_unique_data.nunique()[label_col] 67 68 num_ambiguous = 0 /opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in nunique(self, dropna) 1803 obj = self._obj_with_exclusions 1804 results = self._apply_to_column_groupbys( -> 1805 lambda sgb: sgb.nunique(dropna), obj=obj 1806 ) 1807 /opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in _apply_to_column_groupbys(self, func, obj) 1709 columns = obj.columns 1710 results = [ -> 1711 func(col_groupby) for _, col_groupby in self._iterate_column_groupbys(obj) 1712 ] 1713 /opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in <listcomp>(.0) 1709 columns = obj.columns 1710 results = [ -> 1711 func(col_groupby) for _, col_groupby in self._iterate_column_groupbys(obj) 1712 ] 1713 /opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in <lambda>(sgb) 1803 obj = self._obj_with_exclusions 1804 results = self._apply_to_column_groupbys( -> 1805 lambda sgb: sgb.nunique(dropna), obj=obj 1806 ) 1807 /opt/conda/lib/python3.7/site-packages/pandas/core/groupby/generic.py in nunique(self, dropna) 671 672 result = self.obj._constructor(res, index=ri, name=self.obj.name) --> 673 return self._reindex_output(result, fill_value=0) 674 675 @doc(Series.describe) /opt/conda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in _reindex_output(self, output, fill_value) 3163 levels_list = [ping.group_index for ping in groupings] 3164 index, _ = MultiIndex.from_product( -> 3165 levels_list, names=self.grouper.names 3166 ).sortlevel() 3167 /opt/conda/lib/python3.7/site-packages/pandas/core/indexes/multi.py in from_product(cls, iterables, sortorder, names) 618 619 # codes are all ndarrays, so cartesian_product is lossless --> 620 codes = cartesian_product(codes) 621 return cls(levels, codes, sortorder=sortorder, names=names) 622 /opt/conda/lib/python3.7/site-packages/pandas/core/reshape/util.py in cartesian_product(X) 52 b = np.zeros_like(cumprodX) 53 ---> 54 return [tile_compat(np.repeat(x, b[i]), np.product(a[i])) for i, x in enumerate(X)] 55 56 /opt/conda/lib/python3.7/site-packages/pandas/core/reshape/util.py in <listcomp>(.0) 52 b = np.zeros_like(cumprodX) 53 ---> 54 return [tile_compat(np.repeat(x, b[i]), np.product(a[i])) for i, x in enumerate(X)] 55 56 <__array_function__ internals> in repeat(*args, **kwargs) /opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py in repeat(a, repeats, axis) 477 478 """ --> 479 return _wrapfunc(a, 'repeat', repeats, axis=axis) 480 481 /opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds) 56 57 try: ---> 58 return bound(*args, **kwds) 59 except TypeError: 60 # A TypeError occurs if the object does have such a method in its MemoryError: Unable to allocate 45.3 PiB for an array with shape (25499357367644160,) and data type int16
The text was updated successfully, but these errors were encountered:
Pandas has a bug with categorical features with many unique values. Please see:
pandas-dev/pandas#45128
Sorry, something went wrong.
ItayGabbay
Successfully merging a pull request may close this issue.
To reproduce:
https://www.kaggle.com/itay94/notebookf8c78e84d7
The text was updated successfully, but these errors were encountered: