-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG internal indexing tools trigger error with pandas < 2.0.0 #28931
Comments
Hello @lorentzenchr and @ogrisel Digging into this, it seems that this is actually not a bug in Curiously, for pandas<2, the problem only happens when So I think we can close the issue. And I will try to fix the corresponding problem in the PR #28375 via a When we add polars support for import warnings
import pandas as pd
import numpy as np
# Can't install scikit learn under Python 3.8 and with pandas 1.5, so I am
# simply copy pasting code from devel version
def _safe_assign(X, values, *, row_indexer=None, column_indexer=None):
"""Safe assignment to a numpy array, sparse matrix, or pandas dataframe.
Parameters
----------
X : {ndarray, sparse-matrix, dataframe}
Array to be modified. It is expected to be 2-dimensional.
values : ndarray
The values to be assigned to `X`.
row_indexer : array-like, dtype={int, bool}, default=None
A 1-dimensional array to select the rows of interest. If `None`, all
rows are selected.
column_indexer : array-like, dtype={int, bool}, default=None
A 1-dimensional array to select the columns of interest. If `None`, all
columns are selected.
"""
row_indexer = slice(None, None, None) if row_indexer is None else row_indexer
column_indexer = (
slice(None, None, None) if column_indexer is None else column_indexer
)
if hasattr(X, "iloc"): # pandas dataframe
with warnings.catch_warnings():
# pandas >= 1.5 raises a warning when using iloc to set values in a column
# that does not have the same type as the column being set. It happens
# for instance when setting a categorical column with a string.
# In the future the behavior won't change and the warning should disappear.
# TODO(1.3): check if the warning is still raised or remove the filter.
warnings.simplefilter("ignore", FutureWarning)
X.iloc[row_indexer, column_indexer] = values
else: # numpy array or sparse matrix
X[row_indexer, column_indexer] = values
X = pd.DataFrame({"x": [1, 2] * 2, "y": ["a", "a"] * 2})
# Works if values is a ndarray (as per docu of _safe_assign())
values_np = np.array([1] * 4)
#_safe_assign(X, values=values_np, column_indexer=[0]) # works
# Fails if values is pd.DataFrame
values_pd = pd.DataFrame({"x": values_np}, index=[0, 1] * 2)
_safe_assign(X, values=values_pd, column_indexer=[0])
X
# Error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[17], [line 52](vscode-notebook-cell:?execution_count=17&line=52)
[48](vscode-notebook-cell:?execution_count=17&line=48) #_safe_assign(X, values=values_np, column_indexer=[0]) # works
[49](vscode-notebook-cell:?execution_count=17&line=49)
[50](vscode-notebook-cell:?execution_count=17&line=50) # Fails if values is pd.DataFrame
[51](vscode-notebook-cell:?execution_count=17&line=51) values_pd = pd.DataFrame({"x": values_np}, index=[0, 1] * 2)
---> [52](vscode-notebook-cell:?execution_count=17&line=52) _safe_assign(X, values=values_pd, column_indexer=[0])
[53](vscode-notebook-cell:?execution_count=17&line=53) X
Cell In[17], [line 39](vscode-notebook-cell:?execution_count=17&line=39)
[32](vscode-notebook-cell:?execution_count=17&line=32) with warnings.catch_warnings():
[33](vscode-notebook-cell:?execution_count=17&line=33) # pandas >= 1.5 raises a warning when using iloc to set values in a column
[34](vscode-notebook-cell:?execution_count=17&line=34) # that does not have the same type as the column being set. It happens
[35](vscode-notebook-cell:?execution_count=17&line=35) # for instance when setting a categorical column with a string.
[36](vscode-notebook-cell:?execution_count=17&line=36) # In the future the behavior won't change and the warning should disappear.
[37](vscode-notebook-cell:?execution_count=17&line=37) # TODO(1.3): check if the warning is still raised or remove the filter.
[38](vscode-notebook-cell:?execution_count=17&line=38) warnings.simplefilter("ignore", FutureWarning)
---> [39](vscode-notebook-cell:?execution_count=17&line=39) X.iloc[row_indexer, column_indexer] = values
[40](vscode-notebook-cell:?execution_count=17&line=40) else: # numpy array or sparse matrix
[41](vscode-notebook-cell:?execution_count=17&line=41) X[row_indexer, column_indexer] = values
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py:670](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:670), in _LocationIndexer.__setitem__(self, key, value)
[667](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:667) self._has_valid_setitem_indexer(key)
[669](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:669) iloc = self if self.name == "iloc" else self.obj.iloc
--> [670](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:670) iloc._setitem_with_indexer(indexer, value)
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py:1711](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1711), in _iLocIndexer._setitem_with_indexer(self, indexer, value)
[1709](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1709) if item in value:
[1710](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1710) sub_indexer[info_axis] = item
-> [1711](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1711) v = self._align_series(
[1712](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1712) tuple(sub_indexer), value[item], multiindex_indexer
[1713](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1713) )
[1714](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1714) else:
[1715](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1715) v = np.nan
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexing.py:1935](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1935), in _iLocIndexer._align_series(self, indexer, ser, multiindex_indexer)
[1932](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1932) if ser.index.equals(new_ix) or not len(new_ix):
[1933](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1933) return ser._values.copy()
-> [1935](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1935) return ser.reindex(new_ix)._values
[1937](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1937) # 2 dims
[1938](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1938) elif single_aligner:
[1939](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1939)
[1940](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexing.py:1940) # reindex along index
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\series.py:4399](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/series.py:4399), in Series.reindex(self, index, **kwargs)
[4391](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/series.py:4391) @doc(
[4392](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/series.py:4392) NDFrame.reindex,
[4393](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/series.py:4393) klass=_shared_doc_kwargs["klass"],
(...)
[4397](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/series.py:4397) )
[4398](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/series.py:4398) def reindex(self, index=None, **kwargs):
-> [4399](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/series.py:4399) return super().reindex(index=index, **kwargs)
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\generic.py:4452](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4452), in NDFrame.reindex(self, *args, **kwargs)
[4449](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4449) return self._reindex_multi(axes, copy, fill_value)
[4451](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4451) # perform the reindex on the axes
-> [4452](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4452) return self._reindex_axes(
[4453](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4453) axes, level, limit, tolerance, method, fill_value, copy
[4454](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4454) ).__finalize__(self, method="reindex")
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\generic.py:4472](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4472), in NDFrame._reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
[4467](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4467) new_index, indexer = ax.reindex(
[4468](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4468) labels, level=level, limit=limit, tolerance=tolerance, method=method
[4469](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4469) )
[4471](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4471) axis = self._get_axis_number(a)
-> [4472](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4472) obj = obj._reindex_with_indexers(
[4473](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4473) {axis: [new_index, indexer]},
[4474](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4474) fill_value=fill_value,
[4475](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4475) copy=copy,
[4476](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4476) allow_dups=False,
[4477](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4477) )
[4479](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4479) return obj
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\generic.py:4515](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4515), in NDFrame._reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
[4512](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4512) indexer = ensure_int64(indexer)
[4514](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4514) # TODO: speed up on homogeneous DataFrame objects
-> [4515](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4515) new_data = new_data.reindex_indexer(
[4516](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4516) index,
[4517](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4517) indexer,
[4518](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4518) axis=baxis,
[4519](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4519) fill_value=fill_value,
[4520](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4520) allow_dups=allow_dups,
[4521](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4521) copy=copy,
[4522](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4522) )
[4523](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4523) # If we've made a copy once, no need to make another one
[4524](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/generic.py:4524) copy = False
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\internals\managers.py:1243](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/internals/managers.py:1243), in BlockManager.reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
[1241](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/internals/managers.py:1241) # some axes don't allow reindexing with dups
[1242](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/internals/managers.py:1242) if not allow_dups:
-> [1243](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/internals/managers.py:1243) self.axes[axis]._can_reindex(indexer)
[1245](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/internals/managers.py:1245) if axis >= self.ndim:
[1246](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/internals/managers.py:1246) raise IndexError("Requested axis not found in manager")
File [c:\Users\Michael\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\indexes\base.py:3283](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexes/base.py:3283), in Index._can_reindex(self, indexer)
[3281](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexes/base.py:3281) # trying to reindex on an axis with duplicates
[3282](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexes/base.py:3282) if not self.is_unique and len(indexer):
-> [3283](file:///C:/Users/Michael/AppData/Local/Programs/Python/Python38-32/lib/site-packages/pandas/core/indexes/base.py:3283) raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis |
I have the feeling that we should not close this straight away as discussed in #28375 (comment). |
#28375 triggers errors for pandas < 2.0.0, despite just using scikit-learn internal functionalities.
As documented in https://scikit-learn.org/dev/install.html, we have pandas >= 1.1.3.
The text was updated successfully, but these errors were encountered: