You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off, love love love hypothesis! I know storing non-scalars in a dataframe isn't exactly the intended use case, it's not something we're not well-placed to fix right this moment. I also understand that probably means this isn't high priority.
Environment:
attrs 21.2.0 Classes Without Boilerplate
hypothesis 6.24.1 A library for property-based testing
more-itertools 8.10.0 More routines for operating on iterables, beyond it...
numpy 1.21.3 NumPy is the fundamental package for array computin...
packaging 21.2 Core utilities for Python packages
pandas 1.3.4 Powerful data structures for data analysis, time se...
pluggy 0.13.1 plugin and hook calling mechanisms for python
py 1.10.0 library with cross-python path, ini-parsing, io, co...
pyparsing 2.4.7 Python parsing module
pytest 5.4.3 pytest: simple powerful testing with Python
python-dateutil 2.8.2 Extensions to the standard Python datetime module
pytz 2021.3 World timezone definitions, modern and historical
six 1.16.0 Python 2 and 3 compatibility utilities
sortedcontainers 2.4.0 Sorted Containers -- Sorted List, Sorted Dict, Sort...
wcwidth 0.2.5 Measures the displayed width of unicode strings in ...
============================= test session starts ==============================
platform linux -- Python 3.8.11, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
rootdir: /home/andrea/omnistream/bugs/hypothesis-dataframes-sets
plugins: hypothesis-6.24.1
collected 1 item
repro.py F [100%]
=================================== FAILURES ===================================
___________________________ test_dataframe_with_set ____________________________
self = 0 0.0
dtype: float64, key = 0, value = {'', '0'}
def __setitem__(self, key, value) -> None:
key = com.apply_if_callable(key, self)
cacher_needs_updating = self._check_is_chained_assignment_possible()
if key is Ellipsis:
key = slice(None)
try:
> self._set_with_engine(key, value)
.venv/lib/python3.8/site-packages/pandas/core/series.py:1062:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = 0 0.0
dtype: float64, key = 0, value = {'', '0'}
def _set_with_engine(self, key, value) -> None:
# fails with AttributeError for IntervalIndex
loc = self.index._engine.get_loc(key)
# error: Argument 1 to "validate_numeric_casting" has incompatible type
# "Union[dtype, ExtensionDtype]"; expected "dtype"
validate_numeric_casting(self.dtype, value) # type: ignore[arg-type]
> self._values[loc] = value
E TypeError: float() argument must be a string or a number, not 'set'
.venv/lib/python3.8/site-packages/pandas/core/series.py:1099: TypeError
During handling of the above exception, another exception occurred:
@given(data_frames(columns=[column(elements=st.sets(st.text(), min_size=1))]))
> def test_dataframe_with_set(df):
repro.py:6:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.8/site-packages/pandas/core/series.py:1088: in __setitem__
self._set_with(key, value)
.venv/lib/python3.8/site-packages/pandas/core/series.py:1123: in _set_with
self._set_labels(key, value)
.venv/lib/python3.8/site-packages/pandas/core/series.py:1135: in _set_labels
self._set_values(indexer, value)
.venv/lib/python3.8/site-packages/pandas/core/series.py:1141: in _set_values
self._mgr = self._mgr.setitem(indexer=key, value=value)
.venv/lib/python3.8/site-packages/pandas/core/internals/managers.py:355: in setitem
return self.apply("setitem", indexer=indexer, value=value)
.venv/lib/python3.8/site-packages/pandas/core/internals/managers.py:327: in apply
applied = getattr(b, f)(**kwargs)
.venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py:927: in setitem
return self.coerce_to_target_dtype(value).setitem(indexer, value)
.venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py:943: in setitem
check_setitem_lengths(indexer, value, values)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
indexer = array([0]), value = {'', '0'}, values = array([0.0], dtype=object)
def check_setitem_lengths(indexer, value, values) -> bool:
"""
Validate that value and indexer are the same length.
An special-case is allowed for when the indexer is a boolean array
and the number of true values equals the length of ``value``. In
this case, no exception is raised.
Parameters
----------
indexer : sequence
Key for the setitem.
value : array-like
Value for the setitem.
values : array-like
Values being set into.
Returns
-------
bool
Whether this is an empty listlike setting which is a no-op.
Raises
------
ValueError
When the indexer is an ndarray or list and the lengths don't match.
"""
no_op = False
if isinstance(indexer, (np.ndarray, list)):
# We can ignore other listlikes because they are either
# a) not necessarily 1-D indexers, e.g. tuple
# b) boolean indexers e.g. BoolArray
if is_list_like(value):
if len(indexer) != len(value) and values.ndim == 1:
# boolean with truth values == len of the value is ok too
if isinstance(indexer, list):
indexer = np.array(indexer)
if not (
isinstance(indexer, np.ndarray)
and indexer.dtype == np.bool_
and len(indexer[indexer]) == len(value)
):
> raise ValueError(
"cannot set using a list-like indexer "
"with a different length than the value"
)
E ValueError: cannot set using a list-like indexer with a different length than the value
.venv/lib/python3.8/site-packages/pandas/core/indexers.py:176: ValueError
=========================== short test summary info ============================
FAILED repro.py::test_dataframe_with_set - ValueError: cannot set using a lis...
============================== 1 failed in 0.52s ===============================
The text was updated successfully, but these errors were encountered:
I'm glad you like Hypothesis, and thanks for an excellent reproducible bug report 🥰
This looks like a real bug to me, and a use-case that I'd like to support if that's practical - even if dataframes-of-sets are pretty unusual, Pandas is happy to represent them and so Hypothesis ought to generate them.
As you note though, without any resources to do so this is going to be a low-priority issue - I'm happy to help external contributors, but over the next few months I'm more likely to focus my OSS time on filter rewrites, the 3.6 EOL, and better support for external randomness.
First off, love love love hypothesis! I know storing non-scalars in a dataframe isn't exactly the intended use case, it's not something we're not well-placed to fix right this moment. I also understand that probably means this isn't high priority.
Environment:
Code:
output.txt:
The text was updated successfully, but these errors were encountered: