Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign up`hypothesis.extra.numpy` only generates strings of length at most one #2229
Comments
This comment has been minimized.
This comment has been minimized.
It's also worth looking out for trouble with python2 versus python3 here. |
This comment has been minimized.
This comment has been minimized.
True! I've only tested this on Python 3. Though given that we're in the dying days of Python 2 support if it presents much trouble we may just want to wait on fixing this until January... |
This comment has been minimized.
This comment has been minimized.
Related to #2085... I'd probably just deprecate all usage of unsized string dtypes, have from_dtype treat unsized as size one, and be done with it. Not sure how that's interacting with DO_NOT_ESCALATE though. |
This comment has been minimized.
This comment has been minimized.
Alternatively we could add special handling for string arrays, to fill them differently, but I'd rather not. |
This comment has been minimized.
This comment has been minimized.
It wouldn't be super hard to do. We could generate string arrays as object arrays, then convert to the right dtype at the end of generation. |
For reasons I have not fully determined, if you run the following:
You get the following error:
The confusion is not that this code fails with
HYPOTHESIS_DO_NOT_ESCALATE
set but that it doesn't without it set, because our code for this is all wrong.The reason for this is that 'U' is something of a lie of a dtype. Consider the following code:
The 'U' dtype is actually a family of dtypes each of bounded width. When you create an array of unicode objects there's an implicit fixed sized limit on every element. As we create our arrays using
np.zeros
, this results in all unicode we generate being implicitly truncaed to elements of size one.The same issue presumably exists with byte strings.
You can see this more directly by the fact that the following test passes but emits a pile of deprecation warnings: