Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy arrays strategy generates infinities with elements=floats(allow_infinity=False) #1385

Closed
ashgillman opened this issue Jul 4, 2018 · 4 comments
Assignees
Labels
enhancement it's not broken, but we want it to be better legibility make errors helpful and Hypothesis grokable

Comments

@ashgillman
Copy link

ashgillman commented Jul 4, 2018

I have tried to explicitly as hypothesis.extra.numpy.arrays to not produce any infinities, by providing a strategy for both fill and elements. However, this doesn't seem to be working.

MWE:

In [62]: import hypothesis.strategies as st                            
                                                                        
In [63]: from hypothesis.extra.numpy import arrays as st_arrays        
                                                                         
In [64]: non_inf = st.floats(allow_nan=False, allow_infinity=False)
                                                                   
In [65]: dim = st.integers(1, 10)                                  
                                      
In [66]: arr = st_arrays(shape=st.tuples(dim, dim), fill=non_inf, elements=non_inf, dtype='float32')

In [71]: arr.example()                                             
Out[71]:                              
array([[3.3333334e-01,           inf,           inf,           inf,    
        1.6367880e+16,           inf,           inf,           inf],
       [          inf,           inf,           inf, 4.9060428e+16,    
                  inf,           inf,           inf,           inf]],
      dtype=float32)  

In [72]: arr                                                                                                                                                          
Out[72]: arrays(dtype='float32', shape=tuples(integers(min_value=1, max_value=10), integers(min_value=1, max_value=10)), elements=floats(allow_nan=False, allow_infinity=False), fill=floats(allow_nan=False, allow_infinity=False))

Am I doing something wrong? Thanks for any help

@DRMacIver
Copy link
Member

Huh.

I think what's happening here is that the floats strategy generates doubles (because float in Python is actually a double because everything is terrible and naming things is hard), and when Hypothesis generates a double value that's too large to fit in a 32-bit float the conversion overflows it to infinity. That's far from ideal though.

I'm not sure what the right thing to do here is. I don't think this behaviour can/will change, but maybe this is a use case for adding a specific strategy for 32-bit floats that could be used instead?

@Zac-HD Zac-HD added bug something is clearly wrong here and removed bug something is clearly wrong here labels Jul 4, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Jul 4, 2018

Strictly speaking, this is not a bug - it's just what happens when you put large values of dtype f64 in an array of dtype f32. @ashgillman, you can avoid this by tweaking the elements strategy:

f32_max = (2. - 2**-23) * 2**127
finite_f32 = st.floats(-f32_max, f32_max, allow_nan=False, allow_infinity=False)

Nonetheless, I think Hypothesis can do better. Proposals:

  • Do this in hypothesis.extra.numpy.from_dtype(np.dtype('float32') - currently it has a massive bias towards infinities for narrow float types.
  • Check for overflow when inserting elements in a Numpy array, and IMO error we we detect it. Precision loss must be silently allowed though or floats will be unusable.
  • Consider accepting **kwargs to from_dtype and passing them to the underlying strategy so that it can be used to generate e.g. finite floats only. Validate min and max value arguments, or maybe disallow them entirely.

Providing a native float32 strategy would be a real pain for Hypothesis in Python - on the order of duplicating hypothesis.internal.conjecture.floats 😭 - so it's fortunate that we don't need to, because it's possible to set the bounds of the floats() strategy and we don't mind discarding the lower mantissa bits.

@Zac-HD Zac-HD added enhancement it's not broken, but we want it to be better legibility make errors helpful and Hypothesis grokable labels Jul 4, 2018
@Zac-HD Zac-HD changed the title Numpy arrays strategy generates NaNs Numpy arrays strategy generates infinities with elements=floats(allow_infinity=False) Jul 4, 2018
@DRMacIver
Copy link
Member

Providing a native float32 strategy would be a real pain for Hypothesis in Python - on the order of duplicating hypothesis.internal.conjecture.floats 😭 - so it's fortunate that we don't need to, because it's possible to set the bounds of the floats() strategy and we don't mind discarding the lower mantissa bits.

The solution I mostly had in mind was to implement the native 32-bit floats as something not too far off floats(**kwargs).map(np.float32).filter(conforms_to_args). I definitely wasn't suggesting we do a full blown from scratch support for them!

@ashgillman
Copy link
Author

Thanks @Zac-HD , your solution fixes. (I had tried previously to filter on np.ininfinite but had no luck.)

I hadn't realised that coercing floats to lower precision can result in inf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better legibility make errors helpful and Hypothesis grokable
Projects
None yet
Development

No branches or pull requests

3 participants