Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st.integers(min_value=0, max_value=1e100) almost only returns large numbers #1387

Closed
ichumuh opened this issue Jul 6, 2018 · 3 comments
Closed
Assignees
Labels
enhancement it's not broken, but we want it to be better

Comments

@ichumuh
Copy link

ichumuh commented Jul 6, 2018

hypothesis.__version__
Out[70]: '3.66.0'
a = [st.integers(min_value=0, max_value=1e100).example() for x in range(999)]
len([x for x in a if x >= 1e97])
Out[72]: 997
len([x for x in a if x < 1e97])
Out[73]: 2
len([x for x in a if x < 1e94])
Out[74]: 0

The same happens with floats(). As a workaround I'm currently using filter, which gives a better distribution

a = [st.integers().filter(lambda x: x > 0 and x < 1e100).example() for x in range(999)]
len([x for x in a if x >= 1e97])
Out[75]: 0
len([x for x in a if x < 1e97])
Out[76]: 999

I'm on Ubuntu 16.04 using python 2.7.

@Zac-HD Zac-HD added the enhancement it's not broken, but we want it to be better label Jul 6, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Jul 6, 2018

First and most generally, .example() gives a highly biased distribution - it's designed to show illustrative rather than representative examples.

That said, considering this alongside #909 and #1212 makes me thing that we may well have some issues with data variety from such strategies. (though note again that it's not specified what will be generated, just that some things could be generated and that we try to find bugs)

@ichumuh
Copy link
Author

ichumuh commented Jul 6, 2018

Thanks for your reply. I was just trying to make the example as simple as possible. I first noticed the issue when a test case with @given(st.floats(min_value=-1e100, max_value=1e100)) didn't find a bug in one of my functions that only occurred with numbers relatively close to 0. Whereas the filter alternative did.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 2, 2018

I've just been looking at integers() for #1616, and the root cause is that we delegate from BoundedIntStrategy directly down to cu.integer_range which is basically a uniform distribution. Instead, we probably want to check if the range is large (e.g. >16-bit) and if so use something more like the psudeo-geometric approach from WideRangeIntStrategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

No branches or pull requests

2 participants