# Hypothesis

> [Hypothesis](http://hypothesis.works/) is a Python library for creating unit tests which are simpler to write and more powerful when run, finding edge cases in your code you wouldn’t have thought to look for. It is stable, powerful and easy to add to any existing test suite.

### Classical Test
1. Set up some data.
1. Perform some operations on the data.
1. Assert something, typically by comparing the value, about the result.

### Hypothesis Test
1. For all data matching some specification.
1. Perform some operations on the data.
1. Assert something, namely a general property, about the result.


### Hypothesis

- performs a **strategic random search** in the input space


- checks if the asserted **properties** hold true


- if not, tries to find a minimal example


- saves that example to check if the problem gets fixed

Hypothesis does **Property Based Testing** and generalizes a normal unit test.

### Example

Compute the mean

```python
def mean(x):
    return sum(x) / len(x)
```

What could be a **property** of this function?

We can expect that
```python
min(x) <= mean(x) <= max(x)
```

We need a **data specification**, i.e. a *strategy*.

```python
from hypothesis.strategies import lists, floats
lists(elements=floats())
```

In [1]:
%%writefile hypothesis/intro_mean_0.py
from hypothesis import given
from hypothesis.strategies import lists, floats


def mean(x):
    return sum(x) / len(x)

@given(x=lists(elements=floats()))
def test_mean(x):
    assert min(x) <= mean(x) <= max(x)


if __name__ == '__main__':
    import pytest
    pytest.main([__file__])

Writing hypothesis/intro_mean_0.py


In [2]:
!python hypothesis/intro_mean_0.py

platform linux -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/claus/repo/PAULST, inifile:
plugins: mock-1.6.0, cov-2.3.1, hypothesis-3.33.0
collected 1 item                                                                [0m[1m

hypothesis/intro_mean_0.py F

[31m[1m__________________________________ test_mean ___________________________________[0m

[1m    @given(x=lists(elements=floats()))[0m
[1m>   def test_mean(x):[0m

[1m[31mhypothesis/intro_mean_0.py[0m:9: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[1m[31m../../anaconda3/envs/scipytst/lib/python3.6/site-packages/hypothesis/executors.py[0m:58: in default_new_style_executor
[1m    return function(data)[0m
[1m[31m../../anaconda3/envs/scipytst/lib/python3.6/site-packages/hypothesis/core.py[0m:136: in run
[1m    return test(*args, **kwargs)[0m
[1m[31mhypothesis/intro_mean_0.py[0m:9: in test_mean
[1m    def test_mean(x):[0m
[1m[31m../../anaconda3/en

In [3]:
%%writefile hypothesis/intro_mean_1.py

from hypothesis import given, assume  # <-- import assume
from hypothesis.strategies import lists, floats

def mean(x):
    return sum(x) / len(x)

@given(x=lists(elements=floats()))
def test_mean(x):
    assume(x)  # <-- assume that list is not empty
    assert min(x) <= mean(x) <= max(x)


if __name__ == '__main__':
    import pytest
    pytest.main([__file__])

Writing hypothesis/intro_mean_1.py


In [4]:
!python hypothesis/intro_mean_1.py

platform linux -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/claus/repo/PAULST, inifile:
plugins: mock-1.6.0, cov-2.3.1, hypothesis-3.33.0
collected 1 item                                                                [0m[1m

hypothesis/intro_mean_1.py F

[31m[1m__________________________________ test_mean ___________________________________[0m

[1m    @given(x=lists(elements=floats()))[0m
[1m>   def test_mean(x):[0m

[1m[31mhypothesis/intro_mean_1.py[0m:9: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[1m[31m../../anaconda3/envs/scipytst/lib/python3.6/site-packages/hypothesis/executors.py[0m:58: in default_new_style_executor
[1m    return function(data)[0m
[1m[31m../../anaconda3/envs/scipytst/lib/python3.6/site-packages/hypothesis/core.py[0m:136: in run
[1m    return test(*args, **kwargs)[0m
[1m[31mhypothesis/intro_mean_1.py[0m:9: in test_mean
[1m    def test_mean(x):[0m
[1m[31m../../anaconda3/en

In [5]:
%%writefile hypothesis/intro_mean_2.py

from hypothesis import given, assume  # <-- import assume
from hypothesis.strategies import lists, floats

def mean(x):
    return sum(x) / len(x)

@given(x=lists(elements=floats(allow_nan=False)))  # <-- do not allow nan
def test_mean(x):
    assume(x)  # <-- assume that list is not empty
    assert min(x) <= mean(x) <= max(x)


if __name__ == '__main__':
    import pytest
    pytest.main([__file__])

Writing hypothesis/intro_mean_2.py


In [6]:
!python hypothesis/intro_mean_2.py

platform linux -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/claus/repo/PAULST, inifile:
plugins: mock-1.6.0, cov-2.3.1, hypothesis-3.33.0
collected 1 item                                                                [0m[1m

hypothesis/intro_mean_2.py F

[31m[1m__________________________________ test_mean ___________________________________[0m

[1m    @given(x=lists(elements=floats(allow_nan=False)))  # <-- do not allow nan[0m
[1m>   def test_mean(x):[0m

[1m[31mhypothesis/intro_mean_2.py[0m:9: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[1m[31m../../anaconda3/envs/scipytst/lib/python3.6/site-packages/hypothesis/executors.py[0m:58: in default_new_style_executor
[1m    return function(data)[0m
[1m[31m../../anaconda3/envs/scipytst/lib/python3.6/site-packages/hypothesis/core.py[0m:136: in run
[1m    return test(*args, **kwargs)[0m
[1m[31mhypothesis/intro_mean_2.py[0m:9: in test_mean
[1m    def test_m

In [7]:
%%writefile hypothesis/intro_mean_3.py

from hypothesis import given, assume  # <-- import assume
from hypothesis.strategies import lists, floats

def mean(x):
    return sum(x) / len(x)

# <-- do not allow nan, do not allow infinity
@given(x=lists(elements=floats(allow_nan=False, allow_infinity=False)))
def test_mean(x):
    assume(x)  # <-- assume that list is not empty
    assert min(x) <= mean(x) <= max(x)


if __name__ == '__main__':
    import pytest
    pytest.main([__file__])

Writing hypothesis/intro_mean_3.py


In [8]:
!python hypothesis/intro_mean_3.py

platform linux -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/claus/repo/PAULST, inifile:
plugins: mock-1.6.0, cov-2.3.1, hypothesis-3.33.0
collected 1 item                                                                [0m[1m

hypothesis/intro_mean_3.py .

None
  Module already imported so can not be re-written: hypothesis



Can you imagine any additional problems?

In [9]:
# How about this?
0.1 + 0.1 + 0.1

0.30000000000000004

In [10]:
%%writefile hypothesis/intro_mean_4.py

from hypothesis import given, assume, example
from hypothesis.strategies import lists, floats

def mean(x):
    return sum(x) / len(x)

@example([0.1, 0.1, 0.1])  # <-- add special case
@given(x=lists(elements=floats(allow_nan=False, allow_infinity=False)))
def test_mean(x):
    assume(x)
    assert min(x) - 1e-8 <= mean(x) <= max(x) + 1e-8


if __name__ == '__main__':
    import pytest
    pytest.main([__file__])

Writing hypothesis/intro_mean_4.py


In [11]:
!python hypothesis/intro_mean_4.py

platform linux -- Python 3.6.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/claus/repo/PAULST, inifile:
plugins: mock-1.6.0, cov-2.3.1, hypothesis-3.33.0
collected 1 item                                                                [0m[1m

hypothesis/intro_mean_4.py .

None
  Module already imported so can not be re-written: hypothesis



#### Summary

Hypothesis
- managed to find edge cases that are not covered
- explicit example cases can be passed

Challenges
- getting your data right
- identifying properties

Notes
- Properties are very problem specific and maybe not obvious
- Testing properties is in general not sufficient (cf. `mean` vs. `median`)
- In above example we restricted the search space

#### Generate Data

[Hypothesis for the Scientific Stack](http://hypothesis.readthedocs.io/en/latest/numpy.html):
- numpy
- pandas

In [12]:
def examples(strategy, n=3):
    for _ in range(n):
        print(strategy.example(), '\n')

In [15]:
from hypothesis.strategies import lists, floats

strategy = lists(elements=floats())
examples(strategy)

[1e-05, 4.339153192424458e+16, -6.285711018828626e+16, -1.5, nan, 6.338751142470446e-108] 

[] 

[-1.1977964691766333e+186, 1.8906949861730064e+16, -3.402823466e+38, 3273952135995461.0, nan, 3.402823466e+38, nan, -2.9528053461762548e+16, -3.703384598018958e+16, -1.845729331567465e+142, -4.343837701191257e+16, 6.0659438183327416e+16, -7.0618361161914216e+16, 1.192092896e-07, -1.2087613502964555e+234, 2.2250738585072014e-308, -3.4089697788891177e+218, 2.8344539885747253e+189] 



In [16]:
strategy = lists(elements=floats(min_value=0, max_value=1), 
                 min_size=2, 
                 max_size=2)
examples(strategy)

[0.9633653394907686, 0.07723148456860775] 

[0.9331416954308236, 0.4142750850720617] 

[0.05793894695600567, 0.42810248775534954] 



# Exercises

Start with [`arrays`](http://hypothesis.readthedocs.io/en/latest/numpy.html#hypothesis.extra.numpy.arrays) to implement the following strategies:

- create integer arrays of shape (2, 3)
- as above but with with bounded values
- create float arrays of shape (len, 2) where len is between 2 and 5
- as above but with first column values <= second column values
- as above buth with <

Note: [`filter`](http://hypothesis.readthedocs.io/en/latest/data.html#mapping) and [`map`](http://hypothesis.readthedocs.io/en/latest/data.html#filtering) allow to adapt a strategy.

---

Use [`composite`](https://hypothesis.readthedocs.io/en/latest/data.html#composite-strategies) to create:
- an array X of shape (5, 2) and
- an array y of shape (5,) where y is a random linear combination of the columns of X, i.e. y = X * c.T.


# Solutions

In [13]:
import numpy as np
from hypothesis.extra.numpy import arrays
from hypothesis.strategies import composite, just
from hypothesis.strategies import (tuples, lists, 
                                   booleans, integers, floats)

In [12]:

arrays(dtype=np.int, elements=integers(min_value=0, max_value=3), shape=(2, 3)).example()

array([[1, 3, 3],
       [1, 0, 2]])

In [18]:
arrays(dtype=np.float, 
       elements=floats(min_value=0, max_value=1), 
       shape=tuples(integers(min_value=2, max_value=5), 
                    just(2))).example()

array([[ 0.86894353,  0.87988412],
       [ 0.91949955,  0.48971159],
       [ 0.22638774,  0.32773948],
       [ 0.22638774,  0.22638774],
       [ 0.91302933,  0.22638774]])

In [54]:
arrays(dtype=np.float, 
       elements=floats(min_value=0, max_value=1), 
       shape=tuples(integers(min_value=2, max_value=5), 
                    just(2))).map(lambda x: np.sort(x)).example()

array([[ 0.29211861,  0.29211861],
       [ 0.5440692 ,  0.83423388],
       [ 0.29211861,  0.29211861]])

In [55]:
arrays(dtype=np.float, 
       elements=floats(min_value=0, max_value=1), 
       shape=tuples(integers(min_value=2, max_value=5), 
                    just(2))).map(lambda x: np.sort(x)).filter(lambda x: all(x[:, 0] < x[:, 1])).example()

array([[ 0.21068308,  0.45112424],
       [ 0.06522418,  0.5403578 ]])

In [58]:
@composite
def create_linear_dependency(draw):
    X = draw(arrays(dtype=np.int, elements=integers(0, 7), shape=(5, 2)))
    c = draw(arrays(dtype=np.int, elements=integers(0, 7), shape=(2,)))
    return X, X @ c

create_linear_dependency().example()

# Summary

- hypothesis allows one to explicitely state assumptions


- hypothesis can be helpful in identifying implicit assumptions


- hypothesis test examples may be a bit too pathological in some situations


- encoding/deconding problems or situations where an inverse function exists can be tackled very nicely