Property-based Testing
======================

Property-based testing is a method of testing where tests are written to assert that your software satisfies a set of properties for many different inputs.
In example-based testing, the behavior of your software is verified for specific examples, in the form of concrete inputs. Property-based testing generalizes this to test your software for any valid input. In addition to that, property-based testing focuses on testing more general properties of our software, behaviors that always hold even when the input changes. In example-based testing, we are usually concerned with the more concrete and specific behavior that applies to our example at hand.

## Why?

Property-based testing verifies that our properties hold for many inputs, instead of a set of specific examples. This allows our tests to catch bugs that would otherwise go undetected.

Property-based testing allows us to focus on what is really relevant to our test. We need to think about what our input looks like in more general terms. For instance, if one of our input parameters is an integer, we must think if our software accepts all integer values or only those in a specific range. We are also forced to think about what kinds of properties our software has. Those things are explicitly codified in our tests, so property-based testing forces us to make our assumptions explicit.

## How?

The original property-based testing library is [QuickCheck](https://hackage.haskell.org/package/QuickCheck), written for Haskell, but there are tools for property-based testing for [many languages](https://en.wikipedia.org/wiki/QuickCheck).
A property-based library usually consists of two basic parts. One of them is a component capable of generating a large set of data to be used as input for the tests. The other is a library that allows us to formulate the properties that we want to test.

## Example: Hypothesis

We will use [Hypothesis](https://hypothesis.works/) in Python as our property-based testing library. Let's install it:

In [11]:
!pip install hypothesis # Install hypothesis

Collecting hypothesis
[?25l  Downloading https://files.pythonhosted.org/packages/2e/5f/4057b926d3272f666810fd620b8c754c0b6642b96f1bcc19ba3f91703c24/hypothesis-5.19.3-py3-none-any.whl (301kB)
[K     |█                               | 10kB 17.2MB/s eta 0:00:01[K     |██▏                             | 20kB 2.1MB/s eta 0:00:01[K     |███▎                            | 30kB 2.7MB/s eta 0:00:01[K     |████▍                           | 40kB 3.1MB/s eta 0:00:01[K     |█████▍                          | 51kB 2.5MB/s eta 0:00:01[K     |██████▌                         | 61kB 2.8MB/s eta 0:00:01[K     |███████▋                        | 71kB 3.0MB/s eta 0:00:01[K     |████████▊                       | 81kB 3.4MB/s eta 0:00:01[K     |█████████▊                      | 92kB 3.6MB/s eta 0:00:01[K     |██████████▉                     | 102kB 3.4MB/s eta 0:00:01[K     |████████████                    | 112kB 3.4MB/s eta 0:00:01[K     |█████████████                   | 122kB 3.4MB/s

Now we need the code that we want to test. The Python standard library [defines a string method called `center`](https://docs.python.org/3/library/stdtypes.html#str.center):

```
str.center(width[, fillchar])
Return centered in a string of length width. Padding is done using the specified fillchar (default is an ASCII space). The original string is returned if width is less than or equal to len(s).
```
Our example will be a function that tries to implement the `center` string method, following the above specification:

In [28]:
# String center function that we want to test

def center(s, width, fillchar=' '):
  if len(s) >= width:
    return s;
  margin = width - len(s)
  left_margin = margin // 2
  right_margin = margin - left_margin
  return (fillchar * left_margin) + s + (fillchar * right_margin)

center('abc', 5)

' abc '

This is based on [CPython's implementation of `center`](https://github.com/python/cpython/blob/98ce7b107e6611d04dc35a4f5b02ea215ef122cf/Objects/stringlib/transmogrify.h#L195). CPython also has [tests for `center`](https://github.com/python/cpython/blob/a81849b0315277bb3937271174aaaa5059c0b445/Lib/test/string_tests.py#L854-L860), but they are example-based. We can adapt those tests to use our implementation of `center`:

In [39]:
import unittest

class CenterExampleTest(unittest.TestCase):
  def test_center(self):
    self.assertEqual(center('abc', 10), '   abc    ')
    self.assertEqual(center('abc', 6), ' abc  ')
    self.assertEqual(center('abc', 3), 'abc')
    self.assertEqual(center('abc', 2), 'abc')
    self.assertEqual(center('abc', 10, '*'), '***abc****')
    with self.assertRaises(TypeError):
          center('abc')

# We must call unittest like this to run it in a notebook
unittest.main(argv=['first-arg-is-ignored', 'CenterExampleTest'], exit=False)

.
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.main.TestProgram at 0x7fe97ae9b240>

But now we want to improve our tests and write them using property-based testing. First, we need to think which properties our function must hold. Abstracting from the examples in the test above, we might want to consider two scenarios:
1. When the length of the input string is smaller or equal to the input width, `center` returns the original string.
2. When the length of the input string is greater than the width, we expect the returned string to have length equal to the input width. This is a simple property that is easy to check and might still find bugs. We may be tempted to define a property that captures the fact that a string is 'centered' in another, but this property's implementation could in turn have bugs and need tests, so we need to strike a balance here.

Let's write our property-based tests using Hypothesis for those two scenarios:

In [63]:
import unittest

from hypothesis import given, assume, settings, HealthCheck
from hypothesis.strategies import text, integers

class CenterPropertyTest(unittest.TestCase):
  @settings(suppress_health_check=HealthCheck.all())
  @given(
      s=text(max_size=100),
      width=integers(max_value=100)
  )
  def test_center_length_less_than_width(self, s, width):
    assume(len(s) < width) 
    self.assertEqual(len(center(s, width)), width)

  @given(
      s=text(),
      width=integers(max_value=100)
  )
  def test_center_length_greater_or_equal_to_width(self, s, width):
    assume(len(s) >= width) 
    self.assertEqual(center(s, width), s)
    

unittest.main(argv=['first-arg-is-ignored', 'CenterPropertyTest'], exit=False)

..
----------------------------------------------------------------------
Ran 2 tests in 1.049s

OK


<unittest.main.TestProgram at 0x7fe97af194e0>

Looking at the tests, we can see that we use the `given` decorator to describe what our parameters look like so that Hypothesis can generate valid inputs for our tests. We have a parameter called `s`, which is a string. In fact, it could be any string, so instead of testing `center` for a single string like `'abc'`, Hypothesis is going to run our tests for many different strings.

To tell Hypothesis that `s` could be any string, we need to use what the library calls [strategies](https://hypothesis.readthedocs.io/en/latest/data.html). A strategy is a description of values our parameter can have. We are using the `text` strategy to describe a string.

The same thing happens for `width`: in theory, it could be any integer, so we use the `integers` strategy. However, to avoid timeouts when running our tests, we limit `width` to being at most 1000. This happens because Hypothesis tries to look for special values that could be an edge case for our code, so it will for example try using the empty string, and very large integers.

Because we want to distinguish two different scenarios, we also use the [`assume` decorator](https://hypothesis.readthedocs.io/en/latest/details.html?#making-assumptions)

We can now add back the example-based tests. They will complement our property-based tests, like the case of a missing argument or the case when `fillchar` is provided.



In [42]:
import unittest

from hypothesis import given, assume, settings, HealthCheck
from hypothesis.strategies import text, integers

class CenterTest(unittest.TestCase):
  def test_center_missing_arg(self):
    with self.assertRaises(TypeError):
          center('abc')

  def test_center_examples(self):
    self.assertEqual(center('abc', 10), '   abc    ')
    self.assertEqual(center('abc', 6), ' abc  ')
    self.assertEqual(center('abc', 10, '*'), '***abc****')

  @settings(suppress_health_check=HealthCheck.all())
  @given(
      s=text(),
      width=integers(max_value=1000)
  )
  def test_center_length_less_than_width(self, s, width):
    assume(len(s) < width) 
    self.assertEqual(len(center(s, width)), width)

  @given(
      s=text(),
      width=integers(max_value=1000)
  )
  def test_center_length_greater_or_equal_to_width(self, s, width):
    assume(len(s) >= width) 
    self.assertEqual(center(s, width), s)
    

unittest.main(argv=['first-arg-is-ignored', 'CenterTest'], exit=False)

....
----------------------------------------------------------------------
Ran 4 tests in 0.885s

OK


<unittest.main.TestProgram at 0x7fe97b25a3c8>

To see how property-based testing could help us find bugs, let's try testing an implementation of `center` that has a bug:

In [29]:
# Center implementation with a bug

def center_with_bug(s, width, fillchar=' '):
  if len(s) >= width:
    return s;
  margin = width - len(s)
  left_margin = margin // 2
  right_margin = margin // 2
  return (fillchar * left_margin) + s + (fillchar * right_margin)

center_with_bug('abc', 5)

' abc '

The bug here is that, to get the left and right margins, we divide the total margin available by two. However, `margin` can be an odd integer, and in that case `margin // 2` will be rounded down, and the length of our resulting string won't be equal to `width`.

This looks like something that is easy to detect, but if we go back to our example-based tests, we might easily have used an example string (e.g. `"abcd"`) that leaves us testing only even margins:

In [44]:
import unittest

class CenterWithBugExampleTest(unittest.TestCase):
  def test_center_with_bug(self):
    self.assertEqual(center_with_bug('abcd', 10), '   abcd   ')
    self.assertEqual(center_with_bug('abcd', 6), ' abcd ')
    self.assertEqual(center_with_bug('abcd', 3), 'abcd')
    self.assertEqual(center_with_bug('abcd', 2), 'abcd')
    self.assertEqual(center_with_bug('abcd', 10, '*'), '***abcd***')
    with self.assertRaises(TypeError):
          center_with_bug('abcd')

unittest.main(argv=['first-arg-is-ignored', 'CenterWithBugExampleTest'], exit=False)

.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK


<unittest.main.TestProgram at 0x7fe97be3fe10>

However, if we try running our property-based tests, we will find the bug:

In [45]:
import unittest

from hypothesis import given, assume, settings, HealthCheck
from hypothesis.strategies import text, integers

class CenterWithBugPropertyTest(unittest.TestCase):
  @settings(suppress_health_check=HealthCheck.all())
  @given(
      s=text(max_value=1000),
      width=integers(max_value=1000)
  )
  def test_center_with_bug_length_less_than_width(self, s, width):
    assume(len(s) < width) 
    self.assertEqual(len(center_with_bug(s, width)), width)

  @given(
      s=text(),
      width=integers(max_value=1000)
  )
  def test_center_with_bug_length_greater_or_equal_to_width(self, s, width):
    assume(len(s) >= width) 
    self.assertEqual(center_with_bug(s, width), s)
    

unittest.main(argv=['first-arg-is-ignored', 'CenterWithBugPropertyTest'], exit=False)

.F

Falsifying example: test_center_with_bug_length_less_than_width(
    self=<__main__.CenterWithBugPropertyTest testMethod=test_center_with_bug_length_less_than_width>,
    s='',
    width=1,
)



FAIL: test_center_with_bug_length_less_than_width (__main__.CenterWithBugPropertyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-45-b3065f59db69>", line 8, in test_center_with_bug_length_less_than_width
    @given(
  File "/usr/local/lib/python3.6/dist-packages/hypothesis/core.py", line 1141, in wrapped_test
    raise the_error_hypothesis_found
  File "<ipython-input-45-b3065f59db69>", line 14, in test_center_with_bug_length_less_than_width
    self.assertEqual(len(center_with_bug(s, width)), width)
AssertionError: 0 != 1

----------------------------------------------------------------------
Ran 2 tests in 0.389s

FAILED (failures=1)


<unittest.main.TestProgram at 0x7fe97b360b70>

Hypothesis found an example that falsifies our assertion that `len(center_with_bug(s, width))` equals `width`. The falsifying example is `s=""` and `width=1`.

A nice feature of Hypothesis and other property-based testing libraries is that, once the library finds a falsifying example, it tries to shrink the example to be the simplest (smallest) example that still breaks our tests. This is important because, once our test fails, we don't want to have to debug our code using a huge and complex example.

This may seem like an artifical example (because it is), but it goes to show that it is easy to miss certain scenarios using only example-based testing, and property-based testing can help us find more bugs.

## References

* [David R. MacIver, "What is Property Based Testing?"](https://hypothesis.works/articles/what-is-property-based-testing/)
* [David R. MacIver, "In praise of property-based testing"](https://increment.com/testing/in-praise-of-property-based-testing/)
* [Jessica Kerr, "Property-based testing: what is it?"](https://jessitron.com/2013/04/25/property-based-testing-what-is-it/)
* [Hypothesis documentation](https://hypothesis.readthedocs.io/en/latest/)
* [Claessen, Koen, and John Hughes. "QuickCheck: a lightweight tool for random testing of Haskell programs." Acm sigplan notices 46.4 (2011): 53-64.](https://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf)