# Test driven development

An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it’s given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:

<img src = "range_overlap.png">

Most novice programmers would solve this problem like this:

* Write a function range_overlap.
* Call it interactively on two or three different inputs.
* If it produces the wrong answer, fix the function and re-run that test.

This clearly works — after all, thousands of scientists are doing it right now — but there’s a better way:

* Write a short function for each test.
* Write a range_overlap function that should pass those tests.
* If range_overlap produces any wrong answers, fix it and re-run the test functions.

Writing the tests before writing the function they exercise is called test-driven development (TDD). Its advocates believe it produces better code faster because:

* If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.
* Writing tests helps programmers figure out what the function is actually supposed to do.

Here are three test functions for range_overlap:

In [1]:
assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)

NameError: name 'range_overlap' is not defined

The error is actually reassuring: we haven’t written range_overlap yet, so if the tests passed, it would be a sign that someone else had and that we were accidentally using their function.

And as a bonus of writing these tests, we’ve implicitly defined what our input and output look like: we expect a list of pairs as input, and produce a single pair as output.

Something important is missing, though. We don’t have any tests for the case where the ranges don’t overlap at all:

In [None]:
## What's missing?
## Ranges that don't overlap

assert range_overlap ([ (0.0, 1.0), (5.0, 6.0)]) == None

# Multiple identical ranges
assert range_overlap ([ (0.0, 1.0), (0.0, 1.0)]) == (0.0, 1.0)

# No common overlap between multiple values

# Ranges that touch but don't overlap
assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None

In [13]:
# Here's our function:

def range_overlap(ranges):
    """Return common overlap among a set of [left, right] ranges."""
    max_left = 0.0 
    min_right = 1.0
    
    for (left, right) in ranges:
        max_left = max(max_left, left)
        min_right = min(min_right, right)
    return (max_left, min_right)

In [16]:
# Write a single function to do all the tests

def test_range_overlap():
    assert range_overlap([ (0.0, 1.0) ]) == (0.0, 1.0)
    assert range_overlap([ (2.0, 3.0), (2.0, 4.0) ]) == (2.0, 3.0)
    assert range_overlap([ (0.0, 1.0), (0.0, 2.0), (-1.0, 1.0) ]) == (0.0, 1.0)
    assert range_overlap ([ (0.0, 1.0), (5.0, 6.0)]) == None
    assert range_overlap ([ (0.0, 1.0), (0.0, 1.0)]) == (0.0, 1.0)
    assert range_overlap([ (0.0, 1.0), (1.0, 2.0) ]) == None

In [17]:
# Test the function by running the test function
test_range_overlap()

AssertionError: 

In [9]:
# We are always setting max_left and min_right to 0.0 and 1.0 respectively, regardless of the input values. 
# This violates another important rule of programming: always initialise from data.

# Is there a better way of doing this?
# Can we initialise the values from the data itself?
# Where should the starting values come from and how do I get them?


In [10]:
# Homework:
# Fix the function so that all the tests work...