First read [link to prog resources]

The first step in developing code in a TDD style is to 
- write the function skeleton
- write the test
- check the test fails

In [1]:
def one_hot(data):
    pass


def test_one_hot():
    data = ['left', 'left', 'right', 'straight']
    
    encoded = one_hot(data)
    
    #  check we have the correct number of rows
    assert encoded.shape[0] == len(data)
    
    #  check columns are in the correct order (alphabetical)
    assert (encoded.columns == ['left', 'right', 'straight']).all()
    
    #  check only one category in each row
    assert encoded.sum(axis=1).all() == 1
    
    #  numpy testing is very handy for comparing arrays
    np.testing.assert_array_equal(encoded.loc[:, 'left'], [1, 1, 0, 0])
    np.testing.assert_array_equal(encoded.loc[:, 'right'], [0, 0, 1, 0])
    np.testing.assert_array_equal(encoded.loc[:, 'straight'], [0, 0, 0, 1])
    
test_one_hot()

AttributeError: 'NoneType' object has no attribute 'shape'

Note how before we have written any functional code, we are already thinking about
- what order the columns should be in (alphabetical)
- that encoded should return a Pandas DataFrame

Also note how we (as the reader) can understand the intention of the function - this is executable documentation.

Lets write our function:

In [2]:
def one_hot(data):
    columns = sorted(set(data))

    values = np.zeros((len(data), len(columns)))

    for row, d in enumerate(data):
        col = columns.index(d)
        values[row, col] = 1
        
    return pd.DataFrame(values, columns=columns)

test_one_hot()

Now lets see if our function generalizes:

In [3]:
one_hot(['cat', 'dog', 'fish', 'fish', 'dog'])

Unnamed: 0,cat,dog,fish
0,1.0,0.0,0.0
1,0.0,1.0,0.0
2,0.0,0.0,1.0
3,0.0,0.0,1.0
4,0.0,1.0,0.0


## Exercise

Now follow the same TDD style to write a test & function to normalize a 2D array

$$ y = \frac{x - x_{min}}{x_{max} - x_{min}} $$