# Unit Testing

While we will not cover the [unit testing module](https://docs.python.org/3/library/unittest.html) that python has, we wanted to introduce you to a simple way that you can test your code.

Unit testing is important because it the only way you can be sure that your code is do what you think it is doing. 

Remember, just because ther are no errors does not mean your code is correct.

In [1]:
import numpy as np
import pandas as pd

pd.set_option('display.max_columns', 30)

In [2]:
# NHANES 2015-2016 data
df = pd.read_csv('csv/nhanes_2015_2016.csv')
df.index = range(1, df.shape[0]+1)  # change index
df.head()

Unnamed: 0,SEQN,ALQ101,ALQ110,ALQ130,SMQ020,RIAGENDR,RIDAGEYR,RIDRETH1,DMDCITZN,DMDEDUC2,DMDMARTL,DMDHHSIZ,WTINT2YR,SDMVPSU,SDMVSTRA,INDFMPIR,BPXSY1,BPXDI1,BPXSY2,BPXDI2,BMXWT,BMXHT,BMXBMI,BMXLEG,BMXARML,BMXARMC,BMXWAIST,HIQ210
1,83732,1.0,,1.0,1,1,62,3,1.0,5.0,1.0,2,134671.37,1,125,4.39,128.0,70.0,124.0,64.0,94.8,184.5,27.8,43.3,43.6,35.9,101.1,2.0
2,83733,1.0,,6.0,1,1,53,3,2.0,3.0,3.0,1,24328.56,1,125,1.32,146.0,88.0,140.0,88.0,90.4,171.4,30.8,38.0,40.0,33.2,107.9,
3,83734,1.0,,,1,1,78,3,1.0,3.0,1.0,2,12400.01,1,131,1.51,138.0,46.0,132.0,44.0,83.4,170.1,28.8,35.6,37.0,31.0,116.5,2.0
4,83735,2.0,1.0,1.0,2,2,56,3,1.0,5.0,6.0,1,102718.0,1,131,5.0,132.0,72.0,134.0,68.0,109.8,160.9,42.4,38.5,37.7,38.3,110.1,2.0
5,83736,2.0,1.0,1.0,2,2,42,4,1.0,4.0,3.0,5,17627.67,2,126,1.23,100.0,70.0,114.0,54.0,55.2,164.9,20.3,37.4,36.0,27.2,80.4,2.0


## Goal

We want to find the mean of first 100 rows of "BPXSY1" when "RIDAGEYR" > 60

In [3]:
# One correct way of doing this is:
print(df[df['RIDAGEYR'] > 60].iloc[range(0, 100), 16].mean())

# Another way to reference the 'BPXSY1' variable:
print(df[df['RIDAGEYR'] > 60].iloc[range(0, 100), df.columns.get_loc('BPXSY1')].mean())

136.29166666666666
136.29166666666666


In [4]:
# Test our code on only ten rows so we can easily check
test = pd.DataFrame(data=np.column_stack((np.repeat([3, 1], 5), range(3, 13))),
                    index=range(1, 11),
                    columns=['col1', 'col2'])
test

Unnamed: 0,col1,col2
1,3,3
2,3,4
3,3,5
4,3,6
5,3,7
6,1,8
7,1,9
8,1,10
9,1,11
10,1,12


In [5]:
# Using the .iloc[] method, we are correctly choosing the first 5 rows, regardless of their row labels
test[test['col1'] > 2].iloc[range(0, 5), 1].mean()

5.0