# Hypothesis testing for Computer Scientists

Assume we roll a 6-sided die 24 times and it comes up as the number 6 7 times.  We want to know whether the die is fair.

We would expect a 6 to occur on 1/6 rolls.  With 24 rolls, we would expect 4 sixes.  So, we are above what we expect from a fair die, but are we far enough above to believe strongly that the die is unfair?

This is typically handled by a hypothesis test.  We first assume the die is fair.  We then figure out what distribution a fair die would follow (binomial).  Then, we figure out which paramters the binomial distribution takes (n=24, p=1/6). Next, we calculate (1 - binomial_cdf(24, 1/6, 6)).

What do we do if we don't remember how to run a hypothesis test from our stats class?  Think about the big picture of what we are trying to answer.  One way to think of probability is to ask "If we run this scenario lots and lots of times, what percent of times will this event occur?".  So, we run the scenario (rolling a 6-sided die 24 times) many times and see what percent of times have the questionable result (7 or more 6s).  Typically, if the result occurs more than 5% of the time, we decide we do not have enough evidence to say that the process that actually occurred was different from the assumed process (that the die was, in fact, unfair).

Credit:  There is a great presentation of this idea and more available at
https://speakerdeck.com/jakevdp/statistics-for-hackers .  The presentation was given by Jake VanderPlas at PyCon 2016 and is titled "Statistics for Hackers".

In [None]:
import matplotlib.pyplot as plt
import numpy as np


plt.figure()
all_sixes = []

First, we set up code to simulate our scenario.
The next cell simulates rolling a 6-sided die 24 times and counts the number of 6s.
If you run the cell several times, it will keep track of the rolls so far and start to build up a histogram of results.

In [None]:
n_rolls_per_trial = 24

rolls = np.random.randint(1, 7, size=n_rolls_per_trial)  # Generate 24 "dice rolls"
num_sixes = np.sum(rolls == 6)  # count the number of 6s
all_sixes.append(num_sixes)  # keep a running list of how many 6s we have seen each time

# This is all just plotting code to make the histogram look nice
bins = np.arange(0, 12 + 1.5) - 0.5
plt.hist(all_sixes, bins, color='c', edgecolor='k', stacked=True)
plt.gca().set_xticks(bins + 0.5);
plt.gca().set_yticks(np.arange(0, 21, 2));
plt.ylim(0, 20)
plt.xlabel('# sixes');
plt.ylabel('# occurences');

This next cell generates a histogram by repeating the previous code `n_trials` times.
This should give you an idea of the overall shape of the probability distribution under the null hypothesis,
i.e., what we would expect our scenario to look like if the coin were fair.

In [None]:
n_trials = 10000  # a single "trial" is rolling a die 24 times
n_rolls_per_trial = 24

all_sixes = []
for i in range(n_trials):
    rolls = np.random.randint(1, 7, size=n_rolls_per_trial)
    num_sixes = np.sum(rolls == 6)
    all_sixes.append(num_sixes)

bins = np.arange(0, max(all_sixes) + 1.5) - 0.5
plt.figure()
plt.hist(all_sixes, bins, color='c', edgecolor='k',
         weights=np.ones_like(all_sixes)/len(all_sixes))
plt.gca().set_xticks(bins + 0.5);
plt.axvline(7, color='k', linestyle='dashed', linewidth=1);
plt.xlabel('# sixes');
plt.ylabel('probability');

In [None]:
# To calculate a (approximate) p-value, all we do is count
print('p = ', np.sum(np.array(all_sixes) >= 7)/n_trials)

In [None]:
'''
This cell has same code, but without all the plotting.
If you actually wanted to run the hypothesis test,
it would be this simple.
'''
n_trials = 10000
n_rolls_per_trial = 24

count = 0
for i in range(n_trials):
    rolls = np.random.randint(1, 7, size=n_rolls_per_trial)
    num_sixes = np.sum(rolls == 6)
    if num_sixes >= 7:
        count += 1

print('p = ', count/n_trials)

How general is thls method?  As you as you can simulate the process you want to test, you can use the same idea.  The next snippet tests whether a coin that comes up as heads 23 times is likely to be fair.  You'll notice how similar it looks to the previous code. 

In [None]:
n_trials = 10000
n_flips_per_trial = 30

count = 0
for i in range(n_trials):
    heads = np.random.randint(2, size=n_flips_per_trial)
    num_heads = np.sum(heads == 1)
    if num_heads >= 23:
        count += 1

print('p = ', count/n_trials)