## Part 5 - Binomial Test ##

If we need to decide whether a count of binary outcomes shows a significant variation from the *expected* outcome, we can use a Binomial Test.  

A Binomial Test compares a categorical dataset to some expectation - for example 
+ Comparing the actual number of heads from 1000 coin flips of a weighted coin to the expected number of heads
+ Comparing the actual percentage of respondents who gave a certain survey response to the expected survey response
  
The null hypothesis is that there is no (statistically significant) difference between the observed behavior and the expected behavior. If we get a p-value of less than 0.05, we can reject that hypothesis and determine that there is a difference between the observation and expectation.

The SciPy function for Binomial Testing is `binom_test`.  This requires 3 inputs:  
+ number of observed sucesses / positive outcomes  
+ total number of trials / events  
+ Expected probability of sucesses / positive outcomes  
  
  


#### Example 1 - Is the coin weighted? ####
Throw a coin 1000 times; we expect to get a head 50% of the time, i.e. 500 positive outcomes.   

Do we think the coin is weighted if we get 650 heads?

In [8]:
from scipy.stats import binom_test

pval_exactly_50pct = binom_test(500,n=1000,p=0.5)
pval_650_heads = binom_test(650,n=1000,p=0.5)
print(pval_exactly_50pct)
print(pval_650_heads)

1.0
1.6156310386976815e-21


...so that is a resounding YES - we reject the NULL hypothesis for 650 heads out of 1000 ... we think there is something wrong with the coin!

#### Example 1, part 2 - reduce the number of coin throws####
If we reduce the number of coin throws to 10 but still get the same percentage of positive outcomes, what is the effect:

In [9]:
pval = binom_test(7,n=10,p=0.5)

print(pval)

0.3437499999999999


...so even if we get 7 heads out of 10 throws, there is nothing to be suspicious about.

#### Example 2 - Customer Purchases after visiting Web-Site ####
This is taken from Codecademy:

Imagine that we are analyzing the percentage of customers who make a purchase after visiting a website. We have a set of 1000 customers from this month, 58 of whom made a purchase. Over the past year, the number of visitors per every 1000 who make a purchase hovers consistently at around 72. Thus, our marketing department has set our target number of purchases per 1000 visits to be 72. We would like to know if this month's number, 58, is a significant difference from that target or a result of natural fluctuations.

In [14]:
#
n = 1000
expected_pct = 72/1000
actual = 58

pval = binom_test(actual,n=n, p=expected_pct)
print(pval.round(5))

0.09813


There is a greater than 5% likelihood of this variation occuring by chance, we accept the Null Hypothesis and assume no significant difference.