# Review

Congratulations! You now know how to run a binomial hypothesis test using a SciPy function — or by simulating it yourself! This will serve you well as a data scientist because it will enable you to investigate what’s going on if pre-written functions return surprising results. You also now have a conceptual understanding of how a binomial test works and what questions it aims to answer. To summarize, here are some of the things you’ve learned about hypothesis tests in general:

- All hypothesis tests start with a null and alternative hypothesis
- 
- Outcomes of a hypothesis test that might be reported include:
    - confidence intervals
    - p-values

- A hypothesis test can be simulated by:
    - taking repeated random samples where the null hypothesis is assumed to be true
    - using those simulated samples to generate a null distribution
    - comparing an observed sample statistic to that null distribution


## Instructions

As a final exercise, the solution code for the previous exercise is available to you in script.py. As a challenge, see if you can re-write the simulation-based binomial test function so that it has an input named `alternative_hypothesis` that can be equal to `'less'`, `'not_equal'`, or `'greater'`. Then change the function so that it performs the appropriate one- or two-sided test for the alternative hypothesis provided. Solution code is available to you in solution.py if interested.


In [None]:
import numpy as np
import pandas as pd
from scipy.stats import binom_test

def simulation_binomial_test(observed_successes, n, p):
  #initialize null_outcomes
  null_outcomes = []
  
  #generate the simulated null distribution
  for i in range(10000):
    simulated_monthly_visitors = np.random.choice(['y', 'n'], size=n, p=[p, 1-p])
    num_purchased = np.sum(simulated_monthly_visitors == 'y')
    null_outcomes.append(num_purchased)

  #calculate a 1-sided p-value
  null_outcomes = np.array(null_outcomes)
  p_value = np.sum(null_outcomes <= observed_successes)/len(null_outcomes) 
  
  #return the p-value
  return p_value

### Solution

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import binom_test

def simulation_binomial_test(observed_successes, n, p, alternative_hypothesis):
  #initialize null_outcomes
  null_outcomes = []
  
  #generate the simulated null distribution
  for i in range(10000):
    simulated_monthly_visitors = np.random.choice(['y', 'n'], size=n, p=[p, 1-p])
    num_purchased = np.sum(simulated_monthly_visitors == 'y')
    null_outcomes.append(num_purchased)

  null_outcomes = np.array(null_outcomes)

  if alternative_hypothesis == 'less':
    p_value = np.sum(null_outcomes <= observed_successes)/len(null_outcomes) 
  elif alternative_hypothesis == 'greater':
    p_value = np.sum(null_outcomes >= observed_successes)/len(null_outcomes)
  else:
    difference = np.abs(p*n - observed_successes)
    upper = p*n + difference
    lower = p*n - difference
    p_value = np.sum((null_outcomes >= upper) | (null_outcomes <= lower))/len(null_outcomes)
  
  #return the p-value
  return p_value

#Test your function:
print('lower tail one-sided test:')
p_value1 = simulation_binomial_test(45, 500, .1, alternative_hypothesis = 'less')
print("simulation p-value: ", p_value1)

p_value2 = binom_test(45, 500, .1, alternative = 'less')
print("binom_test p-value: ", p_value2)

print('upper tail one-sided test:')
p_value1 = simulation_binomial_test(53, 500, .1, alternative_hypothesis = 'greater')
print("simulation p-value: ", p_value1)

p_value2 = binom_test(53, 500, .1, alternative = 'greater')
print("binom_test p-value: ", p_value2)

print('two-sided test:')
p_value1 = simulation_binomial_test(42, 500, .1, alternative_hypothesis = 'not_equal')
print("simulation p-value: ", p_value1)

p_value2 = binom_test(42, 500, .1)
print("binom_test p-value: ", p_value2)