In [None]:
# autoreload
%load_ext autoreload
%autoreload 2

# add src to path
import sys

# path relative to your notebook
sys.path.insert(0, '../src')

# Hypothesis Testing: Binomial Testing


Notes (Hook):

Anyone here put milk in your tea? Is this a popular thing to do in
Canada? My family usually doesn't put milk in our tea, but there's a
pretty famous scenario from statistics that goes something like this.


## Scenario

It's the 1920's and Ron and Muriel are coworkers. Muriel swears that when
she drinks tea and milk together, she can taste whether the milk was
poured in first or second (and she prefers the milk go in first in her
tea).

Ron is unconvinced (he thinks it shouldn't matter at all what order the
tea and milk enter the cup).

What kind of proof could satisfy Ron's skepticism?


## Objectives

- Recognize situations where the binomial distribution is applicable
- Design a hypothesis test using a binomial distribution
- Formulate null and alternative hypotheses
- Define the significance level ($ \alpha $) and its role in hypothesis
  testing
- Observe how the number of trials and the alpha interact


Notes:

For the next 20 minutes or so, we'll be looking at Hypothesis testing,
specifically setting up a question so that we can put the binomial
distribution to work for us.

We'll also be leveraging `scipy.stats` to help us out with many of the
calculations.

We'll also be taking a few liberties with the historic situation. At the
end, I'll leave some links to some lovely articles that describe more
fully the original scenario.


## Choose a distribution that describes the situation


Notes:

We need a model that can help us quantify our judgement about the
situation.

You could make the assumption that each cup is like a Bernoulli trial.

If we assume that Muriel does no better than chance, then Muriel
correctly identifying a single teacup is kind of like flipping a single
fair coin and guessing correctly. She could have just made a single lucky
guess.

Which distribution would describe multiple Bernoulli trials?

The binomial distribution.


..


## Forming Hypothesis Statements

Are there the two mutually-exclusive possible outcomes in the tea-debate?

What could they be?


In [None]:
# two mutually-exclusive possibilities
# muriel is guessing 
# muriel can do better than guessing

notes:

The first question we should probably ask is does this scenario lend
itself to a hypothesis test?

- That muriel can actually tell a difference

- That if things were randomized and Muriel had to rely on taste alone,
  Muriel would do no better than chance.


### The Null Hypothesis

This is the skeptic's position. Your test will assume that this position
is correct and seek evidence to refute it.

When written in math terms, the null contains the equals sign, whether
$=$, $\ge$, or $\le$.

How can we phrase the skeptic's position (that Muriel does no better than
chance) in a mathematical way?


$H_0: p \le .5$


### The Alternate Hypothesis

This is the the position where something new is discovered. This
statement and the null hypothesis will be mutually exclusive.

How can we phrase this in a formal way?


$H_a$: p > 0.5


## Significance level ($\alpha$)

How much proof do we need to reject the null in favor of the alternate?


Notes:

Just how much proof do we need before we decide that Muriel is doing
better than chance? What if the result was only attainable by chance 5
percent of the time?

That does mean that there is a 5% chance we're using evidence to reject
the null that could have been only chance. This 5% is often used, but
depending on the industry and application, you may see 1% or even .01%.


## Running the test with `scipy.stats`


- [scipy.stats.binomtest — SciPy v1.11.4
  Manual](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binomtest.html)


In [1]:
from scipy.stats import binomtest

null_prob = 0.5
trials = 8
accurate_predictions = 8
alpha = 0.05


muriel_result = binomtest(
    k=accurate_predictions,
    n=trials,
    p=null_prob,
    alternative="greater")  # alternate: p should be greater than chance (0.5)

print(muriel_result)

# test statistic is equal to the estimated proportion
# print(accurate_predictions/trials)

BinomTestResult(k=8, n=8, alternative='greater', statistic=1.0, pvalue=0.00390625)


## How many cups must Muriel predict accurately?


Notes:

You should have some intuition that the more cups Muriel predicts accurately, the more evidence we gather that she's not just doing it by chance. So the more trials, the better.

But it should raise a few more questions. In the real world there are limitations to how many trials or samples we can gather. Perhaps there's only so much milk (or tea), or time to drink it, or Muriel can only stomach so much tea without becoming queasy.

There's also a question of what happens if Muriel misses one?

There should also be an intuition that missing one out of 50 trials wouldn't be as big of a deal as missing one out of trials.

How many cups must Muriel predict in order to bring enough evidence that
we would believe she's doing better than chance?

Let's look at a brute-force method to examine the situation more closely.


In [2]:
# custom function
from src.binom_demo import crit_value

crit_value(trials)

For 8 trials
accurate	p_value
0 		 1.0
1 		 0.99609375
2 		 0.96484375
3 		 0.85546875
4 		 0.63671875
5 		 0.36328125
6 		 0.14453125
7 		 0.03515625
Critical value: 7
	Muriel must accurately identify at least 7 cups
	in order to demonstrate significance.
8 		 0.00390625


## A Brief Aside

You can also use `scipy.stats.binom` to do this job

Documentation

- [scipy.stats.binom — SciPy v1.11.4
  Manual](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html)


In [3]:
import numpy as np
from scipy.stats import binom

# print(f"trials = {trials}")
# print(f"accurate_predictions = {accurate_predictions}")
# print(f"alpha = {alpha}")
# print(f"null_prob = {null_prob}\n")

guessing_dist = binom(n=trials, p=null_prob)

# the probability mass of randomly guessing this many or more
print(guessing_dist.pmf(accurate_predictions)
      + guessing_dist.sf(accurate_predictions))

# muriel_result = binomtest(
#     k=accurate_predictions,
#     n=trials,
#     p=null_prob,
#     alternative="greater")

print(muriel_result)

0.00390625
BinomTestResult(k=8, n=8, alternative='greater', statistic=1.0, pvalue=0.00390625)


## Thinking through scenarios


### Scenario 1

Three siblings decide to randomly select who has to do a chore around the
house each day. After 5 days, the eldest sibling still hasn't been
selected for a chore. The other two siblings are starting to get
suspicious. What does a hypothesis test say about the situation?


In [5]:
sibling_result = binomtest(
    k=0,
    n=10,
    p=1/3,
    alternative='less')

print(sibling_result)
# H_0: each sibling p>=1/3
# H_0: one sibling is p < 1/3

BinomTestResult(k=0, n=10, alternative='less', statistic=0.0, pvalue=0.01734152991583262)


notes:

- k=0 (no turns)
- n=5 (5 days)
- p=1/3 (expected probability of being chosen)
- alternative="less" (other numbers are equal to 2/3 or greater)

The alternative is sibling has less of a chance of getting chosen than they say.

What if they still haven't had a turn after 10 days?


## Scenario 2

A company claims their new advertising strategy has increased the
click-through rate on their website to 10%. You collect a sample of 150
website visits and find that 13 of them resulted in a click.

You want to test whether the click-through rate is different from the
claimed 10%.


In [None]:
click_through = None

click_through

notes:

null: the probability is 15%
alternate: the probability is higher or lower than d

```python
click_through = binomtest(
    k=15,
    n=150,
    p=.1,
    alternative='two-sided')

print(sibling_result)

```


## Conclusion

We have:

- Recognized situations where the binomial distribution is applicable
- Designed a hypothesis test using a binomial distribution
- Formulate null and alternative hypotheses
- Define the significance level ($ \alpha $) and its role in hypothesis
  testing
- Observe how the number of trials and the alpha interact


## Looking Forwards

- Look at situations where the Hypothesis test calls for other
  distributions
- Review the central limit theorem
  - specifically how the CLM enables us to use the normal distribution
    for many sample-based hypothesis tests
- Review `scipy.stats.norm`
- Calculate power
- Calculate sample size


## For further reading

- [Ronald Fisher, a Bad Cup of Tea, and the Birth of Modern Statistics |
  Science History
  Institute](https://sciencehistory.org/stories/magazine/ronald-fisher-a-bad-cup-of-tea-and-the-birth-of-modern-statistics/)

- [Tea for three: Of infusions and inferences and milk in first - Senn -
  2012 - Significance - Wiley Online
  Library](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2012.00620.x)
