# STA130 Tutorial 5: Hypothesis Testing

|<img src="https://pictures.abebooks.com/inventory/md/md31377899338.jpg" alt="Scientific Revolusions" style="width: 300px; height: 450px;"/>|<img src="https://i.ytimg.com/vi/Yn8cCDtVd5w/maxresdefault.jpg" alt="Kuhn Cycle" style="width: 800px; height: 450px;"/>|
|-|-|
| | |

# Today's Agenda (5 minutes)

- Tutorial Activity: Stella McStat's "Wheel of Destiny"

    - **Parameters (of populations)** versus **Statistics (of samples)**
    - **Independent** Samples, **Sampling Distributions**, **p-values**, etc.

- Lecture/Example: Hypothesis test walk-through
    - Code Examples to facilite Terminology Explanation and Discussion
    - Evidence against the Null Hypothesis
    - Alternatives form of **Sampling Distributions**, "as or more extreme...", etc.


- Tutorial Assignment: Mini-Homework Hypothesis Testing Practice


## The Wheel of Destiny (1 minute) [Click "Down" next]

Stella McStat had been running a small-time gambling operation on campus for several months during her first year at U of T. It was disrupted during COVID, but now that courses seem to be reliably back to in-person formats, Stella is getting things back up and running. **Stella wants to determine if her game is "fair"** (even if somewhat illegal).

| <img src="stella2.png" style="height: 450px;"/> |  <img src="fair.png" style="height: 450px;"/> |
|-|-|
| | |

<sub><sup>Adapted from Lawton, L. (2009) An Exercise for Illustrating the Logic of Hypothesis Testing, Journal of Stat. Education, 17(2).</sup></sub>


## The Wheel of Destiny (1 minute) [Click "Down" next]

For each spin of the wheel, two gamblers take part. For a toonie each (\\$2 Canadian), Stella sells one a red ticket and one a black ticket  (i.e., total \\$4). Then Stella spins the Wheel of Destiny. The person who holds the colour on which the spinner stops gets \\$3.50 (Stella keeps \\$0.50 per spin for running the game and providing snacks).

| <img src="stella2.png" style="height: 450px;"/> |  <img src="fair.png" style="height: 450px;"/> |
|-|-|
| | |

<sub><sup>Adapted from Lawton, L. (2009) An Exercise for Illustrating the Logic of Hypothesis Testing, Journal of Stat. Education, 17(2).</sup></sub>


## The Wheel of Destiny (1 minute) [Click "Down" next]


Stella just bought a new spinner, the critical piece of equipment for this game. She's heard some mixed reviews about the manufacturer she has purchased from. Before she beings using this spinner, she wants to make sure that it is, in fact, fair—she wants both colours to come up equally often. Because of the set-up of the game, Stella has no incentive to cheat and wants the game to be as fair as possible.

| <img src="stella2.png" style="height: 450px;"/> |  <img src="fair.png" style="height: 450px;"/> |
|-|-|
| | |

<sub><sup>Adapted from Lawton, L. (2009) An Exercise for Illustrating the Logic of Hypothesis Testing, Journal of Stat. Education, 17(2).</sup></sub>


## The Wheel of Destiny (1 minute) 


Everything she can examine about the wheel seems fine; there is the same number of sectors of each colour and they each have the same area. BUT! Stella has a great idea and decides to come to YOU, her statistical guru, and ask you to verify that the new spinner is fit to use.

| <img src="stella2.png" style="height: 450px;"/> |  <img src="fair.png" style="height: 450px;"/> |
|-|-|
| | |

<sub><sup>Adapted from Lawton, L. (2009) An Exercise for Illustrating the Logic of Hypothesis Testing, Journal of Stat. Education, 17(2).</sup></sub>


## Tutorial Activity  (15 minutes) [Click "down" next]<br>Submit answers to the following questions/prompts 

- *Write and submit as a group of 3 or 4 students (with all your names) the following:* 
- How can we examine the wheel for fairness?
    - What's a **null hypothesis** here? 
        - Hint: the alternative hypothesis is just "$H_1: H_0 \text{ is False}$"
- What's "data" here? 
    - What's a **sample** here?
        - Hint: the **population** would be every spin result ever 
    - What's the difference between a **parameter** and a **statistic**?
    - What's the difference between a **dependent** and **independent** sample?
- How could we go about conducting a simulation-based hypothesis test here?
    - What is the definition of a **p-value**?
    - *Describe the process of using simulation to create a p-value for this problem*

## Answers Review (10 minutes)

- How can we examine the wheel for fairness? **Use a Hypothesis Test:**

$$
\begin{align}
H_0 &{}: {} p_r = p_b = 0.5 \quad\text{probability of a spin coming up red or black is equal}\\
H_1 &:{} p_r \neq p_b \neq 0.5 \quad\text{probability of a spin coming up red isn't same as black}
\end{align}$$

- We could perform actual spins and calculate the proportion of landing red
    - **How many spins would we perform**? 
    - The observed proportions of red would be our **statistic**
    - The actual true chance of a red spin would be our **parameter**
    - If each spin result doesn't affect the others the spins are **independent**  
    
    
- How would/could we conduct a simulation-based hypothesis test here?
    - p-value: [TYPE OUT DEFINITION WITH HELP FROM CLASS]
    - *Describe/Discuss the simulation process for creating a p-value!*

## Code I: Parameters, Samples, Statistics (10 minutes)
- Discuss how line 6 specifies the **parameter** of a **population** while line 9 and 10 calculates a **statistic** of a **sample**
    - Try out some different **populations** (i.e., different **parameter** values)
    - **Differentiate this exercise from specifying a *null hypothesis***
- Discuss/Explore what changing the seed in line 4 does
- Discuss/Explore question on line 3
- Discuss `replace=True` on line 7 and **independence**

In [2]:
# Simulating an observed test statistic
import numpy as np
spins = 100  # LINE 3: chose 100 spins, pros/cons of doing more/less?
np.random.seed(56) # LINE 4: Experiment with changing this
# What if the spinner wasn't actually truthfully fair and slightly favored red (say 51:49)...
spin_results = np.random.choice(['Red', 'Black'], p=[0.51, 0.49], # LINE 6: Truth we don't know
                                size=spins, replace=True)         # LINE 7: not a null hypothesis
observed_test_stat = (spin_results == 'Red').sum() / spins
print('Our test statistic is ' + str(observed_test_stat) + ', meaning we spun ' +      # LINE 9
      str(int(observed_test_stat * spins)) + ' reds out of ' + str(spins) + ' spins.') # LINE 10

Our test statistic is 0.54, meaning we spun 54 reds out of 100 spins.


## Things to Consider: Sampling Distributions (5 minutes)

Simulate the sampling distribution of the proportion of "Red" spins in a sample under a 50/50 assumption for the null hypothesis. 
> `np.random.choice` is not a distributional assumption, and only uses the assumption that the parameter $p=0.5$ (which implies that the chance of getting "Red" is the same as "Black").<br>**Using no distributional assumptions (like normality) and just parameter assumptions (strangely?) makes this <u>Nonparametric</u>**

- What's the difference between `p=[0.51, 0.49]` on the previous slide<br>as opposed to `p=[0.5, 0.5]` on the next slide?
<!-- Previously we imagined the spinner was not actually fair; but, this is different than a null hypothesis assumption that the spinner is fair, which is the assumption we use to generate our sampling distribution on the next slide --> 
- Are the *simulated spins* on the next slide **independent**? <br>Are the *simulated draws from the sampling distribution of $\hat p$* **independent**? <!-- Yes, each simulated spin and sample does not affect the other simulated spins/samples --> 
    - ["Double Click" for Answers]

# Code II: Sampling Distributions (10 minutes)

In [4]:
import pandas as pd; import plotly.express as px; import numpy as np
num_samples,sample_size = 1000,100  # What are each of these? Do the choices here change results? 
simulated_sample_proportions = []
np.random.seed(seed=1)
for i in range(num_samples):
    sample = np.random.choice(['Red', 'Black'], size=sample_size, p=[0.5, 0.5], replace=True)
    simulated_sample_proportions += [(sample == 'Red').sum() / sample_size]
fig = px.histogram(pd.DataFrame({'Proportion of spins that are Red': simulated_sample_proportions}), x='Proportion of spins that are Red', 
             title=str(num_samples)+' draws from the Sampling Distribution of the "Proportion of Red" (for a sample of size n='+str(sample_size)+' spins)'); fig.show()

### Code III: p-value and evidence against the null hypothesis (10 minutes)

In [41]:
#observed_test_stat=0.54 # Are we just "choosing" this value? Or what where would it come from?
num_as_or_more_extreme = \
  (abs(np.array(simulated_sample_proportions) - 0.5) >= abs(observed_test_stat - 0.5)).sum()
p_value = num_as_or_more_extreme / num_samples
print(str(num_as_or_more_extreme)+' out of ' +str(num_samples)+' simulated statistics were "as or more extreme" (relative to the Null Hypothesis\nassumed p=0.5 parameter value) than the observed test statistic of ' + str(observed_test_stat) + ' giving us a p-value of ' + str(p_value))

459 out of 1000 simulated statistics were "as or more extreme" (relative to the Null Hypothesis
assumed p=0.5 parameter value) than the observed test statistic of 0.54 giving us a p-value of 0.459


| | |
|-|-|
|<img src="https://www.jcpcarchives.org/userfiles/values-of-p-Inference.jpg" /> | [WRITE OUT CONCLUSIONS AS A CLASS FOR DIFFERENT SCENARIOS]<br>num_samples,sample_size = 1000,100; observed_test_stat=0.54<br>num_samples,sample_size = 10000,1000; observed_test_stat=0.54<br>num_samples,sample_size = 10000,1000; observed_test_stat=0.51|

- Change the code on the previous slide to re-simulate the Sampling Distribution
- See if the p-value looks approximately corrected given where the observed test statistic is relative to the simulated sampling distribution of the null hypothesis

<!-- Since our p-value is more than 0.10, we have no evidence against the null hypothesis of the probability of spinning red is 0.5 -->


#### p-values: Different Sampling Distributions (10 minutes) [Click "down" for the next few slides]

- Compare and contrast the figures on the next three sides<br>[and don't worry about the code so much]:
    - Simulated (**nonparametric**) Sampling Distribution
    - Theoretical **nonparametric** Binomial Distribution 
        - Only assumption used is the $p=0.5$ parameter value assumption of the null hypothesis (and spin **independence**<br>so each spin result has no impact on future spin results)
        - [**This "No Distributional Assumption" is called *nonparametric***...]
    - Approximate **parametric** t-test (assumes Normally Distributed data)
        - Assuming data is Normally Distributed makes this **parametric** (and the assumption here is wrong which makes it approximate) 
        - not the $p=0.5$ parameter value and **independence** assumptions

*You'll examine these different Sampling Distributions again in the Tutorial Assignment!*


In [81]:
import pandas as pd; import plotly.express as px; import numpy as np
num_samples,sample_size = 1000,100; simulated_sample_proportions = []; np.random.seed(seed=1)
for i in range(num_samples):
    sample = np.random.choice(['Red', 'Black'], size=sample_size, p=[0.5, 0.5], replace=True)
    simulated_sample_proportions += [(sample == 'Red').sum() / sample_size]
observed_test_stat=0.54; num_as_or_more_extreme = \
  (abs(np.array(simulated_sample_proportions) - 0.5) >= abs(observed_test_stat - 0.5)).sum()
p_value = num_as_or_more_extreme / num_samples
print(str(num_as_or_more_extreme)+' out of ' +str(num_samples)+' simulated statistics were "as or more extreme" (relative to the Null Hypothesis\nassumed p=0.5 parameter value) than the observed test statistic of ' + str(observed_test_stat) + ' giving us a p-value of ' + str(p_value))
fig = px.histogram(pd.DataFrame({'Proportion of spins that are Red': simulated_sample_proportions}), #nbins=100,
                   title=str(num_samples)+' draws from the Sampling Distribution of the "Proportion of Red" (for a sample of size n='+str(sample_size)+' spins)', x='Proportion of spins that are Red')
fig.add_vline(x=observed_test_stat, line_dash="dash", line_color="black", annotation_text="observed_test_stat ("+str(observed_test_stat)+")"); fig.show()

459 out of 1000 simulated statistics were "as or more extreme" (relative to the Null Hypothesis
assumed p=0.5 parameter value) than the observed test statistic of 0.54 giving us a p-value of 0.459


In [88]:
# Theoretically Exact (Binomial) Sampling Distribution
from scipy import stats; x=np.arange(0,100); sample_size=100; prob=stats.binom(n=sample_size, p=0.5).pmf(x) # setup for binomial distribution
fig = px.bar(pd.DataFrame({'$\hat p$':x/sample_size,'probability':prob}), x='$\hat p$', y='probability',
                title='$\\text{Theoretically Exact (Binomial) Sampling Distribution of } p_r \\text{ assuming "}H_0 \\text{: 50-50 chance of red"}$')
fig.add_vline(x=observed_test_stat, line_dash="dash", line_color="black",
              annotation_text="observed_test_stat ("+str(observed_test_stat)+")")
# Theoretically Exact (Binomial) Sampling Distribution p-value calculation
p_value = (1-stats.binom(n=100, p=0.5).cdf(54-1))*2  # "two sided" p-value
# This calculates "as or more extreme" as the sum of all the probabilities (bin heights) that are located at 
# 54/100, 55/100, 56/100, ..., up to 100/100 (and then multiplies this sum by two since this distribution is symmetric)
print('Our theoretical p-value calculation based on the theoretical binomial distribution is ' + str(np.round(p_value, 5))); fig.show()
# uncomment `nbins=100` and set `num_samples=100000` in the previous histogram to see that
# the simulation estimates this Theoretically Exact (Binomial) Sampling Distribution...

Our theoretical p-value calculation based on the theoretical binomial distribution is 0.48412


In [98]:
# Continuous Approximation to the Theoretical (Binomial) Sampling Distribution
np.random.seed(56); spins = 100; x = np.linspace(0,1,200); spin_results = np.random.choice(['Red', 'Black'], p=[0.51, 0.49], size=spins, replace=True)
n = len(spin_results); spin_results_numeric = np.where(spin_results == 'Red', 1, 0) # convert red spins to True/1 and black spins to False/0
dens = stats.t(loc=0.5, df=n-1, scale=np.std(spin_results_numeric, ddof=1)/n**0.5) # Another possible approximation could be based on `dens=stats.norm(loc=0.5, scale=0.5*100**0.5).pdf(x)`
# p-value calculations: "as or more extreme as" the area under the curve from 54/100 to 100/100 (and multiply this sum by two since this distribution is symmetric)
p_value = (1-dens.cdf(54/100))*2 # Area under the curve from 54/100 to 100/100 for the continuous approximation to the binomial distirbution
p_value2 = (1-dens.cdf((54-0.5)/100))*2 # The '-0.5' is an adjustment to the discontinuous ("probability bars") binomial distribution when approximating it as the continuous ("smooth line") normal distribution 
print('Approximate-theoretical parametric p-value calculation based on a continuous approximation of\nthe theoretical binomial distribution is ' 
      + str(np.round(p_value,5)) + ' ("continuity corrected" adjusted to ' + str(np.round(p_value2,5))+").")
print(stats.ttest_1samp(spin_results_numeric, .5)) # scipy has built in ttest function
fig = px.line(pd.DataFrame({'x':x, 'density':dens.pdf(x)}), x='x', y='density', title='$$\\text{Continuous Approximation to the Theoretical (Binomial) Sampling Distribution (under } H_0: p=0.5\\text{)}$$'); fig.add_vline(x=observed_test_stat, line_dash="dash", line_color="black", annotation_text="observed_test_stat ("+str(observed_test_stat)+")"); fig.show()

Approximate-theoretical parametric p-value calculation based on a continuous approximation of
the theoretical binomial distribution is 0.42646 ("continuity corrected" adjusted to 0.48636).
TtestResult(statistic=0.7985494095046912, pvalue=0.4264632540527483, df=99)


# Tutorial Assignment (Begin in time remaining...)
More hypothesis testing, sampling and p-values with Fisher's Tea Experiment.
- Include your code and written solutions in a `.ipynb` notebook. Starter file provided to MarkUs.
    - Annotate your notebook with comments where it's helpful for understanding what you're doing.

### Notes on approaching the writing questions
- Use full sentences
- Grammar is *not* the main focus of the assessment, but it is important that you communicate in a clear and professional manner (without slang or emojis) 
