# (Statistical) Hypothesis Testing

Specifying 

```python
mu,std = 8,1
normal_population = stats.norm(loc=mu, scale=std)
```

or, mathematically

$$\huge \mathcal{N}(\mu, \sigma) = \mathcal{N}(8, 1)$$

is implying the **null hypothesis**

$\huge
\require{enclose}
\begin{align}
H_0 &: {} \mu = 8 \quad\text{(null hypothesis)}\\
 &\color{white}{:} {} \sigma = 1\\
 &\color{white}{:} {} \text{and }  \text{the population is normally distributed}\\
 &\color{white}{:} {} (\text{and our sample size is }\enclose{horizontalstrike}{325} 321 \text{ [because NAs]})\\\\
 H_1 &:{} H_0 \text{ is } \text{False} \quad\text{(alternative hypothesis)}\\\\
\end{align}$

This is a **parameteric** null hypothesis(!) because we're saying that the distribution from which the sample was drawn is normally distributed.




In [10]:
from scipy import stats
mu,std = 8,1
normal_population = stats.norm(loc=mu, scale=std)

In [2]:
import pandas as pd
amazonbooks = pd.read_csv("amazonbooks.csv", encoding="ISO-8859-1")
amazonbooks

Unnamed: 0,Title,Author,List Price,Amazon Price,Hard_or_Paper,NumPages,Publisher,Pub year,ISBN-10,Height,Width,Thick,Weight_oz
0,"1,001 Facts that Will Scare the S#*t Out of Yo...",Cary McNeal,12.95,5.18,P,304.0,Adams Media,2010.0,1605506249,7.8,5.5,0.8,11.2
1,21: Bringing Down the House - Movie Tie-In: Th...,Ben Mezrich,15.00,10.20,P,273.0,Free Press,2008.0,1416564195,8.4,5.5,0.7,7.2
2,100 Best-Loved Poems (Dover Thrift Editions),Smith,1.50,1.50,P,96.0,Dover Publications,1995.0,486285537,8.3,5.2,0.3,4.0
3,1421: The Year China Discovered America,Gavin Menzies,15.99,10.87,P,672.0,Harper Perennial,2008.0,61564893,8.8,6.0,1.6,28.8
4,1493: Uncovering the New World Columbus Created,Charles C. Mann,30.50,16.77,P,720.0,Knopf,2011.0,307265722,8.0,5.2,1.4,22.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
320,Where the Sidewalk Ends,Shel Silverstein,18.99,12.24,H,192.0,HarperCollins,2004.0,60572345,9.3,6.6,1.1,24.0
321,White Privilege,Paula S. Rothenberg,27.55,27.55,P,160.0,Worth Publishers,2011.0,1429233443,9.1,6.1,0.7,8.0
322,Why I wore lipstick,Geralyn Lucas,12.95,5.18,P,224.0,St Martin's Griffin,2005.0,031233446X,8.0,5.4,0.7,6.4
323,"Worlds Together, Worlds Apart: A History of th...",Robert Tignor,97.50,97.50,P,480.0,W. W. Norton & Company,2010.0,393934942,10.7,8.9,0.9,14.4


In [7]:
# but we have missing data!!
amazonbooks.isnull().sum()

Title            0
Author           1
List Price       1
Amazon Price     0
Hard_or_Paper    0
NumPages         2
Publisher        1
Pub year         1
ISBN-10          0
Height           4
Width            5
Thick            1
Weight_oz        9
dtype: int64

In [9]:
amazonbooks.Height.notnull().sum()

321

In [4]:
import plotly.express as px
fig = px.histogram(amazonbooks, x='Height', nbins=15)
fig.show()
# our actual observed sample

In [5]:
# sample mean x-bar and sample standard deviation s
amazonbooks.Height.mean(),amazonbooks.Height.std()

(8.163239875389408, 0.918739337573495)

In [15]:
amazonbooks.Height.dropna()

0       7.8
1       8.4
2       8.3
3       8.8
4       8.0
       ... 
320     9.3
321     9.1
322     8.0
323    10.7
324     7.8
Name: Height, Length: 321, dtype: float64

# Simulate the sampling distribution of (the statistic) $\bar x$
- under the null hypothesis 
- (for a sample of size 321)

A **statistic** is any function of a sample of data (like a average).

The sampling distribution of $\bar x$ (for a given sample size under consideration) is the distribution of averages of samples (of that sample size)

In [54]:
# random variable samples
n = 321
x = normal_population.rvs(size=n)
x.mean()

8.040202928195672

In [56]:
simulated_means = []
reps = 10000
for i in range(reps):
    # must tab or python won't work and will give the error
    # `IndentationError: expected an indented block after 'for' statement on line 2`
    x = normal_population.rvs(size=n)
    simulated_means += [x.mean()]

In [59]:
# example of one sample simulated under the null hypothesis
fig = px.histogram(pd.DataFrame({'Simulated Sample': x}), x='Simulated Sample', nbins=15)
fig.show()


In [60]:
# (simulated) sampling distribution of x-bar under the null hypothesis
fig = px.histogram(pd.DataFrame({'Simulated Means': simulated_means}), x='Simulated Means', nbins=15)
fig.show()

In [63]:
import plotly.figure_factory as ff
group_labels = ['Simulated Means (Sampling Distribution)','One Example Simulated Sample']
fig = ff.create_distplot([simulated_means, x], group_labels, bin_size=.05)
fig.show()

In [82]:
import plotly.graph_objects as go
import numpy as np
support = np.linspace(4,12,100)

fig = go.Figure()
fig.add_trace(go.Histogram(x=simulated_means, histnorm='probability density', 
                           name='Sampling Distribution of x-bar'))#, nbinsx=15
fig.add_trace(go.Scatter(x=support, y=normal_population.pdf(support), mode='lines', name='Null Population'))
fig.show()

# Notes about which figures show which distributions
- red curve is the (null) hypothesized distribiution of the sample
- the histrogram is the simulated sampling distibution of x-bar
- the orange distribuion up above is an example of a sample (of size 321) simlulated under the assumption of the null hypthesis
    - blue up above (in the same plot) is again the simulated sampling distibution of x-bar
- next up above that is again the simulated sampling distibution of x-bar
- next up above that is again an example of a sample (of size 321) simlulated under the assumption of the null hypthesis
- and finally, the very first plot toward the top of the notebook is the distribution of the sample we observed 

# The different distribuitons under conideration
- observed sample
- hypothesis population
    -  simulated samples from the population
- sampling distribution of x-bar

# Quantifying evidence against the null
- we observed a (sample mean) test statistic for our data of `8.16...`
- we simulated the sampling distribution of sample averages under the assumption of the null hypothesis
- let's see how many simulated x-bars (from the simulated sampling distribution of x-bar) are **as or more unusual** than `8.16...`

In [83]:
# observed test statistic
amazonbooks.Height.mean()

8.163239875389408

In [89]:
abs(amazonbooks.Height.mean()-mu)

0.16323987538940798

In [92]:
# `simulated_means` is a list
# so this breaks
abs(simulated_means-mu)

TypeError: unsupported operand type(s) for -: 'list' and 'int'

In [97]:
# coercion will make True=1 and False=0
(abs(np.array(simulated_means)-mu) >= abs(amazonbooks.Height.mean()-mu)).sum()

34

In [99]:
# quanties how many simulated menas (under the null hypthesis assumption) are "as or more extreme"
# than our observed test statistic
34/reps

0.0034

# p-value 

- **The probability of a test statistic being as or more extreme than the observed test statistic if the nulll hypothesis was true**

- We computed this p-value using simulation, based on the parameteric assumption of what the population is as specified by the null hypothesis


In [84]:
# (simulated) sampling distribution of x-bar under the null hypothesis
fig = px.histogram(pd.DataFrame({'Simulated Means': simulated_means}), x='Simulated Means', nbins=15)
fig.show()
# Qualitatively: 8.16... is quite rare
# Quantiatively: the p-value is 0.0034 (less than 1%... 0.34% exactly, to be precise)

For the null hypothesis

$\huge
\require{enclose}
\begin{align}
H_0 &: {} \mu = 8 \quad\text{(null hypothesis)}\\
 &\color{white}{:} {} \sigma = 1\\
 &\color{white}{:} {} \text{and }  \text{the population is normally distributed}\\
 &\color{white}{:} {} (\text{and our sample size is }\enclose{horizontalstrike}{325} 321 \text{ [because NAs]})\\
 H_1 &:{} H_0 \text{ is } \text{False} \quad\text{(alternative hypothesis)}\\
\end{align}$

we observed a (simulated) p-value of 0.0034 (for the observed sample mean of `8.16...` relative to the simulated sampling distribution of x-bar under the null hypotheis). This is "Strong evidence against the null hypothesis".

![](https://www.jcpcarchives.org/userfiles/values-of-p-Inference.jpg)

# Comments
- We don't "prove" a null hypothesis is false or true, or corresponding that an alternative hypothsis is false or true
- We provide a qualitative statement of the amount of evidence we have against the null.
- We then say we reject the null based on the evidence at hand
    - We are Gambling! We might be wrong, but if we have to make a choice, our best guess is to reject the null if the evidence against it is strong
    - Otherwise, we "fail to reject the null hypothesis"
    - Do no "accept" the null hypothesis or alternative hypothesis, which means: we don't ever say we proved thigns one way or another: we don't have say one is right and the other is wrong

# $\alpha$-significance levels
- part of formal statistical hypothesis testing
- this is often how statistical evidence is present in journal publications
- it's quite highly problematic as you would learn in more advanced stats courses; so, we'll leave this topic to the later course

# Types of p-values (ways to calculate p-values)

- We saw simulated p-values above 
    - based on a parametric assumption of the null hypothesis
- This parametric null hypothesis assumption also admits a theoretical (rather than a simulated) p-value calculation
    - This is the one sample t-test
- We can also derive a theoretical p-value, **but one which does not require the parameteric normality assumption**
    - This would be called a **nonparametric test**, and in this case specifically this is the (nonparametric) **sign test**

In [100]:
stats.ttest_1samp(amazonbooks.Height.dropna(), mu)

TtestResult(statistic=3.183365159892701, pvalue=0.0015988011855155308, df=320)

For the null hypothesis

$\huge
\require{enclose}
\begin{align}
H_0 &: {} \mu = 8 \quad\text{(null hypothesis)}\\
 &\color{white}{:} {} \enclose{horizontalstrike}{\sigma = 1}\; \sigma \text{ estimated as sample standard deviation $\hat \sigma\; (s)$} \\
 &\color{white}{:} {} \text{and }  \text{the population is normally distributed}\\
 &\color{white}{:} {} (\text{and our sample size is }\enclose{horizontalstrike}{325} 321 \text{ [because NAs]})\\
 H_1 &:{} H_0 \text{ is } \text{False} \quad\text{(alternative hypothesis)}\\
\end{align}$

we observed a (theoretical) test-t based p-value of `0.001598...` (for the observed sample mean of `8.16...` relative to the theoretical sampling distribution of x-bar under the assumptions of the null hypotheis). This is "Strong evidence against the null hypothesis".

In [122]:
# sign test -- it's nonparametric -- no normality/distributional assumptions made!
((amazonbooks.Height.dropna()>8).sum()+(amazonbooks.Height.dropna()>=8).sum())/2 # out of 321

189.0

In [123]:
189/321

0.5887850467289719

In [107]:
n

321

In [124]:
# sign test p-value
(1-stats.binom(n=n,p=0.5).cdf(189-1))*2

0.0017314192593371747

For the null hypothesis

$\huge
\require{enclose}
\begin{align}
H_0 &: {} Median = 8 \quad\text{(null hypothesis)}\\
 &\color{white}{:} {} (\text{and our sample size is }\enclose{horizontalstrike}{325} 321 \text{ [because NAs]})\\
 H_1 &:{} H_0 \text{ is } \text{False} \quad\text{(alternative hypothesis)}\\
\end{align}$

we observed a (theoretical) (but also nonparameteric) sign test based p-value of `0.0017314...` (for the observed sample based on the assumption that each value in the sample has a 50% chance of being greater than the null hypothesis assumed median of 8 which implies a binomially distributed sampling distribution relative to which p-values may be theoretically calculated). This is "Strong evidence against the null hypothesis".

# More exmples of creating simulated p-values would be good
- Tests about which way an election will -- do we think it's equally likely that each candidate will win?
- Gambling based on a coin -- do we think the coin is actually fair or biased?