# Lecture 07:  (Probability II) Hypothesis Testing

This lecture considers [statistical hypothesis testing](https://en.wikipedia.org/wiki/Statistical_hypothesis_testing). Statistical hypothesis testing tests a [null hypothesis](https://en.wikipedia.org/wiki/Null_hypothesis). A null hypothesis ($H_0$) is a prediction of an experimental result, and it is always a statement of **equality**:

<center>$H_0: \ x = c$</center>

where $x$ is a random variable, and $c$ is a constant value.

For most null hypotheses, $c = 0$.  This is why it is called a "null hypothesis".

Alternative hypotheses ($H_1$) are statements of **inequality**.  Example alternative hypotheses are:

<center>$H_1: \ x > c$</center>

<center>$H_1: \ x < c$</center>

<center>$H_1: \ x \ne c$</center>

The most important difference between null and alternative hypotheses is that <font color="red">only null hypotheses are testable.</font> This is why "*hypothesis testing*" is sometimes called "*null hypothesis testing*".

For all hypothesis tests, the final result is usually a probability value, or "p value".

The null hypothesis is rejected when $p < \alpha$, where $\alpha$ is the pre-specified [Type I error rate](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors).

By convention, $\alpha$=0.05.

This null hypothesis rejection is sometimes called "statistical significance". The probabilistic meaning of this will be discussed in the next lecture.



**Goals**:
* To learn the most common hypothesis testing procedures, and how they are related.
* To learn how to conduct these tests in Python.

In the next lecture, we will consider how hypothesis testing and probability are related. In particular, we will show that hypothesis testing and the previous lecture (Probability I: Random variables) are very closely related.

___

## Types of hypothesis tests

The following five types of hypothesis tests are considered in this notebook:

* One-sample t test
* Paired t test
* Two-sample t test
* Regression
* One-way ANOVA

These tests have different names, but they are all very closely related. All tests are consequences of the mathematics of the [Normal distribution](https://en.wikipedia.org/wiki/Normal_distribution). Understanding why is mathematically difficult, but it is conceptually simple. Nevertheless, to understand the conceptual connection between these tests and the Normal distribution, it is easiest to conduct the tests first, and understand their basics. This lecture thus focusses on the tests themselves. The next lecture will consider more deeply how the tests are conceptually connected to the Normal distribution.

For now, it is sufficient to understand that these five tests are just special names for specific cases of a single [independent variable](https://en.wikipedia.org/wiki/Dependent_and_independent_variables) (IV) and a single [dependent variable](https://en.wikipedia.org/wiki/Dependent_and_independent_variables) (DV) . Those IV cases are summarized in the following table:

| IV type        | Number of IV values | Type of DV | Hypothesis test name  |
| :------------- |:-------------:| -----:| -----:|
| Categorical   | 1 | Scalar | One-sample t test |
| Categorical   | 1 | Paired difference (scalar) | Paired t test |
| Categorical   | 2 | Scalar | Two-sample t test |
| **Continuous**    | $n$ | Scalar | Regression |
| Categorical   | $g$ | Scalar | One-way ANOVA |
| Categorical   | $g$ | Paired difference (scalar) | One-way repeated-measures ANOVA |

where:

* $n$ = sample size
* $g$ = number of groups

Note that this notebook does not consider repeated-measures ANOVA;  this test is possible in Python, but it requires a separate Python package called [statsmodels](https://www.statsmodels.org/stable/index.html).  If you are interested in trying one-way repeated-measures ANOVA, please read this [this blog](http://www.pybloggers.com/2018/10/repeated-measures-anova-in-python-using-statsmodels/).


___

## One-sample t test


The one-sample t statistic is:

<table border="1">
  <tr>
    <td></td>
    <td></td>
    <td>$t = \frac{  \overline{y}   - \mu  }{  s / \sqrt{n}  }$</td>
    <td></td>
    <td>(Equation 1)</td>
  </tr>
  <tr>
</table>



where $\overline{y}$ is the sample mean, $\mu$ is the hypothesized mean, $s$ is the sample standard deviation and $n$ is the sample size.

A one sample t-test tests the following null hypothesis:

<center>$H_0: \ \overline{y} = \mu$</center>




### Example:

This example is a weight loss study from [Real Statistics Using Excel](http://www.real-statistics.com/students-t-distribution/one-sample-t-test/).

The data and results are:



<img alt="RealStatsOneSampleTTest" width=500 src="https://i1.wp.com/www.real-statistics.com/wp-content/uploads/2012/11/one-sample-t-test-1.png"/>


### Python:

This test can be conducted in Python using **scipy.stats.ttest_1samp**  like this:

In [1]:
import numpy as np
from scipy import stats

y       = np.array([23, 15, -5, 7, 1, -10, 12, -8, 20, 8, -2, -5])  # data
mu      = 0    # hypothesized sample mean
results = stats.ttest_1samp(y, mu)

print(results)

Ttest_1sampResult(statistic=1.4492553137533357, pvalue=0.17516945558857122)


The t value and p value can be retreived like this:

In [2]:
t = results.statistic
p = results.pvalue

print(t)
print(p)

1.4492553137533357
0.17516945558857122


Or like this:

In [3]:
t,p = results

print(t)
print(p)

1.4492553137533357
0.17516945558857122


Note that the p value from **stats.ttest_1samp** does not match the p value from Real Statistics. This is because **stats.ttest_1samp** uses [two-tailed inference](https://en.wikipedia.org/wiki/One-_and_two-tailed_tests) and the Real Statistics example uses one-tailed inference.

To see agreement between the results, simply divide the **stats.ttest_1samp** p value result by two:

In [4]:
print( p / 2 )

0.08758472779428561


The p value is larger than $\alpha$=0.05, so $H_0$ is not rejected.

___

## Paired t test

The paired t statistic is:


<table border="1">
  <tr>
    <td></td>
    <td></td>
    <td>$t = \frac{  \overline{d}  }{  s / \sqrt{n}  }$</td>
    <td></td>
    <td>(Equation 2)</td>
  </tr>
  <tr>
</table>



where:

<center>$d_i = (y_1)_i - (y_2)_i$</center>


A paired t-test tests the following null hypothesis:

<center>$H_0: \ \overline{d} = 0$</center>



### Example:

This example is from [WebStat at the University of New England](https://webstat.une.edu.au/unit_materials/c6_common_statistical_tests/example_paired_sample_t.html).

The data and results are:



<img alt="WebStatData" width=250 src="https://webstat.une.edu.au/unit_materials/c6_common_statistical_tests/image67.gif">

<img alt="WebStatResults" width=500 src="https://webstat.une.edu.au/unit_materials/c6_common_statistical_tests/image71.gif">







### Python:

This test can be conducted in Python using **scipy.stats.ttest_rel** like this:

In [5]:
y_pre  = np.array( [3, 0, 6, 7, 4, 3, 2, 1, 4] )
y_post = np.array( [5, 1, 5, 7, 10, 9, 7, 11, 8] )

t,p    = stats.ttest_rel(y_pre, y_post)

print(t)
print(p)

-3.1428571428571423
0.013745824394788489


The p value is smaller than $\alpha$=0.05, so $H_0$ is rejected.

___

## Two-sample t test

The two-sample t statistic is:


<table border="1">
  <tr>
    <td></td>
    <td></td>
    <td>$t = \frac{  \overline{y}_1   - \overline{y}_2  }{  s_p  \sqrt{ \frac{1}{n_1} + \frac{1}{n_2}  }  }$</td>
    <td></td>
    <td>(Equation 3)</td>
  </tr>
  <tr>
</table>


where

<center>$s_p = \sqrt{   \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}   }$</center>


A two-sample t-test tests the following null hypothesis:

<center>$H_0: \ \overline{y}_1 - \overline{y}_2 = 0$</center>


### Example:

This example is from [qimacros.com](https://www.qimacros.com/hypothesis-testing/two-sample-t-test/).

The data and results are:



<img alt="WikiHowData" width=150 src="https://www.qimacros.com/hypothesis-testing/t-test-two-sample-data.jpg"/>

<img alt="WikiHowData" width=700 src="https://www.qimacros.com/hypothesis-testing/t-test-two-sample-results.jpg"/>



     

### Python:

This test can be conducted in Python using **scipy.stats.ttest_ind** like this:

In [6]:
beginning = np.array( [3067, 2730, 2840, 2913, 2789] )
end       = np.array( [3200, 2777, 2623, 3044, 2834] )

t,p    = stats.ttest_ind(beginning, end)

print(t)
print(p)

-0.2372742730908139
0.8184074100386953


The p value is not less than $\alpha$=0.05, so $H_0$ is not rejected.

___

## Regression

Regression analysis tests the null hypothesis:

<center>$H_0: \ m = 0$</center>


where $m$ is the slope of the regression line.  (Search for "slope" in the Lecture 5 notes for a reminder of what "slope" means).

A t statistic also exists for regression, but it is slightly more complicated than the t statistics described above, so it is not described here.




### Example:

This example is from [Real Statistics Using Excel](http://www.real-statistics.com/regression/hypothesis-testing-significance-regression-line-slope/).  (This example was also considered in Assignment 05 for this class.)

The data and results are:



<img alt="RealStatsOneSampleTTest" width=500 src="https://i0.wp.com/www.real-statistics.com/wp-content/uploads/2012/12/slope-regression-t-test.jpg?w=473"/>
     

### Python:

This test can be conducted in Python using **stats.linregress** like this:

In [7]:
cig      = np.array([5, 23, 25, 48, 17, 8, 4, 26, 11, 19, 14, 35, 29, 4, 23])
life     = np.array([80, 78, 60, 53, 85, 84, 73, 79, 81, 75, 68, 72, 58, 92, 65])

results  = stats.linregress(cig, life)
p        = results.pvalue
print(results)
print()
print(p)

LinregressResult(slope=-0.6282004052311659, intercept=85.72042119481794, rvalue=-0.713430174386581, pvalue=0.0028223429900712275, stderr=0.17112895461639727)

0.0028223429900712275


The p value is less than $\alpha$=0.05, so $H_0$ is rejected.

___

## One-way ANOVA 

One-way ANOVA analysis tests the null hypothesis:

<center>$H_0: \ \overline{y}_i - \overline{y}_j = 0$</center>

where $i$ and $j$ represent different groups. Thus this null hypothesis implies: *no difference between any group means*.

Instead of the $t$ statistic, ANOVA uses the $F$ statistic. Similar to regression's $t$ statistic, the $F$ statistic is more difficult to calculate than the $t$ statistics described above, so $F$ statistic calculation is not described here.


### Example:

This example is from [StackOverflow](https://stackoverflow.com/questions/8320603/how-to-do-one-way-anova-in-r-with-unequal-sample-sizes).

The data are:

<center>

```R
site1 <- c(34,25,27,31,26,34,21)
site2 <- c(33,35,31,31,42,33)
site3 <- c(17,30,30,26,32,28,26,29)
site4 <- c(28,33,31,27,32,33,40)
```

</center>


The results are:


<center>
    
```R
Analysis of Variance Table

Response: Y
          Df Sum Sq Mean Sq F value  Pr(>F)  
Site       3 212.35  70.782  3.4971 0.03098 *
Residuals 24 485.76  20.240                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
```

</center>


### Python:

This test can be conducted in Python like this:


In [8]:
site1   = np.array([34,25,27,31,26,34,21])
site2   = np.array([33,35,31,31,42,33])
site3   = np.array([17,30,30,26,32,28,26,29])
site4   = np.array([28,33,31,27,32,33,40])

results = stats.f_oneway(site1, site2, site3, site4)

print(results)

F_onewayResult(statistic=3.4971081266542487, pvalue=0.03097911104360909)
