# [CPSC 222](https://github.com/GonzagaCPSC222) Intro to Data Science
[Gonzaga University](https://www.gonzaga.edu/)

[Gina Sprint](http://cs.gonzaga.edu/faculty/sprint/)

# Hypothesis Testing
What are our learning objectives for this lesson?
* Apply one sample and two sample hypothesis tests
* Utilize a t-distribution

Content used in this lesson is based upon information in the following sources:
* Dr. Mirjeta Beqiri's Stats notes

## Introduction Hypothesis Testing
Hypothesis testing involves using information about a sample to draw conclusions regarding the value of the population parameter
* Is this estimate accurate?
* If there is a difference between the sample mean and the hypothesized mean, is it:
    * An error, or
    * A true difference

Hypothesis: A statement made about the value of a population parameter. In this class, we are only going to cover hypotheses related to the population mean, $\mu$.

Hypothesis Testing: A procedure, based on sample evidence and probability theory to determine whether the hypothesis is:
* Reasonable statement - accepted
* Unreasonable statement - rejected

## 5 Step Hypothesis Testing Approach
1. State null and alternative hypothesis
2. State the level of significance
3. Select the appropriate test statistic
4. Formulate the decision rule
5. Make a decision


### 1. State Null and Alternative Hypothesis
Null Hypothesis ($H_0$): A tentative **assumption** made about a population parameter, set up in a way to accept or reject it.
* $H_0$ has always the equal sign
    * E.g. $H_0: \mu \geq 20$
    * E.g. $H_0: \mu \leq 20$
* We cannot say that $H_0$ is true, only that we accept it or reject it (false)

Alternative Hypothesis ($H_1$): A statement that will be accepted **if** the Null Hypothesis is false (or rejected)

To help formulate your null and alternative hypothesis, ask if there is there a direction given?
* One-tailed test: when the alternative hypothesis ($H_1$) states a direction
    * Left tail: e.g. $H_1: \mu < 20$
    * Right tail: e.g. $H_1: \mu > 20$
* Two-tailed test: when the alternative hypothesis ($H_1$) does not state a direction
    * E.g. $H_1: \mu \neq 20$

### 2. Select the Level of Significance
If there is a difference between the sample mean and population mean, is this difference statistically significant?

Level of significance ($\alpha$): The probability of rejecting the null hypothesis when it is actually true
* Decide on the level of significance before collecting the data
* Most common used: 0.05 (or 5%) and 0.01 (1%)

### 3. Select the appropriate test statistic
Test Statistic
A value determined from the sample information to help you decide whether to accept/reject H0. There are many test statistics based on different distributions, for example:
* z-test (uses z-score statistic from a Z distribution)
    * Use when you have one or two samples
    * Use when you have the popluation standard deviation
    * Use when your sample size is "large" (> 30 observations)
* t-test (uses t-statistic computed from a t distribution)
    * Use when you have one or two samples
    * Use when you do not have the population standard deviation
    * Use when your sample size is "small" (<= 30 observations)
* ANOVA (uses f-statistic computed from a F distribution)
    * Use when you have three or more samples
* And many others!

<img src="https://blog.minitab.com/hubfs/Imported_Blog_Media/distribution_field_guide_1.gif" width = "400"/>
(image from https://blog.minitab.com/hubfs/Imported_Blog_Media/distribution_field_guide_1.gif)

In this class, we will work with one and two samples. How do you know whether to use the z-test or t-test then? Here is a helpful graphic:

<img src="https://www.statisticshowto.com/wp-content/uploads/2013/08/t-score-vs.-z-score.png" width="300"/>
(image from https://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/t-score-vs-z-score/)

We will primarily use the t-test. This is because we will work with only 1 or 2 samloes, we assume our data is normally distributed (e.g. it follows a bell-shaped curve with the majority of the values centered around the mean) and we assume we do not have access to our data's population standard deviation. Here are the relevant test statistics we will use with the t-test:
* **One-sample** test of mean, using the t-test statistic: $t=\frac{\overline{x} - \mu}{s / \sqrt{n}}$, where $\overline{x}$ is the sample mean, $\mu$ is the *hypothesized* population mean, $s$ is the sample standard deviation, and $n$ is the number of observations in the sample (e.g. the sample size)  
    * Degrees of freedom: $df = n - 1$
    * Test the hypothesized population mean against a constant value
* **Two-sample** test of **independent means**, using the t-test statistic: $t=\frac{\overline{X_1} - \overline{X_2}}{\sqrt{s_p^2(\frac{1}{n_1}+\frac{1}{n_2})}}$, where $\overline{X_1}$ is the sample mean for sample #1, $\overline{X_2}$ is the sample mean for sample #2, $s_p^2$ is the pooled variance ($s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}$), $n_1$ is the number of observations in sample #1, and $n_2$ is the number of observations in sample #2
    * Degrees of freedom: $df = n_1 + n_2 - 2$
    * Observations in each sample are from two different groups
* **Two-sample** test of **dependent means**, using the t-test statistic: $t=\frac{\overline{d} - \mu_d}{s_{\overline{d}}}$, where $\overline{d}$ is mean difference $\overline{d} = \frac{\sum{d}}{n}$, $\mu_d$ is the hypothesized mean difference (0 for our purposes), and $s_{\overline{d}}$ is the standard error of the mean difference $s_{\overline{d}} = \frac{s_d}{\sqrt{n}} = \frac{\frac{\sum{(d - \overline{d})^2}}{n - 1}}{\sqrt{n}}$
    * Degrees of freedom: $df = n - 1$, $n$ is the number of paired observations
    * Observations in each sample are from the same group (e.g. one group that was sampled twice, for example the same set of students tested a observed a week apart)

### 4. Formulate the decision rule
Decision rule: a statement of the conditions under which the $H_0$ is rejected and the conditions under which $H_0$ is not rejected  
Critical value: the dividing point between the region where $H_0$ is rejected and the conditions under which $H_0$ is not rejected  Once you have determined if you have a one or two-tailed test, your degrees of freedom (df), and the level of significance, look up the critical t-value using the t-table:
![](https://media.cheggcdn.com/media/cb1/s675x1024/cb100490-5be7-4807-8972-c0d984f9e9fc/php3o1s6N.png)
(image from https://media.cheggcdn.com/media/cb1/s675x1024/cb100490-5be7-4807-8972-c0d984f9e9fc/php3o1s6N.png)

### 5. Make a decision
* Select a sample
* Compute the test statistic value
* Compare the computed value to the critical value (from step 4)
* Make a decision whether or not to reject the null hypothesis
* Compute the p-value
* State the conclusion

p-value: The probability of observing a sample value as extreme as, or more extreme than, the value observed, given that the null hypothesis is true
* If p-value < level of significance ($\alpha$), then Reject $H_0$
* If p-value > level of significance ($\alpha$), then Do Not Reject $H_0$

<img src="https://i.ytimg.com/vi/DlwOTOydeyk/maxresdefault.jpg" width="500"/>
(image from https://i.ytimg.com/vi/DlwOTOydeyk/maxresdefault.jpg)

Once you have computed your test statistic value, find what p-value (range) is associated with it in the table (given the df and one vs two-tailed).

## Types of Errors
* Type I error (AKA $\alpha$ error)
    * Reject the null hypothesis when it is actually true
* Type II error (AKA $\beta$ error)
    * Fail to reject the null hypothesis when it is false

<img src="https://s3.amazonaws.com/libapps/accounts/73970/images/hypothesis_testing.png" width="400"/>
(image from https://s3.amazonaws.com/libapps/accounts/73970/images/hypothesis_testing.png)

Example: We have a shipment of 4,000 units. If more than 6% is defective, shipment will be rejected. Let's take a sample of 50 units.
* H0: Defective units are less than or equal to 6%
* H1:  Defective units are greater than 6%

Let’s assume that the sample indicated that: 
* 8 units out of 50 were defective.
* (8/50) * 100 = 16%
Since 16% > 6%, decision would be Reject H0. But, what if, by chance, we got the only 8 units (out of 4,000) that were defective?
* (8/4,000) * 100 = 0.2%
* 0.2% < 6%
Therefore, decision would be Do Not Reject H0. But, we rejected H0 when it was true, this indicates Type I error

Let's assume that the sample indicated that only 2 units were defective.
* (2/50) * 100 = 4%
Since 4% < 6%, decision would be Do Not Reject H0. But, let’s suppose that there are 320 (out of 4,000) that were defective, and, by chance, only 2 were included in the sample.
* (320/4,000) * 100 = 8%
8% > 6%
Therefore, decision would be Reject H0. But, we did not reject H0 when it was false; that’s why this is Type II error