# Hypothesis Testing
*
*

## One Sample T Test

It is a statistical procedure used to examine or compare the mean of sample data to already known population mean. `stats.ttest_1samp()`

$$t = \dfrac{\bar x - \mu}{\dfrac{s}{\sqrt n}} $$

It is used when the sample size is **less than or equal to 30**.

### Degree of Freedom

It is the number of values in the final calculation of a statistic that are free to vary. It can be calculate: $n -1$


<hr>

**EXAMPLE:**
Calculate the resting systolic blood pressure of 20 first-year resident female doctors and compare it to the general public population mean of 120mmHg.

<u>Solution</u>

**Null Hypothesis:** There is no significant difference between the blood pressures of the resident female doctors and the general population.

**Alternate:** There is a statistically significant difference between the blood pressure of the resident female doctors and the general population.

In [1]:
from scipy import stats

In [18]:
female_doctor_bps = [128, 127, 118, 115, 144, 142, 133, 140, 132, 131, 
                     111, 132, 149, 122, 139, 119, 136, 129, 126, 128]

# one sample t-test
stats.ttest_1samp(female_doctor_bps,120)

Ttest_1sampResult(statistic=4.512403659336718, pvalue=0.00023838063630967753)

The pvalue is less than 0.05. Hence, reject the null hypothesis. *The is a statistically significant difference between the resting systolic blood pressure of the resident female doctors and the general population.*

<hr>

## Two Sample T Test

It is the statistical procedure used to examine or compare the mean of two separate samples. `stats.ttest_ind()`

$$t = \dfrac{\bar x_1 + \bar x_2}{\sqrt{s_p^2(\frac{1}{n_2} + \frac{1}{n_2})}}$$

**DOF:** $n_1 + n_2 - 2$. 

<hr>

**EXAMPLE:**
Compare the blood pressure of male consultant doctors with the junior resident female doctors.

<u>Solution</u>

**Null Hypothesis:** There is no significant difference between the blood pressure of male consultant doctors and junior  resident female doctors.

**Alternate:** There is a statistically significant difference between the blood pressure of the male consultant doctors and junior resident female doctors.

In [6]:
import pandas as pd

# read data
bps = pd.read_csv('../data/bp.csv')

In [11]:
bps.head()

Unnamed: 0,female_bps,male_bps
0,128,118
1,127,115
2,118,112
3,115,120
4,144,124


In [17]:
# two sample t test
stats.ttest_ind(bps.iloc[:,0],bps.iloc[:,1])

Ttest_indResult(statistic=3.5143256412718564, pvalue=0.0011571376404026158)

## Paired Sample T Test

It is a statistical procedure for examining or comparing the means of two samples. It has the situation of before and after. `stats.ttest_rel()`

$$ \large t = \dfrac{\bar d}{\frac{s}{\sqrt{n}}}$$

Degree of Freedom: $n - 1$

<hr>

**EXAMPLE:**
Measure and compare the amount of sleep by patients before and after taking soporific drug to help them sleep.

<u>Solution</u>

**Null Hypothesis:** The drug has no effect on the sleep duration of the patients. 

**Alternate:** The drug has an effect on the sleep duration of the patients.

In [19]:
sleep_duration = pd.read_csv('../data/sleep_duration.csv')
control, treatment = sleep_duration.iloc[:,0],sleep_duration.iloc[:,1]

In [20]:
# paired sample t test
stats.ttest_rel(control,treatment)

Ttest_relResult(statistic=-3.9698390753392734, pvalue=0.003255434487402806)

**N.B:** pvalue is less than 0.05. Therefore, we reject the null hypothesis. There is a statistically significant difference in sleep duration caused by soporofic drug.

## One Sample Z Test

It is a statistical test to determine whether two population means are different when the variances are known.

**T-Test Vs Z-Test**

In t test, the sample size is less than or equal to 30 and the population standard deviation is unknown.

In z test, the sample size is greater than 30 and the population standard deviation is known.

**The One Sample z-test** is used to test whether the mean of a population is greater than, less than, or not equal to a specific value.

$$ z = \dfrac{\bar x - \mu}{\sigma} $$