In [1]:
import numpy as np
import pandas as pd 
import scipy.stats as st

# Hypothesis Testing
- Hypothesis Testing answers to some questions
    + Whether males default more than females?
    + Do self-driving cars crash more than normal cars?
    + Does drug X help prevent/treat disease Y?

- During data exploration
    + If we discovered interesting patterns hidden in the data
    + Hypothesis testing enables us to confirm whether these patterns were present in the data
        + by luck 
        + or by some real phenomena

#### Null and Alternate hypothesis
- The aim of the hypothesis test is to determine if the null hypothesis can be rejected or not
- The **null hypothesis** $H_0$ = a statement that assumes
    + Nothing interesting is going on
    + No relationship is present between two variables
    + No difference between a sample and a population

- Example: if we suspect that males default more than females
    + Null hypothesis = Males do not default more than females
    + If there is little or no evidence against the null hypothesis, we accept the null hypothesis
    + Otherwise, we reject the null hypothesis in favor of the **alternate hypothesis** $H_1$
        + States that something interesting is going on
        + There is a relationship between two variables
        + The sample is different from the population

- The null hypothesis is assumed true 
    + Statistical evidence is required to reject it in favor of the alternative hypothesis

#### Z-test vs t-test

| -                     | **t-test**                                                                                                                        | **Z-test**                                                                                                                                        |
|:-------------------   |:--------------------------------------------------------------------------------------------------------------------------------- |:------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Meaning**           | T-test is applied <br>to identify, how the means of two sets <br>of data differ from one another <br>when variance is not given.  | Z-test implies a hypothesis test <br>which ascertains if the means of two datasets <br>are different from each other <br>when variance is given.  |
| **Based on**          | Student-t distribution                                                                                                            | Normal distribution                                                                                                                               |
| **Population std**    | Unknown                                                                                                                           | Known                                                                                                                                             |
| **Sample Size**       | Small                                                                                                                             | Large                                                                                                                                             |

# Z-test
- Z-test = a statistical technique to test the Null Hypothesis against the Alternate Hypothesis. Used when
    + sample data is normally distributed 
    + Population size is greater than 30
        + According to the Central Limit Theorem, the samples are considered to be normally distributed whenever sample size exceeds 30
    + Know population std

#### Z-Score
- $Z = \frac{\bar{x}-\mu}{\sigma/ \sqrt{n}}$
    + $\bar{x}$: mean of sample
    + $\mu$: mean of population
    + $\sigma$: std of population
    + $n$: sampling size

#### p-value
- p-value quantifies the rareness in our results
    + Tells us how often we’d see the results of an experiment 
    + We can use p-values to reach conclusions in significance testing

<img src="./assets/2.png" width="500"/>

#### Threshold $\alpha$
+ We compare the p-value to a significance level ($\alpha$) to make conclusions about our hypotheses
    + If $\text{p-value} < \alpha$ = the result would rarely occur by chance
        + The result is statistically significant
        + We can reject the null hypothesis in favor of the alternative hypothesis
    + If $\text{p-value} \geq \alpha$
        + Fail to reject the null hypothesis
        + This doesn’t mean we accept the null hypothesis though!

+ How to choose $\alpha$?
    + Choice of $\alpha$ depends on the situation
    + 0.05 = the most widely used value across all scientific disciplines (Confidence level = 95%)
    - $p<0.05$ meaning
        + there is less than 5% chance of seeing our results, in the world where the null hypothesis is true
    - $p<0.05$ **does not** mean
        + there’s less than 5% chance that our experimental results are due to random chance

<p align="center">
    <img src="./assets/3.png" width="220"/>
    <br/>
    <i><a href=https://xkcd.com//>Credit</a></i>
</p>

#### Z-score, p-value, Confidence level in normal distribution

<p align="center">
    <img src="./assets/4.png" width="680"/>
    <br/>
    <i><a href=https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/what-is-a-z-score-what-is-a-p-value.htm>Credit</a></i>
</p>

| **z-score** ($\sigma$) 	| **p-value** (Probability) 	| **Confidence level** 	| $\alpha$ 	|
|:----------------------:	|:-------------------------:	|:--------------------:	|:--------:	|
|   < -1.65 or > +1.65   	|           < 0.10          	|          90%         	|    0.1   	|
|   < -1.96 or > +1.96   	|           < 0.05          	|          95%         	|   0.05   	|
|   < -2.58 or > +2.58   	|           < 0.01          	|          99%         	|   0.01  	|


#### Example 1
- **Problem**: A company claims that it has a high hiring bar which its employees having an IQ above the average
    + A random sample of their 40 employees has a mean IQ score of 115
    + Given mean population IQ is 100 with a standard deviation of 15?
    + Is this sufficient evidence to support the company’s claim?
- **Solution**
    + Hypotheses
        + H0 - 40 employees have population mean IQ = 100: $\mu = 100$
        + H1 - 40 employees have above population average IQ scores: $\mu > 100$
    + Choose Threshold $alpha = 0.05$
    + Calc z-score
        + $Z = \frac{\bar{x}-\mu}{\sigma/ \sqrt{n}} = \frac{115-110}{15/\sqrt{40}} = 6.32$
    + Calc p-value from z-score

In [2]:
z_scores = 6.32
p_values = st.norm.sf(abs(z_scores))
p_values

1.3078165132642286e-10

- Conclude
    + p < 0.05 => we can reject the null hypothesis
    + 40 employees have an unusually higher IQ score compared to random samples of similar size from the entire population
- **p = extremely small** meaning
    + If 40 employees have population mean IQ (H0 is true), there is an extremely small **random** chance that a sample of 40 employees will have IQ mean = 115

#### Example 2
- **Problem**: A study that tests the impact of smoking on the duration of pregnancy. 
    + We randomly sample 40 women who smoked, has mean duration of pregnancy = 260 days
    + While the mean pregnancy length = 266 days with std = 21 days
    + Is this sufficient evidence to support the company’s claim?
- **Solution**
    + State Hypotheses
        + H0 - smoking has no effect on the duration of pregnancy, 40 smoking women have population mean = 266 days: $\mu = 266$
        + H1 - smoking impact on the duration of pregnancy: $\mu < 266$
    + State the threshold: $\alpha = 0.05$
    + Calc z-score
        + $Z = \frac{\bar{x}-\mu}{\sigma/ \sqrt{n}} = \frac{260-266}{21/\sqrt{40}} = -1.81$
    + Calc p-value from z-score

In [3]:
z_scores = -1.81
p_values = st.norm.sf(abs(z_scores))
p_values

0.03514789358403879

- Conclude
    + p < 0.05 => we can reject the null hypothesis
    + 40 smoking women have an unusually duration of pregnancy compared to random samples of similar size from the entire population
- **p = 0.035** meaning
    + If smoking has no effect on the duration of pregnancy (H0 is true), there is a 3.5% **random** chance that a sample of 40 smokers will have pregnancies lasting less than 260 days

# t-test
- t-test is used when
    + Student (t) distribution
    + Small sample/population size
    + Dont know the std of population
- The testing process
    + State both hypotheses: null and alternate hypotheses
    + Choose the significance level $\alpha$
        + The Probability threshold that determines whether accept or reject a hypothesis.
        + If confidence level = 95%, $\alpha = 1- cl = 0.05$
    + Choose a test and compute the test statistic.
    + Make the decision to reject or accept the null hypothesis
        + If **p-value** < $\alpha$ $\to$ reject the null hypothesis in favor of the alternative
            + This probability of seeing a result as extreme or more extreme than the one observed (known as the p-value)


- Usually, three types of tests are carried out:
    + One-Sample t-Test
    + Two-Sample t-Test
    + Paired t-Test

## One Sample t-Test
- One sample t-test checks if a sample mean differs from the population mean or not

#### Example
- **Problem**:
    - Student Alcohol Consumption: Take a sample from the data
    - Check if sample mean is differs from population mean
- **Solution**:
    - Hypotheses
        + Null hypothesis $H_0$: $\bar{x} = \mu$
        + Alternate hypothesis $H_1$: $\bar{x} \neq \mu$
    - Threshold $\alpha = 0.05$

In [4]:
df = pd.read_csv('./data/student-mat.csv')
df['grade'] = df['G1'] + df['G2'] + df['G3']

# population
pop = df['grade']
pop_mean = pop.mean()

# sample
sample = pop.sample(
    n=100,
    random_state = 6)
sample_mean = sample.mean()

print('Population Mean', pop_mean)
print('Sample Mean', sample_mean)

Population Mean 32.037974683544306
Sample Mean 31.1


- `ttest_1samp` returns
    - test statistic: tells how much the sample mean deviates from the population 
    - p-value

In [5]:
# Test
result = st.ttest_1samp(
    a=sample,
    popmean=pop_mean)
print(result)

Ttest_1sampResult(statistic=-0.860205141708286, pvalue=0.3917543405621652)


- Conclude
    - p-value = 0.40 > $\alpha = 0.05$
    - Cannot reject the null hypothesis $H_0$ and the difference in both the means is by chance and not statistically significant
- if we construct a 95% confidence interval with the sample mean, the population mean would be captured in it

## Two-sample t-test
- Two-sample t-test: checks whether means of two independent samples differ from each other

#### Example
- **Problem**: 
    + Student Alcohol Consumption: Take 2 sample from dataset
    + check if the mean for both groups differ from each other or not
- **Solution**
    - Hypotheses
        + Null hypothesis $H_0$: $\bar{x}_1 = \bar{x}_2$
        + Alternate hypothesis $H_1$: $\bar{x}_1 \neq \bar{x}_2$
    - Threshold $\alpha = 0.05$

In [6]:
df = pd.read_csv('./data/student-mat.csv')
df['grade'] = df['G1'] + df['G2'] + df['G3']
df['alc'] = df['Walc'] + df['Dalc']

# 2 Samples
sample_1 = df[df['alc']>5]['grade']
sample_2 = df[df['alc']<=5]['grade']

sample_1_mean = sample_1.mean()
sample_2_mean = sample_2.mean()

print('Sample 1 Mean:',sample_1_mean)
print('Sample 2 Mean:',sample_2_mean)

Sample 1 Mean: 29.5
Sample 2 Mean: 32.62305295950156


In [7]:
# Test
result = st.ttest_ind(
    a=sample_1,
    b=sample_2)
print(result)

Ttest_indResult(statistic=-2.1943004704717235, pvalue=0.028798495846983758)


- Conclude
    + p-value = 0.029 < $\alpha=0.05$
    + we can reject the null hypothesis 
    + The difference between the mean from 2 samples is statistically significant

## Paired t-test
- Paired t-test checks whether the means of the same sample differ at two different times

#### Example
- **Problem**: 
    + Student Alcohol Consumption: Take 2 sample from dataset
    + check if the mean differ for the same sample at different times
- **Solution**
    - Hypotheses
        + Null hypothesis $H_0$: $\bar{x}_1 = \bar{x}_2$
        + Alternate hypothesis $H_1$: $\bar{x}_1 \neq \bar{x}_2$
    - Threshold $\alpha = 0.05$

In [8]:
# 2 samples
sample_1 = df['G1']
sample_2 = df['G3']

sample_1_mean = sample_1.mean()
sample_2_mean = sample_2.mean()

print('Sample 1 Mean:', sample_1_mean)
print('Sample 2 Mean:', sample_2_mean)

Sample 1 Mean: 10.90886075949367
Sample 2 Mean: 10.415189873417722


In [9]:
# Test
result = st.ttest_rel(
    a=sample_1,
    b=sample_2)
print(result)

Ttest_relResult(statistic=3.5517031247185855, pvalue=0.00042906738658041643)


- Conclude
    + p-value is much smaller than $\alpha=0.05$ => we reject the null hypothesis
    + the drop in performance in sample 2 is statistically significant.