# Hypothesis Testing 
This document contains personal understanding. If there is a fallacy, please let me know.

**Update any time**

## Definition

**Wiki**: 

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. 

**Personal understanding**: 

Hypothesis testing is a method of verifying whether a hypothesis is reliable.

In the hypothesis test, the calculated p-value is used against the significance level α. If **p < α**, then **H0 is rejected**.




#### Method

1. Make the H0
1. Define the significance level - α
1. Calculate the p-value
1. Compare p with α: H0 = FLASE **if** p<α **else** H0 = TRUE




## Python 

This is an example based on the tutor.

In [2]:
# import relevant packages
import numpy as np
import scipy.stats as sps
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline


# Get the data on the weights of 4-year-olds:
data = pd.read_csv('https://github.com/huanfachen/QM_2021/raw/main/data/toddler_data.csv')[['sample_1','sample_2']]

# Look at the first few rows:
data.head()

#### Step 1
Make the H0

H0: Mean of population 1 =  Mean of population 1

H1: Mean of population 1 <> Mean of population 1

#### Step 2

Set significance level:

In [3]:
# Set significance level:

alpha = 0.05

#### Step 3

Calculate the p-value

In [4]:
# Store each sample separately:
data1 = data['sample_1']
data2 = data['sample_2']

# There is a built-in scipy.stats function (ttest_ind) that does steps 3 and 4 for us.
# We just need to know whether we can assume that the samples are drawn...
# ... from populations with the same standard deviation or not.
# (Provided neither standard deviation is double the other, this should be ok)

std_ratio = data1.std()/data2.std()

print("std_ratio =", std_ratio)

if std_ratio > 0.5 and std_ratio < 2:
    print("Can assume equal population standard deviations.")
    equal_stds = True
else:
    print("Cannot assume equal population standard deviations.")
    equal_stds = False


# Calculate the test statistic and the p-value:
# There are two outputs from the function of sps.ttest_ind: the **test statistic** and the **p value**
test_stat, p_value = sps.ttest_ind(data1, data2, equal_var = equal_stds)
print("p-value =", p_value)

std_ratio = 0.9720359813764003
Can assume equal population standard deviations.
p-value = 0.04479005662769824


#### Step 4

Make the comparation.

In [5]:
# Reach a conclusion:

if p_value < alpha:
    print("p-value < significance threshold.")
    print("Reject H0. Accept H1.")
    print("Conclude that samples are drawn from populations with different means.")
elif p_value >= alpha:
    print("p-value >= significance threshold.")
    print("No significant evidence to reject H0.")
    print("Assume samples are drawn from populations with the same mean.")

p-value < significance threshold.
Reject H0. Accept H1.
Conclude that samples are drawn from populations with different means.
