Source: https://medium.com/analytics-vidhya/probability-distributions-and-hypothesis-tests-using-python-2ee25cb3a90f

In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from math import sqrt

**Hypothesis testing** involves using the data to accept or reject a null hypothesis. First we define a null hypothesis and an alternative hypothesis. Then identify the best test to used to check the validity of the null hypothesis (t-test, chi^2 test, z-test, etc)

### <font color = 'orange'>**t-test (one-sample)** <font color = 'black'>is used when: 
* population standard deviation is unknown

**Problem:**
Using given data set: The teacher of the class has high expectations from the students and expects the average score to be 72 and the data follows a normal distribution. Check whether the expection is correct or not. (alpha = 0.05)

$H_{0}$ - The population mean is 72.

$H_{a}$ - The population mean is not equal to 72.

alpha = 0.05

In [13]:
#creating a dataset from scratch into pandas
data = {'Name':['Tom', 'nick', 'krish', 'jack', 'sam'],
        'Score':[66, 71, 49, 58, 61]}
# Creating DataFrame
grade_df = pd.DataFrame(data)
grade_df

Unnamed: 0,Name,Score
0,Tom,66
1,nick,71
2,krish,49
3,jack,58
4,sam,61


In [14]:
stats.ttest_1samp(grade_df.Score, 72)


Ttest_1sampResult(statistic=-2.9504297943220106, pvalue=0.0419514282794582)

**Findings:**
The t-statistic 'implies mean is less than the population mean and has only 4.1% probability of being part of the distribution with a population mean of 72.'

* t = -2.95 
* p-value = 0.041. (4.1%)

We reject the null hypothesis that the average score is 72. 


### <font color = 'orange'>**Chi-square goodness of fit test** <font color = 'black'>is used when: 
* compare the observed distribution of data with the expected distribution
* checks for statistically significant differencence between the observed and theoretical distribution

**Problem:**
110 visual artists were surveyed to find out their zodiac sign. The results were: Aries (31), Cancer (19), Libra (19), Pisces (23), Capricorn (18). The manager believes that 30% are Aries, 15% are cancer, 20% are libra, 25% is Pisces, and 10% Capricorn. Test the hypothesis that zodiac signs are distributed across visual artists according to the assumption of the manager.

$H_{0}$ = The manager's assumptions are correct.

$H_{a}$ = The manager's assumptions are not correct.
    
α = 0.05

In [18]:
# Create list of observed frequency and list of expected frequency
n_obs = [31, 19, 19, 23, 18]
n_exp = [110 * 0.3, 110 * 0.15, 110 * 0.2, 110 * 0.25, 110 * .1 ]

s, p = stats.chisquare(n_obs, n_exp)

s, p

(6.1, 0.1918036437841208)

In [21]:
if p < 0.05:
    print(f'We fail to reject the null hypothesis')
else:
    print(f'We reject the null hypothesis and conclude that the manager\'s assumption is false')

We reject the null hypothesis and conclude that the manager's assumption is false


### <font color = 'orange'>**z-test** <font color = 'black'>is used when:
* population variance is known
* population is normally distributed
* population variance is known, but we need population mean. 
* large sample size and known population variance (n>30)

![Screen%20Shot%202021-02-17%20at%2011.36.38%20AM.png](attachment:Screen%20Shot%202021-02-17%20at%2011.36.38%20AM.png)