## Learning Objectives

By the end of this lesson, you will be able to:

1- Describe null hypothesis and when and how to apply it

2- Define the metrics we need in order to statistically accept or reject null

3- Understand how and when to use Z test or T test

4- Distinguish whether or not two groups are different

## What is null hypothesis?

It is the formal method of reaching conclusions based on population statistics and sample data where we apply changes to a population

## Null hypothesis examples


1- Rain dance, cultures people dance together to have rain during periods of drought 

<img src="Images/raindance.jpg" height="200px">

2- We want to test if a drug has an impact on the brain 

3- Does a food boost your IQ?

4- A/B testing


## Activity: Review of Z distribution, CDF and SF

Generate 10000 samples as a random variable with Normal distribution with a pre-defined mean and standard deviation.

**Hint:** `import numpy as np`

        `mean  = 60`
        
        `sigma = 10`

       X = np.random.normal(mean, sigma, 10000)

Write a function to show that $Z = \frac{X - mean}{sigma}$, is standard Normal

**Hint:** If you subtract each element of the above generated number from the mean, and divide over std (sigma), then the new array would be a Normal distribution with zero mean, std 1 

**Hint:** Plot the histogram of Z, show that Z is standard normal

## Null hypothesis drug example on rat

[Slides for example](https://docs.google.com/presentation/d/1BQibGlrpX71JU0jBU0C7zJJr6S_4WQeFzO7PBmnxf8g/edit?usp=sharing)

A neurologist is testing the effect of a drug on response time by injecting 100 rats with a unit dose of a drug, subjecting each to neurologist stimulus and recording its response time. The neurologist knows that the mean response time for rats not injected with the drug is 1.2 seconds. The mean of the 100 injected rats's response time is 1.05 seconds with population standard deviation of 0.5 seconds. Do you think the drug has effect on response time?

$H_o :$ Drug has no effect ==> $\mu_x = 1.2$ even with drug 

$H_a :$ Drug has effect ==> $\mu_x \neq 1.2$ when the drug is given

## The steps to reject or accept the null hypothesis

1- The population mean is known, $\mu$

2- The sample mean is known, $\bar{x}$

3- Define a significant level, $\alpha$

4- If $N$ > 30 or $\sigma$ is known, then calculate z_score, $z_{score} = \frac{\bar{x} - \mu}{\sigma/\sqrt{N}}$

5- Calculate p-value, $p_{value} = 2SF(z_{score})$

6- Decision: if $p_{value} < \alpha$ then reject the null hypothesis 

## Activity: z-test

Write a function that takes the mean, significant level, and samples of the population as the input arguments.

This function should then decide to reject ot accept the null hypothesis for the drug effect on rats

Recall: $\mu = 1.2$, $\bar{x} = 1.05$, $N = 100$ and $\sigma = 0.5$

In [1]:
def accept_or_reject_null_hypothesis(mu, sample_mean, significant_level, N, sigma):
    # Calculate standard deviation of the sampling distribution
    sample_std = sigma / np.sqrt(N)
    
    # Calculate z-score from population mean (mu), sample mean and sample std
    z = (sample_mean - mu) / sample_std
    
    # Calculate p-value from z-score
    p_value = 2 * norm.cdf(-np.abs(z))
    
    # Determine whether to accept or reject null hypothesis
    if p_value < significant_level:
        print('reject null hypothesis')
    else:
        print('accept null hypothesis')

## Activity: t-test

The average British man is 175.3 cm tall. A survey recorded the heights of 10 British men who drank a special type of Milk for 2 years.

Calculate the t-score from the previous formula and use the available function in [`stats.ttest_1samp`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_1samp.html). Compare what you will get with `x` as your input:

`x = [177.3, 182.7, 169.6, 176.3, 180.3, 179.4, 178.5, 177.2, 181.8, 176.5]`

We want to know whether the mean of the sample is different from the population mean

In [13]:
from scipy import stats
import numpy as np

x = [177.3, 182.7, 169.6, 176.3, 180.3, 179.4, 178.5, 177.2, 181.8, 176.5]
mu = 175.3
sample_mean = np.array(x).mean()

# Calculate the standard deviation of sample distribution

N = len(x) # number of data samples
S = np.array(x).std(ddof=1)
den = S /np.sqrt(N)

# t-test from formula
t = (sample_mean - mu)/den
print("t-statistic: ",t)

# one sample t-test that gives you the p-value too can be done with scipy as follows:
t, p = stats.ttest_1samp(x, mu)
print("t = ", t, ", p = ", p)

t-statistic:  2.295568968083183
t =  2.295568968083183 , p =  0.04734137339747034


## So, we conclude that Milk has an effect on the heights

## Activity: z-test or t-test

Write a function that determines whether to use z-test or t-test in order to accept or reject null hypothesis

**Hint:** Remember the the [slides](https://docs.google.com/presentation/d/1BQibGlrpX71JU0jBU0C7zJJr6S_4WQeFzO7PBmnxf8g/edit?usp=sharing) for the Rats...

In [4]:
def z_t_null_hypothesis(data_sample, mu, sigma, significant_level):
    # If sigma is known, calculate using the z-test
    if sigma:
        z_score = (np.mean(data_sample)-mu)/(sigma/np.sqrt(len(data_sample)))
        p = scipy.stats.norm.sf(abs(z_score))*2
    # If the data_sample is greater than 30, calculate using the z-test
    elif len(data_sample) > 30:
        z_score = (np.mean(data_sample)-mu)/(np.std(data_sample)/np.sqrt(len(data_sample)))
        p = scipy.stats.norm.sf(abs(z_score))*2
    # otherwise, calculate using the t-test
    else:
        t, p = stats.ttest_1samp(data_sample, mu)

    if p < significant_level:
        print('reject null hypothesis')
        
    else:
        print('accept null hypothesis')

## What is a one-tail or two-tail calculation for a p-value?

If the alternative hypothesis says the mean of a sample is different from the mean of the population, we should compute the p-value using two-tail. If it says the mean of a sample is greater or lower than the mean of a population, we should compute using one-tail

## Possible errors that can happen when accepting or rejecting the null hypothesis

**Type I error:** We reject the null hypothesis when the null is true

$\alpha$ = P(rejecting $H_o$  $|$  $H_o$ is true)

**Type II error:** We accept the null hypothesis when it is not true

$\beta$ = P(accepting $H_o$ $|$ $H_o$ is false)

The drug has an effect on the brain

The drug has no effect on the brain

## Other examples for statistical test 

Please read the Unpaired t-test part of this article

http://iaingallagher.tumblr.com/post/50980987285/t-tests-in-python

## Homework

https://docs.google.com/document/d/1ITryiXU_VoyBvtZY4deehk4PmlieSlF7rSNc8sBU3Sw/edit