# Hypothesis Testing

## Review of notation

* ### $\bar{x}$ - Sample mean
* ### $\mu$ - Population mean

* ### $s$ - Sample standard deviation
* ### $\sigma$ - Population standard deviation

* ### $s^2$ - Sample variance
* ### $\sigma^2$ - Population variance

* ### $n$ - Sample size

* ### $\alpha$ - Significance level, probability of rejecting the null hypothesis when it is true
* ### $Z_{\alpha/2}$ - alpha level Z-Score for a two-tailed test
* ### $t_{\alpha/2}$ - alpha level t-value for a two-tailed test

* ### $\hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ - Calculating confidence interval for a proportion
* ### $\bar{x} \pm t_{\alpha/2} \frac{s}{\sqrt{n}}$ - Calculating confidence interval under the Student's t-distribution
* ### $\bar{x} \pm Z_{\alpha/2} \frac{s}{\sqrt{n}}$ - Calculating confidence interval under the Normal distribution
* ### $\bar{d} \pm t_{\alpha/2} \frac{s_d}{\sqrt{n}}$ - Calculating confidence interval for a difference in two means under the t-distribution
* ### $\bar{X} - \bar{Y} \pm t_{\alpha/2} \sqrt{\frac{s_x^2}{n} + \frac{s_y^2}{n}}$ - Calculating confidence interval for a difference in means
* ### $\hat{p_1} - \hat{p_2} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}}$ - Calculating confidence interval for a difference in proportion



# Conceptual Review

## Quick note on when you would use a one-tailed test or a two-tailed test
* #####  A one tailed test might be if the null hypothesis is that an area has experienced growth in the last couple years.
* #####  Here, we're not testing whether there is growth or decay, we only want statistically significant results for growth.

## Area in the tails to get our 95% confidence interval for a two-tailed test


<img src="z of 196.png">

<img src="z of negative 196.png">

<img src="95 confidence.png">

# Finding p-values and multipliers

In [None]:
import scipy.stats as sc

### Get the Z-Score for alpha = 0.05, two-tailed

sc.norm.ppf(.975)

In [None]:
### Get the t-value for alpha = 0.05, two-tailed

sc.t.ppf(.975, 12)

In [None]:
### Find alpha for a given the Z-score for a one-tailed test

1- sc.norm.cdf(1.96)

In [None]:
### Find alpha for a given the Z-score for a two-tailed test

2*(1 - sc.norm.cdf(1.96))

In [None]:
sc.norm.cdf(-1.96) + (1 - sc.norm.cdf(1.96))

In [None]:
### Find alpha given a t-value and degrees of freedom for both tails
### Degrees of freedom will be n-1 in this type of scenario

2*(1 - sc.t.cdf(1.96, 100000))

In [None]:
### Find alpha given a t-value and degrees of freedom

sc.t.cdf(-1.96, 100000)

In [None]:
### Find alpha given a t-value and degrees of freedom

1 - sc.t.cdf(2.624, 14)

## Practice 1

##### After reviewing a website test for a week, 
##### you've noticed that in the test cell 20% of customers are converting (buying a product)
##### The sample standard deviation of your sample is 5, and you've had 10,000 site visits in the test cell.
##### You'd like to create a 95% confidence interval and round your output to 3 decimal places

## Practice 2
##### Build a function that takes in a list of numbers as input 
##### return the variance of those numbers
##### NumPy statistics functions reference:  https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html

In [None]:
import numpy as np


In [None]:
var_list = [3.3,4.3,5.4,3.2,3.4,3.4,5.1,2.3]


## Practice 3
##### Build a function that takes two inputs, a list of numbers and the Z-Score for a two-tailed hypothesis test
##### Return the upper and lower bound of a 95% confidence interval as a tuple
##### Due to the small sample size in our list, we should be using a t-distribution here.

In [None]:
def confidence_interval(list, Zscore):
    
    

In [None]:
print(confidence_interval(var_list, 1.96))

## Practice 4
##### Build a function that takes three inputs, a 2 lists of numbers and the t-value for the confidence of a two-tailed hypothesis test
##### Return the upper and lower bound of the confidence interval of the pair-wise difference as a tuple

In [None]:
def paired_t_test(list1, list2, tval):
    return ''

In [None]:
test1 = [4.3,5.6,3,7,4,5,6,4,5,3,6,4,3,1,7]
test2 = [1,4,2,5,3,4,5,2,3,1,4,3,4,2,3]
tval = 1.8

paired_t_test(test1, test2, tval)



##### What was the alpha level in the above test?

## Practice 5
##### Given the table below with information from two samples
##### Determine if there is a difference between the two means at 95% confidence using a paired t-test

<img src="Webinar table.png">

## $\bar{X} - \bar{Y} \pm t_{\alpha/2} \sqrt{\frac{s_x^2}{n} + \frac{s_y^2}{n}}$


In [None]:
mean1 = 45.6
mean2 = 48.9
var1 = 3.2**2
var2 = 4.2**2
n = 10000
dof = 10000-1

In [None]:
## Xbar - Ybar


## t multiplier


## standard deviation



In [None]:
### create the interval


## Practice 6
##### The marketing team is testing a change to the UI of their website
##### In the test group 546 customers converted out of 3,300 visitors to the site
##### In the control group 450 customers converted out of 3,030 visitors to the site
##### Determine if the two proportions are different at the 95% significance level

## $\hat{p_1} - \hat{p_2} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}}$





In [None]:
### computer bounds



# Blog Articles
https://datamovesme.com/2018/07/02/learning-with-a-b-testing/

https://datamovesme.com/2018/10/01/setting-your-hypothesis-test-up-for-success/