## Hypothesis Testing For Difference of Proportions

Hypothesis testing for two proportions compares categorical response variables between two populations or treatments. The focus is on testing whether the difference in sample proportions provides evidence about the difference in population proportions.

### Criteria For Hypothesis Testing For Difference Of Proportions

1. Two independent samples
2. Five or more successes and failures in each sample
<br>

### Types Of Test For Difference of Proportion

#### Two Tail Hypothesis Test For Difference of Proportions

* $H_0: p_1 - p_2 = 0 \rightarrow p_1 = p_2$
  * The population proportions are equal

<br>

* $H_a: p_1 - p_2 \ne 0 \rightarrow p_1 \ne p_2$
  * The population proportions are different
<br><br>

#### Upper Tail Hypothesis Test For Difference of Proportions

* $H_0: p_1 - p_2 \le 0 \rightarrow p_1 \le p_2$
  * Population 1 proportion is less than or equal to population 2 proportion

<br>

* $H_a: p_1 - p_2 > 0 \rightarrow p_1 > p_2$
  * Population 1 proportion is greater than population 2 proportion
<br><br>

#### Lower Tail Hypothesis Test For Difference of Proportions

* $H_0: p_1 - p_2 \ge 0 \rightarrow p_1 \ge p_2$
  * Population 1 proportion is greater than or equal to population 2 proportion

<br>

* $H_a: p_1 - p_2 < 0 \rightarrow p_1 < p_2$
  * Population 1 proportion is less than population 2 proportion
<br><br>

<span style = "color:skyblue;font-weight:bold">
📓 NOTE
</span><br>

<span style = "color:gainsboro;size:103%">
The hypothesized difference is often 0, but can be any value. For example, the alternative hypothesis might claim the difference is 0.20 rather than 0.
</span><br><br>

Upper and lower tail tests are equivalent when you switch the population labels and reverse the inequality direction.<br><br>

### Formula

<b>Test Statistic Formula for Difference of Proportions</b><br>

$z = \dfrac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{\sqrt{\color{dodgerblue}\hat{p}\color{defaultcolor} \cdot (1 - \color{dodgerblue}\hat{p}\color{defaultcolor}) \cdot (\dfrac{1}{n_1} + \dfrac{1}{n_2})}}$<br><Br>

$\color{dodgerblue}\hat{p} = \dfrac{(\hat{p}_1 \cdot n_1) + (\hat{p}_2 \cdot n_2)}{n_1 + n_2} \color{defaultcolor}$<Br><Br>

<b>p_hat Formula for Difference of Proportions</b><br>
The <span style = "color:dodgerblue;font-weight:bold">bottom formula</span> is the combined sample proportion where each sample proportion is multiplied by its sample size, then the 2 products are added, to finally be divided by the sum of the two sample sizes<br><br>

### Example
We will use the same example from the confidence interval of difference of proportions<br><br>

Say we are interested in the proportions of passenger trucks vs cars in California versus Colorado. After researching and obtaining data, we find: <br><br>

| Parameter <br> statistic | Population 1 (CA) | Population 2 (CO) |
| :----------------------: | :---------------: | :--------------: |
|            N             |      unknown      |     unknown      |
|            n             |        100        |        80        |
|        $\hat{p}$         |       0.40        |       0.30       |

<br><br>
With a confidence interval of 95%


In [1]:
import sys
sys.path.insert(0, "..")
from resources import datum 
from bokeh.plotting import figure, show, output_notebook, curdoc

output_notebook(hide_banner=True)
curdoc().theme = 'light_minimal'

data = datum.Data()

CL = 95
tail = "two"
null_hypo = "There is NO difference in CA and CO truck proportions, p1 - p2 = 0"
alt_hyp0 = "There is a difference in CA and CO truck proportions, p1 - p2 != 0"
p1 = 0.40
p2 = 0.30
n1 = 80
n2 = 100

# SINCE THE NULL HYPOTHESIS IS SET TO ZERO, WE NEED TO REPLACE (p1 - p2), WE NEED TO REPLACE (p1 - p2) TO ZERO IN
# THE TEST STATISTIC FORMULA 
p_diff = 0


if data.check_proportion_distro_for_normality(p1 = p1, p2 = p2, n1 = n1, n2 = n2):
    phat = data.get_phat_for_proportion(p1 = p1, p2 = p2, n1 = n1, n2 = n2)
    test_statistic = data.get_test_statistic_for_diff_of_props(p_diff=p_diff, phat=phat, p1=p1, p2=p2, n1=n1, n2=n2)
    
print(f'phat: {phat: .4f}\ntest statistic: {test_statistic: .4f}')


phat:  0.3444
test statistic:  1.3310


In [2]:

import numpy as np 
import sys
sys.path.insert(0, '../..')
from resources import test, datum 

test = test.Test()
data = datum.Data()

def create_test_params():    
    test_type = "prop"
    CL = 0.95
    mu = 0
    sigma1 = 1
    sigma2 = 0
    tail = 'lower'   
    N = 0                                      
    n1 = 100
    n2 = 80
    xbar1 = 0 # the mean of the expected value
    xbar2 = 0
    p_hat = 0
    p1 = 0.30
    p2 = 0.40
    observed_value = 0 # the historical mean
    successes = 20    
    p = 0.25
    var1 = 0
    s1 = 1
    s2 = 0
    null_hypo = 'p is greater than or equal to 0.25'
    alt_hypo = 'p is less than 0.25'
    
    # no need to check mu and sigma values for test type proportion 
    mean = mu
    std_dev = sigma1
    #std_dev = sigma1 if sigma1 > 0 else s1
    # mean = mu if mu > 0 else xbar1
    
    # for a proportion test I will use a standard normal distr
    x_data = list(np.linspace(-3, 3, n1))
    y_data = data.get_normal_dist(x_arr= x_data, mu = mean, sigma = std_dev)
    hypothesize_diff = 30
    statement = True 
    visual = True 


    
    margin_of_error_for_calculating_minimum_n = 2
    
    test_params = {}
    test_params['cl'] = CL
    test_params['type'] = test_type 
    test_params['mu'] = mu
    test_params['tail'] = tail
    test_params['N'] = N
    test_params['sigma1'] = sigma1
    test_params['sigma2'] = sigma2
    test_params['n1'] = n1
    test_params['n2'] = n2
    test_params['xbar1'] = xbar1
    test_params['xbar2'] = xbar2
    test_params['p_hat'] = p_hat
    test_params['p'] = p
    test_params['p1'] = p1
    test_params['p2'] = p2
    test_params['observed_value'] = observed_value
    test_params['successes'] = successes
    test_params['var1'] = var1
    test_params['s1'] = s1
    test_params['s2'] = s2
    test_params['hypothesize_diff'] = hypothesize_diff
    test_params['x_data'] = x_data
    test_params['y_data'] = y_data
    test_params['null_hypo'] = null_hypo
    test_params['alt_hypo'] = alt_hypo
    test_params['mar_err_min_n'] =   margin_of_error_for_calculating_minimum_n
    test_params["statement"] = statement
    test_params["visual"] = visual
    test
    return test_params

# check out the data before testing 
# params = create_test_params()
# data.check_normality(params["y_data"])

test.make_hypothesis_test(info = create_test_params())



<IPython.core.display.Math object>