# MATH 3350 Course Notes - Module S4 (Part II)

## Hypothesis Testing for Two Proportions

Recall the steps to conducting a hypothesis test:  
1. Identify a population parameter and state null and alternative hypotheses about the parameter
2. Create a model consistent with the NULL HYPOTHESIS
3. Use the model to determine a p-value (the probability that results as extreme as those we observed would occur by random chance IF the null hypothesis were true)
4. Based on the p-value, decide whether to reject the null hypothesis in favor of the alternative
5. Draw a conclusion in the context of the scenario given  

In the notes below, we will focus on how to accomplish STEPS 1-3 above using _R_.  

**_Remember that to complete a hypothesis test, you should proceed to steps 4 and 5 after the p-value is found._**  

### Creating the Model of the Null Hypothesis
Recall our two 'families' of options for steps 2-3 (creating the model and finding the p-value):
1. Use simulation/randomization to create an empirical model and find a p-value.
2. Use a theoretical distribution to find a p-value. (There is sometimes more than one suitable theoretical distribution.)


### Example 1.  Dolphin Therapy
Our hypothesis is about the "true" proportion of individuals who would improve if they received dolphin therapy ($p_1$), compared to the "true" proportion of those who would improve if dolphins were not introduced into their treatment ($p_2$). (**Note:** these parameters, $p_1$ and $p_2$, represent these true proportions across all possible patients, not just the 30 patients in the study.)  Regardless of which method above we choose to generate our p-value, our hypotheses are as follows.
<center>
$H_{0}: p_1 = p_2$  
</center>
<center>
$H_{a}: p_1 > p_2$
</center>

Notice that another way to write these hypotheses is to focus on the **_difference_** between the proportions:  
<center>
$H_{0}: p_1 - p_2 = 0$  
</center>
<center>
$H_{a}: p_1 - p_2 > 0$
</center>

#### Our Sample 
In the experiment, 10 of the 15 participants in the ('dolphin') treatment group improved, and 3 of the 15 participants in the control group improved. This gives us the following sample statistics:
<center>
    $\widehat{p_1}=\frac{10}{15}=0.667$
</center>
<center>
    $\widehat{p_2}=\frac{3}{15}=0.2$
</center>

And the following _difference in sample proportions:_  

<center>
    $\widehat{p_1} - \widehat{p_2}=0.667-0.2=0.467$
</center>

The hypothesis test is intended to help us decide if this difference is _statistically significant_.

#### Method 1 - Empirical p-value through simulation/randomization
If the null hypothesis is true, the only explanation for the difference in results between the two groups in the experiment is that the individual's improvement had nothing to do with their treatment group. In other words, it was just random chance that the individuals who improved were in the 'dolphin' group.  

Below, we show a repeated simulation of assigning the individuals randomly to treatment groups, assuming that their outcome is NOT related to their treatment group. This is our way of modeling $H_0$.

In [None]:
#Create array of all outcomes in study (all 30 participants), regardless of treatment group
#Y=Participant who improved; N=Participant who did not improve
num_improved <- 13
num_not_improved <- 17

participants <- c(rep('Y', num_improved),rep('N', num_not_improved))
participants

In [None]:
#Mimic a treatment group of size 15 being randomly selected from these 30 participants
tgroup <- sample(participants,15,replace=FALSE)
tgroup

#How many individuals who improved were randomly placed in this group (with dolphins)?
count1 <- sum(tgroup=='Y')
p_hat1 <- count1/15

#The remaining participants were in the other (control) group
count2 <- num_improved - count1
p_hat2 <- count2/15

diff <- p_hat1 - p_hat2

cat("Dolphin Group: ", count1," improved - sample proportion = ", p_hat1)
cat("\n")  #new line
cat("Control Group: ", count2," improved - sample proportion = ", p_hat2)
cat("\n")  
cat("Difference in sample proportions: ",diff)


In [None]:
#Repeat random sampling from the participants many times 
num_trials <- 10000

#This vector will hold the difference in proportions for each randomized assignment
differences <- c()          

#Create a model of the number of differences we would expect for a 
#                  random group assignment IF THE NULL HYPOTHESIS IS TRUE
for (i in 1:num_trials){
    tgroup <- sample(participants,15,replace=FALSE)
    count1 <- sum(tgroup=='Y')
    count2 <- num_improved - count1
    differences[i] <- (count1/15 - count2/15)
}

#Visualize our model
hist(differences, main="Difference in Proportion of Improved Participants (Null Model)", breaks=8)

In [None]:
#Compute p-value from above empirical model
sample_diff <- (10/15 - 3/15)

cat("Finding differences of ", sample_diff, "or greater...\n")

emp_p <- sum(differences>=sample_diff)/num_trials
cat("Empirical p-value:", emp_p)

#### Method 2 - Theoretical Distribution

#### Normal Distribution (z-Test for 2 Proportions)

Under some circumstances, it would be appropriate to use a z-Test (approximation with a Normal distribution). Again, there are conditions that should be met for this approach to be viable:

1. $\widehat{p}_1n_1 \geq 10 $
2. $(1-\widehat{p}_1)n_1 \geq 10 $ 
3. $\widehat{p}_2n_2 \geq 10 $
4. $(1-\widehat{p}_2)n_2 \geq 10 $ 

In other words, there should be at least 10 'successes' and 10 'failures' in **_each_** sample group.  The Dolphin Therapy study does not meet these conditions.  

#### Chi-Square Distribution
The $\chi^{2}$ family of distributions can be used to test the _independence_ of two variables. Consider our data, summarized in a _contingency table_ as follows:  

| Treatment | Improved  | Not Improved |
|-----------|-----------|--------------|
| Dolphins | 10 | 5 |
| No Dolphins | 3 | 12 |

The two variables are **_Treatment_** and **_Improvement_**, represented by counts in the rows and columns, respectively.  

The hypotheses are stated differently in a $\chi^{2}$ Test of Independence:  


$H_0:$ _Improvement_ and _Treatment_ are INDEPENDENT  
$H_a:$ _Improvement_ and _Treatment_ are NOT INDEPENDENT

##### Expected Counts 
If the null hypothesis is accurate, we would _expect_ the proportion of improved/not improved to be the same in both groups.  The $\chi^{2}$ Test uses the OVERALL proportions in the sample data to compute "expected counts" that would ideally occur if the 2 variables were truly independent (e.g., we would expect the same proportion of individuals to improve, regardless of group).  

Here is an example of an expected count calculation:  

$\frac{13}{30} \approx 0.433$ is the overall proportion of participants who improved.  

Therefore, if $H_0$ is true, we would expect $\sim 43.3$% of the individuals in EACH group to improve; because there are 15 participants in the Dolphin group, this would be an _expected count_ of $(0.433)(15) = 6.5$.  

Here is a table of all **expected counts** for this scenario:  

| Treatment | Improved  | Not Improved |
|-----------|-----------|--------------|
| Dolphins | 6.5 | 8.5 |
| No Dolphins | 6.5 | 8.5 |

##### The $\chi^{2}$ Statistic 

The $\chi^{2}$ statistic is a measure of how much the _observed_ counts differ from the _expected_ counts. It is computed across all cells in the table as:  

<center>
$\chi^{2} = \sum \frac{(observed-expected)^{2}}{expected}$
</center>

For our example, 

<center>
$\chi^{2} = \frac{(10-6.5)^{2}}{6.5} + \frac{(3-6.5)^{2}}{6.5} + \frac{(5-8.5)^{2}}{8.5} + \frac{(12-8.5)^{2}}{8.5} \approx 6.652$
</center>

The number of data rows and columns in the contingency table defines the **degrees of freedom** of the $\chi^{2}$ distribution we will use to find our p value.  

<center>
$df=(rows - 1)(columns - 1)$
</center>

Without any labels, our DATA table has 2 rows and 2 columns:

| | | 
|-----------|-----------|
| 10 | 5 |
| 3 | 12 |

Therefore we need the $\chi^{2}$ distribution with $df = 1$.  Remember that this is the distribution of $\chi^{2}$ statistics we would expect from samples if the null hypothesis were true.  Below we visualize this distribution so we can see where our $\chi^{2}$ statistic falls. 

In [None]:
#Visualize chi-square distribution with df=1

xvalues <- seq(0,15,0.1)     
yvalues <- dchisq(xvalues, df=1)

plot (xvalues,yvalues, main="PDF of Chi-Square Distribution, df=1", 
      xlab="Chi-Square Statistic", ylab="Density", type="l")

#Add a line to show where the chi-square value of our sample is
abline(v=6.652, lty=2, col="red")


##### Interpreting the $\chi^{2}$ Statistic
The larger the $\chi^{2}$ statistic, the greater the discrepancy between observed and expected counts. Our p-value should represent the probability of a $\chi^{2}$ value _at least as extreme_ as the one produced by our sample. Therefore, we want the area under the curve to the _right_ of our $\chi^{2}$ of 6.652.

In [None]:
#Find our pvalue from the above distribution (right tail)
theory_p <- pchisq(6.652,df=1,lower.tail=FALSE)
cat("Theoretical p-value (chi-square): ", theory_p)

#### Method 2 - Theoretical Distribution (Option B)
R will also perform a test based on the $\chi^{2}$ distribution.  This is done below.

In [None]:
#Chi-square hypothesis test

#Create matrix of data set
data_tbl <- matrix(c(10,5,3,12), ncol=2, byrow=TRUE)
colnames(data_tbl) <- c('Improved', 'Not Improved')
rownames(data_tbl) <- c('Dolphins', 'No Dolphins')

#View matrix
data_tbl

#Perform test on matrix (without correction factor to match test we did by hand)
cs_test <- chisq.test(data_tbl,correct=FALSE)

#Results of test are now stored in variable called cs_test

#View expected counts
cat("Expected Counts:")
cs_test$expected

#View test result
cs_test

### Example 2.  Cardiovascular Disease

A major drug company conducted a study in which participants were adults over 55 who did not have any known cardiovascular disease, but who had at least one risk factor (smoking, family history, etc.)  Each patient was randomly placed on a regimen for several months taking either a placebo or a drug designed to reduce cholesterol and blood pressure. The researchers then recorded whether each patient had a major cardiovascular "event" during the period of the study. (Events include non-fatal heart attack or stroke, or any cardiovascular-related death.)  The table below shows a summary of the study results.

| Treatment | Major Event | NO Major Event | Total |
|-----------|-----------|--------------|-------|
| Drug | 113 | 3067 | 3180 |
| Placebo | 157 | 3011 | 3168

Do these data provide evidence that the drug has some effectiveness in preventing major cardiovascular events?

#### Questions 

1. If _p_ represents the proportion of patients in a group who experienced a major cardiovascular event, what are the values of $\widehat {p}_{drug}$ and $\widehat {p}_{placebo}$?
2. What is the _difference in proportion_ between the 2 groups?
3. Do these data meet the criteria (described earlier) for a **z-test for 2 proportions**?
4. Use the cells below to conduct a $\chi^{2}$ test of independence (use previous dolphin therapy example as a model.) 


##### Hypotheses

Complete the null and alternative hypotheses for this test:

$H_0:$ 

$H_a:$ 

Conduct the test below.

In [None]:
#Chi-square hypothesis test

#Create matrix of data set


#View matrix


#Perform test on matrix (without correction factor)


#View expected counts

#View test result


##### Interpretation

Interpret the results of the above test.

To conduct a Z-test for 2 proportions, the z statistic is computed with the following formula

<center>
$z = \frac{\widehat{p}_1 - \widehat{p}_2}{\sqrt{\widehat{p}(1 - \widehat{p}) \left ( \frac{1}{n_1} + \frac{1}{n_2} \right )}}$
</center>

Note that _p_ without a subscript represents the proportion for both groups combined.

Compute a _**z**_ statistic for the difference in proportions that you found earlier in Question #2.  Use the code cell below to find the appropriate p-value.  You will need to pick ONE of the three models, depending on which test is appropriate for this scenario.

### The 2-Proportion Z-Test

We can also use a 2-proportion z-test for this scenario.

#### Hypotheses

This is the format for hypotheses of a 2-proportion z-test, where ${p}_{dif}$ represents the true difference between proportions (not the difference between sample proportions).

##### Null Hypothesis
$H_0: {p}_{dif} = 0$

##### Possible Alternative Hypotheses
One of the following will be the appropriate hypothesis. For a 1-tailed test, the direction of the alternative hypothesis depends on how you computed your difference in proportions.  **Decide which alternative hypothesis is appropriate for your data.**

$H_a: {p}_{dif} \ne 0$

**OR**

$H_a: {p}_{dif} \gt 0$

**OR**

$H_a: {p}_{dif} \lt 0$

#### Computing the z-statistic

To conduct a Z-test for 2 proportions, the _**z**_ statistic is computed with the following formula

<center>
$z = \frac{\widehat{p}_1 - \widehat{p}_2}{\sqrt{\widehat{p}(1 - \widehat{p}) \left ( \frac{1}{n_1} + \frac{1}{n_2} \right )}}$
</center>

Note that _p_ without a subscript represents the proportion for both groups combined.

Compute a _**z**_ statistic for the difference in proportions that you found earlier in Question #2.  Use the code cell below to find the appropriate p-value.  You will need to pick ONE of the models, depending on which test is appropriate for this scenario.

In [None]:
# Models for finding a p-value for a z-statistic
# HINT: Your test should MATCH your alternative hypothesis

#Update this line with the z-statistic you have computed
z_stat <- 1

# One-sided test: Left tail
pnorm(z_stat)

# One-sided test: Right tail
pnorm(z_stat, lower.tail=FALSE)

# Two-sided test (positive z)
2 * pnorm(z_stat, lower.tail=FALSE)

# Two-sided test (negative z)
2 * pnorm(z_stat, lower.tail=TRUE)

### Alternate Version of 2-Proportion Test in _R_

The `prop.test` command in R can also be used to perform a significance test for 2 proportions.  The test is shown below for the cardiovascular example.  

_How does the p-value compare between the $\chi^2$ test, the *z*-test (both above), and the `prop.test` result below?_

In [None]:
#prop.test(vector of X's, vector of n's, alternative="two.sided")

prop.test(c(113,157), c(3180,3168), alternative="less")