# MATH 3350 Course Notes - Module S5 Supplement

## Using R to create simulations

Recall our two 'families' of options for creating a model of the null hypothesis for a statistical test:
1. Use simulation/randomization to create an empirical model and find a p-value.
2. Use a theoretical distribution to find a p-value. 

The notes below show examples of R code for **_simulating_** outcomes that would occur if the null hypothesis is true.


### Example 2. Guinea Pig Tooth Growth (Two Means)¶

The full ToothGrowth data set includes data for two different treatments: Orange Juice (OJ) and a Vitamin C supplement (VC). We want to know if there is a significant **difference** between the true mean odontoblast length of guinea pigs receiving these two treatments.

Hypotheses are as follows.
<center>
$H_{0}: \mu_J = \mu_V$  
</center>
<center>
$H_{a}: \mu_J \neq \mu_V$
</center>

First we gather our sample data:

In [None]:
OJ_sampleData <- ToothGrowth[ToothGrowth$supp=='OJ',]$len
VC_sampleData <- ToothGrowth[ToothGrowth$supp=='VC',]$len

summary(OJ_sampleData)
summary(VC_sampleData)

sample_meanDiff <- mean(OJ_sampleData) - mean(VC_sampleData)
sample_meanDiff

#### Method 1 - Empirical p-value through simulation/randomization
This randomization is similar to the one we performed for the 2-proportion hypothesis test.

In [None]:
#Mimic 2 treatment groups of size 30 being randomly selected from the 60 guinea pigs in the study
random_groups <- sample(ToothGrowth$len,60,replace=FALSE)
OJ_group <- random_groups[1:30]
VC_group <- random_groups[31:60]

#Find mean cell length for each group and then find difference between the means
xbar_OJ <- mean(OJ_group)
xbar_VC <- mean(VC_group)

diff <- xbar_OJ - xbar_VC

cat("OJ Group mean: ", xbar_OJ)
cat("\n")  #new line
cat("Vitamin C Group mean: ", xbar_VC)
cat("\n")  
cat("Difference in sample means: ",diff)


In [None]:
#Repeat random sampling process many times 
num_trials <- 10000

#This vector will hold the difference in means for each randomized assignment
differences <- c()          

#Create a model of the mean differences we would expect for a 
#                  random group assignment IF THE NULL HYPOTHESIS IS TRUE
for (i in 1:num_trials){
    random_groups <- sample(ToothGrowth$len,60,replace=FALSE)
    OJ_group <- random_groups[1:30]
    VC_group <- random_groups[31:60]
    xbar_OJ <- mean(OJ_group)
    xbar_VC <- mean(VC_group)
    differences[i] <- xbar_OJ - xbar_VC
}

#Visualize our model
hist(differences, main="Differences in Mean Odontoblast Length by Treatment (Null Model)")

In [None]:
#Compute p-value from above empirical model

cat("Finding mean differences at least as extreme as ", sample_meanDiff, "...\n")

emp_p <- sum(abs(differences)>=sample_meanDiff)/num_trials
cat("Empirical p-value:", emp_p)