# MATH 3350 Course Notes - Module S5 (Part I)

## Hypothesis Testing for Means

Recall the steps to conducting a hypothesis test:  
1. Identify a population parameter and state null and alternative hypotheses about the parameter
2. Create a model consistent with the NULL HYPOTHESIS
3. Use the model to determine a p-value (the probability that results as extreme as those we observed would occur by random chance IF the null hypothesis were true)
4. Based on the p-value, decide whether to reject the null hypothesis in favor of the alternative
5. Draw a conclusion in the context of the scenario given  

In the notes below, we will focus on how to accomplish STEPS 1-3 above using _R_.  

**_Remember that to complete a hypothesis test, you should proceed to steps 4 and 5 after the p-value is found._**  

### Creating the Model of the Null Hypothesis
Recall our two 'families' of options for steps 2-3 (creating the model and finding the p-value):
1. Use simulation/randomization to create an empirical model and find a p-value.
2. Use a theoretical distribution to find a p-value. (There is sometimes more than one suitable theoretical distribution.)


### Example 1.  Guinea Pig Tooth Growth (Single Mean)

Recall the 'ToothGrowth' data set available in R. The data set gives the length of cells responsible for tooth growth ("odontoblasts") in guinea pigs. We will work with a subset of the data to focus only on those guinea pigs who were given orange juice. We will assume that the average length of these cells among guinea pigs in general is 18.5 picometers (pm). Scientists treating guinea pigs with orange juice hypothesized that it would result in longer odontoblasts. Our hypothesis is about $\mu$, the "true" mean odontoblast length of guinea pigs whose diet is supplemented with orange juice. (Remember, the parameter $\mu$ represents the true mean across all guinea pigs who are given orange juice, not just those in the study.)  

Hypotheses are as follows.
<center>
$H_{0}: \mu = 18.5$  
</center>
<center>
$H_{a}: \mu > 18.5$
</center>

Below we restrict the data set to only those guinea pigs receiving orange juice.

In [None]:
#Have a look at format of data set
head(ToothGrowth,3)

#Reduce data set to contain only records with 'OJ' supplement
OJ_only <- ToothGrowth[ToothGrowth$supp=='OJ',]
head(OJ_only,3)

#Get statistics for odontoblast length in this sample 
summary(OJ_only$len)



#### Our Sample 
We can see that the sample mean is $\overline{x}=20.66$ 

The sample mean is greater than the overall population mean, but our hypothesis tests should help us decide if the difference we are seeing is _statistically significant_.

#### Creating a Null Model
To create a model of the null hypothesis, we need to model a _sampling distribution_ of sample means where sample size is the same as our sample (30) and the sampling distribution has the same mean as our null hypothesis; we also need our model sampling distribution to have a standard error that is as close as possible to $\frac{\sigma}{\sqrt{n}}$ (where $\sigma^{2}$ is the true population variance. We don't know the true population variance, so we must estimate it with the sample variance of our data; we use $s$ instead of $\sigma$.

#### Method 1 - Empirical p-value through simulation/randomization
The process for simulating randomized samples for the null model described above is implemented in some statistical programs (such as StatKey), but we won't attempt to use R to re-create that process here. 

#### Method 2 - Theoretical Distribution
This scenario is a candidate for the **1-sample t-test**. The _t_ distribution (also called "_Student's t_") is a family of distributions. The shape of a given _t_ distribution is governed by _degrees of freedom_, which is typically related to the size of the sample from which the _t_ statistic is calculated.  

Below are the plots of a few _t_ distributions, along with the Standard Normal distribution for comparison.  The plots demonstrate that as the degrees of freedom increase, the _t_ distribution gets closer to a Normal distribution.  

In [None]:
#Create plot of 3 t distributions and Standard Normal distribution
xvalues <- seq(-4,4,0.1)     # Generate x-values
z <- dnorm(xvalues)          # Standard Normal y coordinates
t1 <- dt(xvalues, df=1)      # t Distribution y coordinates for df=1, 3, and 10
t3 <- dt(xvalues, df=3)
t10 <- dt(xvalues, df=10)

plot (xvalues,z, main="t Distributions and Standard Normal Curve", ylab="Density", xlab="Statistic (t or z)", 
      type="l", lwd=4, col="red")
lines(xvalues,t1, lty=2, lwd=4, col="grey")
lines(xvalues,t3, lty=3, lwd=3, col="blue")
lines(xvalues,t10, lty=4, lwd=3, col="darkgreen")

legend("topleft", lty = c(2,3,4,1), legend = c('t, df=1','t, df=3', 't,df=10', 'Std Normal'), 
       lwd=c(4,3,3,4), col = c('grey','blue','darkgreen','red') )

##### Conditions for the t-Test

Below are the conditions for the t-test:
1. Sampling distribution being modeled should be a **Normal distribution** (Note: this means either the underlying population distribution is close to normal **_or_** the sample size is large enough to compensate*. _Regardless of underlying distribution shape, we consider this condition met when sample size $n \geq 40$._
2. All observations in the sample should be independent.  

*Due to the Central Limit Theorem, sample sizes of 40 and greater have sampling distributions sufficiently close to normal for the t-test to work well. Our sample size is 30, so we should examine the sample to check for any extreme outliers. A boxplot is sufficient for sample sizes of 15 or more. For even smaller sample sizes($n < 15$), a histogram would be appropriate to ensure no major departures from normality. We'll check our sample below with a boxplot.

In [None]:
boxplot(OJ_only$len, horizontal=TRUE)

Our sample of size 30 has no outliers and the skew is not terribly strong, so we can be confident that the normality condition is satisfied.

#### Conducting 1-Sample t-test: Method 1 - Calculate t-statistic and Find Tail Area

The t statistic is computed as follows:  

<center>
$t = \frac {\overline{x} - \mu}{\frac{s}{\sqrt{n}}} $
</center>

where $\mu$ refers to the mean specified in the _null hypothesis_.  

The code below performs this calculation.

In [None]:
#Calculate t statistic

null_mu <- 18.5
xbar <- mean(OJ_only$len)
s <- sd(OJ_only$len)
n <- length(OJ_only$len)

t <- (xbar - null_mu)/(s/sqrt(n))
t

Now that we have a t statistic, the p-value is found using the right tail of the appropriate t distribution.  The t distribution is governed by _degrees of freedom_.  For a 1-sample t-test, $df = n-1$.  In this example, $df=29$. We find the p-value below.

In [None]:
#Determine right-tailed p-value for this sample.
#Note we are using variables t and n from previous calculation

pval <- pt(t, df=n-1, lower.tail = FALSE)
cat("p-value: ", pval)

#### Conducting 1-Sample t-test: Method 2 
R has a t.test function that will perform the t-test.  This is shown below.

In [None]:
#Use R's t.test function
# t.test(x, mu = 0, alternative = "two.sided")
# x = vector of sample values
# mu = null hypothesis mean (default is zero)
# alternative = direction of alternative hypothesis (default is 2-tailed)

t.test(OJ_only$len, mu = 18.5, alternative = "greater")

### Example 2. Guinea Pig Tooth Growth (Two Means)¶

The full ToothGrowth data set includes data for two different treatments: Orange Juice (OJ) and a Vitamin C supplement (VC). We want to know if there is a significant **difference** between the true mean odontoblast length of guinea pigs receiving these two treatments.

Hypotheses are as follows.
<center>
$H_{0}: \mu_J = \mu_V$  
</center>
<center>
$H_{a}: \mu_J \neq \mu_V$
</center>

First we gather our sample data:

In [None]:
OJ_sampleData <- ToothGrowth[ToothGrowth$supp=='OJ',]$len
VC_sampleData <- ToothGrowth[ToothGrowth$supp=='VC',]$len

summary(OJ_sampleData)
summary(VC_sampleData)

sample_meanDiff <- mean(OJ_sampleData) - mean(VC_sampleData)
sample_meanDiff

#### Method 1 - Empirical p-value through simulation/randomization
This randomization is similar to the one we performed for the 2-proportion hypothesis test.

In [None]:
#Mimic 2 treatment groups of size 30 being randomly selected from the 60 guinea pigs in the study
random_groups <- sample(ToothGrowth$len,60,replace=FALSE)
OJ_group <- random_groups[1:30]
VC_group <- random_groups[31:60]

#Find mean cell length for each group and then find difference between the means
xbar_OJ <- mean(OJ_group)
xbar_VC <- mean(VC_group)

diff <- xbar_OJ - xbar_VC

cat("OJ Group mean: ", xbar_OJ)
cat("\n")  #new line
cat("Vitamin C Group mean: ", xbar_VC)
cat("\n")  
cat("Difference in sample means: ",diff)


In [None]:
#Repeat random sampling process many times 
num_trials <- 10000

#This vector will hold the difference in means for each randomized assignment
differences <- c()          

#Create a model of the mean differences we would expect for a 
#                  random group assignment IF THE NULL HYPOTHESIS IS TRUE
for (i in 1:num_trials){
    random_groups <- sample(ToothGrowth$len,60,replace=FALSE)
    OJ_group <- random_groups[1:30]
    VC_group <- random_groups[31:60]
    xbar_OJ <- mean(OJ_group)
    xbar_VC <- mean(VC_group)
    differences[i] <- xbar_OJ - xbar_VC
}

#Visualize our model
hist(differences, main="Differences in Mean Odontoblast Length by Treatment (Null Model)")

In [None]:
#Compute p-value from above empirical model

cat("Finding mean differences at least as extreme as ", sample_meanDiff, "...\n")

emp_p <- sum(abs(differences)>=sample_meanDiff)/num_trials
cat("Empirical p-value:", emp_p)

#### Method 2 - Theoretical Distribution

#### 2-Sample t-test

The conditions for a t-test still include the two we reviewed previously, plus a third condition:  

1. Sampling distribution being modeled should be a Normal distribution. 
2. All observations in the sample should be independent.  
3. Variance of sub-populations should be equal or very close to equal (_or the test should account for unequal variance_).

##### Calculating the t Statistic
There are 2 versions of the 2-sample t-statistic :  
* Student's t-statistic: Assumes both populations have same variance and uses "pooled" variance ($s^{2}$) of both samples combined
* Welch t-statistic: Computes t statistic with separate variances for each sample, $s_1^{2}$ and $s_2^{2}$

Using $n_1$ and $n_2$ for the sample sizes of the two sample groups and $s_1$ and $s_2$ as the standard deviation of the two samples, the two t-statistics described above are computed as follows.  

**Student's t-Statistic ("Classical" t-test):**  

<center>
$t=\frac{\overline{x}_1 - \overline{x}_2}{\sqrt{\frac{s^{2}}{n_1} + \frac{s^{2}}{n_2}}}$
</center>

where pooled variance $s^{2}$ is defined as  

<center>
$s^{2}=\frac{(n_1-1)s_1^{2}+(n_2-1)s_2^{2}}{n_1 + n_2 - 2}$
</center>

The degrees of freedom for the classic t-test (pooled) are: 
<center>
$df = n_1 + n_2 -2$
</center>
<br>

**Welch t-Statistic:**  

<center>
$t=\frac{\overline{x}_1 - \overline{x}_2}{\sqrt{\frac{s_1^{2}}{n_1} + \frac{s_2^{2}}{n_2}}}$
</center>
 

The degrees of freedom for the Welch t-test (not pooled) are: 

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; **_df=_**  

<center>   
$\underline { \left ( \frac{s_1^{2}}{n_1}+\frac{s_2^{2}}{n_2} \right )^{2} }$
</center>

<center>
     $\frac{(s_1^{2}/n_1)^{2}}{n_1-1} + \frac{(s_2^{2}/n_2)^{2}}{n_2-1}$
</center>
<br>

To decide which t-test to use, we can check the spread of each sub-group using the boxplots below.

In [None]:
#Examine distribution of sub-populations
boxplot(ToothGrowth$len ~ ToothGrowth$supp, horizontal=TRUE)

The boxplots suggest the two populations may not have the same variance.  If ever in doubt when planning a t-test, it is best to err on the side of caution and assume variances are NOT equal (choose unpooled option).

In [None]:
#Perform Welch's t-test
xbar1 <- mean(OJ_sampleData)
xbar2 <- mean(VC_sampleData)
s1 <- sd(OJ_sampleData)
s2 <- sd(VC_sampleData)
n1 <- length(OJ_sampleData)
n2 <- length(VC_sampleData)

welch_t <- (xbar1 - xbar2)/sqrt(s1^2/n1 + s2^2/n2)
cat("t statistic: ", welch_t)

In [None]:
#Find degrees of freedom
welch_df <- ((s1^2/n1 + s2^2/n2)^2) / ( ((s1^2/n1)^2)/(n1-1) + ((s2^2/n2)^2)/(n2-1) )
cat("Degrees of freedom: ", welch_df)

In [None]:
#Visualize where our statistic falls on this t distribution

xvalues <- seq(-4,4,0.1)     # Generate x-values
tvalues <- dt(xvalues, df=welch_df)

plot (xvalues,tvalues, main="t Distribution with df = 55.3", ylab="Density", xlab="t Statistic", type="l", lwd=4)
abline(v=welch_t, col="red", lty=2)

In [None]:
#Find 2-tailed p-value with above statistic and degrees of freedom
#NOTES: 
#   1) positive t is in upper tail
#   2) the upper tail p-value must be doubled for 2-tailed test

p_val <- pt(welch_t, df=welch_df, lower=FALSE) * 2
cat("p value: ", p_val)

#### t-Test Option B (packaged t-test in R)  

The t.test function in R will perform Student's t-test if var.equal is set to TRUE, and it will perform Welch's t-test if var.equal is set to FALSE (the default).  

We will use R's straightforward t.test below: 

In [None]:
#2-Sample t-Test
#t.test(x, y, alternative = "two.sided", var.equal = FALSE)

t.test(OJ_sampleData, VC_sampleData, alternative = "two.sided", var.equal = FALSE)

In [None]:
t.test(len ~ supp, data=ToothGrowth)

### Practice Exercise

We will investigate features of the different species of iris represented in the `iris` data set.

In [None]:
head(iris)
summary(iris$Species)

First let's examine the Petal.Length variable across the entire data set.

In [None]:
summary(iris$Petal.Length)

#### Examine the data

1. How do the values of Petal.Length in the first 6 rows compare with the data summary shown?
2. What clue(s) do you see in the first 6 rows of data that might explain any differences you see?
3. In the cell below, create a boxplot of Petal.Length **_by Species_**. 

#### Follow up

Does your plot give you reason to believe that one of the Species has different petal lengths than the others?  

Notice that the t-test is only able to compare **two** groups.  Because we have 3 different species, the following command will not work.  (Try it and read the error message that results.)

In [None]:
t.test(iris$Petal.Length ~ iris$Species)

##### A Simple Work-Around

We need a variable that only has 2 values. We can easily create one as shown below.  It is a logical variable that we can set to TRUE when species is setosa, and FALSE for all others.


In [None]:
iris$Setosa <- (iris$Species=="setosa")
head(iris)
summary(iris$Setosa)

Notice that the new variable has only 2 values, so it is suitable to use as the 'group' for a 2-sample test.


#### Statistical Test

In the code cell below, you will conduct a t-test to determine whether the average petal length of setosa is **significantly different from** the average petal length of other iris species.  Before you conduct the test, you should fill in the answers to the following. (Double-click this cell and type your answers below.)

1. What are the **_parameters_** we are testing for?  Give symbols and state what they represent.



2. Based on the question above, will this be a one-sided or a two-sided test?



3. Based on the boxplot we created earlier, should we assume equal variance is TRUE or FALSE?



4. State the null and alternative hypotheses.

    $H_0:$

    $H_a:$

In [None]:
# Conduct your t-test here


#### Interpret your results

(Double-click and type your response in this cell.)