# MATH 3350 Course Notes - Module S5 (Part II)

## Matched Pairs versus Independent Samples


First, we examine how comparing two independent samples is different than a matched pairs scenario.

### Example 1. Base Running Strategies  
Studies about baseball players' running strategies (e.g., Hollander and Wolfe, 1999) investigated whether taking a wide angle around first base was faster than taking a narrow angle.  A study had 22 runners run from home to second base once using each strategy (with randomized order and a rest between runs.)  Each runner was timed between two fixed points as they traveled around first base.  

The data set contains each runner's time in seconds on both the 'narrow' and the 'wide' path. We have a look at the data set below.

In [None]:
#Runners' data for narrow and wide base-running path

runners <- read.csv("BaseRunning.csv")
head(runners)

summary(runners$narrow)
summary(runners$wide)


#### Independent Samples?
Suppose we treat the narrow times and the wide times as two independent samples (they are NOT independent!) Then we might run a 2-sample t-test with these hypotheses:  

Hypotheses are as follows.
<center>
$H_{0}: \mu_N = \mu_W$  
</center>
<center>
$H_{a}: \mu_N > \mu_W$
</center>

(Notice that a _larger_ mean indicates a _slower_ average time.  The hypothesis is that the times to run the narrow path are longer, on average.)  

A t-test is shown below for these hypotheses.

In [None]:
#Treat base running data as 2 independent samples:
#Independent Samples t-test (NOT ACTUALLY APPROPRIATE FOR THESE DATA)

t.test(runners$narrow,runners$wide, alternative="greater")

The test above has a non-significant result, suggesting no statistically significant difference the two mean running times.  

#### Dependent Samples 
The narrow and wide path data are not independent.  One data point in each sample is related to a data point in the other sample because the two data points are times from the same runner. This scenario is known as **Matched Pairs**.    

Instead of examining the times, we are really interested in _each runner's **time difference** between the two paths_.  

Now our hypotheses are as follows:  

<center>
$H_{0}: \mu_\delta = 0$  
</center>
<center>
$H_{a}: \mu_\delta > 0$
</center>

where $\delta = Narrow - Wide$  

Instead of hypothesizing about a difference between two mean running times, we are hypothesizing about a single mean difference in running times.  This turns the test back into a "1 sample" test, where the sample contains the difference in run time for each runner.  The matched pairs test is performed below.

In [None]:
#Treat data as matched pairs (dependent samples)

diffs <- runners$narrow - runners$wide
t.test(diffs, alternative = "greater")

In [None]:
#A second method for matched pairs in R
t.test(runners$narrow,runners$wide, paired=TRUE, alternative="greater")

This time, the test result is highly significant.  We have strong evidence that on average, runners clock faster times when they take the wide route.  

#### Why did the results change so much?

The matched pairs design is powerful because it reduces variability that results from the runners' individual speeds by focusing only on the differences. Look at the standard deviations of the run times, compared to the standard deviation of the time differences by route (narrow vs wide). 

In [None]:
#Examine standard deviations of runners' times versus standard deviations of differences.
sd(runners$narrow)
sd(runners$wide)
sd(diffs)

Most of the variability in the running times can be attributed to the differences _in runners' speeds_, not the path they ran.  By eliminating the variability of individual runners' speeds, the test allows us to focus on the remaining variability, which is easier to detect without all the "noise" in the original data.  

#### Matched Pairs Designs
The ability to control for individual differences makes the Matched Pairs design very powerful for certain types of studies.  Here are some examples:  
* Pre-test and post-test designs: take a "before" and "after" measurement from each individual to determine the effect of some treatment
* Taking data from each individual under two different conditions (e.g., attempting a task with music playing and without music playing)
* Comparing skin reactions by applying different creams on each participant's left and right arm
* Comparing some aspect of each participant's dominant and non-dominant hand (strength, speed, coordination, etc.)

## Comparing Means of More than 2 Groups

We will use the R dataset 'ChickWeight' which provides data from a study where newborn chicks were fed one of four different diets; their weights (in grams) were measured throughout the study, including at the end of the study. Below we see the format of the data set. 

In [None]:
head(ChickWeight)

The Time variable tells the chick's age in days at the time a measurement was taken.  We are only interested in the chick weights at the end of the 3 week period, so we will create a smaller data set containing only those records where Time is 21 (days).

In [None]:
GrownChicks <- ChickWeight[ChickWeight$Time==21,]
head(GrownChicks)

We are interested in whether mean chick weight (in general, not just in this sample) may vary depending on the diet it is given. The box plots below provide exploratory analysis; the dotted vertical line shows the mean weight across the chicks in all 4 samples combined (the "Grand Mean").

In [None]:
boxplot(GrownChicks$weight ~ GrownChicks$Diet, horizontal=TRUE, xlab="Chick Weight", ylab="Diet")
abline(v=mean(GrownChicks$weight), lty=4, lwd=3, col="red")
legend("topleft", lty=4, lwd=3, col="red", legend = "Grand Mean")

The plots give us reason to suspect that the differences in chick weights based on diet may be statistically significant (they reflect more than just random variation).  Because there are 4 groups, a t-test will not suffice.  

### ANOVA: ANalysis Of VAriance 

The one-way ANOVA statistical test is used to detect whether ANY statistically significant difference exist among means of multiple groups.  The hypotheses for this test are formatted as follows:  

<center>
$H_{0}: \mu_1 = \mu_2 = ... = \mu_k$  
</center>
<center>
$H_{a}: $ At least one $\mu_i$ is different
</center>

The test statistic for this test is calculated using the following **RATIO**:  

<center>
$\frac{MSG}{MSE}$
</center>

where $MSG$ represents the mean square variation **_between_** groups, and $MSE$ represents the mean square error **_within_** all groups combined.  

The distribution of possible values for this ratio is the **_F_** distribution (named for statistician Ronald Fischer).  The **_F_** distribution is defined using 2 values: a _numerator_ degrees of freedom (**df1**) and a _denominator_ degrees of freedom (**df2**).  These are defined as follows:  

<center>
$df1 = k - 1$  
</center>
<center>
$df2 = N - k$
</center>

where $k$ represents the number of **_groups_** and $N$ represents the _total combined sample size_.  

The $MSG$ and $MSE$ are calculated by dividing the associated _Sum of Squares_ (**SS**) by the corresponding degrees of freedom:  


<center>
$MSG = \frac{SSG}{df1}$  
</center>
<center>
$MSE = \frac{SSE}{df2}$
</center>

The Sum of Squares values are computed as follows.

##### TOTAL Sum of Squares (SST)

<center>
$SST = \left [  \sum_{1}^{N}{({x}_i - \overline{X}_G)^{2}} \right ] = SSG + SSE$
</center> 

##### Sum of Squares BETWEEN Groups (SSG)
<center>
$SSG = \sum_{1}^{k}{n_i(\overline{x}_i - \overline{X}_G)^{2}}$  
</center>

##### Sum  of Squares WITHIN Groups (SSE)
<center>
$SSE = SST - SSG$
</center>  

where $n_i$ and $\overline{x}_i$ represent the sample size and mean of each _subgroup_, and $\overline{X}_G$ represents the "_grand mean_" of the combined data set.  


We show below how these values are computed for our sample. 

In [None]:
#Start with overall statistics (all samples combined) and degrees of freedom
k <- 4  #number of subgroups
df1 <- k-1

grandMean <- mean(GrownChicks$weight)
groupSD <- sd(GrownChicks$weight)
groupN <- length(GrownChicks$weight)
df2 <- groupN - k

cat("Sample size N =", groupN, "with k =", k, "subgroups\n")
cat("Degrees of Freedom: ", df1, "and", df2, "\n")
cat("Whole group: mean =", grandMean, "sd =", groupSD, "N =",groupN, "\n")

In [None]:
#Get mean, standard deviation, and sample size of all subgroups
m <- c()   #Vector of means
s <- c()   #Vector of standard deviations
n <- c()   #Vector of sample sizes

m[1] <- mean(GrownChicks$weight[GrownChicks$Diet == 1])
m[2] <- mean(GrownChicks$weight[GrownChicks$Diet == 2])
m[3] <- mean(GrownChicks$weight[GrownChicks$Diet == 3])
m[4] <- mean(GrownChicks$weight[GrownChicks$Diet == 4])

s[1] <- sd(GrownChicks$weight[GrownChicks$Diet == 1])
s[2] <- sd(GrownChicks$weight[GrownChicks$Diet == 2])
s[3] <- sd(GrownChicks$weight[GrownChicks$Diet == 3])
s[4] <- sd(GrownChicks$weight[GrownChicks$Diet == 4])

n[1] <- length(GrownChicks$weight[GrownChicks$Diet == 1])
n[2] <- length(GrownChicks$weight[GrownChicks$Diet == 2])
n[3] <- length(GrownChicks$weight[GrownChicks$Diet == 3])
n[4] <- length(GrownChicks$weight[GrownChicks$Diet == 4])

cat("Subgroup means:", m, "\n")
cat("Subgroup std deviations:",s, "\n")
cat("Subgroup sample sizes:",n, "\n")


#### Conditions for ANOVA
The conditions for ANOVA are essentially the same as those for the 2-sample t-test, but we have different rules of thumb checking these conditions:  

##### 1. Normality: Sampling distributions should be close to normal. 
* Avoid conducting the ANOVA procedure when overall sample size is less than 20  

##### 2. Independence: All observations in the sample should be independent.  

##### 3. Homogeneity of Variance: Variance of sub-populations should be equal or close to equal.
* Ensure that the ratio of largest to smallest subgroup standard deviation is LESS than 2:1
* Ensure that the ration of largest to smallest subgroup sample size is LESS than 2:1  

From the data above, we can see that: 
* Overall sample size is 45
* Ratio of largest to smallest standard deviation = $78.1:43.3 \approx 1.8:1$
* Ratio of largest to smallest sample size = $16:9 \approx 1.8:1$  

Therefore, we will proceed with the test.

#### Hypotheses

<center>
$H_{0}: \mu_1 = \mu_2 = \mu_3 = \mu_4$  
</center>
<center>
$H_{a}: $ At least one $\mu_i$ is different
</center>  

We calculate the test statistic below, using formulas given above.

In [None]:
#Calculate MSG, MSE, and F statistic

SST <- sum((GrownChicks$weight - grandMean)^2)
cat("Sum of Squares Total (SST):", SST, "\n")

SSG <- 0
for (i in 1 : 4)
    SSG = SSG + n[i]*(m[i]-grandMean)^2

cat("Between group Sum of Squares (SSG):",SSG,"\n")

MSG <- SSG/df1
cat ("Mean Square variation between Groups (MSG):", MSG,"\n")

SSE <- SST - SSG
cat("Sum of Squared Error within groups (SSE):", SSE, "\n")

MSE <- SSE/df2
cat ("Mean Square Error within groups (MSE):", MSE,"\n")

F_stat <- MSG/MSE
cat ("F statistic: ", F_stat)

In [None]:
#Visualize where our statistic falls on this F distribution

xvalues <- seq(0,5,0.05)     # Generate x-values
Fvalues <- df(xvalues, df1=df1, df2=df2)

title <- paste("F Distribution with df1=",df1,", df2=",df2)
plot (xvalues,Fvalues, main=title, ylab="Density", xlab="F Statistic", type="l", lwd=4)
abline(v=F_stat, col="red", lty=2)

In [None]:
#Find p-value for above statistic with designated degrees of freedom
#NOTE: p-value is always area in upper tail for F distribution

p_val <- pf(F_stat, df1=df1, df2=df2, lower.tail=FALSE)
cat("p value: ", p_val)

#### Interpretation
The p-value is very small, leading us to reject the null hypothesis and conclude that at least one of the diets will lead to a significantly different mean weight in newborn chicks. Followup studies would be needed to identify which diet(s) lead to significantly different weight(s).

### ANOVA Using R's Packaged Functions
**SURPRISE!!**  R has functions that will do the heavy lifting for you.  An example is shown below.

In [None]:
#Conduct ANOVA using functions in R Library
modelA <- lm(weight ~ Diet, data=GrownChicks)
anova(modelA)