Rebecca Black

## Determining the Optimal Chopstick Length
### A Repeated Measures Analysis of Variance

This is an analysis of a portion of an experiment designed to determine the most efficient length among chopsticks measuring 180, 210, 240, 270, 300, and 330 mm. The subjects were thirty-one male junior college students, and the response variable was a measure of how many peanuts the subject picked up and placed in a cup (a.k.a. Food Pinching Performance.)

Due to the correlation of response measures for each subject, an ordinary analysis of variance cannot be used. Instead I will use a Treatment-by-Subjects ANOVA to model the treatment effects while taking into consideration the within-subject variability that would be ignored if I were to use ordinary ANOVA. A Single Factor Repeated Measures ANOVA would also work, but in that case, I would not be able to use Tukey's HSD Test to conduct multiple comparisons among the treatments.

This analysis was done in R.

### Read in the data

In [19]:
chopsticks=read.csv('chopstick_effectiveness.csv', header=T)

### Print the initial structure and variables

In [20]:
dim(chopsticks)
names(chopsticks)

### Convert the dataset to a dataframe

In [21]:
chopsticks=as.data.frame(chopsticks)

### Look at some summary statistics

In [22]:
str(chopsticks)
summary(chopsticks)
head(chopsticks, n=5)

'data.frame':	186 obs. of  3 variables:
 $ FoodPinchingEfficiency: num  19.6 27.2 28.8 31.2 21.9 ...
 $ Individual            : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ChopstickLength       : int  180 180 180 180 180 180 180 180 180 180 ...


 FoodPinchingEfficiency   Individual ChopstickLength
 Min.   :14.47          Min.   : 1   Min.   :180    
 1st Qu.:22.54          1st Qu.: 8   1st Qu.:210    
 Median :24.91          Median :16   Median :255    
 Mean   :25.01          Mean   :16   Mean   :255    
 3rd Qu.:27.93          3rd Qu.:24   3rd Qu.:300    
 Max.   :36.15          Max.   :31   Max.   :330    

Unnamed: 0,FoodPinchingEfficiency,Individual,ChopstickLength
1,19.55,1,180
2,27.24,2,180
3,28.76,3,180
4,31.19,4,180
5,21.91,5,180


Now to do the ANOVA, the subjects identifier (here it is "Individual") must be a factor. So I convert Individual from type int to type factor. I obviously also need to do the same to ChopstickLength, since it is the treatment in question in this analysis.

In [23]:
chopsticks$Individual=as.factor(chopsticks$Individual)
chopsticks$ChopstickLength=as.factor(chopsticks$ChopstickLength)

Now to confirm the changes

In [24]:
str(chopsticks$Individual)
str(chopsticks$ChopstickLength)

 Factor w/ 31 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 Factor w/ 6 levels "180","210","240",..: 1 1 1 1 1 1 1 1 1 1 ...


Nice. Now I'm ready to go.

### Now for the questions I want to answer with this analysis

#### Question 1:
Do the treatment (chopstick length) effects differ?

To answer this question, I'll run a Treatment-by-Subjects ANOVA.

In [25]:
aov.tbys = aov(FoodPinchingEfficiency ~ ChopstickLength + Individual, data=chopsticks)
summary(aov.tbys)

                 Df Sum Sq Mean Sq F value   Pr(>F)    
ChopstickLength   5  106.9   21.37   5.051 0.000262 ***
Individual       30 2277.5   75.92  17.944  < 2e-16 ***
Residuals       150  634.6    4.23                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The treatment in question here is chopstick length. I wish to determine if there is a difference in mean food pinching efficiency for each of the lengths. The null hypothesis is that the mean food pinching efficiency is equal for all treatments (e.g. all of the chopstick lengths.) As you can see here, the p-value for the hypothesis test on chopstick length is 0.000262, which is significant at $ \alpha $ =0.99. So we reject the null hypothesis of equal treatment means and conclude that at least one of the chopstick lengths has a mean food pinching efficiency that is unequal to the others (or equivalently, at least one pair of treatment means differ significantly.)

Now it is important to understand that if I had analyzed this as an ordinary ANOVA, the treatment differences would not have been detected, since the within-subject variance would not have been taken into account. To illustrate this, I will show the results for the ordinary ANOVA, in which the within subject variance is ignored.

In [26]:
aov_ord = aov(FoodPinchingEfficiency ~ ChopstickLength, data=chopsticks)
summary(aov_ord)

                 Df Sum Sq Mean Sq F value Pr(>F)
ChopstickLength   5  106.9   21.37   1.321  0.257
Residuals       180 2912.2   16.18               

As you can see, the p-value is 0.257, which in most settings would be considered highly insignificant.

#### Question 2:
Given the difference in the treatment effects, where do those differences occur (e.g. which chopstick lengths are significantly more or less efficient than the others), and which chopstick length offers the greatest mean food pinching efficiency?

To answer this question, I'll use Tukey's Honestly Significant Difference (HSD) test.

In [27]:
TukeyHSD(aov.tbys, which="ChopstickLength")

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = FoodPinchingEfficiency ~ ChopstickLength + Individual, data = chopsticks)

$ChopstickLength
               diff        lwr         upr     p adj
210-180  0.54870968 -0.9595748  2.05699418 0.8999148
240-180  1.38774194 -0.1205426  2.89602644 0.0904885
270-180 -0.61129032 -2.1195748  0.89699418 0.8503866
300-180  0.03290323 -1.4753813  1.54118773 0.9999999
330-180 -0.93548387 -2.4437684  0.57280063 0.4749602
240-210  0.83903226 -0.6692522  2.34731676 0.5959492
270-210 -1.16000000 -2.6682845  0.34828450 0.2346843
300-210 -0.51580645 -2.0240910  0.99247805 0.9213891
330-210 -1.48419355 -2.9924781  0.02409096 0.0565555
270-240 -1.99903226 -3.5073168 -0.49074775 0.0025803
300-240 -1.35483871 -2.8631232  0.15344579 0.1053005
330-240 -2.32322581 -3.8315103 -0.81494130 0.0002412
300-270  0.64419355 -0.8640910  2.15247805 0.8199855
330-270 -0.32419355 -1.8324781  1.18409096 0.9893780
330-300 -0.9683871

So if we use $ \alpha $ =0.99, we see that the mean differences of 270-240 and 330-240 are significant. Since the differences in both cases are negative, we can conclude that 240mm chopsticks have a higher mean food pinching efficiency than 270mm chopsticks, and that 240mm chopsticks have a higher mean food pinching efficiency than 330mm chopsticks. We could make some additional conclusions if we had selected a lower $ \alpha $ level.

So what do these results mean in the world of chopstick length selection? Given the population under consideration (adult male junior college students, which we are presumably using as a proxy for all adults,) we can consider 240mm chopsticks to offer the greatest efficiency. 