-
Notifications
You must be signed in to change notification settings - Fork 4
/
s2_Lab6_Comparisons.Rmd
652 lines (439 loc) · 27.1 KB
/
s2_Lab6_Comparisons.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
```{r, include = FALSE}
source("global_stuff.R")
```
# Linear Contrasts
## Readings
Chapters 12, 13, and 14 from [@abdiExperimentalDesignAnalysis2009].
## Overview
## Practical I: Orthogonal Constrasts
<iframe width="560" height="315" src="https://www.youtube.com/embed/Og1cPcO551w" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
We will start with a practical example of how to do orthogonal contrasts in R. The following R code shows how to reproduce the example from 12.4 in @abdiExperimentalDesignAnalysis2009. The video briefly reviews that example and discusses some of the details of implementing orthogonal contrasts in R.
```{r}
library(tibble)
library(tidyr)
library(dplyr)
options(dplyr.summarise.inform = FALSE)
# create the example data
smith_example <- tribble(
~Same, ~Different, ~Imagery, ~Photo, ~Placebo,
#--|--|--|--|----
25,11,14,25,8,
26,21,15,15,20,
17,9,29,23,10,
15,6,10,21,7,
14,7,12,18,15,
17,14,22,24,7,
14,12,14,14,1,
20,4,20,27,17,
11,7,22,12,11,
21,19,12,11,4
) %>%
pivot_longer(cols = everything(),
names_to = "IV",
values_to = "DV") %>%
mutate(IV = factor(IV,levels = c("Same",
"Different",
"Imagery",
"Photo",
"Placebo")))
# Note: R automatically orders Factor levels alphabetically
# we later define contrasts between groups using
# the the order from the textbook
# so we need to explicitly declare the order of levels
# run the omnibus test
aov.out <- aov(DV~IV, smith_example)
summary(aov.out)
# check the existing contrasts for your factor
contrasts(smith_example$IV)
# define your new set of contrasts
c1 <- c(2,-3,2,2,-3)
c2 <- c(2,0,-1,-1,0)
c3 <- c(0,0,+1,-1,0)
c4 <- c(0,+1,0,0,-1)
# combine them into the columns of a matrix
my_contrasts <- cbind(c1, c2, c3, c4)
# assign them to the contrast property for your IV
contrasts(smith_example$IV) <- my_contrasts
# rerun ANOVA
aov.out <- aov(DV~IV, smith_example)
summary(aov.out)
# use summary.aov and split to print out contrast info
# define your contrast labels that will get printed to the table
(full_summary <- summary.aov(aov.out,
split=list(IV=list("(1+3+4) vs (2+5)"=1,
"(1) vs (3+4)" = 2,
"(3) vs (4)"= 3,
"(2) vs (5)"= 4)
)
)
)
```
```{r example_table, fig.cap="Printing an ANOVA table with papaja"}
library(papaja)
apa_table(apa_print(full_summary)$table)
```
As a sidenote, @abdiExperimentalDesignAnalysis2009 points out that when you use a complete set of orthogonal contrasts, the omnibus F-value is the same as the average of the F-values from each of the contrasts.
```{r}
full_summary[[1]]$`F value`[1]
mean(full_summary[[1]]$`F value`[2:5])
```
## Concept 1: Understanding orthogonal contrasts
<iframe width="560" height="315" src="https://www.youtube.com/embed/egoVLWApFMU" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
```{r}
knitr::include_graphics("imgs/LinearContrasts/LinearContrasts.004.jpeg")
```
The figure shows some group means that are different from the grand mean. The top right panel shows how each group mean can be considered as a unique source for each difference. The bottom three panels on the right show a set of three orthogonal contrasts, which is an alternative way of breaking down the sources of the differences.
Here are a similar pattern of group means, along with the grand mean, deviations, and sum of squares.
```{r}
group_means <- c(4,3,10,11)
(grand_mean <- mean(group_means))
(differences <- group_means-grand_mean)
(squared_differences <- differences^2)
(sum_squares <- sum(squared_differences))
```
If all of the groups are different, and each of them have an independent influence on the grand mean, then we can think of each group mean as having it's own unique effect (represented as four different colored lines in the top right of panel above).
Even though there are four presumed influences, because we are estimating them on the basis of the grand mean, one degree of freedom is lost. So, three of the group means could be any number, but the last one would be fixed in order for the set to produce the specific grand mean here.
When we define linear contrasts, we are defining the make-up of specific means in terms of combinations of the set of means. So, the default ANOVA assumption that each mean is unique, can be thought of itself as a very basic set of orthogonal contrasts.
If we enter the group names and means into a dataframe, and declare the IV as a factor, we can see the basic set of orthogonal contrasts.
```{r}
fake_data <- tibble(IV = factor(c("A","B","C","D")),
DV = c(4,3,10,11))
contrasts(fake_data$IV)
```
Let's walk through this and answer some rhetorical questions. Our question for each group mean is, "how will we estimate this mean?". First of all, we assume that all of the group means are deviations from the grand mean. So each group mean will be expressed as the sum of the grand mean and a deviation.
Let's start with D. What values from the data will we use to estimate the mean of D? The values in the column for D show the contrast **weights**. We are saying we will take 100% of deviation for D (an 4), and 0% of the other means.
```{r}
contrasts(fake_data$IV)[,'D']
contrasts(fake_data$IV)[,'D'] * differences
# the formula for D
grand_mean + (1 * differences[4])
```
The same goes for C and B. We will estimate each of those means, with the grand mean and their unique deviations
```{r}
contrasts(fake_data$IV)
contrasts(fake_data$IV) * differences
```
So far if we say that each mean is the sum of the grand mean and the unique deviations from the grand mean, we get the original mean back for B, C, and D:
```{r}
grand_mean*contrasts(fake_data$IV) + contrasts(fake_data$IV) * differences
```
Notice that there is no contrast weight for A. This is the last mean left. Remember the degrees of freedom issue. We can define three of means in terms of the grand mean, but the last one will be fixed. There is only one value that A can be, in order for the grand mean to be 7.
```{r}
grand_mean
# A has to be four, if the others are 3, 10, and 11)
mean(c(4, 3, 10, 11))
```
### Other orthogonal contrasts
```{r}
knitr::include_graphics("imgs/LinearContrasts/LinearContrasts.004.jpeg")
```
Just to keep this picture close at hand, the previous section showed how the top right panel, showing the idea that each mean is a unique source, can be described in terms of linear contrasts.
The same set of means can be described by **any** set of orthogonal linear contrasts. For example, consider the set of three contrasts in the figure.
```{r}
c1 <- c(-1,-1,1,1)
c2 <- c(1,-1,0,0)
c3 <- c(0,0,-1,1)
my_contrasts <- cbind(c1,c2,c3)
contrasts(fake_data$IV) <- my_contrasts
contrasts(fake_data$IV)
# check they are orthogonal
cor(contrasts(fake_data$IV))
```
The formula from the textbook for computing Sums of Squares for contrasts is:
$\frac{S(\sum{C_a M_a})^2}{\sum{C_a^2}}$
If we compute the SSs for each contrast, we can see that they add up to the same total SS (50) from the example above. This will be true for any set of orthogonal linear contrasts.
```{r}
# multiple contrast weights by group means
contrasts(fake_data$IV) * group_means
# Find the sums for each column
colSums(contrasts(fake_data$IV) * group_means)
# Square the sums
colSums(contrasts(fake_data$IV) * group_means)^2
# divide by the SS for the contrast weights
(colSums(contrasts(fake_data$IV) * group_means)^2)/ colSums(contrasts(fake_data$IV)^2)
```
### Explanation part 2
<iframe width="560" height="315" src="https://www.youtube.com/embed/ET-Hz1tUVKM" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
This is an addendum to the above "explanation", which I gave a low score to myself for explanatory value. I thought I could do better, so I came up with this...
**Here's the thing about orthogonal linear contrasts**: if you add them up in the right way, you are guaranteed to get the original means back.
Let's see this process in action:
From the above example, our means are:
```{r}
fake_data$DV
```
and our set of three linear contrasts are:
```{r}
contrasts(fake_data$IV)
```
The claim is that if I add these contrasts up in the right way, I can get back the original pattern of means.
Let's try adding, remember we start with the grand means. So, the question will be, "can we add some combination of the linear contrasts in such a way that we can get the original pattern of means back?".
```{r}
grand_means <- c(7,7,7,7)
grand_means
```
First, note that we need all of the contrasts to do this. For example, there is now way we can add just the first linear contrast and somehow get the original means back...
```{r}
grand_means + contrasts(fake_data$IV)[,1]
grand_means + contrasts(fake_data$IV)[,1]*2
grand_means + contrasts(fake_data$IV)[,1]*3
```
In the above example no matter how many times we add the linear contrast to the grand means, we can never get the pattern in the original means...A will also be the same as B, and C will always be the same as D. We won't capture those extra nuances, so we need the other two contrasts.
Check this out, I add three of the first contrast, and 1 each of the second and third contrasts. I'm getting something close to the pattern of original means. But, it's not clear to me exactly how much of each to add. The claim is that there is some combination that will produce the original means, what is it? what are the "coefficients" telling me how much of each contrast to add?
I could fiddle about by hand and try different coefficients...
```{r}
grand_means+
(contrasts(fake_data$IV)[,1]*3.5)+
(contrasts(fake_data$IV)[,2]*.5)+
(contrasts(fake_data$IV)[,3]*.5)
```
Or, we could connect this issue with the concept of linear regression. Effectively, we are trying to explain the pattern of means (4,3,10,11), in terms of the multiple *linear* regression of the three different linear contrasts. Our question is what are the weights for each?
```{r}
fake_data_2 <- fake_data
fake_data_2 <- cbind(fake_data,contrasts(fake_data$IV))
lm(DV ~ c1 + c2 + c3, data = fake_data_2 )
summary(lm(DV ~ c1 + c2 + c3, data = fake_data_2 ))
```
Ok, we just used multiple linear regression to find how much of each contrast we need to add together to reproduce the original pattern of means. The above coefficients were 3, 1, and, 1, we just need to change this to 3.5, .5, and, .5:
```{r}
grand_means+
(contrasts(fake_data$IV)[,1]*3.5)+
(contrasts(fake_data$IV)[,2]*.5)+
(contrasts(fake_data$IV)[,3]*.5)
```
Although we don't have time in this lab to dive into the issue. The above will always be true for any set of orthogonal linear contrasts. You can always decompose and recompose a set of means into weighted contributions of linear contrasts. I will leave it to you to define different orthogonal linear contrasts for this set of means, and then show that they can be added up to produce the original set of means (sounds like a good generalization problem).
To do one last demonstration. The point here is that any set of means can be described as the combination of a set of orthogonal contrasts. We made one example of a set of orthogonal linear contrasts for a four group situation, so we should be able to show that this set of contrasts can reproduce any set of means.
I've assembled code from our previous examples. We are using the previously defined set of orthogonal contrasts. Now, we should be able to put any numbers we want for the group means (DV), and find that our set of contrasts can perfectly explain them.
```{r}
# you should be able to change any of the DV numbers
# and always find a combination of contrasts
# that perfectly explains the data
fake_data <- tibble(IV = factor(c("A","B","C","D")),
DV = c(43,22,53,104))
c1 <- c(-1,-1,1,1)
c2 <- c(1,-1,0,0)
c3 <- c(0,0,-1,1)
my_contrasts <- cbind(c1,c2,c3)
contrasts(fake_data$IV) <- my_contrasts
fake_data_2 <- cbind(fake_data,contrasts(fake_data$IV))
lm(DV ~ c1 + c2 + c3, data = fake_data_2 )
summary(lm(DV ~ c1 + c2 + c3, data = fake_data_2 ))
```
<!--
composable on the basis of
-->
<!--
Consider showing examples of creating other orthogonal contrasts...but oh ya, I was going to do that for the assignments...
-->
<!--
do a monte-carlo simulation...familywise etc.
-->
## Concept 2: Family-wise error rate
<iframe width="560" height="315" src="https://www.youtube.com/embed/hh1BM6Z6O4M" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
Orthogonal linear contrasts show how a single omnibus ANOVA can be broken down into a set of contrasts. This is useful especially when the contrasts are designed to examine patterns of interest.
At the same time, conducting linear contrasts is like conducting multiple independent significance tests, and as you conduct more tests, you will increase the likelihood of making an inferential error (e.g., rejecting the null even the effect isn't real).
This section is a supplement to the textbook section 12.3.2, describing a Monte Carlo simulation to illustrate the difference between to alphas:
1. $\alpha(PC)$ or alpha per comparison: the probability of making a type I error for a single test
2. $\alpha(PF)$, or alpha per family of comparisons, the probability of making at least one type I error across a whole set of related tests.
### Monte-Carlo simulation of two alphas
The textbook describes a monte-carlo simulation of a one-factor design with 6 groups. There are 100 observations per group, all sampled from a normal distribution.
For each simulation we do two things:
1. Conduct an omnibus ANOVA and compute F. If the p-value is smaller than .05, we "reject the null", and count this a type I error.
2. conduct 5 linear orthogonal contrasts, compute each F, and if any have p-values smaller than .05, we "reject" them, and count them as type I errors.
In the textbook example, there were 10,000 simulations. The below code contains the basic components we need to perform the simulation.
```{r}
# example dataframe to simulate null
sim_data <- tibble(DV = rnorm(6*100,0,1),
IV = factor(rep(1:6, each = 100)))
# example orthogonal linear contrasts
c1 <- c(1,-1,0,0,0,0)
c2 <- c(0,0,1,-1,0,0)
c3 <- c(0,0,0,0,1,-1)
c4 <- c(-1,-1,2,2,-1,-1)
c5 <- c(1,1,0,0,-1,-1)
# create contrast matrix
orth_contrasts <- cbind(c1,c2,c3,c4,c5)
# check contrasts are orthogonal
cor(orth_contrasts)
# assign new contrasts to IV
contrasts(sim_data$IV) <- orth_contrasts
# run ANOVA
summary.aov(aov(DV~IV, sim_data), split=list(IV=list("c1"=1,
"c2" = 2,
"c3"= 3,
"c4"= 4,
"c5" = 5)
))
```
When we conduct the simulation we want to count type I errors. Specifically, let's count type I errors for the omnibus test, as well as for the linear contrasts. Our simulation will be similar to the one reported in Table 12.1 from the textbook.
In terms of organizing the code, I will adopt a strategy of saving the necessary data from the simulation into a dataframe that can then be analyzed to produce a summary table.
```{r}
#create a tibble to store all the results
all_sim_data <- tibble()
# conduct simulation
for(i in 1:10000){
sim_data <- tibble(DV = rnorm(6*100,0,1),
IV = factor(rep(1:6, each = 100)))
contrasts(sim_data$IV) <- orth_contrasts
sim_output <- summary.aov(aov(DV~IV, sim_data), split=list(IV=list("c1"=1,
"c2" = 2,
"c3"= 3,
"c4"= 4,
"c5" = 5)
))
#save current results in a tibble
sim_results <- tibble(type = c("omnibus",rep("contrast",5)),
p_values = sim_output[[1]]$`Pr(>F)`[1:6],
sim_num = rep(i,6)
)
# add current results to the tibble storing all the results
all_sim_data <- rbind(all_sim_data,sim_results)
}
```
Let's use `dplyr` to count how many type I errors occurred for omnibus and contrast based F-tests.
```{r}
# analyze the sim data
type_I_errors <- all_sim_data %>%
mutate(type_I = p_values < .05) %>%
group_by(type, sim_num) %>%
summarize(counts = sum(type_I)) %>%
group_by(type,counts) %>%
summarize(type_I_frequency = sum(counts))
knitr::kable(type_I_errors)
```
The type I error rate for the omnibus test is pretty close to .05:
```{r}
type_I_errors %>%
filter(type == 'omnibus',
counts == 1) %>%
pull(type_I_frequency)/10000
```
Similarly, we conducted 50000 total linear contrasts. Looking at individual contrasts, the type I error rate is also close to .05:
```{r}
type_I_errors %>%
filter(type == 'contrast',
counts > 0) %>%
pull(type_I_frequency) %>%
sum()/50000
```
However, **the family-wise** type I error rate is larger. We conducted 10,000 simulated experiments, if asked the question, what was the probability of rejecting the null if any of the linear contrasts were significant, then we find:
```{r}
type_I_errors %>%
filter(type == 'contrast',
counts > 0) %>%
pull(type_I_frequency) %>%
sum()/10000
```
## Concept 3: Correcting for multiple comparisons
Your textbook describes several methods for "correcting" p-values for multiple comparisons to protect against the family-wise type I error rate. Some of these corrections include the Sidak, Bonferroni, Boole, and Dunn. I going to purposefully leave this concept section mostly blank; and, discuss why in class.
As a blunt perspective, I think it is generally useful to report how many tests you conducted, along with the uncorrected p-values for each. In my experience, many different people have different perspectives on what ought to be done. Most of the corrections can be done to the uncorrected p-values, so if a reader has a preference, you have supplied them with the necessary information for them to do the corrections that they think are necessary.
## Practical II: Non-orthogonal Contrasts
<iframe width="560" height="315" src="https://www.youtube.com/embed/8pPDvBEo_bE" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
This practical section will be brief. The purpose will be to show how to accomplish textbook examples from chapter 13 in R.
### 13.2.3 Pretend non-orthogonal contrasts are orthogonal
In this example, sample data from the "romeo and juliet" example are supplied, along with four linear contrasts testing different research hypotheses. The textbook shows examples of computing F statistics for each contrast.
We can do this in R too, **with some minor difficulties**.
```{r}
romeo_juliet <- tibble(subjects = 1:20,
Group = rep(c("Context Before",
"Partial Context",
"Context After",
"Without context"), each = 5),
Comprehension = c(5,9,8,4,9,
5,4,3,5,4,
2,4,5,4,1,
3,3,2,4,3
)
)
romeo_juliet$Group <- factor(romeo_juliet$Group,
levels = c("Context Before",
"Partial Context",
"Context After",
"Without context")
)
# define non-orthogonal contrasts
c1 <- c(1,1,1,-3)
c2 <- c(0,0,1,-1)
c3 <- c(3,-1,-1,-1)
c4 <- c(1,-1,0,0)
new_contrasts <- cbind(c1,c2,c3,c4)
cor(new_contrasts)
contrasts(romeo_juliet$Group) <- new_contrasts
# note, by default, correctly computes the first contrast, but not the rest...
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("c1"=1, "c2" = 2, "c3"= 3, "c4" = 4)))
# you could enter them one at a time:
contrasts(romeo_juliet$Group) <- c1
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("c1"=1)))
contrasts(romeo_juliet$Group) <- c2
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("c2"=1)))
contrasts(romeo_juliet$Group) <- c3
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("c3"=1)))
contrasts(romeo_juliet$Group) <- c4
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("c4"=1)))
```
### 13.3.3 Multiple regression of non-orthogonal contrasts
Next we jump to the last example. We again use the "romeo and juliet" data, and we are given three non-orthogonal contrasts (table 13.5).
In this example, the textbook looks at how the F-values for each contrasts depend on whether you take the "traditional" approach, or consider a role for semi-partial correlation (the modern approach).
```{r}
romeo_juliet <- tibble(subjects = 1:20,
Group = rep(c("Context Before",
"Partial Context",
"Context After",
"Without context"), each = 5),
Comprehension = c(5,9,8,4,9,
5,4,3,5,4,
2,4,5,4,1,
3,3,2,4,3
)
)
romeo_juliet$Group <- factor(romeo_juliet$Group,
levels = c("Context Before",
"Partial Context",
"Context After",
"Without context")
)
# traditional approach
# define contrasts
c1 <- c(3,-1,-1,-1)
c2 <- c(1,1,-1,-1)
c3 <- c(1,-1,1,-1)
# run individual ANOVAs for each contrast
contrasts(romeo_juliet$Group) <- c1
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("contrast"=1)))
contrasts(romeo_juliet$Group) <- c2
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("contrast"=1)))
contrasts(romeo_juliet$Group) <- c3
summary.aov(aov(Comprehension~Group, romeo_juliet), split=list(Group=list("contrast"=1)))
# the modern approach
# add contrasts as factors in a multiple linear regression
romeo_juliet <- romeo_juliet %>%
mutate(c1 = rep(c(3,-1,-1,-1),each=5),
c2 = rep(c(1,1,-1,-1),each=5),
c3 = rep(c(1,-1,1,-1),each=5)
)
# conduct the multiple linear regression
summary(lm(Comprehension ~ c1 + c2 + c3 , romeo_juliet))
# note the t^2 values are the same as the F's from the textbook
# semi-partial correlations
library(ppcor)
spcor(romeo_juliet[,3:6])$estimate^2
```
## Lab 6 Generalization Assignment
<iframe width="560" height="315" src="https://www.youtube.com/embed/UkpAGiO1ic4" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
### Instructions
Your assignment instructions are the following:
1. Work inside the new R project for stats II that you created
2. Create a new R Markdown document called "Lab6.Rmd"
3. Use Lab6.Rmd to show your work attempting to solve the following generalization problems. Commit your work regularly so that it appears on your Github repository.
4. **For each problem, make a note about how much of the problem you believe you can solve independently without help**. For example, if you needed to watch the help video and are unable to solve the problem on your own without copying the answers, then your note would be 0. If you are confident you can complete the problem from scratch completely on your own, your note would be 100. It is OK to have all 0s or 100s anything in between.
5. Submit your github repository link for Lab 6 on blackboard.
### Problems
1. Section 12.3.3 from your textbook refers to: The problem with replications of a meaningless experiment: 'alpha and the captain's age'. The issue here is that if you run an ineffectual experiment enough times you can always find a significant result by chance. The textbook mentions that if you repeat an experiment 20 times, you are guaranteed to find a significant result with .64 probability, and the probability is .92 if you repeat the experiment 50 times.
a. Make use of the `rbinom()` function to show you can reproduce both probabilities. (1 point)
b. If the ineffectual experiment was conducted 20 times, and there were four groups, and the experimenter would accept a significant result from any of the orthogonal linear contrasts, what would be the probability of finding a significant result here? (1 point)
The next two questions draw a connection to a technique we have not yet discussed called p-curve analysis [@simonsohnPcurveKeyFiledrawer2014;@wallisCompoundingProbabilitiesIndependent1942]. P-curve analysis is sometimes used for purposes of meta-analyses to determine whether there is "good" evidence for an effect in the literature.
2. Consider that a researcher publishes a study showing a significant effect, p <. 05; but, in reality the researcher makes a type I error, and the manipulation did not cause any difference. If many other researchers replicated the study, what kind of p-values would they find? Use R to create a sampling distribution of p-values that would be expected in this situation. What shape does this distribution have? (2 points)
3. Now assume that the published result reflects a true effect. Specifically, let's imagine the study had two groups (between-subjects), with 20 subjects in each group. Assume that scores for subjects are all sampled from a normal distribution, and that group A has larger mean than group B by .5 standard deviations (e.g., Cohen's d = .5). If many other researchers replicated the study, what kind of p-values would they find? Use R to create a sampling distribution of p-values that would be expected in this situation. What shape does this distribution have? (2 points)
Bonus Questions
4. Same as #3, except that we now assume the design has four groups (between-subjects). Assume that group A has a mean that is .5 standard deviations larger than groups B, C, and D. Use R to create a sampling distribution of p-values that would be expected for the linear contrast evaluating the research hypothesis that A > B = C = D. (1 point)
5. Consider a one-factor between subjects ANOVA with four groups. Run two simulations of the null-hypothesis, one for the omnibus test, and one for the specific linear contrast mentioned above A > B = C = D. Is the probability of rejecting a type I error (for rejecting the null with alpha < .05) the same for the omnibus test versus a specific contrast? (1 point)
## References