-
Notifications
You must be signed in to change notification settings - Fork 7
/
33-Testing-MeanDifference.Rmd
979 lines (743 loc) · 39.3 KB
/
33-Testing-MeanDifference.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
# Tests for the mean difference (paired data) {#TestPairedMeans}
\index{Research question!relational}
<!-- Introductions; easier to separate by format -->
```{r, child = if (knitr::is_html_output()) {'./introductions/33-Testing-MeanDifference-HTML.Rmd'} else {'./introductions/33-Testing-MeanDifference-LaTeX.Rmd'}}
```
<!-- Define colours as appropriate -->
```{r, child = if (knitr::is_html_output()) {'./children/coloursHTML.Rmd'} else {'./children/coloursLaTeX.Rmd'}}
```
## Introduction: soil nitrogen {#PairedTTest-SoilN}
<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/meal-3236971_1280.jpg" width="200px"/>
</div>
@lambie2021microbial compared the percentage nitrogen (%N) in soils from irrigated and non-irrigated intensively-grazed pastures.
The researchers *paired* similar irrigated and non-irrigated sites (p. 338):
> The irrigated and non-irrigated pairs within each site were within $100$\ m of each other and were on the same soil, landform and usually the same farm with the same farm management on both treatments.
Pairing (Sect.\ \@ref(PairedIntro)) is a form of blocking (Sect.\ \@ref(ManagingConfounding)) and is used to manage confounding.
One RQ in the study was:
> For intensively grazed pastures sites, is there a mean reduction in percentage soil nitrogen (%N) when sites are irrigated, compared to non-irrigated?
The data are shown in
`r if( knitr::is_latex_output() ) {
'Table\\ \\@ref(tab:SoilCN).'
} else {
'the table below.'
}`
The parameter is $\mu_d$, the population mean *reduction* in %N when sites are irrigated, compared to non-irrigated.
::: {.tipBox .tip data-latex="{iconmonstr-info-6-240.png}"}
Explaining *how* the differences are computed is important.
The differences here are the %N in *non-irrigated* sites minus %N in *irrigated* sites.
The differences could be computed as the %N in *irrigated* sites minus %N in *non-irrigated* sites.
*Either is fine*, as long as you remain consistent throughout.
The meaning of any conclusions will be the same.
:::
```{r}
data(SoilCN)
SoilN <- subset(SoilCN,
select = c(IrrigatedN, NonirrigatedN))
SoilN$Change <- SoilN$NonirrigatedN - SoilN$IrrigatedN
Nlen <- length(SoilN$TotalCI)
if( knitr::is_latex_output() ) {
T1 <- knitr::kable(pad(SoilN[1:14, ],
surroundMaths = TRUE,
targetLength = 5,
digits = 2),
format = "latex",
valign = 't',
align = "c",
linesep = "",
col.names = c("irrigated",
"irrigated",
"reduction"),
row.names = FALSE,
escape = FALSE,
booktabs = TRUE) %>%
add_header_above( c("%N:" = 1,
"%N: Not" = 1,
"%N:" = 1),
line = FALSE,
bold = TRUE) %>%
row_spec(0, bold = TRUE)
T2 <- knitr::kable(pad(SoilN[15:28, ],
surroundMaths = TRUE,
targetLength = 5,
digits = 2),
format = "latex",
valign = 't',
align = "c",
linesep = "",
col.names = c("irrigated",
"irrigated",
"reduction"),
row.names = FALSE,
escape = FALSE,
booktabs = TRUE) %>%
add_header_above( c("%N:" = 1,
"%N: Not" = 1,
"%N:" = 1),
line = FALSE,
bold = TRUE) %>%
row_spec(0, bold = TRUE)
out <- knitr::kables(list(T1, T2),
format = "latex",
label = "SoilCN",
caption = "The percentage total soil nitrogen (\\%N) in irrigated and non-irrigated soils in $n = 28$ sites") %>%
kable_styling(font_size = 8)
out2 <- prepareSideBySideTable(out,
gap = "\\enskip")
out2
}
if( knitr::is_html_output() ) {
kable( pad(SoilN,
surroundMaths = TRUE,
targetLength = 5,
digits = 2),
format = "html",
align = "c",
booktabs = TRUE,
longtable = FALSE,
col.names = c("Irrigated", "Not irrigated", "Reduction"),
caption = "The percentage total nitrogen in irrigated and non-irrigated soils in $n = 28$ sites") %>%
row_spec(0, bold = TRUE)
}
```
The data are summarised graphically (Fig.\ \@ref(fig:SoilNGraphs)) and numerically (Table\ \@ref(tab:SoilNSummary)), using information provided by software (Fig.\ \@ref(fig:Nitrogenjamovi)).
```{r SoilNSummary}
SoilSummary <- array( dim = c(3, 4))
rownames(SoilSummary) <- c("Irrigated sites",
"Non-irrigated sites",
"Change")
colnames(SoilSummary) <- c("Mean",
"Std. dev.",
"Std. error",
"Sample size")
SoilSummary[, 1] <- colMeans(SoilN)
SoilSummary[, 2] <- apply(SoilN,
2,
"sd")
SoilSummary[, 3] <- apply(SoilN,
2,
"findStdError")
SoilSummary[, 4] <- apply(SoilN,
2,
"length")
# Do some appropriate rounding
SoilSummary <- round(SoilSummary, 6)
SoilSummary[1:2, 1] <- round(SoilSummary[1:2, 1], 6)
if( knitr::is_latex_output() ) {
knitr::kable(pad(SoilSummary,
surroundMaths = TRUE,
targetLength = c(5, 6, 6, 0),
digits = c(4, 4, 4, 0)),
format = "latex",
align = "c",
linesep = "",
caption = "The percentage total soil N in irrigated and non-irrigated sites",
col.names = c("Mean", "Std. dev.", "Std. error", "Sample size"),
row.names = TRUE,
escape = FALSE,
booktabs = TRUE) %>%
row_spec(0, bold = TRUE) %>%
row_spec(3, italic = TRUE) %>%
kable_styling(font_size = 8)
}
if( knitr::is_html_output() ) {
kable( pad(SoilSummary,
surroundMaths = TRUE,
targetLength = c(5, 6, 6, 0),
digits = c(4, 4, 4, 0)),
format = "html",
align = "c",
booktabs = TRUE,
longtable = FALSE,
col.names = c("Mean", "Std. dev.", "Std. error", "Sample size"),
caption = "The percentage total soil N in irrigated and non-irrigated sites") %>%
row_spec(0, bold = TRUE)
}
```
```{r, SoilNGraphs, fig.align="center", fig.cap="The reduction in \\%N when sites are irrigated, compared to non-irrigated. Left: A histogram of the differences (boundary values in the lower box). Right: a case-profile plot (solid lines, solid dots for lower percentage N in irrigated sites).", fig.show="hold", out.width='100%', fig.width = 9.25, fig.height = 3.5}
par( mfrow = c(1, 2))
hist(SoilN$Change,
right = FALSE,
breaks = seq(-0.15, 0.20, by = 0.05),
main = "Histogram of percentage\nsoil nitrogen reduction",
xlab = "Reduction in %N",
ylab = "No. of sites",
las = 1)
box()
plot( x = c(0.75, 2.25),
y = c(0.15, 0.7),
axes = FALSE,
type = "n",
main = "Case-profile plot of percentage\nsoil nitrogen changes",
xlab = "",
ylab = "Soil %\ N")
axis(side = 1,
at = 1:2,
labels = c("Non-Irrigated",
"Irrigated") )
axis(side = 2,
las = 1)
box()
delta <- 0.02
for (i in 1:length(SoilN$Change)){
lines( x = c(1 + ifelse(SoilN$Change[i] > 0, delta, -delta),
2 + ifelse(SoilN$Change[i] > 0, delta, -delta)),
y = c( SoilN$NonirrigatedN[i],
SoilN$IrrigatedN[i]),
lty = ifelse(SoilN$Change[i] > 0, 1, 2),
type = "b",
pch = ifelse(SoilN$Change[i] > 0, 19, 1))
}
box()
```
```{r Nitrogenjamovi, fig.cap="jamovi output for the nitrogen data", fig.align="center", out.width=c("60%","100%"), fig.show='hold'}
knitr::include_graphics("jamovi/SoilCN/SoilCN-Summary.png")
knitr::include_graphics("jamovi/SoilCN/SoilCN-Testing.png")
```
## Statistical hypotheses and notation
The RQ asks if the mean %N *reduces* in the *population* when sites are irrigated.
If the differences are defined as the %N for *non-irrigated* sites minus *irrigated* sites, then positive values refer to non-irrigated sites having larger %N than irrigated sites, on average.
Then, the RQ is asking if the mean difference is zero, or if it is *greater* than zero.
The *parameter* is the *population mean difference*.
The notation used (recapping Sect.\ \@ref(MeanDiffCI)) is:
* $\mu_d$: the mean *difference* in the *population* (in %N).
* $\bar{d}$: the mean *difference* in the *sample* (in %N).
* $s_d$: the *sample* standard deviation of the *differences* (in %N).
* $n$: the number of *differences*.
The *null* hypothesis is that 'there is *no mean change* in %N, in the population' (Sect.\ \@ref(AboutHypotheses)):
* $H_0$: $\mu_d = 0$.
This hypothesis, which we initially *assume* to be true, postulates that the mean reduction may not be zero in the *sample*, due to sampling variation.
Since the RQ asks specifically if mean %N *decreases*, the alternative hypothesis is *one-tailed* (Sect.\ \@ref(AboutHypotheses)).
According to how the differences have been defined, the alternative hypothesis is:
* $H_1$: $\mu_d > 0$ (i.e., one-tailed).
This hypothesis says that the mean change in the population is *greater than* zero, because of the wording of the RQ, and because of how the differences were defined.
(If the differences were defined in the opposite way---as '%N in *irrigated* sites minus *non-irrigated* sites'--- then the alternative hypothesis would be $\mu_d < 0$, which has the same *meaning*.)
## Tests for the mean difference: $t$-test
\index{Sampling distribution!mean difference}
The *sample* mean %N reduction varies depending on which one of the countless possible samples is randomly obtained, *even if* the mean reduction in the population is zero.
That is, the value of $\bar{d}$ varies across all possible samples even if $\mu_d = 0$.
The value of $\bar{d}$ has a sampling distribution.
::: {.definition #DEFSamplingDistributionMeanDiff name="Sampling distribution of a sample mean difference"}
The *sampling distribution of the sample mean difference* is (when certain conditions are met; Sect.\ \@ref(ValiditySampleMeanDiffTest)) described by
* an approximate normal distribution,
* centred around the sampling mean difference, whose value is $\mu_d$ (from $H_0$),
* with a standard deviation (called the *standard error* of $\bar{d}$) of
\begin{equation}
\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}},
(\#eq:StdErrorDifferencesTest)
\end{equation}
where $n$ is the number of differences, and $s_d$ is the standard deviation of the sample differences.
:::
The value of the *standard error of the differences* here is
$$
\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}} = \frac{0.0618}{\sqrt{28}} = 0.0117.
$$
This sampling distribution of $\bar{d}$ is shown in Fig.\ \@ref(fig:PercentageNSamplingDist).
```{r PercentageNSamplingDist, fig.cap="The sampling distribution is a normal distribution; it describes how the sample mean reduction in percentage N varies in samples of size $n = 28$ when the population mean reduction is $0$.", fig.align="center", fig.width=9.0, fig.height=2.5, out.width='95%'}
mn <- mean(SoilN$Change)
n <- length(SoilN$Change)
stdd <- sd(SoilN$Change)
se <- stdd/sqrt(n)
par( mar = c(4, 0.5, 0.5, 0.5) )
out <- plotNormal(0,
se,
xlab = "Sample mean reduction in %N",
cex.axis = 0.95,
ylim = c(0, 55),
showXlabels = c(
expression( -0.0352),
expression( -0.0234),
expression( -0.0117),
expression( 0 ),
expression( 0.0117),
expression( 0.0234),
expression( 0.0352) ) )
arrows(x0 = 0,
x1 = 0,
y0 = 45,
y1 = max(out$y),
angle = 15,
length = 0.1)
text(x = 0,
y = 45,
pos = 3,
labels = expression(Sampling~mean~difference))
arrows(x0 = 0,
x1 = 0 + se,
y0 = 0.30 * max(out$y),
y1 = 0.30 * max(out$y),
code = 3, # Arrows both ends
angle = 15,
length = 0.1)
text(x = 0 + (se / 2),
y = 0.30 * max(out$y),
labels = expression( atop(Std.~error,
s.e.(bar(italic(d)))==0.0117)) )
arrows(x0 = mn,
x1 = mn,
y0 = 0.7 * max(out$y),
y1 = 0,
angle = 15,
length = 0.1)
text(x = mn,
y = 0.7 * max(out$y),
pos = 3,
labels = expression(bar(italic(d)) == 0.0282) )
points(x = mn,
y = 0,
pch = 19,
cex = 0.6)
mtext(text = expression( italic(z) == 2.41),
side = 1,
at = mn,
cex = 0.8,
line = 0
)
```
The sample mean difference can be located on the sampling distribution (Fig.\ \@ref(fig:PercentageNSamplingDist)) by computing the $t$-score:\index{Hypothesis testing!mean difference}\index{Test statistic!F@$F$-score}
$$
t
= \frac{\bar{d} - \mu_{d}}{\text{s.e.}(\bar{d})}
= \frac{0.0282 - 0}{0.0117} = 2.41,
$$
following the ideas in Eq.\ \@ref(eq:tscore).
Software also displays the $t$-score (Fig.\ \@ref(fig:Nitrogenjamovi)).
A $P$-value determines if the sample data are consistent with the assumption (Table\ \@ref(tab:PvaluesInterpretation)).
Since $t = 2.41$, and since $t$-scores are like $z$-scores, the *one*-tailed $P$-value is smaller $0.025$ (based on the $68$--$95$--$99.7$ rule).\index{68@$68$--$95$--$99.7$ rule}
Software (Fig.\ \@ref(fig:Nitrogenjamovi)) reports that the *two*-tailed $P$-value is $0.02279$.
Hence, the *one*-tailed $P$-value is $0.02279/2 = 0.0114$.
::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
The software clarifies *how* the differences have been computed.
At the left of the output (Fig.\ \@ref(fig:Nitrogenjamovi)), the order implies the differences are found as `NonirrigatedN` minus `IrrigatedN`, the same as our definition.
:::
`r if (knitr::is_latex_output()) '<!--'`
<iframe src="https://learningapps.org/watch?v=pj3pt56fk22" style="border:0px;width:100%;height:500px" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>
`r if (knitr::is_latex_output()) '-->'`
## Writing conclusions
The one-tailed $P$-value is $0.0114$, suggesting moderate evidence (Table\ \@ref(tab:PvaluesInterpretation)) to support $H_1$.
A conclusion requires an *answer to the RQ*, a summary of the *evidence* leading to that conclusion, and some *summary statistics*:
> Moderate evidence exists in the sample (paired $t = 2.41$; one-tailed $P = 0.0114$) of a mean reduction in percentage soil N from non-irrigated to irrigated sites (mean reduction: $0.0282$%; $n = 28$; $95$% CI from $0.0042$% to $0.0522$%).
The CI is found using the process described in Chap.\ \@ref(PairedIntro).
The wording implies the direction of the differences.
::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Saying 'there is evidence of a difference' is insufficient.
You must state *which* measurement is, on average, higher (that is, what the differences *mean*).
:::
## Statistical validity conditions {#ValiditySampleMeanDiffTest}
\index{Sampling distribution!mean difference}
As with any hypothesis test, these results apply under certain conditions.
For a hypothesis test for the mean of paired data, these conditions are the same as for the CI for the mean difference for paired data (Sect.\ \@ref(ValidityPaired)), and similar to those for one sample mean.
Statistical validity can be assessed using these criteria:
* When $n \ge 25$, the test is statistically valid provided the distribution of differences is *not* highly skewed.
* When $n < 25$, the test is statistically valid only if the data come from a *population* of differences with a normal distribution.
The sample size of $25$ is a rough figure; some books give other values (such as $30$).
Data with severe skewness or large outliers may need a larger sample size for the test to be statistically valid.
This condition ensures that the *distribution of the sample mean differences has an approximate normal distribution* (so that, for example, the $68$--$95$--$99.7$ rule can be used).
Provided $n \ge 25$, this will be approximately true *even if* the distribution of the differences in the population does not have a normal distribution.
The units of analysis are also assumed to be independent (e.g., from a simple random sample).
If the statistical validity conditions are not met, other similar options include using a sign test of the differences\index{Sign test} or a Wilcoxon signed-rank test of the differences\index{Wilcoxon signed ranks test} [@conover2003practical], or using resampling methods [@efron2021computer].
For paired *qualitative* data, McNemar's test\index{McNemar's test} may be suitable.
::: {.example #StatisticalValidityWeightLossHT name="Statistical validity"}
For the %N data, the sample size is $n = 28$, so the test is statistically valid.
:::
## Example: invasive plants {#PairedInvasivePlantsHT}
(This study was seen in Sect.\ \@ref(PairedInvasivePlantsCI).)
Skypilot is a alpine wildflower native to the Colorado Rocky Mountains (USA).
In recent years, a willow shrub (*Salix*) has been encroaching on skypilot territory and, because willow often flowers early, @kettenbach2017shrub studied whether the willow may 'negatively affect pollination regimes of resident alpine wildflower species' (p.\ 6\ 965).
Data for both species was collected at $n = 25$ different sites, so the data are *paired* by site (Sect.\ \@ref(PairedIntro)).
The data are shown in
`r if( knitr::is_latex_output() ) {
'Table\\ \\@ref(tab:FloweringData).'
} else {
'Sect.\\ \\@ref(PairedInvasivePlantsCI).'
}`
The parameter is $\mu_d$, the population mean *difference* in day of first flowering for skypilot, less the day of first flowering for willow.
A *positive* value for the difference means that the skypilot values are larger, and hence that willow flowered first.
The RQ is:
> In the Colorado Rocky Mountains, is there a mean difference between first-flowering day for the native skypilot and encroaching willow?
The hypotheses are
$$
\text{$H_0$: $\mu_d = 0$}\quad\text{and}\quad\text{$H_1$: $\mu_d\ne 0$},
$$
where the alternative hypothesis is two-tailed, and $\mu$ is the mean difference between first-flowering day for the native skypilot and encroaching willow.
::: {.tipBox .tip data-latex="{iconmonstr-info-6-240.png}"}
Explaining *how* the differences are computed is important.
The differences here are skypilot minus willow first-flowering days.
However, the differences could be computed as willow minus skypilot first-flowering days.
*Either is fine*, as long as you remain consistent.
The *meaning* of any conclusions will be the same.
:::
The data are summarised graphically in Fig.\ \@ref(fig:FloweringPlots).
The numerical summary (Table\ \@ref(tab:FloweringSummaryHT)) and software output (Fig.\ \@ref(fig:FloweringjamoviHT)) are repeated here.
```{r FloweringjamoviHT, fig.cap="jamovi output for the flowering-day data", fig.align="center", out.width=c("100%"), fig.show='hold'}
knitr::include_graphics("jamovi/Flowering/FloweringAll.png")
```
```{r FloweringSummaryHT}
data(Flowering)
FloweringSummary <- array( dim = c(3, 4))
FloweringTab <- cbind( Flowering[, 1:2],
Change = Flowering[, 2] - Flowering[, 1])
rownames(FloweringSummary) <- c("Willow (encroaching)",
"Skypilot (native)",
"Differences")
colnames(FloweringSummary) <- c("Mean",
"Std. dev.",
"Std. error",
"Sample size")
FloweringSummary[, 1] <- colMeans(FloweringTab,
na.rm = TRUE)
FloweringSummary[, 2] <- apply(FloweringTab,
2,
"sd",
na.rm = TRUE)
FloweringSummary[, 3] <- apply(FloweringTab,
2,
"findStdError",
na.rm = TRUE)
FloweringSummary[, 4] <- apply(FloweringTab,
2,
"realLength")
# Do some appropriate rounding
FloweringSummary <- round(FloweringSummary, 4)
FloweringSummary[1:2, 1] <- round(FloweringSummary[1:2, 1], 3)
if( knitr::is_latex_output() ) {
knitr::kable(pad(FloweringSummary,
surroundMaths = TRUE,
targetLength = c(6, 6, 5, 0),
digits = c(2, 3, 3, 0)),
format = "latex",
align = "c",
linesep = "",
caption = "The day of first flowering for encroaching willow and native skypilot",
col.names = c("Mean", "Std. dev.", "Std. error", "Sample size"),
row.names = TRUE,
escape = FALSE,
booktabs = TRUE) %>%
row_spec(0, bold = TRUE) %>%
row_spec(3, italic = TRUE) %>%
kable_styling(font_size = 8)
}
if( knitr::is_html_output() ) {
kable( pad(FloweringSummary,
surroundMaths = TRUE,
targetLength = c(6, 6, 5, 0),
digits = c(2, 3, 3, 0)),
format = "html",
align = "c",
booktabs = TRUE,
longtable = FALSE,
col.names = c("Mean", "Std. dev.", "Std. error", "Sample size"),
caption = "The day of first flowering for encroaching willow and native skypilot") %>%
row_spec(0, bold = TRUE)
}
```
The standard error of the mean difference is $\text{s.e.}(\bar{d}) = 0.9396$ (Fig.\ \@ref(fig:Floweringjamovi) or Table\ \@ref(tab:FloweringSummary)).
The value of the test statistic (i.e., the $t$-score) is
\begin{align*}
t
= \frac{\bar{d} - \mu_d}{\text{s.e.}(\bar{d})}
= \frac{1.36 - 0}{0.9396} = 1.45,
\end{align*}
as in the output.
This is a relatively small value of $t$, so a large $P$-value is expected using the $68$--$95$--$99.7$ rule.
Indeed, the output shows that $P = 0.161$: there is *no evidence* of a mean difference in first-flowering day (i.e., the sample mean difference could reasonably be explained by sampling variation if $\mu_d = 0$).
We write:
> No evidence exists ($t = 1.45$; two-tailed $P = 0.161$) that the day of first-flowering is different for the encroaching willow and the native skypilot (mean difference: $1.36$ days earlier for willow; $95$% CI from $-0.57$ to $3.30$; $n = 25$).
The test is statistically valid since $n = 25$.
:::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
We *do not* say whether the evidence supports the null hypothesis.
We assume the null hypothesis is true, so we state how strong the evidence is to support the alternative hypothesis.
The current sample presents no evidence to contradict the assumption, but future evidence may emerge.
That is:
> No evidence of a difference *does not* mean evidence of no difference.
:::
::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Be clear in your conclusion about *how* the differences are computed.
Make sure to interpret the CI consistent with how the differences were defined.
:::
## Example: chamomile tea {#ChamomileTea-Paired-HT}
(This study was seen in Sects.\ \@ref(ChamomileTea-Paired-CI) and\ \@ref(ChamomileTea-TwoMeans-CI).)
@rafraf2015effectiveness studied patients with Type\ 2 diabetes mellitus (T2DM).
They randomly allocated $32$ patients into a control group (who drank hot water), and another $32$ patients to receive chamomile tea.
Summary data are shown in Table\ \@ref(tab:TGsummaryTable).
In Sect.\ \@ref(ChamomileTea-Paired-CI), a CI was formed for the mean reduction in total glucose (TG) for the hot water and the tea groups separately.
However, we can also ask about whether the mean differences in TG are non-zero due to sampling variation or not.
The following descriptive RQs can be asked:
> * For patients with T2DM, is there a mean *change* in TG after eight weeks drinking *chamomile tea*?
> * For patients with T2DM, is there a mean *change* in TG after eight weeks drinking *hot water*?
The hypotheses are (where $\mu$ represent the mean change in TG after eight weeks):
\begin{align*}
\text{Chamomile tea group}: \qquad & \text{$H_0$: } \mu_d = 0\quad \text{vs}\quad\text{$H_1$: } \mu_d \ne 0;\\
\text{Hot water group:}. \qquad & \text{$H_0$: } \mu_d = 0\quad \text{vs}\quad\text{$H_1$: } \mu_d \ne 0.
\end{align*}
For the two groups, the test statistics are (using the standard errors in Sect.\ \@ref(ChamomileTea-Paired-CI)):
$$
t_T = \frac{38.62 - 0}{5.37} = 7.19\qquad\text{and}\qquad t_W = \frac{-7.12 - 0}{6.48} = -1.10,
$$
where the subscripts $T$ and $W$ refer to the tea and hot-water groups respectively.
The $t$-score for the tea-drinking group is *huge*, so the two-tailed $P$-value will be *very small* using the $68$--$95$--$99.7$ rule, and certainly smaller than $0.001$.
This means that there is evidence that chamomile tea had an impact on the mean change in TG.
In contrast, the $t$-score for the water-drinking group is *small*, so the two-tailed $P$-value will be *large* using the $68$--$95$--$99.7$ rule, and certainly larger than $0.10$.
This means there is no evidence that placebo treatment (hot water) had any impact on mean change in TG (as one might expect for a placebo).
We can write:
> There is very strong evidence ($t = 7.19$; two-tailed $P < 0.001$) of a mean change in TG for the chamomile-drinking groups (mean reduction: $38.62$\ mg.dl^$-1$^; $n = 32$; approx. $95$% CI: $27.88$ to $49.36$\ mg.dl^$-1$^), but *no* evidence ($t = -1.10$; two-tailed $P > 0.10$) of a mean change in the hot-water drinking group (mean reduction: $-7.12$\ mg.dl^$-1$^; $n = 32$; approx. $95$% CI: $-20.08$ to $5.84$\ mg.dl^$-1$^).
The sample sizes are larger than $25$, so the results are statistically valid.
The two groups appear to show different mean reductions in TG, depending on which group the subject is in.
This may suggest that chamomile tea reduces TG compared to the control, but perhaps the difference in TG between the hot water and tea groups is simply due to sampling variation.
This is considered in Sect.\ \@ref(ChamomileTea-TwoMeans-HT).
<iframe src="https://learningapps.org/watch?v=pj3pt56fk22" style="border:0px;width:100%;height:500px" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>
## Chapter summary {#Chap29-Summary}
To test a hypothesis about a population mean difference $\mu_d$:
* Write the null hypothesis ($H_0$) and the alternative hypothesis ($H_1$).
* Initially *assume* the value of $\mu_d$ in the null hypothesis to be true.
* Then, describe the *sampling distribution*, which describes what to *expect* from the sample mean difference based on this assumption: under certain statistical validity conditions, the sample mean difference varies with:
* an approximate normal distribution,
* with sampling mean whose value is the value of $\mu_d$ (from $H_0$), and
* having a standard deviation of $\displaystyle \text{s.e.}(\bar{d}) =\frac{s_d}{\sqrt{n}}$.
* Compute the value of the *test statistic*:
$$
t = \frac{ \bar{d} - \mu}{\text{s.e.}(\bar{d})},
$$
where $\mu_d$ is the hypothesised value given in the null hypothesis.
* The $t$-value is like a $z$-score, and so an approximate *$P$-value* can be estimated using the $68$--$95$--$99.7$ rule, or found using software.
`r if (knitr::is_html_output()){
'The following short video may help explain some of these concepts:'
}`
<div style="text-align:center;">
```{r}
htmltools::tags$video(src = "./videos/PairedTTest.mp4",
width = "550",
controls = "controls",
loop = "loop",
style = "padding:5px; border: 2px solid gray;")
```
</div>
## Quick review questions {#Chap34-QuickReview}
::: {.webex-check .webex-box}
@bacho2019effects compared joint pain in stroke patients before and after a supervised exercise treatment.
The same participants ($n = 34$) were assessed *before* and *after* treatment.
The mean *improvement* in joint pain after $13$ weeks was $1.27$ (with a standard error of $0.57$) measured using a standardised tool.
1. True or false: Only 'before and after' studies can be paired. \tightlist
`r if( knitr::is_html_output() ) { torf( answer=FALSE )}`
1. True or false: The null hypothesis is about the *population* mean difference.
`r if( knitr::is_html_output() ) { torf( answer=TRUE )}`
1. What is the value of the test statistic (to two decimal places)?
`r if( knitr::is_html_output() ) { fitb( answer = 2.23, num = TRUE, tol = 0.01 )}`
1. What is the value of the two-tailed $P$-value?
`r if( knitr::is_html_output() ) {mcq(
c(answer = "Very small",
"Small",
"About 0.05",
"Large",
"Very large")
)}`
1. True or false: The 'test statistic' is a $t$-score for this test.
`r if( knitr::is_html_output() ) { torf( answer=TRUE )}`
:::
## Exercises {#TestPairedMeansExercises}
Answers to odd-numbered exercises are available in App.\ \@ref(Answers).
`r if( knitr::is_latex_output() ) "\\captionsetup{font=small}"`
::: {.exercise #MeanDiffGDiffsA}
A group of primary school children were asked to complete a certain task on both a personal computer (PC) and using a tablet computer.
If the differences were defined as the time to complete the task on the PC, minus the time to complete the same task on a tablet (one difference for each child), what do the difference *mean*?
:::
::: {.exercise #MeanDiffGDiffsB}
Suppose water quality was recorded $500$\ m upstream and $500$\ m downstream of $28$ different copper mines.
If the differences were defined as the pH downstream minus the water pH upstream for each river, what do the differences *mean*?
:::
::: {.exercise #MeanDiffGrowingSquashHT}
[*Dataset*: `Fruit`]
(These data were also seen in Exercise\ \@ref(exr:MeanDiffGrowingSquashCI).)
@mukherjee2019diversity studied the effect of rainfall on growing Chayote squash (*Sechium edule*).
They compared the size of the fruit in a year with normal rainfall (2015) compared to a dry year (2014) on $24$ farms:
> For Chayote squash grown in Bangalore, what is the mean difference in fruit weight between a normal and dry year?
Ten fruits were gathered from each farm in both years, and the average (mean) weight of the fruit recorded for the farm.
Since the same farms are used in both years, the data are *paired*
`r if( knitr::is_latex_output() ) {
'(Table\\ \\@ref(tab:FruitsData)).'
} else {
'(Exercise\\ \\@ref(exr:MeanDiffGrowingSquashCI)).'
}`
Data is missing for Farm\ 20 in the dry year (2014), so there are $n = 23$ differences.
1. Construct a suitable graph for the differences.
1. What is the parameter?
Carefully describe what it means.
1. Write down the hypotheses.
1. Compute the $t$-score.
1. Determine the $P$-value.
1. Write a conclusion.
1. Is the test statistically valid?
:::
::: {.exercise #TestPairedMeansCaptopril}
(This study was also seen in Exercise\ \@ref(exr:PairedCIExercisesCaptopril).)
In a study of hypertension [@data:hand:handbook; @data:macgregor:essential], patients were given a drug (Captopril) and their systolic blood pressure measured (in mm Hg) immediately before and two hours after being given the drug (data with Exercise\ \@ref(exr:PairedCIExercisesCaptopril)).
The aim is to see if there is evidence of a *reduction* in blood pressure after taking Captopril.
Using these data (Table\ \@ref(tab:CICaptoprilData)) and the software output (Fig.\ \@ref(fig:CaptoriljamoviHT)):
1. Explain why it is probably more sensible to compute differences as the *Before* minus the *After* measurements.
What do the differences *mean* when computed this way?
1. What is the parameter?
Carefully describe what it means.
1. Construct a suitable graph for the differences.
1. Write down the hypotheses.
1. Write down the $t$-score.
1. Write down the $P$-value.
1. Write a conclusion.
1. Is the test statistically valid?
:::
```{r CaptoriljamoviHT, fig.cap="jamovi output for the Captopril data", fig.align="center", out.width="80%", fig.show="hold"}
knitr::include_graphics("jamovi/CaptoprilAll/CaptoprilAll-PairedTOutput.png")
```
::: {.exercise #TestPairedMeansTasteOfBrocolli}
(These data were also seen in Exercise\ \@ref(exr:PairedCIExercisesBrocolli).)
People often struggle to eat the recommended intake of vegetables.
@data:Fritts2018:Vegetables explored ways to increase vegetable intake in teens.
Teens rated the taste of raw broccoli, and raw broccoli served with a specially-made dip.
Each teen ($n = 100$) had a *pair* of measurements: the taste rating of the broccoli *with* and *without* dip.
Taste was assessed using a '$100$\ mm visual analog scale', where a *higher* score means a *better* taste.
In summary:
* For raw broccoli, the mean taste rating was $56.0$ (with a standard deviation of $26.6$);
<!-- % (SDs); so if $n = 100$ we'd get SE: 2.647 -->
* For raw broccoli served with dip, the mean taste rating was $61.2$ (with a standard deviation of $28.7$).
Because the data are paired, the *differences* are the best way to describe the data.
The mean difference in the ratings was $5.2$, with standard error of $3.06$.
<!-- (working backwards from the $t$-score). Looks like $n = 101$. -->
Perform a hypothesis test to see if the use of dip increases the mean taste rating.
:::
::: {.exercise #TestPairedMeansSmokingAndExercise}
(This study was also seen in Exercise\ \@ref(exr:PairedCIExercisesSmokeExercise).)
@data:Allen2018:Smoking examined the effect of exercise on smoking.
Men and women were assessed on their 'intention to smoke', both before and after exercise for each subject (using two quantitative questionnaires).
Smokers ('smoking at least five cigarettes per day') aged\ $18$ to\ $40$ were enrolled for the study.
For the $23$ women in the study, the mean intention to smoke after exercise *reduced* by $0.66$ (with a standard error of $0.37$).
Perform a hypothesis test to determine if there is evidence of a population mean reduction in intention-to-smoke for women after exercising.
:::
::: {.exercise #TestPairedMeansFerritin}
In a study [@cressie1984use] conducted at the Adelaide Children's Hospital (p. 107; emphasis added):
> ...a group of beta thalassemia patients [...] were treated by a continuous infusion of desferrioxamine, in order to *reduce* their ferritin content...
Using the data
`r if (knitr::is_latex_output()) {
'in Table\\ \\@ref(tab:FerritinTable),'
} else {
'shown below,'
}`
conduct a hypothesis test to determine if there is evidence that the treatment reduces the ferritin content, as intended.
:::
```{r FerritinTable}
data(Ferritin) ### Exercise
FR <- Ferritin
if( knitr::is_latex_output() ) {
T1 <- kable( pad(FR[1:10, ],
surroundMaths = TRUE,
targetLength = c(4, 4, 5),
digits = 0),
format = "latex",
row.names = FALSE,
escape = FALSE,
align = "c",
col.names = c("Sept.", "March", "Reduction"),
booktabs = TRUE,
linesep = c("", "", "", "\\addlinespace"),
longtable = FALSE) %>%
row_spec(0, bold = TRUE)
T2 <- kable( pad( FR[11:20, ],
surroundMaths = TRUE,
targetLength = c(4, 4, 5),
digits = 0),
format = "latex",
row.names = FALSE,
escape = FALSE,
align = "c",
col.names = c("Sept.", "March", "Reduction"),
booktabs = TRUE,
linesep = c("", "", "", "\\addlinespace"),
longtable = FALSE) %>%
row_spec(0, bold = TRUE)
out <- knitr::kables(list(T1, T2),
format = "latex",
label = "FerritinTable",
caption = "The ferritin content (in $\\mu$g/L) for 20 thalassemia patients at the Adelaide Children's Hospital") %>%
kable_styling(font_size = 8)
out2 <- prepareSideBySideTable(out,
gap = "\\quad")
out2
}
if( knitr::is_html_output()) {
kable( pad(head(FR, 10),
surroundMaths = TRUE,
targetLength = c(4, 4, 5),
digits = 0),
format = "html",
align = "c",
longtable = FALSE,
caption = "The ferritin content (in $\\mu$g/L) for 20 thalassemia patients at the Adelaide Children's Hospital (first ten observations)",
booktabs = TRUE)
}
```
::: {.exercise #StressSurgeryHT}
(This study was also seen in Exercise\ \@ref(exr:StressSurgeryCI).)
The concentration of beta-endorphins in the blood is a sign of stress.
@hoaglin2011exploring measured the beta-endorphin concentration for $19$ patients about to undergo surgery [@data:hand:handbook].
Each patient had their beta-endorphin concentrations measured $12$--$14$ hours before surgery, and also $10$ minutes before surgery.
A numerical summary (Table\ \@ref(tab:StressDescriptivesjamoviHT)) can be produced from jamovi output.
Use the output to test the RQ.
:::
```{r StressDescriptivesjamoviHT, fig.cap="", fig.align="center", out.width='80%'}
data(Stress) ### Exercise
ST <- Stress
StressTab <- data.frame(
"Means" = c( colMeans(ST),
mean(ST$BeforeMins - ST$BeforeHours) ),
"Std deviation" = c( apply(ST, 2, "sd"),
sd(ST$BeforeMins - ST$BeforeHours) ),
"Std Error" = c( apply(ST, 2, function(x){sd(x)/sqrt(length(x))}),
sd(ST$BeforeMins - ST$BeforeHours)/sqrt(19) ),
"Sample size" = c( apply(ST, 2, length),
length(ST$BeforeMins - ST$BeforeHours) )
)
rownames(StressTab) <- c("12--14 hours before surgery",
"10 minutes before surgery",
"Increase")
if( knitr::is_latex_output() ) {
knitr::kable(pad(StressTab,
surroundMaths = TRUE,
targetLength = c(5, 5, 4, 2),
digits = c(2, 2, 2, 0)),
format = "latex",
align = c("c", "c", "c", "c"),
booktabs = TRUE,
longtable = TRUE,
escape = FALSE,
caption = "The surgery-stress data",
col.names = c("Sample mean",
"Std. dev.",
"Std. error",
"$n$"),
row.names = TRUE,
digits = 2) %>%
row_spec(0, bold = TRUE) %>%
row_spec(3, italic = TRUE) %>%
kable_styling(font_size = 8)
}
if( knitr::is_html_output() ) {
kable(StressTab,
format = "html",
booktabs = TRUE,
longtable = FALSE,
align = "r",
caption = "The surgery-stress data") %>%
column_spec(1, bold = TRUE) %>%
row_spec(1, bold = TRUE)
}
```
`r if( knitr::is_latex_output() ) "\\captionsetup{font=normalsize}"`
<!-- QUICK REVIEW ANSWERS -->
`r if (knitr::is_html_output()) '<!--'`
::: {.EOCanswerBox .EOCanswer data-latex="{iconmonstr-check-mark-14-240.png}"}
\textbf{Answers to \textit{Quick Revision} questions:}
**1.** False.
**2.** True.
**3.** $2.23$.
**4.** Very small.
**5.** True.
:::
`r if (knitr::is_html_output()) '-->'`