/
30-Sample-Size-Estimation.Rmd
executable file
·534 lines (366 loc) · 26.3 KB
/
30-Sample-Size-Estimation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
# Sample sizes for CIs {#EstimatingSampleSize}
```{r, child = if (knitr::is_html_output()) {'introductions/30-Sample-Size-Estimation-HTML.Rmd'} else {'introductions/30-Sample-Size-Estimation-LaTeX.Rmd'}}
```
## Introduction {#SampleSizeIntroduction}
\index{Sample size estimation}
A confidence interval is an interval which gives a range of values of the parameter that could plausibly have produced the observed value of the statistic.\index{Confidence intervals}
All other things being equal, a *larger* sample size gives a *more precise* (Sect.\ \@ref(PrecisionAccuracy)) estimate of the parameter.\index{Precision}
After all, that's why we prefer larger samples: to get more *precise* estimates, and hence narrower CIs.
If that was not the case, we could take the smallest, cheapest and easiest possible sample of size one... which is clearly absurd.
:::{.example #SampleSizeImpact name="Impact of sample size on CIs"}
Suppose we wish to estimate an unknown proportion, and find that $\hat{p} = 0.52$ from a sample of size $n = 25$.
The approximate $95$%\ CI is $0.52 \pm 0.200$ (so the *margin of error* is $0.200$)
If the estimate of $\hat{p} = 0.52$ was found from a sample of size $n = 100$ (rather than $n = 25$), a more precise estimate should be expected.
The approximate $95$%\ CI is $0.52\pm 0.100$; the margin of error is $0.100$.
If the estimate of $\hat{p} = 0.52$ was found from a sample of size $n = 400$, the approximate $95$%\ CI is $0.52\pm 0.050$; the margin of error is $0.050$.
At each step, the sample size was four times as large, but the margin of error was halved.
:::
```{r, child = if (knitr::is_html_output()) './children/SampleSizeCI/SampleSizeCI-HTML.Rmd'}
```
```{r, child = if (knitr::is_latex_output()) './children/SampleSizeCI/SampleSizeCI-LaTeX.Rmd'}
```
That is, improving precision gets more difficult as sample sizes get larger.
Large gains in precision are made by moderately increasing small sample sizes, but only small gains in precision are made by large increases in already-large sample sizes.
::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Remember that the sample size is the number of *units of analysis*.
:::
## General ideas {#SampleSizeIdeas}
If larger samples give more precise estimates, should the largest sample possible always be used?
Not necessarily: using large samples also has disadvantages:
* Studies with larger samples sizes take longer to complete.
* Studies with larger samples sizes are more expensive.
* Ethics committees aim to keep sample sizes as small as possible, so that:\index{Ethics}
- The environment is impacted as little as possible.
- The fewest possible animals are harmed.
- The fewest possible people are harmed or inconvenienced.
- Resources, time and money are not wasted.
::: {.example #Biochar name="The cost of research"}
@farrar2021biochar studied the residual effect of organic biochar compound fertilizers (BCFs) *two years* after application.
This study required planting turmeric in pots using soil previously treated with BCFs.
After the turmeric was grown, the concentration of potassium, phosphorus and nitrogen---as well as many trace minerals---was determined from the soil in *every* pot.
In addition, *every* turmeric plant was analysed for the number of shoots, the leaf mass fraction, and foliar nutrient information.
Clearly, every pot that is used comes with a substantial cost, both in terms of time and money.
:::
Determining the sample size to use is a trade-off between the advantages of increasing precision, and the challenges of cost, time, and remaining ethical (Chap.\ \@ref(Ethics)).
In addition, *how* the sample is obtained is important also: random samples give more *accurate* samples (Sect.\ \@ref(PrecisionAccuracy)) than non-random samples.
For these reasons, researchers usually identify a margin-of-error that is meaningful (i.e., of practical importance)\index{Practical importance} in the context of their study.
:::{.example #SampleSizeMError name="Practical importance in sample size calculations"}
In a weight-loss study, estimating the weight reduction to within $1$\ g is far more precise than is necessary: a weight loss of $1$\ g is of no practical importance, but would require a massive sample size to estimate.\index{Practical importance}
In contrast, the sample size needed to detect a weight loss to within $50$\ kg would be far smaller.
However, a weight loss so great is of no practical importance either, as most people who are looking to lose weight are hoping to lose far less than $50$\, kg.
The researchers may decide that a weight loss to within $5$\ kg is sufficient to be of practical importance, and determine the sample size based on this value.
:::
In this chapter, we learn how to compute the (approximate) minimum sample size needed to obtain a given precision (i.e., for a given *margin of error*\index{Margin of error}) for a confidence interval.
We only study the estimation of sample sizes for constructing a CI in these situations:
* Estimating a proportion: Sect.\ \@ref(SampleSizeProportions).
* Estimating a mean: Sect.\ \@ref(SampleSizeOneMean).
* Estimating a mean difference: Sect.\ \@ref(SampleSizeMeanDifferences).
* Estimating a difference between two means: Sect.\ \@ref(SampleSizeDifferenceTwoMeans).
The formulas given in this chapter only apply for *forming $95$%\ CIs*, and are very *conservative*: they will probably give *minimum* samples sizes a bit *too large*, but that is better than being too small.
In any case, sample sizes slightly larger than calculated are often used anyway, to allow for *drop outs*:\index{Drop outs} animals or plants that die; people who can no longer be contacted; and so on.
<iframe src="https://learningapps.org/watch?v=pfduds6kt22" style="border:0px;width:100%;height:500px" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>
## Sample size for estimating one proportion {#SampleSizeProportions}
\index{Sample size estimation!one proportion}
In Sect.\ \@ref(Female-Coffee-Drinkers), a CI was formed for the *population* proportion of female college students in the United States that drink coffee daily [@data:Kelpin2018:AlcoholCoffee].
From a sample of $n = 360$, the CI was $0.1694 \pm 0.0395$ (i.e., the *margin of error* is $0.0395$), or from $0.130$ to $0.209$.
To obtain a more precise estimate (i.e., a narrower CI), a larger sample is needed.
For instance, suppose we would like a CI with margin of error of $0.02$.
What size sample is needed?
Since we seek a *more* precise estimate, a *larger* sample is needed... but how much larger?
:::{.definition #SampleSizeProportion name="Sample size: proportion"}
Conservatively, the size of the *simple random sample* needed *for a $95$%\ CI for a proportion* with a specified margin-of error is *at least*
\[
\frac{1}{(\text{Margin of error})^2}.
\]
:::
For the coffee-drinking situation above, a sample size of at least $\displaystyle 1\div (0.02^2) = 2\ 500$ female college students in the US is needed.
This is a substantial increase from the original sample size of $360$.
::: {.example #SampleSizep name="Sample size calculations for one proportion"}
To estimate the population proportion of South Africans that smoke, to within $0.07$ with $95$% confidence, a sample size of at least
\[
\frac{1}{(\text{Margin of error})^2} { = \frac{1}{0.07^2}}
\]
is needed; *at least* $n = 204.0816$ people.
In practice, *at least* $205$ people are needed to achieve this desired level of precision (that is, *always round up* in sample size calculations).
:::
::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
Always *round up* the result of the sample size calculation.
:::
`r if (knitr::is_html_output()){
'The following short video may help explain some of these concepts:'
}`
<iframe width="560" height="315" src="https://www.youtube.com/embed/-fflEggczG4" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture"></iframe>
## Sample size for estimating one mean {#SampleSizeOneMean}
\index{Sample size estimation!one mean}
<!-- ```{r SampleSizeCIWidthMean, fig.align="center", out.width = "65%", fig.cap='The approximate width of a $95$\\%\ CI for a mean, when various size samples are used. No values are given on the vertical axis, as the actual values depend on the value of the standard deviation, $s$', fig.height = 4.5} -->
<!-- ME <- function(n, s = 1){ -->
<!-- 1.96 * 2 * s / sqrt( n ) -->
<!-- } -->
<!-- n <- seq(5, 50, by = 1) -->
<!-- par( mfrow = c(1, 1), -->
<!-- mar = c(4, 5, 5, 3), # LINES on each side of plot -->
<!-- oma = c(1, 1, 1, 1) ) # OUTER margins, between plots and edges of canvas -->
<!-- plot( x = c( min(n), max(n) ), -->
<!-- y = c(-2, 2), -->
<!-- type = "n", -->
<!-- pch = 19, -->
<!-- las = 1, -->
<!-- axes = FALSE, -->
<!-- xlim = c(0, 50), -->
<!-- ylim <- c(-1.9, 1.9), -->
<!-- xlab = "Sample size", -->
<!-- ylab = "Estimates", -->
<!-- main = "The width of CI by sample size\n(for estimating a mean)") -->
<!-- axis(side = 1) -->
<!-- box() -->
<!-- abline(h = 0, -->
<!-- lwd = 1, -->
<!-- col = "grey") -->
<!-- lines(n, -->
<!-- 0 + ME(n), -->
<!-- type = "b", -->
<!-- pch = 19, -->
<!-- cex = 0.75, -->
<!-- col = plot.colour, -->
<!-- lty = 1, -->
<!-- lwd = 2) -->
<!-- lines(n, -->
<!-- 0.0 - ME(n), -->
<!-- type = "b", -->
<!-- pch = 19, -->
<!-- cex = 0.75, -->
<!-- col = plot.colour, -->
<!-- lty = 1, -->
<!-- lwd = 2) -->
<!-- text(10, -->
<!-- 0.0, -->
<!-- srt = 90, -->
<!-- "Approximate width\nof 95%\ CI") -->
<!-- n.example <- 10 -->
<!-- arrows(n.example, -->
<!-- 0.0 + ME(n.example), -->
<!-- n.example, -->
<!-- 0.0 - ME(n.example), -->
<!-- length = 0.1, -->
<!-- angle = 15, -->
<!-- lwd = 2, -->
<!-- code = 3, # Draw arrowhead at both ends -->
<!-- col = "black") -->
<!-- mtext(expression(" "*Mean*","~bar(italic(x))), -->
<!-- side = 4, -->
<!-- las = 1) -->
<!-- ``` -->
:::{.definition #SampleSizeMean name="Sample size: mean"}
Conservatively, the size of the *simple random sample* needed *for a $95$%\ CI for the mean* with a specified margin-of error is *at least*
\[
\left( \frac{2 \times s}{\text{Margin of error}}\right)^2,
\]
where $s$ is an estimate of the standard deviation in the population.
:::
The formula requires a value for the sample standard deviation, $s$.
But if we don't have a sample yet... how can we have a value for the standard deviation of the *sample*?
An approximate value for $s$ is used, which can come from:
* the value of $s$ from the results of a pilot study (Sect.\ \@ref(Protocols)).\index{Pilot study}
* the results of a similar study, where the value $s$ there can be used (see Example\ \@ref(exm:SampleSizePeanuts)).
<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/tom-hermans-ZPfd3ZobOc0-unsplash.jpg" width="200px"/>
</div>
::: {.example #SampleSizePeanuts name="Sample size estimation for one mean"}
Sect.\ \@ref(Cadmium-In-Peanuts) discusses a study about the mean cadmium concentrations in peanuts in the United States, where $s = 0.0460$\ ppm [@data:Blair2017:Peanuts].
Suppose we wanted to estimate the mean cadmium concentration in *Australian* peanuts, to give-or-take $0.005$\ ppm with $95$% confidence.
We could use this value for $s$ as a starting point, and then compute:
\[
\left( \frac{2 \times 0.0460}{0.005}\right)^2 = 338.56;
\]
we would need at least $339$ peanuts.
:::
## Sample size for estimating a mean differences {#SampleSizeMeanDifferences}
\index{Sample size estimation!mean difference}
The ideas in the previous section also work for computing sample sizes for estimating *mean differences*, since the differences can be treated like a single sample.
:::{.definition #SampleSizeMeanDiff name="Sample size: mean difference"}
Conservatively, the size of the *simple random sample* needed *for a $95$%\ CI for the mean difference* with a specified margin-of error is *at least*
\[
\left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2,
\]
where $s_d$ is an estimate of the standard deviation in the population.
:::
Again, an approximate value for $s$ can come from a pilot study (Sect.\ \@ref(Protocols)),\index{Pilot study} or from the results of a similar study.
```{r}
data(Diabetes)
Diabetes$Diff <- Diabetes$SBPfirst - Diabetes$SBPsecond
n.D.diff <- length( Diabetes$Diff ) - sum( is.na(Diabetes$Diff) )
se.D.diff <- sd(Diabetes$Diff, na.rm = TRUE) / sqrt(n.D.diff)
ci.lo <- mean(Diabetes$Diff) - 2 * se.D.diff
ci.hi <- mean(Diabetes$Diff) + 2 * se.D.diff
```
::: {.example #SampleSizeWeightGain name="Sample size estimation for mean differences"}
In Sect.\ \@ref(MeanDiffCI), a CI is computed for the mean weight gain by $n = 68$ Cornell University students from Week\ 1 to Week\ 12 (@levitsky2004freshman, @DASL:WeightChange).
The CI is $0.862\pm 0.232$\ kg, where the margin of error is $0.232$\ kg.
Suppose we wanted to estimate the mean weight change at a different university; we could use the value of $s$ from this study (i.e., $s = 0.956$).
Also, suppose we wanted a more precise estimate, to give-or-take $0.15$\ kg.
For a *more precise* estimate, we would need a *larger sample*.
So we compute:
\[
\left( \frac{2 \times 0.965}{0.15}\right)^2 = 162.4775;
\]
we would need at least $163$ students after rounding up.
:::
## Sample size for estimating a difference between two means {#SampleSizeDifferenceTwoMeans}
\index{Sample size estimation!difference between means}
A formula for computing sample sizes for estimating *difference between two means* is simple if we make some assumptions:
* the sample size in each group is the same; and
* the standard deviation in each group is the same.
Formulas are available for computing sample sizes without these restrictions, but are more complicated than that given here.
:::{.definition #SampleSizeDiffBetweenTwomeans name="Sample size: difference between two means"}
Conservatively, the size of the *simple random sample* needed *for a $95$%\ CI for the difference between two means* with a specified margin-of error is *at least*
\[
2\times \left( \frac{2 \times s}{\text{Margin of error}}\right)^2
\]
for *each* sample, where $s$ is an estimate of the common standard deviation in the population for both groups.
:::
::: {.example #SampleSizeSpeeds name="Sample size estimation for difference between means"}
In Sect.\ \@ref(SpeedSignageCI), a CI is computed for difference between the mean speeds of cars before and after signage was added [@ma2019impacts].
Suppose we wanted to estimate the difference between the mean reaction times to within $5$\ km.h^-1^.
In Sect.\ \@ref(SpeedSignageCI), the two groups (before and after signage added) produced standard deviations of $13.194$ and $13.134$ (which are very similar).
We decide to use $s = 13.15$ in the sample-size calculation:
\[
2 \times \left( \frac{2 \times 13.15}{5}\right)^2 = 55.335.
\]
We would need to measure the speed of at least $56$ cars before and after the addition of signage.
:::
## Other issues related to sample size {#SampleSizeOtherIssues}
The above calculations form just one part of the information needed to make the final decision about the necessary sample size.
For example, the *cost* (time and money) of taking sample of this size has not been considered.
The calculations in this chapter assume a *simple random sample* will be used, which is often unreasonable.
Other, more complex, formulas are available for computing sample sizes for other random-sampling schemes (such as stratified samples).
However, the above calculations give an *estimate* of the *minimum* sample size required.
In addition, the calculations in this chapter are only for producing $95$% confidence intervals.
In practice, researchers often start with a slightly larger sample than calculated to allow for drop-outs\index{Drop outs} (e.g., plants die, or people withdraw from the study).
## Example: emergency residential aged care
@dwyer2021residential studied residential aged care residents in Australia needing emergency care and recorded, among other information, the average age of such residents ($\bar{x} = 85$; $s = 7.3$) and the proportion of calls related to falls ($\hat{p} = 0.156$).
Suppose a similar study was to be conducted in New Zealand.
The aim was to estimate the mean age of residents to with $2$ years of age, and the proportion of incidents related to falls to within $0.10$.
The sample size required to meet the age requirement is at least
\[
n = \left(\frac{2\times s}{\text{Margin of error}}\right)^2 = \left(\frac{2\times 7.3}{2}\right)^2 = 53.29,
\]
or at least $54$ residents (rounding up).
The sample size required to meet the falls requirement is at least
\[
n = \frac{1}{(\text{Margin of error}^2)} = \frac{1}{0.1^2} = 100.
\]
Since the same subjects are needed for both estimates, at least $100$ residents are needed.
## Chapter summary
Estimating a sample size is a compromise between increasing the precision of the estimate, and the need to remain ethical and reduce costs.
All other things being equal, making a sample size four times as large makes the confidence interval half as wide.
This means that large gains in precision are made by increasing small sample sizes, but only small gains are made by increasing already-large sample sizes.
## Quick review questions {#Chap30-QuickReview}
::: {.webex-check .webex-box}
1. True or false: A *larger* sample size produces a *more accurate* estimate of the parameter, all other things being equal. \tightlist
`r if( knitr::is_html_output() ) {torf(answer = FALSE )}`
1. True or false: A *larger* sample size produces a *more random* sample.
`r if( knitr::is_html_output() ) {torf(answer = FALSE )}`
1. True or false: We should always take the *largest* possible sample size.
`r if( knitr::is_html_output() ) {torf(answer = FALSE )}`
:::
`r if (!knitr::is_html_output()) '<!--'`
`r webexercises::hide()`
1. **TRUE**. The reason why larger sample are "better" is that they estimate the unknown population parameter with greater precision.
1. **FALSE**. The *size* of the sample, and *how* the sample was obtained, are two different issues.
1. **FALSE**. We also need to consider the cost (in terms of size and time) and ethical issues also.
`r webexercises::unhide()`
`r if (!knitr::is_html_output()) '-->'`
## Exercises {#EstimatingSampleSizeExercises}
Answers to odd-numbered exercises are available in App.\ \@ref(Answers).
::: {.exercise #SampleSizeMean1}
Suppose we need to estimate a population *mean* (with $95$% confidence), using $s = 1$.
1. What size sample is needed to estimate the population mean within\ $0.4$?
1. What size sample is needed to estimate the population mean within\ $0.2$ (that is, the confidence interval will be *half* as wide as in the first calculation)?
1. What size sample is needed to estimate the population mean within\ $0.1$ (that is, the confidence interval will be *a quarter* as wide as in the first calculation)?
1. To get an estimate *half* as wide, how many *times* more units of analysis are needed?
1. To get an estimate *a quarter* as wide, how many *times* more units of analysis are needed?
:::
::: {.exercise #SampleSizeTwoMean1}
Suppose we need to estimate a difference between two population *means* (with $95$% confidence), using $s = 8$.
1. What size samples are needed to estimate the difference between the population means to within\ $4$?
1. What size samples are needed to estimate the difference between the population means to within\ $2$ (that is, the confidence interval will be *half* as wide as in the first calculation)?
1. What size samples are needed to estimate the difference between the population means to within\ $1$ (that is, the confidence interval will be *a quarter* as wide as in the first calculation)?
1. To get an estimate *half* as wide, how many *times* more units of analysis are needed?
1. To get an estimate *a quarter* as wide, how many *times* more units of analysis are needed?
:::
::: {.exercise #SampleSizePropEating}
@data:Mann12017:UniStudents studied of the eating habits of university students in Canada (Sect.\ \@ref(exr:CIOneProportionSnacking)).
They estimated the proportion of Canadian students that ate a sufficient number of servings of grains each day.
Suppose we wished to repeat the study but for *New Zealand* university students; that is, we seek an estimate of the population proportion of New Zealand students that eat a sufficient number of servings of grains each day (with $95$% confidence).
1. What size sample is needed to estimate the proportion to give-or-take $0.01$?
2. What size sample is needed to estimate the proportion to give-or-take $0.02$?
3. What size sample is needed to estimate the proportion to give-or-take $0.10$?
4. Do you think this study would be costly, in terms of time and money?
:::
::: {.exercise #SampleSizeOneProportionAustSmokers}
We wish to estimate the population proportion of Australians that smoke.
1. Suppose we wish our $95$%\ CI to be give-or-take $0.05$.
How many Australians would need to be surveyed?
1. Suppose we wish our $95$%\ CI to be give-or-take $0.025$; that is, we wish to *halve* the width of the interval above.
How many Australians would need to be surveyed?
1. How many *times* as many Australians are needed to *halve* the width of the interval?
:::
::: {.exercise #SampleSizeMeanLungCapacity}
@data:Tager:FEV measured the lung capacity of 11-year-old girls in East Boston, using the *forced expiratory volume* (FEV) of the children (Exercise\ \@ref(exr:CIOneMeanLungCapacityInChildren)).
Suppose we wished to repeat the study, and find a $95$% confidence interval for the mean FEV for 11-year-old *Australian* girls.
Since Australian and American children might be somewhat similar, we could use (as a first approximation) the standard deviation from that study: $s = 0.43$ litres.
1. What size sample is needed to estimate the mean to give-or-take $0.02$ litres?
2. What size sample is needed to estimate the mean to give-or-take $0.05$ litres?
3. What size sample is needed to estimate the mean to give-or-take $0.10$ litres?
4. Suppose we wished to find $99$% (not $95$%) confidence interval for the mean FEV for 11-year-old *Australian* girls, to give-or-take $0.10$ litres.
Would this sample size be *larger* or *smaller* than the sample size found for a $95$% confidence interval (also with give-or-take $0.10$ litres)?
5. Do you think this study would be costly, in terms of time and money?
:::
::: {.exercise #SampleSizeOneMeanBloodLossSampleSize}
@data:Williams2007:BloodLoss asked paramedics ($n = 199$) to estimate the amount of blood loss on four different surfaces.
When the actual amount of blood spill on concrete was $1000$\ ml, the mean guess was $846.4$\ ml (with a standard deviation of $651.1$\ ml).
For a different study:
1. how many paramedics is needed to estimate the mean with an precision of give-or-take $50$\ ml?
1. how many paramedics is needed to estimate the mean with an precision of give-or-take $25$\ ml?
1. how many times greater does the sample size need to be to *halve* the width of the margin of error?
:::
::: {.exercise #SampleSizeInvasivePlants}
Skypilot is a alpine wildflower native to the Colorado Rocky Mountains (USA).
In recent years, a willow shrub has been encroaching on skypilot territory and, because willow often flowers early, @kettenbach2017shrub studied whether the willow may 'negatively affect pollination regimes of resident alpine wildflower species' (p.\ 6\,965).
Data for both species was collected at $25$ different sites, so the data are *paired* by site.
The 'first-flowering day' is the number of days since the start of the year (e.g., January\ $12$ is 'day\ $12$') when flowers were first observed.
Suppose a similar paired study was to be conducted on skypilot growing in Sierra Nevada, California.
Using the software output in Fig.\ \@ref(fig:Floweringjamovi):
1. determine the sample size needed to estimate the mean difference in first-flowering day to within two days.
1. determine the sample size needed to estimate the mean difference in first-flowering day to within three days.
:::
::: {.exercise #SampleSizeCaptopril}
@data:macgregor:essential studied treating hypertension with Captopril.
Patients had their systolic blood pressure measured (in mm Hg) immediately *before* and two hours *after* being given the drug.
A pilot study showed that the difference between the two measurements had a standard deviation of about $9$\ mm Hg.
1. Determine the sample size needed to estimate the mean reduction in *systolic* blood pressure to within $2$\ mm Hg.
1. Determine the sample size needed to estimate the mean reduction in *diastolic* blood pressure to within $1.5$\ mm Hg.
:::
::: {.exercise #SampleSizeWhales}
@agbayani2020growth studied gray whales (*Eschrichtius robustus*) and measured (among other things) the length of whales at birth.
Summary information is shown in Table\ \@ref(tab:WhaleInfo).
Suppose another research study wanted to study sperm whales, which are approximately a similar size.
1. Determine the sample size needed to estimate the difference between the mean lengths for female and male sperm whales at birth, to within $0.15$\ m.
1. Determine the sample size needed to estimate the difference between the mean lengths for female and male sperm whales at birth, to within $0.10$\ m.
1. Determine the sample size needed to estimate the difference between the mean lengths for female and male *goldfish* at birth, to within $1$\ mm.
:::
::: {.exercise #SampleSizePneumonia}
Suppose researchers are trialling a new drug to reduce the recovery time (compared to standard treatments) after contracting double pneumonia.
They conduct a pilot study, and find the standard deviation of the duration of the symptoms, in both groups, is about $s = 1.25$ days.
1. What size sample is needed to estimate the difference between the mean recovery times between the two treatments to within $1$\ day.
1. What size sample is needed to estimate the difference between the mean recovery times between the two treatments to within $0.5$\ days.
:::
<!-- QUICK REVIEW ANSWERS -->
`r if (knitr::is_html_output()) '<!--'`
::: {.EOCanswerBox .EOCanswer data-latex="{iconmonstr-check-mark-14-240.png}"}
\textbf{Answers to \textit{Quick Revision} questions:}
**1.** False.
**2.** False.
**3.** False.
:::
`r if (knitr::is_html_output()) '-->'`