-
Notifications
You must be signed in to change notification settings - Fork 14
/
skeleton.Rmd
847 lines (619 loc) · 28.2 KB
/
skeleton.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
---
title: "Outbreak report"
output: word_document
---
# Introduction to this template
This is a template which can be used to create an automated outbreak situation
report.
- It is organised by time, place and person.
- You can type normal text in white spaces (such as here) and r-code in grey
spaces (denoted by three backticks and r) (see [Rmarkdown
introduction](https://rmarkdown.rstudio.com/articles_intro.html) and
[Markdown basics](https://rmarkdown.rstudio.com/authoring_basics.html))
- Introductions and contents of sections are within square brackets "[...]" and
can be deleted as appropriate
- Examples of inline code (to automate updating numbers, e.g. line 148), can
similarly be removed/updated
- Code itself can be deleted, but as a word of caution: make sure you aren't
deleting bits where variables are created/manipulated, or at least update
them appropriatley
- For a more detailed exaplanation of this template, see [Wiki](https://github.com/R4EPI/sitrep/wiki)
## Installing and loading required packages
Several packages are required for different aspects of analysis with *R*.
You will need to install these before starting.
We will be using the following packages. Some of these packages automatically
install other packages they need to work (called dependencies).
These packages can be quite large and may take a while to download in the
field. If you have access to a USB key with these packages, it makes sense to
copy and paste the packages into Program Files/R/R-[version]/library.
```{r setup, include=FALSE}
# hide all code chunks in the output, but show errors
knitr::opts_chunk$set(echo = FALSE, error = TRUE, fig.width = 6*1.25, fig.height = 6)
# set default NA to - in output, define figure width/height
options(knitr.kable.NA = "-")
library(knitr) # for creating output doc
library(dplyr) # for cleaning/shaping data
library(ggplot2) # for plotting diagrams
# epi packages
library(sitrep) # for msf field epi functions
library(incidence) # for epicurves
library(ISOweek) # for creating epiweeks
library(epitools) # for creating 2by2 tables
#set default text size to 16 for plots
ggplot2::theme_set(theme_bw(base_size = 18))
# Set the day that defines the beginning of your epiweek.
# you can start the week on any day of the week
# (the ISO standard is to start on Monday)
aweek::set_week_start("Monday")
```
```{r read_data}
## Read data ------------------------------------
# CSV file
# linelist_raw <- rio::import("linelist.csv")
#
# Excel file
# to read in a specific sheet use "which"
# linelist_raw <- rio::import("linelist.xlsx", which = "Sheet1")
#
# Stata data file
# linelist_raw <- rio::import("linelist.dat")
#
# For password protected Excel file
# use the excel.link package
# library(excel.link)
# linelist_raw <- excel.link::xl.read.file("linelist.xlsx",
# xl.sheet = "Sheet1",
# password = askpass::askpass(prompt = "please enter file
# password"))
# load an example dataset (delete if you have your own data)
linelist_raw <- outbreaks::fluH7N9_china_2013
## Fixing variable names ----------------------
# a good first step is to assign standard column names so that subsequent code
# uses stable column names.
# in case the input data changes, you just need to fix the column mapping
# make a copy of your original dataset and name it linelist_cleaned
linelist_cleaned <- linelist_raw
# define clean variable names using clean_labels from the epitrix package
# this function is preset rules for variable naming
# for example it changes spaces and dots to "_" and characters to lowercase
cleaned_colnames <- epitrix::clean_labels(colnames(linelist_raw))
# overwrite variable names with defined clean names
colnames(linelist_cleaned) <- cleaned_colnames
# you can also change specific var names using the *rename* function
linelist_cleaned <- rename(linelist_cleaned, sex = gender)
```
```{r add_extra_data}
# THIS CAN BE DELETED IF YOUR DATA HAS THIS ALREADY A DIFFERENT DATASET
# its just to be able to demonstrate posibilities
# generate artificial lab tests, symptoms and contact vars
lab_results <- linelist_cleaned %>%
select(case_id) %>%
mutate(test_result = sample(c("Positive", "Negative"),
nrow(linelist_cleaned),
replace = TRUE),
symptoms = sample(c("Yes", "No"),
nrow(linelist_cleaned),
replace = TRUE),
contact = sample(c("Yes", "No"),
nrow(linelist_cleaned),
replace = TRUE)
)
# generate some artificial population data
population_data <- distinct(linelist_cleaned, province)
population_data$population <- as.integer(runif(nrow(population_data),
min = 10^3, max = 10^5))
```
```{r browse_data, eval = FALSE}
# Browsing data ---------------------------------
# here are a few ways to do data explorations
# view the first ten rows of data
head(linelist_cleaned, n = 10)
# view your whole dataset interactivley (in an excel style format)
## Remember that `View` needs to be written with a capital *V*
if (interactive()) View(linelist_cleaned)
# overview of variable types and contents
str(linelist_cleaned)
# gives mean, median and max values of variables
summary(linelist_cleaned)
# view unique values contained in variables
unique(linelist_cleaned$sex)
# another alternative is with the "summarytools package"
# use the dfSummary function in combination with view
# note that view is not capitalised with this package
# install.packages("summarytools")
# view(summarytools::dfSummary(linelist_cleaned))
```
```{r merge_lab_results}
# merging linelist with lab dataset
linelist_cleaned <- left_join(linelist_cleaned, lab_results,
by = "case_id")
```
```{r standardise_clean_data}
# Next, document anything to clean data. Use dplyr for that.
# make sure all date variables are formatted as dates
linelist_cleaned <- linelist_cleaned %>%
mutate_at(vars(matches("date|Date")), as.Date)
# create an age group variable by specifying categorical breaks
linelist_cleaned$age_group <- age_categories(linelist_cleaned$age,
breakers = c(0, 5, 10, 30, 50, 80))
# alternatively, create an age group variable specify a sequence
# linelist_cleaned$age_group <- age_categories(linelist_cleaned$age,
# lower = 0,
# upper = 100,
# by = 10)
# If you already have an age group variable defined, you should manually
# arrange the categories
# linelist_cleaned$age_group <- factor(linelist_cleaned$age_group,
# c("0-4y", "5-9y", "10-29y", "30-49y", "50-79y", "80+y"))
# Change the levels of a categorical variable
linelist_cleaned$sex <- recode_factor(linelist_cleaned$sex,
f = "Female",
m = "Male")
# create a case definition variable
# the tilda (~) is used to assign the new values (Conf, prob, susp, unknown)
linelist_cleaned <- linelist_cleaned %>%
mutate(case_def = case_when(
test_result == "Positive" ~ "Confirmed",
test_result == "Negative" & symptoms == "Yes" ~ "Probable",
test_result == "Negative" ~ "Suspected",
TRUE ~ "Unknown"
))
# fix any misspellings in the data
# linelist_cleaned <- linelist_cleaned %>%
# mutate(province = recode(province,
# # List all incorrect mis-spellings here
# "Peking" = "Beijing"
# ))
# create an epiweek variable
# floor_day shortens to only give you the week number (rather than including day as well)
# factor includes all weeks between the min and max as levels (useful for zero count weeks)
linelist_cleaned$epiweek <- aweek::date2week(linelist_cleaned$date_of_onset,
floor_day = TRUE,
factor = TRUE)
# ... TODO: add some snippets for cleaing data
# TODO: showcase and recommend the linelist package
```
```{r remove_personally_identifiable_information}
# You might want to remove columns and other personal data
# remove a hypothetical variable called "name"
# this var doesnt actually exist in our dataset
linelist_cleaned$name <- NULL
```
### Person
* [Who is affected: how many in total; male or female; young, adult or old? What are the links between affected people – work place, school, social gathering? Is there a high rate of illness in contacts? Is there a high rate of illness in health workers? You may want to include: a bar chart showing case numbers or incidence by age group and sex; attack rates (AR); and numbers of deaths (in suspected and confirmed cases), mortality rates and/or case fatality ratio (CFR)]
In total there were `r nrow(linelist_cleaned)` cases. There were
`r linelist_cleaned %>% filter(sex == "Female") %>% count()` females affected and
`r linelist_cleaned %>% filter(sex == "Male") %>% count()` males.
The most affected age group was `r descriptive(linelist_cleaned, "age_group") %>% slice(which.max(n)) %>% select(age_group)` years.
#### Age
Cases by sex
```{r describe_by_sex}
# get counts and proportions of cases by sex
descriptive(linelist_cleaned, "sex") %>%
# change table column names
# rename( new variable name = old variable name)
rename("Sex" = sex, "Cases (n)" = n,"Proportion (%)" = prop) %>%
kable(digits = 2)
```
Cases by age group.
```{r describe_by_age_group}
# get counts and proportions by age group
descriptive(linelist_cleaned, "age_group") %>%
# change variable names
rename("Age group (years)" = age_group,
"Cases (n)" = n,
"Proportion (%)" = prop) %>%
kable(digits = 2)
```
Cases by age group and definition
```{r describe_by_age_group_and_def}
# get counts and props of age groups by case definition
# include column and row totals
descriptive(linelist_cleaned, "age_group", "case_def", coltotals = TRUE, rowtotals = TRUE) %>%
rename("Age group (years)" = age_group) %>%
rename_redundant(prop = "%") %>%
augment_redundant("_n$" = " cases (n)") %>%
kable(digits = 2)
```
Cases by age group and sex
```{r describe_by_age_group_and_sex}
descriptive(linelist_cleaned, "age_group", "sex") %>%
rename("Age group (years)" = age_group) %>%
rename_redundant(prop = "%") %>%
augment_redundant("_n$" = " cases (n)") %>%
kable(digits = 2)
```
```{r filter_case_def_describe, eval = FALSE}
# you can also subset a descriptive table
# for example to only have confirmed cases
filter(linelist_cleaned, case_def == "Confirmed") %>%
descriptive("age_group", "sex") %>%
rename("Age group (years)" = age_group) %>%
rename_redundant(prop = "%") %>%
augment_redundant("_n$" = " cases (n)") %>%
kable(digits = 2)
# alternatively you could show a single age group
filter(linelist_cleaned, age_group == "10-29") %>%
descriptive("age_group", "sex") %>%
rename("Age group (years)" = age_group) %>%
rename_redundant(prop = "%") %>%
augment_redundant("_n$" = " cases (n)") %>%
kable(digits = 2)
```
Age pyramid
```{r age_pyramid, warning=FALSE}
# plot age pyramid
plot_age_pyramid(linelist_cleaned, age_group = "age_group", split_by = "sex") +
labs(x = "Cases (n)", y = "Age group (years)") + # change axis labels
theme(legend.position = "bottom", legend.title = element_blank()) # remove title and move legend
```
CFR
The case fatality ratio among those with known outcomes is below
```{r overall_cfr}
# use arguments from above to produce overal CFR
linelist_cleaned %>%
filter(!is.na(outcome)) %>% # remove rows with missing outcome
summarise(deaths = sum(outcome == "Death"), # tally deaths
population = n()) %>% # count population
do(case_fatality_rate(.$deaths, .$population)) %>% # calculate case fatality rate
rename("Deaths" = deaths,
"Population" = population,
"CFR (%)" = cfr,
"Lower 95% CI" = lower,
"Upper 95% CI" = upper) %>%
knitr::kable(digits = 2) # print nicely with 2 digits
```
CFR by sex
```{r cfr_by_sex}
# group by known outcome and sex
linelist_cleaned %>%
filter(!is.na(outcome)) %>% # remove rows with missing outcome
group_by(sex) %>% # group by sex
summarise(deaths = sum(outcome == "Death"), # tally deaths
population = n()) %>% # tally population
do(bind_cols(sex = .$sex, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
arrange(desc(lower)) %>% # sort by lower confidence interval
rename("Sex" = sex,
"Deaths" = deaths,
"Population" = population,
"CFR (%)" = cfr,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
knitr::kable(digits = 2)
```
CFR by age group
```{r cfr_by_age_group}
# group by known outcome and agegroup
linelist_cleaned %>%
filter(!is.na(outcome)) %>% # remove rows with missing outcome
group_by(age_group) %>% # group by age_group
summarise(deaths = sum(outcome == "Death"), # tally deaths
population = n()) %>% # tally population
do(bind_cols(age_group = .$age_group, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
arrange(desc(lower)) %>% # sort by lower confidence interval
tidyr::complete(age_group) %>% # Ensure all levels are represented
rename("Age group (years)" = age_group,
"Deaths" = deaths,
"Population" = population,
"CFR (%)" = cfr,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
knitr::kable(digits = 2)
```
CFR by case definition
```{r cfr_by_case_def}
# group by known outcome and case definition
linelist_cleaned %>%
filter(!is.na(outcome)) %>% # remove rows with missing outcome
group_by(case_def) %>% # group by case_def
summarise(deaths = sum(outcome == "Death"), # tally deaths
population = n()) %>% # tally population
do(bind_cols(case_def = .$case_def, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
arrange(desc(lower)) %>% # sort by lower confidence interval
tidyr::complete(case_def) %>% # Ensure all levels are represented
rename("Case definition" = case_def,
"Deaths" = deaths,
"Population" = population,
"CFR (%)" = cfr,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
knitr::kable(digits = 2)
```
#### Attack rate
The attack rate per 100,000 population is below - based on available population data for the whole country.
```{r collect_variables}
# define population
population <- sum(population_data$population)
deaths <- filter(linelist_cleaned, outcome == "Death") %>% nrow()
# counts and cummulative counts by week
cases <- count(linelist_cleaned, epiweek) %>%
mutate(cummulative = cumsum(n))
```
```{r attack_rate}
ar <- attack_rate(nrow(linelist_cleaned), population, multiplier = 100000)
rename(ar,
"Cases (n)" = cases,
"Population" = population,
"AR (per 100,000)" = ar,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
knitr::kable(digits = 2)
```
Here, we can see that the Attack Rate for a population of `r format(population, big.mark = ",")` was `r fmt_ci_df(ar)`.
The below gives the attack rate per week.
```{r attack_rate_per_week}
# attack rate for each week
attack_rate(cases$n, population, multiplier = 100000) %>%
# add the epiweek column to table
bind_cols(select(cases, epiweek), .) %>%
rename("Epiweek" = epiweek,
"Cases (n)" = cases,
"Population" = population,
"AR (per 100,000)" = ar,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
knitr::kable(digits = 2)
```
The below gives the cummulative attack rate per week.
```{r cummulative_attack_rate_per_week}
# cummulative attack rate by week
ar <- attack_rate(cases$cummulative, population, multiplier = 100000) %>%
# add the epiweek column to table
bind_cols(select(cases, epiweek), .) %>%
rename("Epiweek" = epiweek,
"Cases (n)" = cases,
"Population" = population,
"AR (per 100,000)" = ar,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
knitr::kable(digits = 2)
```
#### Mortality
Mortality rate per 10,000:
```{r mortality_rate}
mortality_rate(deaths, population, multiplier = 10^4) %>%
rename("Deaths" = deaths,
"Population" = population,
"Mortality (per 10,000)" = `mortality per 10 000`,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
kable(digits = 2)
```
#### 2x2 tables
```{r two_by_two, message = FALSE, warning = FALSE, results = 'asis'}
outcome <- linelist_cleaned$outcome == "Death"
is_male <- linelist_cleaned$sex == "Male"
is_child <- as.integer(linelist_cleaned$age) <= 12
univariate_analysis(measure = "OR",
digits = 3,
mergeCI = TRUE,
verbose = FALSE,
outcome = outcome,
is_male,
is_child) %>%
rename("Exposure" = exposure,
"Exp. cases (n)" = exp_cases,
"Unexp. cases (n)" = unexp_cases,
"Exp. controls (n)" = exp_noncases,
"Unexp. controls (n)" = unexp_noncases,
"OR" = estimate,
"95%CI" = ci
) %>%
# clean names of exposures
mutate(Exposure = recode(Exposure,
is_male = "Sex (male)",
is_child = "Age (<12yrs)")) %>%
huxtable::as_huxtable(add_colnames = TRUE) %>% # convert to a pretty table
huxtable::set_col_width(c(3, 1, 1, 1, 1, 1, 2)) %>% # set column widths so that exposure and CI are maximized
huxtable::print_md(max_width = 100) # print as a markdown table (so it can be converted)
```
### Time
* [When did the cases fall ill? Are numbers increasing or stable? You may want to include an Epi curve (bar chart showing number of new (suspected and confirmed) cases each day/week) ]
There were `r sum(is.na(linelist_cleaned$date_of_onset))` cases missing dates of onset.
```{r create_incidence, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset, interval = 7)
```
The peak of the outbreak was in `r ISOweek(find_peak(inc_week_7))`
```{r epicurve, message = FALSE}
# plot your epicurve
plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) +
scale_y_continuous(expand = c(0,0)) + # set origin for axes
theme_classic() + # give classic black/white graph
# add labels to axes and below chart
labs(x = "Calendar week", y = "Cases (n)",
captions = "Source: MoH of China data on xx/yy/zzzz") +
# change visuals of dates and remove legend title
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
legend.title = element_blank())
```
You may also want to stratify by gender.
```{r incidence_by_gender, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset,
interval = 7,
groups = linelist_cleaned$sex)
plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) +
labs(x = "Calendar week", y = "Cases (n)") +
scale_y_continuous(expand = c(0,0)) + # set origin for axes
theme_classic() + # give classic black/white graph
# add labels to axes and below chart
labs(x = "Calendar week", y = "Cases (n)",
captions = "Source: MoH of China data on xx/yy/zzzz") +
# change visuals of dates, remove legend title and move legend to bottom
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
legend.title = element_blank(), legend.position = "bottom")
```
You could similarly stratify by case definition (or any other categorical variable!)
```{r incidence_by_case_def, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset,
interval = 7,
groups = linelist_cleaned$case_def)
plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) +
labs(x = "Calendar week", y = "Cases (n)") +
scale_y_continuous(expand = c(0,0)) + # set origin for axes
theme_classic() + # give classic black/white graph
# add labels to axes and below chart
labs(x = "Calendar week", y = "Cases (n)",
captions = "Source: MoH of China data on xx/yy/zzzz") +
# change visuals of dates, remove legend title and move legend to bottom
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
legend.title = element_blank(), legend.position = "bottom")
```
Alternatively, you could stratify by sex among a subset of only confirmed cases.
```{r incidence_by_sex_confirmed, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset[linelist_cleaned$case_def == "Confirmed"],
interval = 7,
groups = linelist_cleaned$sex[linelist_cleaned$case_def == "Confirmed"])
plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) +
labs(x = "Calendar week", y = "Cases (n)") +
scale_y_continuous(expand = c(0,0)) + # set origin for axes
theme_classic() + # give classic black/white graph
# add labels to axes and below chart
labs(x = "Calendar week", y = "Cases (n)",
captions = "Source: MoH of China data on xx/yy/zzzz") +
# change visuals of dates, remove legend title and move legend to bottom
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
legend.title = element_blank(), legend.position = "bottom")
```
### Place
* [Across what area: one or several villages, all from same school, etc. You may want to include a map of the distribution of cases; attack rates by location]
#### Maps
If you do not have spatial data available, it may be worth calculating attack rates by region.
```{r attack_rate_by_region}
cases <- count(linelist_cleaned, province) %>% # cases for each week
left_join(population_data, by = "province") # merge population data
# attack rate for each week
attack_rate(cases$n, cases$population, multiplier = 100000) %>%
# add the epiweek column to table
bind_cols(select(cases, province), .) %>%
rename("Province" = province,
"Cases (n)" = cases,
"Population" = population,
"AR (per 100,000)" = ar,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
kable(digits = 2, format.args = list(big.mark = ",")) # set thousands separator
```
The following is one example of how to use and display spatial data.
```{r spatial_packages, message = FALSE, warning = FALSE}
# spatial packages
# library(raster) # for downloading GADM shapefiles
library(ggspatial) # for plotting maps and downloading tiles
library(sf) # for manipulating spatial objects easily
```
```{r download_spatial_maps, message=FALSE}
# reading in a shapefile
# shapefiles consist of multiple files
# so you dont need to specify the file type (reas_sf recognises it for you)
# map <- read_sf(here("mapfolder", "china"))
# download administrative boundaries
## view ISO3 codes for countries
# raster::getData("ISO3")
## retrieve province boundaries from the Global Administrative
## level = 1 specifies provinces
## must be possible to do this as sf directly no? Is available on GADM.org
map <- raster::getData("GADM", country = "CN", level = 1)
## changing GADM to a sf object
map <- st_as_sf(map)
## check the CRS
# st_crs(map)
## set the CRS if not present using EPSG value
# map <- st_set_crs(map, value = 4326) # Sets to WGS84
## Transform to a different CRS such as UTM, search online for relvant EPSG value
# map_otherprojection <- st_transform(map, value = 32646)
```
```{r subset_shapefiles, message = FALSE, warning = FALSE}
# subsetting shapefiles
# Subset map to provinces of interest
suppressWarnings({
mapsub <- map %>%
filter(NAME_1 %in% unique(linelist_cleaned$province)) %>%
sf::st_simplify(preserveTopology = TRUE, dTolerance = 0.05)
})
```
```{r random_points, message = FALSE}
# get random points in provinces occuring
## CAN BE DELETED ONCE WE USE A BETTER DATASET
## stupid work around because st_sample hasnt implemented exact number points yet
## get points
a <- st_sample(mapsub, nrow(linelist_cleaned), type = "random") %>%
st_cast("POINT") %>%
st_coordinates() %>%
data.frame() %>%
setNames(c("lon", "lat"))
## fix if too many or too few points
if (nrow(a) < nrow(linelist_cleaned)) {
b <- matrix(rep.int(c(NA, NA), nrow(linelist_cleaned) - nrow(a)), ncol = 2)
colnames(b) <- c("lon", "lat")
a <- rbind(a, b)
}
if (nrow(a) > nrow(linelist_cleaned)) {
a <- a[1:nrow(linelist_cleaned), ]
}
## merge to linelist
linelist_cleaned <- bind_cols(linelist_cleaned, a)
```
```{r plot_map, message = FALSE}
# downloading basemap tiles
## This requires running the plot first to download the tiles (basemap)
## Or you need to tell the command where to look for the map tile in the cachedir argument
# plot your basemap
base <- ggplot() +
annotation_map_tile(zoom = NULL, progress = "text", cachedir = "maps/tiles") + # osm tiles
geom_sf(data = mapsub, fill = NA, col = "grey50") + # shapefile as polygon
annotation_scale() # add a scalebar
```
```{r create_sf_from_linelist, message = FALSE}
# make linelist available for plotting
## this could probably be done directly in linelist_cleaned
## unsure how other functions would react to an sf + dataframe obj though
## It doesn't allow for missing values in the coordinates.
cases <- linelist_cleaned %>%
filter(!is.na(lat) & !is.na(lon)) %>% # Remove missing coordinates
st_as_sf(coords = c("lon", "lat"), crs = 4326)
```
##### Dot maps
```{r dot_maps, message = FALSE, warning = FALSE}
# dotmap
ggplot() +
annotation_map_tile(zoom = NULL, cachedir = "maps/tiles") + # osm tiles
geom_sf(data = mapsub, fill = NA, col = "grey50") + # shapefile as polygon
geom_sf(data = cases, aes(colour = case_def, fill = case_def)) + # cases as points
annotation_scale() + # add a scalebar
theme_void() # remove coordinates and axes
```
##### Choropleth maps
```{r choropleth_maps, message = FALSE, warning = FALSE}
# choropleth
## get counts by provinces
counts <- count(linelist_cleaned, province)
## merge population and get AR per 100000
counts <- left_join(counts, population_data, by = "province") %>%
mutate(AR = n/population * 100000)
## add counts to map data
mapsub <- left_join(mapsub, counts, by = c("NAME_1" = "province"))
ggplot() +
annotation_map_tile(zoom = NULL, cachedir = "maps/tiles") + # osm tiles
geom_sf(data = mapsub, aes(fill = AR), col = "grey50") + # shapefile as polygon
annotation_scale() + # add a scalebar
scale_fill_viridis_c(option = "C") + # color the scale to be perceptually uniform
theme_void() # remove coordinates and axes
```
#### Mortality rate per district
```{r mortality_rate_per_district}
linelist_cleaned %>%
filter(!is.na(outcome)) %>% # remove missing outcomes
group_by(province) %>% # group the provinces
summarise(deaths = sum(outcome == "Death"), # tally deaths
population = n()) %>% # tally population
do(bind_cols(province = .$province, mortality_rate(.$deaths, .$population, multiplier = 10^3))) %>% # calculate mortality rate
arrange(desc(lower)) %>% # sort by lower confidence interval
tidyr::complete(province) %>% # Ensure all levels are represented
rename("Province" = province,
"Number of deaths" = deaths,
"Population" = population,
"Mortality per 1,000" = `mortality per 1 000`,
"Lower 95% CI" = lower,
"Upper 95% CI" = upper
) %>%
kable(digits = 3, format.args = list(big.mark = ",")) # set thousands separator
```