inst/rmarkdown/templates/outbreak/skeleton/skeleton.Rmd

---
title: "Outbreak report"
output: word_document
---


# Introduction to this template

This is a template which can be used to create an automated outbreak situation
report.

- It is organised by time, place and person. 
- You can type normal text in white spaces (such as here) and r-code in grey
    spaces (denoted by three backticks and r) (see [Rmarkdown
    introduction](https://rmarkdown.rstudio.com/articles_intro.html) and
    [Markdown basics](https://rmarkdown.rstudio.com/authoring_basics.html))
- Introductions and contents of sections are within square brackets "[...]" and
    can be deleted as appropriate
- Examples of inline code (to automate updating numbers, e.g. line 148), can
    similarly be removed/updated
- Code itself can be deleted, but as a word of caution: make sure you aren't
    deleting bits where variables are created/manipulated, or at least update
    them appropriatley
- For a more detailed exaplanation of this template, see [Wiki](https://github.com/R4EPI/sitrep/wiki)

## Installing and loading required packages 

Several packages are required for different aspects of  analysis with *R*. 
You will need to install these before starting. 
We will be using the following packages. Some of these packages automatically
install other packages they need to work (called dependencies).

These packages can be quite large and may take a while to download in the
field. If you have access to a USB key with these packages, it makes sense to
copy and paste the packages into Program Files/R/R-[version]/library.


```{r setup, include=FALSE}
# hide all code chunks in the output, but show errors
knitr::opts_chunk$set(echo = FALSE, error = TRUE, fig.width = 6*1.25, fig.height = 6)
# set default NA to - in output, define figure width/height
options(knitr.kable.NA = "-")


library(knitr) # for creating output doc
library(dplyr) # for cleaning/shaping data
library(ggplot2) # for plotting diagrams

# epi packages
library(sitrep) # for msf field epi functions
library(incidence) # for epicurves
library(ISOweek) # for creating epiweeks
library(epitools) # for creating 2by2 tables

#set default text size to 16 for plots
ggplot2::theme_set(theme_bw(base_size = 18))

# Set the day that defines the beginning of your epiweek.
# you can start the week on any day of the week
# (the ISO standard is to start on Monday) 
aweek::set_week_start("Monday")
```


```{r read_data}
## Read data ------------------------------------
# CSV file
# linelist_raw <- rio::import("linelist.csv")
#
# Excel file
# to read in a specific sheet use "which"
# linelist_raw <- rio::import("linelist.xlsx", which = "Sheet1")
#
# Stata data file
# linelist_raw <- rio::import("linelist.dat")
#
# For password protected Excel file 
# use the excel.link package 
# library(excel.link)
# linelist_raw <- excel.link::xl.read.file("linelist.xlsx",
#                                          xl.sheet = "Sheet1",
#                                          password = askpass::askpass(prompt = "please enter file
#                                                                      password"))

# load an example dataset (delete if you have your own data)
linelist_raw <- outbreaks::fluH7N9_china_2013

## Fixing variable names ----------------------

# a good first step is to assign standard column names so that subsequent code
# uses stable column names. 
# in case the input data changes, you just need to fix the column mapping


# make a copy of your original dataset and name it linelist_cleaned
linelist_cleaned <- linelist_raw

# define clean variable names using clean_labels from the epitrix package
# this function is preset rules for variable naming 
# for example it changes spaces and dots to "_" and characters to lowercase
cleaned_colnames <- epitrix::clean_labels(colnames(linelist_raw))

# overwrite variable names with defined clean names
colnames(linelist_cleaned) <- cleaned_colnames

# you can also change specific var names using the *rename* function
linelist_cleaned <- rename(linelist_cleaned, sex = gender)
```


```{r add_extra_data}
# THIS CAN BE DELETED IF YOUR DATA HAS THIS ALREADY A DIFFERENT DATASET 
# its just to be able to demonstrate posibilities

# generate artificial lab tests, symptoms and contact vars
lab_results <- linelist_cleaned %>% 
                select(case_id) %>% 
                mutate(test_result = sample(c("Positive", "Negative"),
                                            nrow(linelist_cleaned), 
                                            replace = TRUE), 
                       symptoms = sample(c("Yes", "No"),
                                            nrow(linelist_cleaned), 
                                            replace = TRUE),
                       contact = sample(c("Yes", "No"),
                                            nrow(linelist_cleaned), 
                                            replace = TRUE)
                )

# generate some artificial population data
population_data <- distinct(linelist_cleaned, province)
population_data$population <- as.integer(runif(nrow(population_data), 
                                          min = 10^3, max = 10^5))
```


```{r browse_data, eval = FALSE}
# Browsing data ---------------------------------
# here are a few ways to do data explorations 

# view the first ten rows of data
head(linelist_cleaned, n = 10)

# view your whole dataset interactivley (in an excel style format)
## Remember that `View` needs to be written with a capital *V*
if (interactive()) View(linelist_cleaned)

# overview of variable types and contents
str(linelist_cleaned)

# gives mean, median and max values of variables
summary(linelist_cleaned)

# view unique values contained in variables 
unique(linelist_cleaned$sex)

# another alternative is with the "summarytools package"
# use the dfSummary function in combination with view
# note that view is not capitalised with this package
# install.packages("summarytools")
# view(summarytools::dfSummary(linelist_cleaned))
```


```{r merge_lab_results}
# merging linelist with lab dataset 
linelist_cleaned <- left_join(linelist_cleaned, lab_results, 
                              by = "case_id")
```


```{r standardise_clean_data}
# Next, document anything to clean data. Use dplyr for that.

# make sure all date variables are formatted as dates 
linelist_cleaned <- linelist_cleaned %>%
  mutate_at(vars(matches("date|Date")), as.Date)


# create an age group variable by specifying categorical breaks
linelist_cleaned$age_group <- age_categories(linelist_cleaned$age, 
                                             breakers = c(0, 5, 10, 30, 50, 80))

# alternatively, create an age group variable specify a sequence
# linelist_cleaned$age_group <- age_categories(linelist_cleaned$age,
#                                              lower = 0, 
#                                              upper = 100, 
#                                              by = 10)

# If you already have an age group variable defined, you should manually
# arrange the categories
# linelist_cleaned$age_group <- factor(linelist_cleaned$age_group,
#                                      c("0-4y", "5-9y", "10-29y", "30-49y", "50-79y", "80+y"))

# Change the levels of a categorical variable
linelist_cleaned$sex <- recode_factor(linelist_cleaned$sex, 
                                      f = "Female", 
                                      m = "Male")

# create a case definition variable 
# the tilda (~) is used to assign the new values (Conf, prob, susp, unknown)
linelist_cleaned <- linelist_cleaned %>%
  mutate(case_def = case_when(
    test_result == "Positive"                     ~ "Confirmed",
    test_result == "Negative" & symptoms == "Yes" ~ "Probable",
    test_result == "Negative"                     ~ "Suspected",
    TRUE                                          ~ "Unknown"
  ))

# fix any misspellings in the data
# linelist_cleaned <- linelist_cleaned %>%
#   mutate(province = recode(province,
#     # List all incorrect mis-spellings here
#     "Peking" = "Beijing"
#   ))

# create an epiweek variable
# floor_day shortens to only give you the week number (rather than including day as well) 
# factor includes all weeks between the min and max as levels (useful for zero count weeks)
linelist_cleaned$epiweek <- aweek::date2week(linelist_cleaned$date_of_onset, 
                                             floor_day = TRUE, 
                                             factor = TRUE)

# ... TODO: add some snippets for cleaing data
# TODO: showcase and recommend the linelist package
```


```{r remove_personally_identifiable_information}
# You might want to remove columns and other personal data

# remove a hypothetical variable called "name" 
# this var doesnt actually exist in our dataset
linelist_cleaned$name <- NULL
```


### Person

* [Who is affected: how many in total; male or female; young, adult or old? What are the links between affected people – work place, school, social gathering?  Is there a high rate of illness in contacts?  Is there a high rate of illness in health workers? You may want to include:  a bar chart showing case numbers or incidence by age group and sex; attack rates (AR); and numbers of deaths (in suspected and confirmed cases), mortality rates and/or case fatality ratio (CFR)]  


In total there were `r nrow(linelist_cleaned)` cases. There were
`r linelist_cleaned %>% filter(sex == "Female") %>% count()` females affected and
`r linelist_cleaned %>% filter(sex == "Male") %>% count()` males. 

The most affected age group was `r descriptive(linelist_cleaned, "age_group") %>% slice(which.max(n)) %>% select(age_group)` years. 


#### Age

Cases by sex

```{r describe_by_sex}
# get counts and proportions of cases by sex
descriptive(linelist_cleaned, "sex") %>%
  # change table column names 
  # rename( new variable name = old variable name)
  rename("Sex" = sex, "Cases (n)" = n,"Proportion (%)" =  prop) %>% 
  kable(digits = 2)
```

Cases by age group.


```{r describe_by_age_group} 
# get counts and proportions by age group
descriptive(linelist_cleaned, "age_group") %>%
  # change variable names 
  rename("Age group (years)" = age_group, 
         "Cases (n)" = n,
         "Proportion (%)" = prop) %>% 
  kable(digits = 2)
```


Cases by age group and definition 

```{r describe_by_age_group_and_def}
# get counts and props of age groups by case definition 
# include column and row totals 
descriptive(linelist_cleaned, "age_group", "case_def", coltotals = TRUE, rowtotals = TRUE) %>% 
  rename("Age group (years)" = age_group) %>%
  rename_redundant(prop = "%") %>%
  augment_redundant("_n$" = " cases (n)") %>%
  kable(digits = 2)
```


Cases by age group and sex 

```{r describe_by_age_group_and_sex}
descriptive(linelist_cleaned, "age_group", "sex") %>% 
  rename("Age group (years)" = age_group) %>%
  rename_redundant(prop = "%") %>%
  augment_redundant("_n$" = " cases (n)") %>%
  kable(digits = 2)
```


```{r filter_case_def_describe, eval = FALSE}

# you can also subset a descriptive table

# for example to only have confirmed cases 
filter(linelist_cleaned, case_def == "Confirmed") %>%
  descriptive("age_group", "sex") %>% 
  rename("Age group (years)" = age_group) %>%
  rename_redundant(prop = "%") %>%
  augment_redundant("_n$" = " cases (n)") %>%
  kable(digits = 2)


# alternatively you could show a single age group 
filter(linelist_cleaned, age_group == "10-29") %>%
  descriptive("age_group", "sex") %>% 
  rename("Age group (years)" = age_group) %>%
  rename_redundant(prop = "%") %>%
  augment_redundant("_n$" = " cases (n)") %>%
  kable(digits = 2)
```


Age pyramid

```{r age_pyramid, warning=FALSE}
# plot age pyramid 
plot_age_pyramid(linelist_cleaned, age_group = "age_group", split_by = "sex") + 
  labs(x = "Cases (n)", y = "Age group (years)") +                    # change axis labels
  theme(legend.position = "bottom", legend.title = element_blank())   # remove title and move legend
```


CFR 

The case fatality ratio among those with known outcomes is below

```{r overall_cfr}
# use arguments from above to produce overal CFR
linelist_cleaned %>% 
  filter(!is.na(outcome)) %>%                     # remove rows with missing outcome
  summarise(deaths = sum(outcome == "Death"),    # tally deaths
            population = n()) %>%                # count population
  do(case_fatality_rate(.$deaths, .$population)) %>% # calculate case fatality rate
  rename("Deaths" = deaths,
         "Population" = population,
         "CFR (%)" = cfr,
         "Lower 95% CI" = lower,
         "Upper 95% CI" = upper) %>%
  knitr::kable(digits = 2)                       # print nicely with 2 digits
```

CFR by sex 

```{r cfr_by_sex}
# group by known outcome and sex
linelist_cleaned %>% 
  filter(!is.na(outcome)) %>%                     # remove rows with missing outcome
  group_by(sex) %>%                              # group by sex
  summarise(deaths = sum(outcome == "Death"),    # tally deaths
            population = n()) %>%                # tally population
  do(bind_cols(sex = .$sex, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
  arrange(desc(lower)) %>%                       # sort by lower confidence interval
  rename("Sex" = sex, 
         "Deaths" = deaths, 
         "Population" = population, 
         "CFR (%)" = cfr, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  knitr::kable(digits = 2)
```


CFR by age group

```{r cfr_by_age_group}
# group by known outcome and agegroup 
linelist_cleaned %>% 
  filter(!is.na(outcome)) %>%                     # remove rows with missing outcome
  group_by(age_group) %>%                        # group by age_group
  summarise(deaths = sum(outcome == "Death"),    # tally deaths
            population = n()) %>%                # tally population
  do(bind_cols(age_group = .$age_group, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
  arrange(desc(lower)) %>%                       # sort by lower confidence interval
  tidyr::complete(age_group) %>%                 # Ensure all levels are represented
  rename("Age group (years)" = age_group, 
         "Deaths" = deaths, 
         "Population" = population, 
         "CFR (%)" = cfr, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  knitr::kable(digits = 2)
```

CFR by case definition 

```{r cfr_by_case_def}
# group by known outcome and case definition 
linelist_cleaned %>% 
  filter(!is.na(outcome)) %>%                     # remove rows with missing outcome
  group_by(case_def) %>%                         # group by case_def
  summarise(deaths = sum(outcome == "Death"),    # tally deaths
            population = n()) %>%                # tally population
  do(bind_cols(case_def = .$case_def, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
  arrange(desc(lower)) %>%                       # sort by lower confidence interval
  tidyr::complete(case_def) %>%                  # Ensure all levels are represented
  rename("Case definition" = case_def, 
         "Deaths" = deaths, 
         "Population" = population, 
         "CFR (%)" = cfr, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  knitr::kable(digits = 2)
```


#### Attack rate

The attack rate per 100,000 population is below - based on available population data for the whole country. 

```{r collect_variables}
# define population 
population <- sum(population_data$population)
deaths     <- filter(linelist_cleaned, outcome == "Death") %>% nrow()
# counts and cummulative counts by week
cases <- count(linelist_cleaned, epiweek) %>% 
  mutate(cummulative = cumsum(n))
```

```{r attack_rate}
ar <- attack_rate(nrow(linelist_cleaned), population, multiplier = 100000) 
  rename(ar, 
         "Cases (n)" = cases,
         "Population" = population, 
         "AR (per 100,000)" = ar, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  knitr::kable(digits = 2)
```

Here, we can see that the Attack Rate for a population of `r format(population, big.mark = ",")` was `r fmt_ci_df(ar)`.

The below gives the attack rate per week. 

```{r attack_rate_per_week}
# attack rate for each week
attack_rate(cases$n, population, multiplier = 100000) %>% 
  # add the epiweek column to table
  bind_cols(select(cases, epiweek), .) %>% 
  rename("Epiweek" = epiweek, 
         "Cases (n)" = cases, 
         "Population" = population, 
         "AR (per 100,000)" = ar, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  knitr::kable(digits = 2)
```
 
The below gives the cummulative attack rate per week. 

```{r cummulative_attack_rate_per_week}
# cummulative attack rate by week
ar <- attack_rate(cases$cummulative, population, multiplier = 100000) %>% 
  # add the epiweek column to table
  bind_cols(select(cases, epiweek), .) %>% 
  rename("Epiweek" = epiweek, 
         "Cases (n)" = cases, 
         "Population" = population, 
         "AR (per 100,000)" = ar, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  knitr::kable(digits = 2)
```


#### Mortality

Mortality rate per 10,000:

```{r mortality_rate}
mortality_rate(deaths, population, multiplier = 10^4) %>%
  rename("Deaths" = deaths, 
         "Population" = population, 
         "Mortality (per 10,000)" = `mortality per 10 000`, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  kable(digits = 2)
```


#### 2x2 tables

```{r two_by_two, message = FALSE, warning = FALSE, results = 'asis'}
outcome <- linelist_cleaned$outcome == "Death"
is_male <- linelist_cleaned$sex == "Male"
is_child <- as.integer(linelist_cleaned$age) <= 12

univariate_analysis(measure = "OR", 
                    digits = 3, 
                    mergeCI = TRUE, 
                    verbose = FALSE,
                    outcome = outcome, 
                    is_male, 
                    is_child) %>% 
  rename("Exposure" = exposure, 
         "Exp. cases (n)" = exp_cases, 
         "Unexp. cases (n)" = unexp_cases, 
         "Exp. controls (n)" = exp_noncases, 
         "Unexp. controls (n)" = unexp_noncases,
         "OR" = estimate,
         "95%CI" = ci
         ) %>% 
  # clean names of exposures 
  mutate(Exposure = recode(Exposure, 
                           is_male = "Sex (male)", 
                           is_child = "Age (<12yrs)")) %>% 
    huxtable::as_huxtable(add_colnames = TRUE) %>%      # convert to a pretty table
    huxtable::set_col_width(c(3, 1, 1, 1, 1, 1, 2)) %>% # set column widths so that exposure and CI are maximized
    huxtable::print_md(max_width = 100)                 # print as a markdown table (so it can be converted)
```


### Time

* [When did the cases fall ill? Are numbers increasing or stable? You may want to include an Epi curve (bar chart showing number of new (suspected and confirmed) cases each day/week) ]


There were `r sum(is.na(linelist_cleaned$date_of_onset))` cases missing dates of onset. 

```{r create_incidence, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset, interval = 7)
```

The peak of the outbreak was in `r ISOweek(find_peak(inc_week_7))`

```{r epicurve, message = FALSE}
# plot your epicurve
plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) + 
  scale_y_continuous(expand = c(0,0)) +  # set origin for axes
  theme_classic() + # give classic black/white graph 
  # add labels to axes and below chart
  labs(x = "Calendar week", y = "Cases (n)", 
       captions = "Source: MoH of China data on xx/yy/zzzz") + 
  # change visuals of dates and remove legend title
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), 
        legend.title = element_blank())  
```

You may also want to stratify by gender. 


```{r incidence_by_gender, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset, 
                        interval = 7, 
                        groups = linelist_cleaned$sex)


plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) + 
  labs(x = "Calendar week", y = "Cases (n)") + 
  scale_y_continuous(expand = c(0,0)) +  # set origin for axes
  theme_classic() + # give classic black/white graph 
  # add labels to axes and below chart
  labs(x = "Calendar week", y = "Cases (n)", 
       captions = "Source: MoH of China data on xx/yy/zzzz") + 
  # change visuals of dates, remove legend title and move legend to bottom
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), 
        legend.title = element_blank(), legend.position = "bottom") 
```


You could similarly stratify by case definition (or any other categorical variable!)


```{r incidence_by_case_def, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset, 
                        interval = 7, 
                        groups = linelist_cleaned$case_def)


plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) + 
  labs(x = "Calendar week", y = "Cases (n)") + 
  scale_y_continuous(expand = c(0,0)) +  # set origin for axes
  theme_classic() + # give classic black/white graph 
  # add labels to axes and below chart
  labs(x = "Calendar week", y = "Cases (n)", 
       captions = "Source: MoH of China data on xx/yy/zzzz") + 
  # change visuals of dates, remove legend title and move legend to bottom
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), 
        legend.title = element_blank(), legend.position = "bottom") 
```


Alternatively, you could stratify by sex among a subset of only confirmed cases. 


```{r incidence_by_sex_confirmed, message = FALSE}
inc_week_7 <- incidence(linelist_cleaned$date_of_onset[linelist_cleaned$case_def == "Confirmed"], 
                        interval = 7, 
                        groups = linelist_cleaned$sex[linelist_cleaned$case_def == "Confirmed"])


plot(inc_week_7, show_cases = TRUE, border = "black", n_breaks = nrow(inc_week_7)) + 
  labs(x = "Calendar week", y = "Cases (n)") + 
  scale_y_continuous(expand = c(0,0)) +  # set origin for axes
  theme_classic() + # give classic black/white graph 
  # add labels to axes and below chart
  labs(x = "Calendar week", y = "Cases (n)", 
       captions = "Source: MoH of China data on xx/yy/zzzz") + 
  # change visuals of dates, remove legend title and move legend to bottom
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), 
        legend.title = element_blank(), legend.position = "bottom") 
```


### Place

*  [Across what area: one or several villages, all from same school, etc. You may want to include a map of the distribution of cases; attack rates by location]


#### Maps


If you do not have spatial data available, it may be worth calculating attack rates by region. 

```{r attack_rate_by_region}
cases <- count(linelist_cleaned, province) %>%   # cases for each week
  left_join(population_data, by = "province")    # merge population data 

# attack rate for each week
attack_rate(cases$n, cases$population, multiplier = 100000) %>% 
  # add the epiweek column to table
  bind_cols(select(cases, province), .) %>% 
  rename("Province" = province, 
         "Cases (n)" = cases, 
         "Population" = population, 
         "AR (per 100,000)" = ar, 
         "Lower 95%CI" = lower,
         "Upper 95%CI" = upper) %>% 
  kable(digits = 2, format.args = list(big.mark = ",")) # set thousands separator
```


The following is one example of how to use and display spatial data. 

```{r spatial_packages, message = FALSE, warning = FALSE}
# spatial packages 
# library(raster) # for downloading GADM shapefiles
library(ggspatial) # for plotting maps and downloading tiles
library(sf) # for manipulating spatial objects easily
```


```{r download_spatial_maps, message=FALSE}

# reading in a shapefile 
# shapefiles consist of multiple files
# so you dont need to specify the file type (reas_sf recognises it for you)
# map <- read_sf(here("mapfolder", "china"))


# download administrative boundaries 

## view ISO3 codes for countries 
# raster::getData("ISO3") 

## retrieve province boundaries from the Global Administrative 
## level = 1 specifies provinces 
## must be possible to do this as sf directly no? Is available on GADM.org
map <- raster::getData("GADM", country = "CN", level = 1)

## changing GADM to a sf object 
map <- st_as_sf(map)

## check the CRS
# st_crs(map)
## set the CRS if not present using EPSG value
# map <- st_set_crs(map, value = 4326) # Sets to WGS84
## Transform to a different CRS such as UTM, search online for relvant EPSG value
# map_otherprojection <- st_transform(map, value = 32646) 

```


```{r subset_shapefiles, message = FALSE, warning = FALSE}
# subsetting shapefiles 

# Subset map to provinces of interest
suppressWarnings({
mapsub <- map %>% 
  filter(NAME_1 %in% unique(linelist_cleaned$province)) %>%
  sf::st_simplify(preserveTopology = TRUE, dTolerance = 0.05)
})
```


```{r random_points, message = FALSE}

# get random points in provinces occuring 
## CAN BE DELETED ONCE WE USE A BETTER DATASET
## stupid work around because st_sample hasnt implemented exact number points yet

## get points
a <- st_sample(mapsub, nrow(linelist_cleaned), type = "random") %>% 
  st_cast("POINT") %>% 
  st_coordinates() %>% 
  data.frame() %>%
  setNames(c("lon", "lat"))

## fix if too many or too few points
if (nrow(a) < nrow(linelist_cleaned)) {
  b <- matrix(rep.int(c(NA, NA), nrow(linelist_cleaned) - nrow(a)), ncol = 2)
  colnames(b) <- c("lon", "lat")
  a <- rbind(a, b)
}

if (nrow(a) > nrow(linelist_cleaned)) {
  a <- a[1:nrow(linelist_cleaned), ]
}

## merge to linelist
linelist_cleaned <- bind_cols(linelist_cleaned, a)

```


```{r plot_map, message = FALSE}
# downloading basemap tiles 
## This requires running the plot first to download the tiles (basemap)
## Or you need to tell the command where to look for the map tile in the cachedir argument


# plot your basemap
base <- ggplot() + 
  annotation_map_tile(zoom = NULL, progress = "text", cachedir = "maps/tiles") + # osm tiles
  geom_sf(data = mapsub, fill = NA, col = "grey50") + # shapefile as polygon
  annotation_scale() # add a scalebar

```


```{r create_sf_from_linelist, message = FALSE}
# make linelist available for plotting 
## this could probably be done directly in linelist_cleaned
## unsure how other functions would react to an sf + dataframe obj though
## It doesn't allow for missing values in the coordinates.
cases <- linelist_cleaned %>% 
  filter(!is.na(lat) & !is.na(lon)) %>% # Remove missing coordinates
  st_as_sf(coords = c("lon", "lat"), crs = 4326)
```

##### Dot maps

```{r dot_maps, message = FALSE, warning = FALSE}
# dotmap 
ggplot() +
  annotation_map_tile(zoom = NULL, cachedir = "maps/tiles") + # osm tiles
  geom_sf(data = mapsub, fill = NA, col = "grey50") + # shapefile as polygon
  geom_sf(data = cases, aes(colour = case_def, fill = case_def)) + # cases as points
  annotation_scale() +  # add a scalebar
  theme_void() # remove coordinates and axes
```

##### Choropleth maps

```{r choropleth_maps, message = FALSE, warning = FALSE}
# choropleth 

## get counts by provinces 
counts <- count(linelist_cleaned, province) 

## merge population and get AR per 100000
counts <- left_join(counts, population_data, by = "province") %>% 
  mutate(AR = n/population * 100000)


## add counts to map data 
mapsub <- left_join(mapsub, counts, by = c("NAME_1" = "province"))


ggplot() +
  annotation_map_tile(zoom = NULL, cachedir = "maps/tiles") + # osm tiles
  geom_sf(data = mapsub, aes(fill = AR), col = "grey50") + # shapefile as polygon
  annotation_scale() + # add a scalebar
  scale_fill_viridis_c(option = "C") + # color the scale to be perceptually uniform
  theme_void() # remove coordinates and axes
```


#### Mortality rate per district

```{r mortality_rate_per_district}

linelist_cleaned %>%
  filter(!is.na(outcome)) %>%                     # remove missing outcomes
  group_by(province) %>%                         # group the provinces
  summarise(deaths = sum(outcome == "Death"),    # tally deaths
            population = n()) %>%                # tally population
  do(bind_cols(province = .$province, mortality_rate(.$deaths, .$population, multiplier = 10^3))) %>% # calculate mortality rate
  arrange(desc(lower)) %>%                       # sort by lower confidence interval
  tidyr::complete(province) %>%                  # Ensure all levels are represented
  rename("Province" = province, 
    "Number of deaths" = deaths, 
    "Population" = population,
    "Mortality per 1,000" = `mortality per 1 000`, 
    "Lower 95% CI" = lower, 
    "Upper 95% CI" = upper
  ) %>% 
  kable(digits = 3, format.args = list(big.mark = ",")) # set thousands separator
```