# Description of Swedish mortality data

In this notebook we will look at different ways of summarizing data on daily counts of deaths in Sweden. The data comes from [Statistics Sweden](https://www.scb.se/hitta-statistik/corona/corona-i-statistiken/#Statistik). The dataset we will work with has been edited and all the deaths without a determined date has been removed. This means that the total number of death will be slightly lower than what it is in reality. 

Start by loading the dataset and printing the first few lines.

In [None]:
options(warn=-1)
require(dplyr)
require(ggplot2)
options(repr.plot.width=14, repr.plot.height=8)

data <- readRDS("mortality_data_from_SCB.rds")
head(data)

The dataset contains the daily counts of number of persons who have died in Sweden from 2015-01-01 to 2021-01-31. Perhaps we could look at the average number of deaths per day and the total per year.

In [None]:
data %>% filter(sex=="both" & agegr=="all") %>% mutate(year=format(date,"%Y")) %>% group_by(year) %>%summarize(mean_p_day=mean(count),sum_p_year=sum(count))

We might want to break it down by sex.

In [None]:
data %>% filter(sex!="both" & agegr=="all") %>% mutate(year=format(date,"%Y")) %>% group_by(sex,year) %>%summarize(mean_p_day=mean(count),sum_p_year=sum(count))

Such annual summaries are very common and useful, but they hide some aspects of the data. Let's look at average number of deaths per month.

In [None]:
data %>% filter(sex=="both" & agegr=="all" & date<"2021-01-01") %>% mutate(month=format(date,"%m")) %>% group_by(month) %>%summarize(death_p_month=sum(count))

Which months do more people die in these data? How big are the differences?

Next, let's try to visualize the data. We will make a smoothed plot of total number of deaths as a function time. 


In [None]:
## prepare a dataset for plotting
## first we make a suitable filter to smooth the time series
ker <- dnorm(seq(-5,5,length.out=15),0,2)
ker <- ker/sum(ker)
pdat<- data %>% filter(sex=="both" & agegr=="all") %>% mutate(fcount=stats::filter(count,ker)) 
ggplot(data=pdat,aes(x=date,y=fcount))+geom_line()+ylab("deaths per day")+xlab("calendar date") +theme_bw(base_size=18)