# Excess Deaths

The link is current WW deaths by country from 2015 . The starting and ending day of the data is different for each country. Moreover, there are two different time unit for the countries, i.e. monthly and weekly. I assume all the countries stay in one time unit only. You may want to verify it. This is also part of the homework. https://github.com/akarlinsky/world_mortality/blob/main/world_mortality.csv

The main goal of the homework is to find the annual excess deaths for each country. For simplicity, the average annual deaths until the end of 2019 is considered as regular deaths. The annual deaths in 2020 and 2021 is the deaths for all causes (including regular and covid deaths). With the annual average deaths in these two time, we should be able to find the excess death in the covid years. 

## Initialize

In [2]:
import random
import pandas as pd
import numpy as np
import math
import json
import matplotlib.pyplot as plt
from pandas import Timestamp
from datetime import datetime
from time import time
from os import getcwd
from os.path import join
start = time()
path = join(getcwd().rstrip('src'), 'data/world_mortality.csv').replace('\\', '/')
print(path)
data = pd.read_csv(path)
end = time()
print('Reading time: ' + str(end-start))

d:/Note_Database/Subject/BD_ML Big Data and Machine Learning/BD_ML_Code/data/world_mortality.csv
Reading time: 0.12062597274780273


In [3]:
data.head()

Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2015,1,monthly,2490.0
1,ALB,Albania,2015,2,monthly,2139.0
2,ALB,Albania,2015,3,monthly,2051.0
3,ALB,Albania,2015,4,monthly,1906.0
4,ALB,Albania,2015,5,monthly,1709.0


## Verify time unit

In [4]:
data.dtypes

iso3c            object
country_name     object
year              int64
time              int64
time_unit        object
deaths          float64
dtype: object

In [5]:
data_temp = data.copy()
columns = data_temp.columns
column_dict = []
for x in columns:
    c = data_temp[x].astype('category')
    d = dict(enumerate(c.cat.categories))
    column_dict.append(d)
    data_temp[x] = data_temp[x].astype('category').cat.codes
data_temp.dtypes

iso3c            int8
country_name     int8
year             int8
time             int8
time_unit        int8
deaths          int16
dtype: object

In [6]:
# print('iso3c:\t' + str(json.dumps(column_dict[0], indent=4)))
# print('country_name:\t' + str(json.dumps(column_dict[1], indent=4)))
# print('year:\t' + str(json.dumps(column_dict[2], indent=4)))
# print('time:\t' + str(json.dumps(column_dict[3], indent=4)))
print('time_unit:\t' + str(json.dumps(column_dict[4], indent=4)))
# print('deaths:\t' + str(json.dumps(column_dict[5], indent=4)))



time_unit:	{
    "0": "monthly",
    "1": "weekly"
}


## Calculate annual excess deaths for each country

In [8]:
pd.options.mode.chained_assignment = None
annual_death = pd.DataFrame()
year = [0, 0]
death = 0

for index, row in data.iterrows():
    year[1] = row['year']
    if year[0] != year[1]:
        conrow = data.iloc[index-1]
        conrow.deaths = death
        annual_death = pd.concat([annual_death, conrow], axis=1)
        death = 0
    death += row['deaths']
    year[0] = year[1]

annual_death = annual_death.transpose()
annual_death.drop(annual_death.index[0], inplace=True)
annual_death.head()

Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
11,ALB,Albania,2015,12,monthly,22418.0
23,ALB,Albania,2016,12,monthly,21388.0
35,ALB,Albania,2017,12,monthly,22232.0
47,ALB,Albania,2018,12,monthly,21804.0
59,ALB,Albania,2019,12,monthly,21937.0


## Separate averaged annual excess deaths before and after the end of 2019 for each country

In [30]:
annual_death.reset_index(drop=True, inplace=True)
annual_death_bf = pd.DataFrame()
annual_death_af = pd.DataFrame()

for index, row in annual_death.iterrows():
    conrow = annual_death.iloc[index-1]
    if row.year > 2019:
        annual_death_af = pd.concat(
            [annual_death_af, conrow], axis=1)
    else:
        annual_death_bf = pd.concat(
            [annual_death_bf, conrow], axis=1)

In [31]:
annual_death_bf = annual_death_bf.transpose()
annual_death_bf.drop(annual_death_bf.index[0], inplace=True)
annual_death_bf.head(10)


Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2015,12,monthly,22418.0
1,ALB,Albania,2016,12,monthly,21388.0
2,ALB,Albania,2017,12,monthly,22232.0
3,ALB,Albania,2018,12,monthly,21804.0
7,ALB,Albania,2022,6,monthly,26490.875
8,DZA,Algeria,2018,12,monthly,177136.4
10,DZA,Algeria,2020,12,monthly,158762.5375
11,AND,Andorra,2015,12,monthly,282.0
12,AND,Andorra,2016,12,monthly,310.0
13,AND,Andorra,2017,12,monthly,323.0


In [27]:
annual_death_af = annual_death_af.transpose()
annual_death_af.drop(annual_death_af.index[0], inplace=True)
annual_death_af.head()

Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
6,ALB,Albania,2021,12,monthly,30580.0
22,ATG,Antigua and Barbuda,2020,12,monthly,22418.0
35,ARM,Armenia,2020,12,monthly,35371.0
36,ARM,Armenia,2021,12,monthly,34638.0
43,ABW,Aruba,2020,12,monthly,743.0
