# Excess Deaths

The link is current WW deaths by country from 2015 . The starting and ending day of the data is different for each country. Moreover, there are two different time unit for the countries, i.e. monthly and weekly. I assume all the countries stay in one time unit only. You may want to verify it. This is also part of the homework. https://github.com/akarlinsky/world_mortality/blob/main/world_mortality.csv

The main goal of the homework is to find the annual excess deaths for each country. For simplicity, the average annual deaths until the end of 2019 is considered as regular deaths. The annual deaths in 2020 and 2021 is the deaths for all causes (including regular and covid deaths). With the annual average deaths in these two time, we should be able to find the excess death in the covid years. 

## Initialize

In [220]:
import random
import pandas as pd
import numpy as np
import math
import json
import matplotlib.pyplot as plt
from pandas import Timestamp
from datetime import datetime
from time import time
from os import getcwd
from os.path import join
start = time()
path = join(getcwd().rstrip('src'), 'data/world_mortality.csv').replace('\\', '/')
print(path)
data = pd.read_csv(path)
end = time()
print('Reading time: ' + str(end-start))

d:/Note_Database/Subject/BD_ML Big Data and Machine Learning/BD_ML_Code/data/world_mortality.csv
Reading time: 0.035051584243774414


In [221]:
data.head()

Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2015,1,monthly,2490.0
1,ALB,Albania,2015,2,monthly,2139.0
2,ALB,Albania,2015,3,monthly,2051.0
3,ALB,Albania,2015,4,monthly,1906.0
4,ALB,Albania,2015,5,monthly,1709.0


## Verify time unit

In [222]:
data.dtypes

iso3c            object
country_name     object
year              int64
time              int64
time_unit        object
deaths          float64
dtype: object

In [223]:
data_temp = data.copy()
columns = data_temp.columns
column_dict = []
for x in columns:
    c = data_temp[x].astype('category')
    d = dict(enumerate(c.cat.categories))
    column_dict.append(d)
    data_temp[x] = data_temp[x].astype('category').cat.codes
data_temp.dtypes

iso3c            int8
country_name     int8
year             int8
time             int8
time_unit        int8
deaths          int16
dtype: object

In [224]:
# print('iso3c:\t' + str(json.dumps(column_dict[0], indent=4)))
# print('country_name:\t' + str(json.dumps(column_dict[1], indent=4)))
# print('year:\t' + str(json.dumps(column_dict[2], indent=4)))
# print('time:\t' + str(json.dumps(column_dict[3], indent=4)))
print('time_unit:\t' + str(json.dumps(column_dict[4], indent=4)))
# print('deaths:\t' + str(json.dumps(column_dict[5], indent=4)))



time_unit:	{
    "0": "monthly",
    "1": "weekly"
}


## Sum death yearly

In [225]:
pd.options.mode.chained_assignment = None
annual_death = pd.DataFrame()
year = [0, 0]
death = 0

for index, row in data.iterrows():
    year[1] = row['year']
    if year[0] != year[1]:
        conrow = data.iloc[index-1]
        conrow.deaths = death
        annual_death = pd.concat([annual_death, conrow], axis=1)
        death = 0
    death += row['deaths']
    year[0] = year[1]

annual_death = annual_death.transpose()
annual_death.drop(annual_death.index[0], inplace=True)
annual_death.reset_index(drop=True, inplace=True)


In [226]:
path = join(getcwd().rstrip('src'),
            'data/bd_w10_hw/1_annual_death.csv').replace('\\', '/')
annual_death.to_csv(path)
annual_death.head()


Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2015,12,monthly,22418.0
1,ALB,Albania,2016,12,monthly,21388.0
2,ALB,Albania,2017,12,monthly,22232.0
3,ALB,Albania,2018,12,monthly,21804.0
4,ALB,Albania,2019,12,monthly,21937.0


## Separate data before and after the end of 2019

In [227]:
annual_death_bf = pd.DataFrame()
annual_death_af = pd.DataFrame()

for index, row in annual_death.iterrows():
    if row.year > 2019:
        annual_death_af = pd.concat(
            [annual_death_af, row], axis=1)
    else:
        annual_death_bf = pd.concat(
            [annual_death_bf, row], axis=1)

annual_death_bf = annual_death_bf.transpose()
annual_death_bf.drop(annual_death_bf.index[0], inplace=True)
annual_death_bf.reset_index(drop=True, inplace=True)
annual_death_af = annual_death_af.transpose()
annual_death_af.drop(annual_death_af.index[0], inplace=True)
annual_death_af.reset_index(drop=True, inplace=True)


In [228]:
path = join(getcwd().rstrip('src'),
            'data/bd_w10_hw/2_annual_death_bf.csv').replace('\\', '/')
annual_death_bf.to_csv(path)
annual_death_bf.head()


Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2016,12,monthly,21388.0
1,ALB,Albania,2017,12,monthly,22232.0
2,ALB,Albania,2018,12,monthly,21804.0
3,ALB,Albania,2019,12,monthly,21937.0
4,DZA,Algeria,2018,12,monthly,177136.4


In [229]:
path = join(getcwd().rstrip('src'),
            'data/bd_w10_hw/3_annual_death_af.csv').replace('\\', '/')
annual_death_af.to_csv(path)
annual_death_af.head()


Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2021,12,monthly,30580.0
1,ALB,Albania,2022,6,monthly,12854.0
2,DZA,Algeria,2020,12,monthly,235628.0
3,AND,Andorra,2020,12,monthly,419.0
4,ATG,Antigua and Barbuda,2020,12,monthly,574.0


## Average death yearly

In [230]:
avg_adbf = pd.DataFrame()
location = ['na', 'na']
death = 0
year_count = 0

for index, row in annual_death_bf.iterrows():
    location[1] = row['iso3c']
    if location[1] != location[0]:
        conrow = annual_death_bf.iloc[index-1]
        try:
            conrow.deaths = round(death / year_count)
        except:
            conrow.deaths = 0
        avg_adbf = pd.concat([avg_adbf, conrow], axis=1)
        avg_death = 0
        year_count = 0
    death += row['deaths']
    year_count += 1
    location[0] = row.iso3c

avg_adbf = avg_adbf.transpose()
avg_adbf.drop(avg_adbf.index[0], inplace=True)
avg_adbf.reset_index(drop=True, inplace=True)

In [231]:
path = join(getcwd().rstrip('src'),
            'data/bd_w10_hw/4_avg_adbf.csv').replace('\\', '/')
avg_adbf.to_csv(path)
avg_adbf.head()


Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2019,12,monthly,21840
1,DZA,Algeria,2019,12,monthly,222970
2,AND,Andorra,2019,12,monthly,89498
3,ATG,Antigua and Barbuda,2019,12,monthly,90069
4,ARG,Argentina,2019,12,monthly,431457


In [232]:
avg_adaf = pd.DataFrame()
location = ['na', 'na']
death = 0
year_count = 0

for index, row in annual_death_af.iterrows():
    location[1] = row['iso3c']
    if location[1] != location[0]:
        conrow = annual_death_af.iloc[index-1]
        try:
            conrow.deaths = round(death / year_count)
        except:
            conrow.deaths = 0
        avg_adaf = pd.concat([avg_adaf, conrow], axis=1)
        avg_death = 0
        year_count = 0
    death += row['deaths']
    year_count += 1
    location[0] = row.iso3c


avg_adaf = avg_adaf.transpose()
avg_adaf.drop(avg_adaf.index[0], inplace=True)
avg_adaf.reset_index(drop=True, inplace=True)

In [233]:
path = join(getcwd().rstrip('src'),
            'data/bd_w10_hw/5_avg_adaf.csv').replace('\\', '/')
avg_adaf.to_csv(path)
avg_adaf.head()

Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2022,6,monthly,21717
1,DZA,Algeria,2020,12,monthly,279062
2,AND,Andorra,2020,12,monthly,279481
3,ATG,Antigua and Barbuda,2021,12,monthly,140352
4,ARG,Argentina,2020,12,monthly,656925


## Calculate excess deaths for each country

In [234]:
exc_adaf = annual_death_af.copy()

for index, row in exc_adaf.iterrows():
    location = row['iso3c']
    try:
        avg_death = avg_adbf.loc[avg_adbf['iso3c'] == location].deaths.values[0]
        row.deaths = row.deaths - avg_death
    except:
        print("Error>> loc/deah/avg_death: " + str(location) + "/" +
              str(row['deaths']) + "/" + str(avg_death))
        row.deaths = -1

Error>> loc/deah/avg_death: UZB/175637.0/19930658
Error>> loc/deah/avg_death: UZB/0/19930658


In [235]:
path = join(getcwd().rstrip('src'),
            'data/bd_w10_hw/6_exc_adaf.csv').replace('\\', '/')
exc_adaf.to_csv(path)
exc_adaf.head()


Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2021,12,monthly,8740.0
1,ALB,Albania,2022,6,monthly,-123.0
2,DZA,Algeria,2020,12,monthly,56092.0
3,AND,Andorra,2020,12,monthly,189983.0
4,ATG,Antigua and Barbuda,2020,12,monthly,-89495.0


## Calculate averaged excess deaths for each country

In [236]:
avg_exc_adaf = avg_adaf.copy()

for index, row in avg_exc_adaf.iterrows():
    location = row['iso3c']
    avg_death = avg_adbf.loc[avg_adbf['iso3c'] == location].deaths.values[0]
    # print("deah/avg_death: " + str(row['deaths']) + "/" + str(avg_death))
    row.deaths = row.deaths - avg_deaths


In [237]:
path = join(getcwd().rstrip('src'),
            'data/bd_w10_hw/7_avg_exc_adaf.csv').replace('\\', '/')
avg_exc_adaf.to_csv(path)
avg_exc_adaf.head()


Unnamed: 0,iso3c,country_name,year,time,time_unit,deaths
0,ALB,Albania,2022,6,monthly,-123
1,DZA,Algeria,2020,12,monthly,56092
2,AND,Andorra,2020,12,monthly,189983
3,ATG,Antigua and Barbuda,2021,12,monthly,50283
4,ARG,Argentina,2020,12,monthly,225468


## Result

DataFrame "exc_adaf" or "[6_exc_adaf.csv](https://github.com/belongtothenight/BD_ML_Code/blob/main/data/bd_w10_hw/6_exc_adaf.csv)" stores data of "excess deaths after 2019 for each country".<br>
<br>
$\text{deaths in each row}=(\text{death by year})\quad-\quad\cfrac{\Sigma(\text{death by year before 2020})}{\text{year count before 2020}}$<br>
<br>
DataFrame "avg_exc_adaf" or "[7_avg_exc_adaf.csv](https://github.com/belongtothenight/BD_ML_Code/blob/main/data/bd_w10_hw/7_avg_exc_adaf.csv)" stores data of "averaged excess deaths after 2019 for each country".<br>
<br>
$\text{deaths in each row}=\cfrac{\Sigma(\text{death by year after 2020})}{\text{year count after 2020}}\quad-\quad\cfrac{\Sigma(\text{death by year before 2020})}{\text{year count before 2020}}$<br>
<br>
**Note**: Data of Uzbekistan (UZB) are not processed correctly, so it's absent in the final result.