# The Ebola Epidemic 2014-2016: a networking point of view
In 2014 an increase in Ebola hemorrhagic fever was noticed in Guinea. This was the result of the spread of the Zaire Ebola virus transmitted from wildlife to humans with the highest fatality-rate of all Ebola virus strains. Human to human transmission resulted in a fast transmission and the World Health Organisation (WHO) declared an official outbreak at 49 confirmed cases and 29 deaths. But this was just the beginning.
In 2,5 years the epidemic resulted in 28 642 cases and 11 319 deaths. It spread to 10 different countries in West-Africa, Europe and the USA and was labelled a Public Health Emergency. It was not until June 2016 that Guinea, the source of the infection was declared Ebola free.
This notebook describes the methods used to get to the final poster result. Data finding, data preparation till it was ready to visualise in cytoscape.

Data was extracted from https://data.humdata.org/dataset/ebola-cases-2014 which was based on the data from the WHO.

In [None]:
import pandas as pd
import numpy as np
import datetime
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.read_csv('EBOLA01.csv')

Data was imported and studied. It consists of an indicator showing the if it are cases or deaths. The corresponding country, date and the number of cases/deaths

In [None]:
df.loc[4961]
df.tail()

Column names were assigned

In [None]:
df.columns = ['default']
df.tail()

In [None]:
df = pd.DataFrame(df.default.str.split(';',3).tolist(), columns = ['Ind','Country','Date', 'Value'])

In [None]:
df.tail()

In [None]:
df['Date'] =  pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Value'] = df['Value'].astype(int)

Split dataset in deaths and cases into seperate dataframes.

In [None]:
deaths = pd.DataFrame
cases = pd.DataFrame

In [None]:
deaths = df.loc[df.Ind == 'Deaths']
cases = df.loc[df.Ind == 'Cases']
cases.columns = ['Ind', 'Country', 'Date', 'Number of cases']
deaths.columns = ['Ind', 'Country', 'Date', 'Number of deaths']
cases = cases.reset_index()
deaths = deaths.reset_index()

In [None]:
print(cases.tail())
print(deaths.tail())

In [None]:
cases = cases.drop('index', axis=1)
cases = cases.drop('Ind', axis=1)
deaths = deaths.drop('index', axis=1)
deaths = deaths.drop('Ind', axis=1)

Now the case fatality rate can be calculated by merging the dataframes until there is a column of cases and a column of deaths for each country on each corresponding date. 

In [None]:
cfr = cases.merge(deaths, how = 'inner', on = ['Country', 'Date'])
cfr.tail()

In [None]:
cfr['CFR'] = cfr['Number of deaths'] / cfr['Number of cases'] * 100

In [None]:
cfr = cfr.round(3)
cfr.tail()

To be able to use the countries on a world map, the country name was replaced by the ISO-code

In [None]:
#iso code landen:
cfr.Country.unique()

In [None]:
# make dictionary of country and ISO code
d = {'Guinea': 'GNQ' , 'Liberia': 'LBR' , 'Nigeria': 'NGA' , 'Sierra Leone': 'SLE' , 'Senegal': 'SEN' ,
       'United States of America': 'USA' , 'Spain': 'ESP' , 'Mali': 'MLI' , 'United Kingdom': 'GBR',
       'Italy': 'ITA' , 'Liberia 2': 'LBR', 'Guinea 2': 'GNQ'}

In [None]:
cfr['Country'] = cfr['Country'].replace(d)

Now the large dataframe was split into data concerning 2014, 2015 and 2016 separately. 

In [None]:
split_date1 = pd.datetime(2015,1,1)
split_date2 = pd.datetime(2016,1,1)

data_2014 = cfr[(pd.to_datetime(cfr['Date']) < split_date1)]
data_2015 = cfr[(pd.to_datetime(cfr['Date']) > split_date1)]
data_2015 = data_2015[(pd.to_datetime(data_2015['Date']) < split_date2)]
data_2016 = cfr[(pd.to_datetime(cfr['Date']) >= split_date2)]

In [None]:
data_2014.head(75)

In [None]:
data_2014 = data_2014.reset_index()
data_2015 = data_2015.reset_index()
data_2016 = data_2016.reset_index()

In [None]:
data_2014 = data_2014.drop('index', axis=1)
data_2015 = data_2015.drop('index', axis=1)
data_2016 = data_2016.drop('index', axis=1)

In [None]:
data_2014.to_csv("data_2014.csv")
data_2015.to_csv("data_2015.csv")
data_2016.to_csv("data_2016.csv")

To achieve a world map with all the cases and CFR over days, plotly express was used. 
In order to visualise the large ànd small numbers, the log of the CFR was taken. 

In [None]:
import plotly.express as px

In [None]:
data_2014["Number of cases log"] = data_2014["Number of cases"].apply(np.log)
data_2015["Number of cases log"] = data_2015["Number of cases"].apply(np.log)
data_2016["Number of cases log"] = data_2016["Number of cases"].apply(np.log)

In [None]:
data_2014.head()

In [None]:
fig = px.scatter_geo(data_2014, locations="Country",animation_frame="Date", size = 'Number of cases log', color = 'CFR', projection="natural earth")
fig.show()

In [None]:
fig = px.scatter_geo(data_2015, locations="Country",animation_frame="Date", size = 'Number of cases log', color = 'CFR', projection="natural earth")
fig.show()

In [None]:
fig = px.scatter_geo(data_2016, locations="Country",animation_frame="Date", size = 'Number of cases log', color = 'CFR', projection="natural earth")
fig.show()

Because it is not possible to show an interactive view of the growing of the cases and CFR over time on a 2D poster, the dataframe was made with the total number of cases and deaths per country and the respective mean of the CFR.


In [None]:
data_2014 = data_2014.groupby("Country").mean()
data_2015 = data_2015.groupby("Country").mean()
data_2016 = data_2016.groupby("Country").mean()
data_2014 = data_2014.round(1)
data_2015 = data_2015.round(1)
data_2016 = data_2016.round(1)

In [None]:
data_2014.head()

In [None]:
data_2014.columns.unique()

In [None]:
data_2014['Country'] = data_2014.index
data_2015['Country'] = data_2015.index
data_2016['Country'] = data_2016.index

### Network visualisation
Now the data is prepared for use in Cytoscape. First source and target nodes were determined from the literature. <br>

Guinea ==> Sierra Leone;<br>
Guinea ==> Liberia;<br>
Sierra Leone ==> Liberia;<br>
Guinea ==> Nigeria (by travelling to Guinea, quickly contained);<br> 
Liberia ==> Spain<br>
Guinea ==> US<br>
Liberia ==> US<br>
Guinea ==> Senegal<br>
SLE ===> groot brittanie<br>
SLE ===> Italy<br>

Radius and colour of the node is the CFR <br>
Weight of the edge is the number of cases <br>
Length of the edge is the distance btw countires <br>

In [None]:
data_2014.rename(columns={"Country": "Target"})
data_2014['Source']='GNQ' 
data_2015.rename(columns={"Country": "Target"})
data_2015['Source']='GNQ' 
data_2016.rename(columns={"Country": "Target"})
data_2016['Source']='GNQ' 

In [None]:
data_2014 = data_2014[['Country', 'Source', 'Number of cases', 'Number of cases log', 'Number of deaths', 'CFR']]
data_2015 = data_2015[['Country', 'Source', 'Number of cases','Number of cases log', 'Number of deaths', 'CFR']]
data_2016 = data_2016[['Country', 'Source', 'Number of cases', 'Number of cases log','Number of deaths', 'CFR']]

In [None]:
print(data_2014.Country.unique())
print(data_2015.Country.unique())
print(data_2016.Country.unique())

In [None]:
data_2014.loc[(data_2014.Country == 'ESP'), 'Source'] = 'LBR'
data_2014.loc[(data_2014.Country == 'USA'), 'Source'] = 'LBR'

data_2015.loc[(data_2015.Country == 'ESP'), 'Source'] = 'LBR'
data_2015.loc[(data_2015.Country == 'USA'), 'Source'] = 'LBR'
data_2015.loc[(data_2015.Country == 'GBR'), 'Source'] = 'SLE'
data_2015.loc[(data_2015.Country == 'ITA'), 'Source'] = 'SLE'

data_2016.loc[(data_2016.Country == 'ESP'), 'Source'] = 'LBR'
data_2016.loc[(data_2016.Country == 'USA'), 'Source'] = 'LBR'
data_2016.loc[(data_2016.Country == 'GBR'), 'Source'] = 'SLE'
data_2016.loc[(data_2016.Country == 'ITA'), 'Source'] = 'SLE'

In [None]:
data_2016

In [None]:
data_2014.to_csv("data_2014_cyto.csv")
data_2015.to_csv("data_2015_cyto.csv")
data_2016.to_csv("data_2016_cyto.csv")