# Capstone: Epidemic propagation

## Objectives
Can you run a simple spatial-temporal clustering of recent historic epidemics, and can this give you the next epidemics?

Can you pull in some main socio-economic features of the centroids of these clusters and run PCA to see the commonalities across these. What do we predict through this?

Can you include propagation of epidemics and see how they spread?

Therefore... 
Is the Coronavirus predictable?
Where's the next one, and how will that spread?
    
## Data required:
1. List of all recent epidemics (with location and year), including: SARS, Coronavirus, MERS, Ebola, Zika, Bird Flu
2. Socio-economic data for each virus: Long / Lat, Population, Development index, cleanliness etc.?
3. Epidemic propagation: airport traffic at each infected city, volume of travel between cities affected

In [11]:
import pandas as pd
pd.set_option('display.max_rows',100, 'display.max_colwidth',1000)

In [4]:
epidemics_list = pd.read_csv('epidemics_list.csv')
epidemics_list.head()

Unnamed: 0,Death toll (estimate),Location,Date,Event,Disease
0,,Nigeria,2001,,Cholera
1,,South Africa,2001,,Cholera
2,299.0,Hong Kong,2002–2004,Timeline of the SARS outbreak,SARS coronavirus
3,349.0,China,2002–2004,Timeline of the SARS outbreak,SARS coronavirus
4,,Algeria,2003,,Plague


In [None]:
epi_interest = ['SARS coronavirus',
                'Ebola',
                'Middle East respiratory disease',
                'Zika virus',
                'Novel coronavirus (2019-nCoV)']

# what do I want for each?
# timeline (year), city, country, long/lat, no.affected cases

In [6]:
epidemics_list.Disease.unique()

array(['Cholera', 'SARS coronavirus', 'Plague', 'Leishmaniasis',
       'Dengue fever', 'Ebola', 'Yellow fever', 'Malaria',
       'Chikungunya\xa0virus', 'Poliomyelitis',
       'Hand, foot and mouth disease', 'Bubonic plague', 'Hepatitis B',
       'Mumps', 'Meningitis', 'Influenza', 'Measles',
       'Middle East respiratory syndrome',
       'Ebola virus disease\n\nEbola virus virion', 'Chikungunya',
       'Primarily\xa0Hepatitis E, but also\xa0Hepatitis A',
       'Influenza A virus subtype H1N1', 'Zika virus',
       'Japanese encephalitis', 'Nipah virus infection',
       'Ebola virus disease', 'Novel coronavirus (2019-nCoV)'],
      dtype=object)

In [9]:
epidemics_list[epidemics_list['Disease'].isin(['Ebola','Ebola virus disease\n\nEbola virus virion'])]

Unnamed: 0,Death toll (estimate),Location,Date,Event,Disease
9,,Sudan,2004,,Ebola
19,,Democratic Republic of the Congo,2007,Mweka ebola epidemic,Ebola
26,,Uganda,2007,,Ebola
48,"> 11,300",West Africa,2013–2016,Ebola virus epidemic in West Africa,Ebola virus disease\n\nEbola virus virion


In [56]:
epidemics_list[epidemics_list['Disease'].isin(['SARS coronavirus'])]

Unnamed: 0,Death toll (estimate),Location,Date,Event,Disease
2,299,Hong Kong,2002–2004,Timeline of the SARS outbreak,SARS coronavirus
3,349,China,2002–2004,Timeline of the SARS outbreak,SARS coronavirus


In [52]:
ebola = pd.read_csv('ebola_data.csv')
ebola['Date'] = pd.to_datetime(ebola['Date'])
ebola['Country'] = ['Guinea' if row=='Guinea 2' else 'Liberia' if row=='Liberia 2' else row for row in ebola.Country]
ebola.groupby(['Indicator','Country','Date']).sum()
ebola.sort_values('Date', inplace=True)
ebola.reset_index(drop=True)

Unnamed: 0,Indicator,Country,Date,value
0,Cumulative number of confirmed Ebola deaths,Guinea,2014-08-29,287.0
1,"Number of confirmed, probable and suspected Ebola cases in the last 21 days",Sierra Leone,2014-08-29,331.0
2,Number of confirmed Ebola cases in the last 21 days,Nigeria,2014-08-29,6.0
3,Number of probable Ebola cases in the last 21 days,Nigeria,2014-08-29,1.0
4,Number of suspected Ebola cases in the last 21 days,Nigeria,2014-08-29,3.0
5,"Number of confirmed, probable and suspected Ebola cases in the last 21 days",Nigeria,2014-08-29,10.0
6,Proportion of confirmed Ebola cases that are from the last 21 days,Guinea,2014-08-29,27.0
7,Proportion of probable Ebola cases that are from the last 21 days,Guinea,2014-08-29,5.0
8,Proportion of suspected Ebola cases that are from the last 21 days,Guinea,2014-08-29,80.0
9,"Proportion of confirmed, probable and suspected Ebola cases that are from the last 21 days",Guinea,2014-08-29,24.0


In [55]:
ebola[(ebola['Indicator']=='Cumulative number of confirmed Ebola cases') & (ebola['Country']=='Sierra Leone')]

Unnamed: 0,Indicator,Country,Date,value
5094,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-08-29,935.0
4968,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-05,1146.0
4854,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-08,1234.0
4754,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-12,1287.0
4656,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-16,1464.0
4576,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-18,1513.0
4536,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-22,1640.0
4472,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-24,1745.0
4432,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-09-26,1816.0
4368,Cumulative number of confirmed Ebola cases,Sierra Leone,2014-10-01,2076.0
