In [None]:
import pandas as pd
import folium
import json
from folium import plugins

# Background
As COVID-19 continues to spread across the world, one of the greatest challenges to response efforts is a lack of data and evidence about infection and mortality rates. We will learn more about the virus and the  populations most at risk as more test kits arrive in the hands of physicians and the total number of cases becomes clearer. Ideally, this information will be used to help target response efforts, guide policy decisions, inform the donor community, and support decision-making by businesses and multinational organizations whose work spans multiple countries. As a global development company with staff in over ___ countries, this kind of information is invaluable, as we think about ways to support our staff, and prepare for what comes next.

Information from the earliest confirmed cases of COVID-19 indicates that some groups of people are at a higher risk for infection than others. Public health organizations like the WHO and US CDC have released warnings for those at higher risk to take extra precautionary measures. To help DAI and other global development organizations better understand vulnerabilities in the communities in which they work, the DAI Global Health and the Center for Digital Acceleration are tracking these risk factors, in addition to other potential correlates to mortality rates. 

In this first of a multi-part series, we will keep an updated record of demographic, socio-economic, and environmental factors that are perceived to being correlated with heightened risk of mortality and identify known country and subnational datasets that can help us identify countries at risk. We will also provide charts and maps to identify potential country level “hotspots.” 


# Theories

### (THIS SECTION WILL BE UPDATED AS WE LEARN MORE ABOUT RISK FACTORS)

Below is a list of factors that have emerged as known drivers of increasing risk of mortality. Some of these theories are based on information released by official public health bodies while others are factors that could be proxies. As we learn more about possible drivers from the medical community, we will update this analysis with new datasets when possible.
1.	People over the age of 65+ are at higher risk (CDC)
2.	People who smoke are at higher risk (CDC)
3.  People living in countries with higher air pollution are at higher risk
4.  People who have serious chronic medical conditions are at higher risk (CDC). These conditions include:
  1. Heart disease
  2. Diabetes
  3. Lung disease
5.	People who have hypertension are at higher risk (Bloomberg News)

In [None]:
factor_df = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vQ9Puqir6LrosixgUrjvXW09b58RzIsMOIdU1AmTqdTPM-Uki2nma39SGSN9ZzkqVQid8m6DT7nSHvq/pub?gid=0&single=true&output=csv')
factor_df.head()

In [None]:
# read in covid data
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-16-2020.csv'
covid_df = pd.read_csv(url)

# Mapping

In [None]:
# load world geojson file
with open('countries.geojson') as f:
    world_area = json.load(f)

In [None]:
# function to create choropleth map with COVID case overlay
def create_map(factor_df, covid_df, variable_name, world_area):
    # initialize the map
    world_map = folium.Map(tiles='Mapbox Bright')
    
    # add choropleth base
    folium.Choropleth(
        geo_data=world_area,
        name=variable_name,
        data=factor_df,
        columns=['ISO3', variable_name],
        key_on='feature.properties.ISO_A3',
        fill_color='BuPu',
        fill_opacity=0.7,
        line_opacity=0.2,
        nan_fill_opacity=0,
        highlight=True,
        legend_name=variable_name).add_to(world_map)
    
    # add covid data - confirmed cases only
    #for i,row in covid_df.iterrows():
    #    if row.Confirmed != 0:
    #        tooltip = 'Country/Region: ' + row['Country/Region'] + '<br>' + 'Province/State: ' + str(row['Province/State'])+ '<br>' + 'Confirmed Cases: ' + str(row.Confirmed)
    #        folium.CircleMarker((row.Latitude,row.Longitude), radius=row.Confirmed/1000, color='red', fill_color='red', fill_opacity=.5, tooltip=tooltip).add_to(world_map)
    #folium.LayerControl().add_to(world_map)

    # save map as html for interactivity
    #world_map.save(variable_name+'.html')
    
    return None

In [None]:
for x in factor_df.columns[6:]:
    create_map(factor_df, covid_df, x, world_area)

# Charting


In [None]:
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure, imshow, axis
from matplotlib.image import imread
plt.style.use('ggplot')

In [None]:
# function to create standard bar chart
def create_bar(factor_df, variable_name):
    # sort by value & drop nulls
    factor2_df = factor_df.dropna(subset=[variable_name]).sort_values(by=[variable_name]).tail(25)
    
    # plot
    plt.figure(figsize=(10,10))
    plt.barh(factor2_df['NAME_ENGLI'], factor2_df[variable_name], color='#a1c7e3')
    plt.title(variable_name)
    
    # save as png
    plt.savefig(variable_name +'_bar.png')
    return None

In [None]:
# loop through all variables and create bar charts
for x in factor_df.columns[6:]:
    create_bar(factor_df, x)

In [None]:
# histogram for factor 1
plt.figure(figsize=(10,10))
plt.hist(factor_df['pop_over_65'], bins=20, width=1, color='#a1c7e3')
plt.xlabel('Percent of population, 65+ yo')
plt.ylabel('Frequency')
plt.title('Age')
plt.savefig('pop_over_65_hist.png')

# Factor 1: Age
Age has been noted by all official public health organizations as a risk for getting very sick from COVID-19, specifically older adults. Data from the earliest cases in China, Italy, and South Korea have shown that mortality rates are higher among older adults. Using data from the World Bank in 2018, we looked at the percentage of the population aged 65 years old and older in each country. The map below shows the global distribution of these older adults - the darker color indicates a larger percentage of the country population while the lighter indicates a smaller percentage.

In [None]:
from IPython.display import Image, IFrame
IFrame(src='pop_over_65.html', width=700, height=600)

Here we've ranked the top 25 countries by their population percentage of older adults. Japan tops the list at 27.6% while Italy ranks second in the world with over 22.8% of the country aged 65 years or older.

In [None]:
Image('pop_over_65_bar.png')

Finally, we looked at the distribution of countries. Most countries have an older population that makes up between 0-13% of the entire country population. However, there is also a smaller group of countries with a slightly higher percentage of older adults - indicated by the second peak in the graph - ranging between 15-20% of the country population.

In [None]:
Image('pop_over_65_hist.png')