# COVID-19 and Population project

In this project I am evaluating the Coronavitus data in order to answer some questions like:

**1. Which is the most related variable to COVID-19 Cases?**

**2. Is COVID-19 related to the age?**

**3. Are the countries with some kind of free Healthcare policies more affected than the rest?**

### Section 1. Gather the data
The first step is call the libraries and get the dataset:


In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import shapefile as shp
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import seaborn as sns
import sup_func as sfu
import geograph_code as geo #not needed to be removed
import geopandas as gpd
%matplotlib inline

#I get the df from website worldometers.info

url_popu = 'https://www.worldometers.info/world-population/population-by-country/'
url_covid = 'https://www.worldometers.info/coronavirus/#countries'
#url that I used for downloading the data
r_popu = requests.get(url_popu)
r_covid = requests.get(url_covid)
soup_popu = BeautifulSoup(r_popu.content)
soup_covid = BeautifulSoup(r_covid.content)
countries_popu = soup_popu.find_all('table')[0]
countries_covid = soup_covid.find_all('table')[0]
df_popu = pd.read_html(str(countries_popu))[0]
df_covid = pd.read_html(str(countries_covid))[0]

#Also I get a healthcare from a csv file, a list of countries with free Healtcare policies
df_healthcare = pd.read_csv('countries with free healthcare.csv')

### 1.1. First visualization

You can see below the heads of the created **DataFrames**

In [2]:
df_popu.head()

Unnamed: 0,#,Country (or dependency),Population (2020),Yearly Change,Net Change,Density (P/Km²),Land Area (Km²),Migrants (net),Fert. Rate,Med. Age,Urban Pop %,World Share
0,1,China,1439323776,0.39 %,5540090,153,9388211,-348399.0,1.7,38,61 %,18.47 %
1,2,India,1380004385,0.99 %,13586631,464,2973190,-532687.0,2.2,28,35 %,17.70 %
2,3,United States,331002651,0.59 %,1937734,36,9147420,954806.0,1.8,38,83 %,4.25 %
3,4,Indonesia,273523615,1.07 %,2898047,151,1811570,-98955.0,2.3,30,56 %,3.51 %
4,5,Pakistan,220892340,2.00 %,4327022,287,770880,-233379.0,3.6,23,35 %,2.83 %


In [3]:
df_covid.head()

Unnamed: 0,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/ 1M pop
0,World,3300058,81875,233639.0,5610,1037905.0,2028514,50958.0,423.0,30.0,,
1,USA,1092492,28298,63763.0,2108,151774.0,876955,15226.0,3301.0,193.0,6332995.0,19133.0
2,Spain,239639,2740,24543.0,268,137984.0,77112,2676.0,5125.0,525.0,1455306.0,31126.0
3,Italy,205463,1872,27967.0,285,75945.0,101551,1694.0,3398.0,463.0,1979217.0,32735.0
4,UK,171253,6032,26771.0,674,,144138,1559.0,2523.0,394.0,901905.0,13286.0


In [4]:
df_healthcare.head()

Unnamed: 0,name,pop2020
0,Albania,2877.797
1,Algeria,43851.044
2,Andorra,77.265
3,Antigua and Barbuda,97.929
4,Argentina,45195.774


### 1.2. Creating a new dataframe

I will use a new DataFrame based in the Population one. I will call it **df_raw**

In [5]:
#I get the columns that I want to use from df_popu

df_raw = pd.DataFrame()
df_raw['Country'] = df_popu['Country (or dependency)']
df_raw['Population'] = df_popu['Population (2020)']
df_raw['People_per_sqKm'] = df_popu['Density (P/Km²)']
df_raw['Migrants'] = df_popu['Migrants (net)']
df_raw['Avg_Age'] = df_popu['Med. Age']
df_raw['Urban_Pop'] = df_popu['Urban Pop %']

df_raw.head()

Unnamed: 0,Country,Population,People_per_sqKm,Migrants,Avg_Age,Urban_Pop
0,China,1439323776,153,-348399.0,38,61 %
1,India,1380004385,464,-532687.0,28,35 %
2,United States,331002651,36,954806.0,38,83 %
3,Indonesia,273523615,151,-98955.0,30,56 %
4,Pakistan,220892340,287,-233379.0,23,35 %


### 1.3. Joining Dataframes

I am going to **join** df_raw with the Covid-19 data and Healthcare data.
All have the countries in common, although the spelling is not the same. I have created two **dictionaries** to make a match for all the countries.

In [6]:

covid_dic = {'USA':'United States', 'UK':'United Kingdom', 'S. Korea':'South Korea', 'UAE':'United Arab Emirates',
               'Czechia':'Czech Republic (Czechia)', 'Ivory Coast':"Côte d'Ivoire", 'DRC':'DR Congo',
               'Palestine':'State of Palestine', 'CAR':'Central African Republic', 'Saint Kitts and Nevis':'Saint Kitts & Nevis',
               'St. Vincent Grenadines':'St. Vincent & Grenadines', 'Vatican City':'Holy See', 'St. Barth':'Saint Barthelemy',
               'Sao Tome and Principe':'Sao Tome & Principe', 'Saint Pierre Miquelon':'Saint Pierre & Miquelon'}

df_covid.replace({'Country,Other': covid_dic},  inplace = True)


healthcare_dic = {'Czech Republic':'Czech Republic (Czechia)',
           'Macau':'Macao',
           'Saint Vincent and the Grenadines':'St. Vincent & Grenadines'}

df_healthcare.replace({'name': healthcare_dic},  inplace = True)


In [7]:
df_healthcare.head()


Unnamed: 0,name,pop2020
0,Albania,2877.797
1,Algeria,43851.044
2,Andorra,77.265
3,Antigua and Barbuda,97.929
4,Argentina,45195.774


In [8]:
df_covid.head()

Unnamed: 0,"Country,Other",TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/ 1M pop
0,World,3300058,81875,233639.0,5610,1037905.0,2028514,50958.0,423.0,30.0,,
1,United States,1092492,28298,63763.0,2108,151774.0,876955,15226.0,3301.0,193.0,6332995.0,19133.0
2,Spain,239639,2740,24543.0,268,137984.0,77112,2676.0,5125.0,525.0,1455306.0,31126.0
3,Italy,205463,1872,27967.0,285,75945.0,101551,1694.0,3398.0,463.0,1979217.0,32735.0
4,United Kingdom,171253,6032,26771.0,674,,144138,1559.0,2523.0,394.0,901905.0,13286.0


In [9]:
#I remove the columns from df_covid that I am not interested in 
df_covid = df_covid.drop(['NewCases','NewDeaths','ActiveCases','Serious,Critical',df_covid.columns[8],'Tests/ 1M pop','Deaths/1M pop'],axis=1)
df_healthcare['Free_healthcare'] = 1
df_healthcare.drop(['pop2020'], axis=1, inplace=True)

In [10]:
df_covid.head()

Unnamed: 0,"Country,Other",TotalCases,TotalDeaths,TotalRecovered,TotalTests
0,World,3300058,233639.0,1037905.0,
1,United States,1092492,63763.0,151774.0,6332995.0
2,Spain,239639,24543.0,137984.0,1455306.0
3,Italy,205463,27967.0,75945.0,1979217.0
4,United Kingdom,171253,26771.0,,901905.0


In [11]:
df_healthcare.head()

Unnamed: 0,name,Free healthcare
0,Albania,1
1,Algeria,1
2,Andorra,1
3,Antigua and Barbuda,1
4,Argentina,1


#### Finally the Join

In [None]:
#I join both tables using as index the country
df_raw = df_raw.join(df_covid.set_index('Country,Other'), on='Country')
df_raw.head()

In [None]:
#I have called dtypes to check where I have non-numeric columns to change its type.
df_raw.dtypes

In [None]:
#lets start for Urban POP, I change % values to floats, 100% is 1 0% is 0
indexes = df_raw.index.values.tolist()
for index in indexes:
    per = df_raw.loc[index,'Urban Pop %']
    if per == 'N.A.':
        df_raw.loc[index,'Urban Pop %'] = np.nan
    else:
        if type(per) is str:
            if per == '0 %':
                df_raw.loc[index,'Urban Pop %'] = 0
            else:
                df_raw.loc[index,'Urban Pop %'] = int(per[0:2])/100

df_raw = df_raw.astype({'Urban Pop %': 'float64'})
#Now I change the countries name to a number and replace the name with the number in the df
countries_list = df_raw['Country (or dependency)'].unique()
countries_list.sort()
countries_dic = dict()
k=0
for country in countries_list:
    countries_dic[country] = k
    k +=1
dic_reverse = {}
for key,values in countries_dic.items():
    dic_reverse[values] = key

df_raw.replace({'Country (or dependency)': countries_dic},  inplace = True)
#the last one is the Med Age. I change to a numeric, using coerce (imput NAN to errors)
df_raw['Med. Age'] = pd.to_numeric(df_raw['Med. Age'],errors='coerce')
df_raw.dtypes

In [None]:
#I am going to add a column of countries with free healthcare, migth be usefull


In [None]:
df_raw = df_raw.join(df_healthcare.set_index('name'), on='Country (or dependency)')
df_raw['Free healthcare'].fillna(0, inplace=True)

#now I am ready with my df, everything is a number

df_raw.dtypes

In [None]:
df_raw = df_raw[['Country (or dependency)','Population (2020)','Density (P/Km²)','Migrants (net)','Med. Age',
                 'Urban Pop %','Free healthcare','TotalCases','TotalDeaths','TotalRecovered','TotalTests']]
df_raw.head()

In [None]:
equis = df_raw['Migrants (net)']
ygriega = df_raw['TotalRecovered']
#mymodel = np.poly1d(np.polyfit(equis, ygriega, 3))

#myline = np.linspace(0, 1, 100)
plt.scatter(equis, ygriega)
#plt.plot(myline, mymodel(myline))
plt.show()

In [None]:
#3 possible evaluations, TotalCases, TotalDeaths and TotalRecovered
casos = ['TotalCases','TotalDeaths','TotalRecovered']
for caso in casos:
    X, y = sfu.edit_df(df_raw,caso)
    r2_scores_test, r2_scores_train, lm_model, X_train, X_test, y_train, y_test = sfu.get_stat_data(X,y)
    print("{} case: The rsquared on the training data was {}.  The rsquared on the test data was {}.".format(caso,r2_scores_train,r2_scores_test))
    

In [None]:
casos = ['TotalCases','TotalDeaths','TotalRecovered']
for caso in casos:
    X, y = sfu.edit_df(df_raw,caso)
    lm_model = LinearRegression(normalize=True) # Instantiate
    lm_model.fit(X, y)
    y_preds = lm_model.predict(X)
    rsquared_score = r2_score(y, y_preds)
    print("{} case: The rsquared on data was {}.".format(caso,rsquared_score))
    
    '''- if  R-squared value < 0.3 this value is generally considered a None or Very weak effect size,
- if R-squared value 0.3 < r < 0.5 this value is generally considered a weak or low effect size,
- if R-squared value 0.5 < r < 0.7 this value is generally considered a Moderate effect size,
- if R-squared value r > 0.7 this value is generally considered strong effect size,'''

In [None]:
X, y = sfu.edit_df(df_raw, 'TotalRecovered')
r2_scores_test, r2_scores_train, lm_model, X_train, X_test, y_train, y_test = sfu.get_stat_data(X,y)

coef_df = sfu.coef_weights(lm_model.coef_, X_train)
coef_df

In [None]:
X, y = sfu.edit_df(df_raw, 'TotalCases')
r2_scores_test, r2_scores_train, lm_model, X_train, X_test, y_train, y_test = sfu.get_stat_data(X,y)

coef_df = sfu.coef_weights(lm_model.coef_, X_train)
coef_df

In [None]:
X, y = sfu.edit_df(df_raw, 'TotalDeaths')
r2_scores_test, r2_scores_train, lm_model, X_train, X_test, y_train, y_test = sfu.get_stat_data(X,y)

coef_df = sfu.coef_weights(lm_model.coef_, X_train)
coef_df

In [None]:
df_raw.hist();

In [None]:
df_corr = df_raw.corr()
df_corr = df_corr[['Density (P/Km²)','Med. Age', 'Urban Pop %','Migrants (net)', 'Free healthcare']]
df_corre = df_corr.drop(['Country (or dependency)','Population (2020)','Density (P/Km²)','Migrants (net)','Med. Age','Urban Pop %','Urban Pop %', 'Free healthcare'],axis=0)
df_corre

In [None]:
sns.heatmap(df_corre, annot= True, fmt='.2f');
#plt.savefig('heatmap.png')

In [None]:
sns.heatmap(df_raw.corr(), annot= True, fmt='.2f');


In [None]:

dic_fh = {1: 'Yes', 0:'No'}


In [None]:
df_c = pd.DataFrame()

In [None]:
df_c['Free Healthcare'] = df_raw['Free healthcare']
df_c.replace({'Free Healthcare':dic_fh}, inplace = True)
df_c['Population (2020)'] = df_raw['Population (2020)']
group_dfc2 = df_c.groupby('Free Healthcare').agg({'count','sum'})
df_c['TotalCases'] = df_raw['TotalCases']
df_c['TotalDeaths'] = df_raw['TotalDeaths']
df_c['TotalRecovered'] = df_raw['TotalRecovered']
df_c['TotalTests'] = df_raw['TotalTests']
group_dfc = df_c.groupby('Free Healthcare').sum()
group_dfc['Tests per 1M P']= (group_dfc['TotalTests']*1e6)/group_dfc['Population (2020)']
group_dfc['Cases per 1M P']= (group_dfc['TotalCases']*1e6)/group_dfc['Population (2020)']
group_dfc['Deaths per 100 Cases'] = (group_dfc['TotalDeaths']*100)/group_dfc['TotalCases']
group_dfc['Recovered per 100 Cases'] = (group_dfc['TotalRecovered']*100)/group_dfc['TotalCases']
group_dfc['Deaths per 1M P'] = (group_dfc['TotalDeaths']*1e6)/group_dfc['Population (2020)']
group_dfc['Recovered per 1M P'] = (group_dfc['TotalRecovered']*1e6)/group_dfc['Population (2020)']
rate_expected = group_dfc.loc['No','Cases per 1M P']/group_dfc.loc['Yes','Cases per 1M P']
death_rate = group_dfc.loc['No','Deaths per 100 Cases']/group_dfc.loc['Yes','Deaths per 100 Cases']
expected_deaths = group_dfc.loc['Yes','Deaths per 100 Cases']*rate_expected
expected_recovered =  group_dfc.loc['Yes','Recovered per 100 Cases']*death_rate
"We might expect {} deaths per each 100 cases in countries with no free Healthcare, however we see 5.55 per 100 Cases. Also, we might expect {} recovered cases per each 100 using the previous rate, but again we find  15.3 recovered cases only per each 100.".format(expected_deaths,expected_recovered)

In [None]:


df_new = pd.DataFrame()
df_new['Tests per 1M P']=group_dfc['Tests per 1M P'].apply(lambda x: "{0:.2f}".format(x))
df_new['Cases per 1M P']=group_dfc['Cases per 1M P'].apply(lambda x: "{0:.2f}".format(x))
df_new['Deaths per 1M P']=group_dfc['Deaths per 1M P'].apply(lambda x: "{0:.2f}".format(x))
df_new['Case Recovery rate']=group_dfc['Recovered per 100 Cases'].apply(lambda x: "{0:.2f}%".format(x))
df_new['Case fatality rate']=group_dfc['Deaths per 100 Cases'].apply(lambda x: "{0:.2f}%".format(x))
df_new



In [None]:

index = ['Non-Free Healthcare','Free Healthcare']
df_p = pd.DataFrame(index=index)
df_p['Countries'] = [group_dfc2[('Population (2020)', 'count')][0],group_dfc2[('Population (2020)', 'count')][1]]
df_p['Population'] = [group_dfc2[('Population (2020)', 'sum')][0],group_dfc2[('Population (2020)', 'sum')][1]]

fig = plt.figure(figsize=(6, 4)) # Create matplotlib figure


ax = fig.add_subplot() # Create matplotlib axes
ax = df_p['Countries'].plot(kind='bar', color='#dca07b', use_index=True)
ax2 = ax.twinx() # Create another axes that shares the same x-axis as ax.
ax2.plot(ax.get_xticks(),
         df_p['Population'].values,
         linestyle=' ',
         marker='o', color='#c5ed5a')
ax.set_ylabel('Number of countries')
ax2.set_ylabel('Billions of People')

ax.set_ylim((0, 160))



#ax2.xaxis.set_major_formatter(billi)



In [None]:

df_p['Population'] = df_p['Population'].apply(lambda x: "{0:.2f}B".format(x * 1e-9))


In [None]:
df_p


In [None]:
df_raw['Med. Age'].fillna(round((df_raw['Med. Age'].mean()),0), inplace=True)




In [None]:
df_raw['rank age'] = df_raw['Med. Age'].apply(sfu.age_rank)
df_raw

In [None]:
df_cha = pd.DataFrame()

df_cha['Age Rank']=df_raw['rank age'] 
df_cha['TotalCases']=df_raw['TotalCases']
df_cha['Population (2020)']=df_raw['Population (2020)']



df_g= df_cha.groupby(['Age Rank']).agg(['sum','count'])
df_g



In [None]:
df_g['Number of Countries'] = df_g[('TotalCases','count')]
df_g['Cases per 1M P'] = (df_g[('TotalCases','sum')])*1e6/df_g[('Population (2020)','sum')]
df_g = df_g.drop([('TotalCases','sum'),('TotalCases','count'),('Population (2020)','sum'),('Population (2020)','count')],axis=1)

In [None]:

df_g=df_g.iloc[[4,2,1,0,3],[0,1]]
df_g

In [None]:
ax = df_g['Cases per 1M P'].plot.bar(rot=0,color='#3086c1')
ax2 = ax.twinx()

ax2.plot(ax.get_xticks(),
         df_g['Number of Countries'].values,
         linestyle=' ',
         marker='o', color='#c5ed5a')

ax.set_ylabel('Cases per 1M people')
ax2.set_ylabel('Number of countries')



In [None]:
# set the filepath and load
fp = 'World_Countries.shp'
#reading the file stored in variable fp
map_df = gpd.read_file(fp)
# check data type so we can see that this is not a normal dataframe, but a GEOdataframe
map_df.head()

In [None]:
map_df.plot()

In [None]:
df = gpd.read_file(fp)

In [None]:
df.head()

In [None]:
data_for_map = df.rename(index=str, columns={'COUNTRY': 'COUNTRY', 'geometry': 'geometry'})

In [None]:
df_raw['Country'] = df_raw['Country (or dependency)']
df_raw.replace({'Country': dic_reverse},  inplace = True)


In [None]:
arr_dict = {'Afghanistan':'Afghanistan',
'Albania':'Albania',
'Algeria':'Algeria',
'American Samoa':'American Samoa (US)',
'Andorra':'Andorra',
'Angola':'Angola',
'Anguilla':'Anguilla (UK)',
'Antigua and Barbuda':'Antigua and Barbuda',
'Argentina':'Argentina',
'Armenia':'Armenia',
'Aruba':'Aruba (Netherlands)',
'Australia':'Australia',
'Austria':'Austria',
'Azerbaijan':'Azerbaijan',
'Bahamas':'Bahamas',
'Bahrain':'Bahrain',
'Bangladesh':'Bangladesh',
'Barbados':'Barbados',
'Belarus':'Belarus',
'Belgium':'Belgium',
'Belize':'Belize',
'Benin':'Benin',
'Bermuda':'Bermuda (UK)',
'Bhutan':'Bhutan',
'Bolivia':'Bolivia',
'Bosnia and Herzegovina':'Bosnia and Herzegovina',
'Botswana':'Botswana',
'Brazil':'Brazil',
'British Virgin Islands':'British Virgin Islands(UK)',
'Brunei':'Brunei',
'Bulgaria':'Bulgaria',
'Burkina Faso':'Burkina Faso',
'Burundi':'Burundi',
'Cabo Verde':'Cape Verde',
'Cambodia':'Cambodia',
'Cameroon':'Cameroon',
'Canada':'Canada',
'Caribbean Netherlands':'Netherlands',
'Cayman Islands':'Cayman Islands (UK)',
'Central African Republic':'Central African Republic',
'Chad':'Chad',
'Channel Islands':'Channel Islands',
'Chile':'Chile',
'China':'China',
'Colombia':'Colombia',
'Comoros':'Comoros',
'Congo':'Congo',
'Cook Islands':'Cook Islands (New Zealand)',
'Costa Rica':'Costa Rica',
"Côte d'Ivoire":'Ivory Coast',
'Croatia':'Croatia',
'Cuba':'Cuba',
'Curaçao':'Curacao (Netherlands)',
'Cyprus':'Cyprus',
'Czech Republic (Czechia)':'Czech Republic',
'Denmark':'Denmark',
'Djibouti':'Djibouti',
'Dominica':'Dominica',
'Dominican Republic':'Dominican Republic',
'DR Congo':'Democratic Republic of the Congo',
'Ecuador':'Ecuador',
'Egypt':'Egypt',
'El Salvador':'El Salvador',
'Equatorial Guinea':'Equatorial Guinea',
'Eritrea':'Eritrea',
'Estonia':'Estonia',
'Eswatini':'Swaziland',
'Ethiopia':'Ethiopia',
'Faeroe Islands':'Faroe Islands (Denmark)',
'Falkland Islands':'Falkland Islands (UK)',
'Fiji':'Fiji',
'Finland':'Finland',
'France':'France',
'French Guiana':'French Guiana (France)',
'French Polynesia':'French Polynesia (France)',
'Gabon':'Gabon',
'Gambia':'Gambia',
'Georgia':'Georgia',
'Germany':'Germany',
'Ghana':'Ghana',
'Gibraltar':'Gibraltar (UK)',
'Greece':'Greece',
'Greenland':'Greenland (Denmark)',
'Grenada':'Grenada',
'Guadeloupe':'Guadeloupe (France)',
'Guam':'Guam (US)',
'Guatemala':'Guatemala',
'Guinea':'Guinea',
'Guinea-Bissau':'Guinea-Bissau',
'Guyana':'Guyana',
'Haiti':'Haiti',
'Holy See':'Holy See',
'Honduras':'Honduras',
'Hong Kong':'Hong Kong',
'Hungary':'Hungary',
'Iceland':'Iceland',
'India':'India',
'Indonesia':'Indonesia',
'Iran':'Iran',
'Iraq':'Iraq',
'Ireland':'Ireland',
'Isle of Man':'Isle of Man (UK)',
'Israel':'Israel',
'Italy':'Italy',
'Jamaica':'Jamaica',
'Japan':'Japan',
'Jordan':'Jordan',
'Kazakhstan':'Kazakhstan',
'Kenya':'Kenya',
'Kiribati':'Kiribati',
'Kuwait':'Kuwait',
'Kyrgyzstan':'Kyrgyzstan',
'Laos':'Laos',
'Latvia':'Latvia',
'Lebanon':'Lebanon',
'Lesotho':'Lesotho',
'Liberia':'Liberia',
'Libya':'Libya',
'Liechtenstein':'Liechtenstein',
'Lithuania':'Lithuania',
'Luxembourg':'Luxembourg',
'Macao':'Macao',
'Madagascar':'Madagascar',
'Malawi':'Malawi',
'Malaysia':'Malaysia',
'Maldives':'Maldives',
'Mali':'Mali',
'Malta':'Malta',
'Marshall Islands':'Marshall Islands',
'Martinique':'Martinique (France)',
'Mauritania':'Mauritania',
'Mauritius':'Mauritius',
'Mayotte':'Mayotte (France)',
'Mexico':'Mexico',
'Micronesia':'Federated States of Micronesia',
'Moldova':'Moldova',
'Monaco':'Monaco',
'Mongolia':'Mongolia',
'Montenegro':'Montenegro',
'Montserrat':'Montserrat',
'Morocco':'Morocco',
'Mozambique':'Mozambique',
'Myanmar':'Myanmar',
'Namibia':'Namibia',
'Nauru':'Nauru',
'Nepal':'Nepal',
'Netherlands':'Netherlands',
'New Caledonia':'New Caledonia (France)',
'New Zealand':'New Zealand',
'Nicaragua':'Nicaragua',
'Niger':'Niger',
'Nigeria':'Nigeria',
'Niue':'Niue (New Zealand)',
'North Korea':'North Korea',
'North Macedonia':'Macedonia',
'Northern Mariana Islands':'Northern Mariana Islands (US)',
'Norway':'Norway',
'Oman':'Oman',
'Pakistan':'Pakistan',
'Palau':'Palau (US)',
'Panama':'Panama',
'Papua New Guinea':'Papua New Guinea',
'Paraguay':'Paraguay',
'Peru':'Peru',
'Philippines':'Philippines',
'Poland':'Poland',
'Portugal':'Portugal',
'Puerto Rico':'Puerto Rico (US)',
'Qatar':'Qatar',
'Réunion':'Reunion (France)',
'Romania':'Romania',
'Russia':'Russia',
'Rwanda':'Rwanda',
'Saint Barthelemy':'Saint Barthelemy',
'Saint Helena':'St. Helena (UK)',
'Saint Kitts & Nevis':'St. Kitts and Nevis',
'Saint Lucia':'St. Lucia',
'Saint Martin':'Saint Martin',
'Saint Pierre & Miquelon':'St. Pierre and Miquelon (France)',
'Samoa':'Western Samoa',
'San Marino':'San Marino',
'Sao Tome & Principe':'Sao Tome and Principe',
'Saudi Arabia':'Saudi Arabia',
'Senegal':'Senegal',
'Serbia':'Serbia',
'Seychelles':'Seychelles',
'Sierra Leone':'Sierra Leone',
'Singapore':'Singapore',
'Sint Maarten':'Sint Maarten',
'Slovakia':'Slovakia',
'Slovenia':'Slovenia',
'Solomon Islands':'Solomon Islands',
'Somalia':'Somalia',
'South Africa':'South Africa',
'South Korea':'South Korea',
'South Sudan':'South Sudan',
'Spain':'Spain',
'Sri Lanka':'Sri Lanka',
'St. Vincent & Grenadines':'St. Vincent and the Grenadines',
'State of Palestine':'Palestine',
'Sudan':'Sudan',
'Suriname':'Suriname',
'Sweden':'Sweden',
'Switzerland':'Switzerland',
'Syria':'Syria',
'Taiwan':'Taiwan',
'Tajikistan':'Tajikistan',
'Tanzania':'Tanzania',
'Thailand':'Thailand',
'Timor-Leste':'East Timor',
'Togo':'Togo',
'Tokelau':'Tokelau (New Zealand)',
'Tonga':'Tonga',
'Trinidad and Tobago':'Trinidad and Tobago',
'Tunisia':'Tunisia',
'Turkey':'Turkey',
'Turkmenistan':'Turkmenistan',
'Turks and Caicos':'Turks and Caicos Islands (UK)',
'Tuvalu':'Tuvalu',
'U.S. Virgin Islands':'American Virgin Islands (US)',
'Uganda':'Uganda',
'Ukraine':'Ukraine',
'United Arab Emirates':'United Arab Emirates',
'United Kingdom':'United Kingdom',
'United States':'United States',
'Uruguay':'Uruguay',
'Uzbekistan':'Uzbekistan',
'Vanuatu':'Vanuatu',
'Venezuela':'Venezuela',
'Vietnam':'Vietnam',
'Wallis & Futuna':'Wallis and Futuna (France)',
'Western Sahara':'Western Sahara',
'Yemen':'Yemen',
'Zambia':'Zambia',
'Zimbabwe':'Zimbabwe'}

In [None]:
df_raw.replace({'Country': arr_dict},  inplace = True)

In [None]:
df_raw.head()

In [None]:
dfh = pd.DataFrame()
dfh['Country'] = df_raw['Country']
dfh['Free healthcare'] = df_raw['Free healthcare']
dfh['TotalCases'] = df_raw['TotalCases']
dfh['Population (2020)'] = df_raw['Population (2020)']
dfh['TotalDeaths'] = df_raw['TotalDeaths']
dfh['TotalDeaths'].fillna(0, inplace=True)
dfh['Cases per 1M P']=(dfh['TotalCases']*1e6)/dfh['Population (2020)']
dfh['Deaths per 100 cases'] = (dfh['TotalDeaths']*100)/dfh['TotalCases']
dfh['Deaths per 1M P'] =(dfh['TotalDeaths']*1e6)/dfh['Population (2020)']
dfh['Deaths per 1M P'].fillna(0, inplace=True)
dfh['TotalCases'].fillna(0, inplace=True)
dfh['Deaths per 100 cases'].fillna(0, inplace=True)
dfh['Cases per 1M P'].fillna(0, inplace=True)
dfh.isnull().mean()

In [None]:
merged = map_df.set_index('COUNTRY').join(dfh.set_index('Country'))

In [None]:
merged.head()

In [None]:
variable = 'Deaths per 1M P'# set the range for the choropleth
vmin, vmax = 120, 220# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(10, 6))
merged.plot(column=variable, cmap='BuGn', linewidth=0.8, ax=ax, edgecolor='0.8')

In [None]:
ax.axis('off')# add a title
ax.set_title('Deaths per 1M P', fontdict={'fontsize': '25', 'fontweight' : '3'})# create an annotation for the data source
ax.annotate('Source: worldometers.info',xy=(0.1, .08), xycoords='figure fraction', horizontalalignment='left', verticalalignment='top', fontsize=12, color='#555555')

In [None]:
sm = plt.cm.ScalarMappable(cmap='BuGn', norm=plt.Normalize(vmin=vmin, vmax=vmax))# empty array for the data range
sm._A = []# add the colorbar to the figure
#cbar = fig.colorbar(sm)
#saving our map as .png file.
fig.savefig('map_export_{}.png'.format('Deaths per 1M P'), dpi=300)

In [None]:
variable = 'Cases per 1M P'# set the range for the choropleth
vmin, vmax = 120, 220# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(10, 6))
merged.plot(column=variable, cmap='BuGn', linewidth=0.8, ax=ax, edgecolor='0.8')

In [None]:
ax.axis('off')# add a title
ax.set_title(variable, fontdict={'fontsize': '25', 'fontweight' : '3'})# create an annotation for the data source
ax.annotate('Source: worldometers.info',xy=(0.1, .08), xycoords='figure fraction', horizontalalignment='left', verticalalignment='top', fontsize=12, color='#555555')

In [None]:
sm = plt.cm.ScalarMappable(cmap='BuGn', norm=plt.Normalize(vmin=vmin, vmax=vmax))# empty array for the data range
sm._A = []# add the colorbar to the figure
#cbar = fig.colorbar(sm)
#saving our map as .png file.
fig.savefig('map_export_{}.png'.format(variable), dpi=300)

In [None]:
variable = 'Free healthcare'# set the range for the choropleth
vmin, vmax = 120, 220# create figure and axes for Matplotlib
fig, ax = plt.subplots(1, figsize=(10, 6))
merged.plot(column=variable, cmap='BuGn', linewidth=0.8, ax=ax, edgecolor='0.8')


In [None]:
ax.axis('off')# add a title
ax.set_title('Countries with ' + variable, fontdict={'fontsize': '15', 'fontweight' : '3'})# create an annotation for the data source
ax.annotate('Source: https://worldpopulationreview.com/',xy=(0.1, .08), xycoords='figure fraction', horizontalalignment='left', verticalalignment='top', fontsize=12, color='#555555')

In [None]:
sm = plt.cm.ScalarMappable(cmap='BuGn', norm=plt.Normalize(vmin=vmin, vmax=vmax))# empty array for the data range
sm._A = []# add the colorbar to the figure
#cbar = fig.colorbar(sm)
#saving our map as .png file.
fig.savefig('map_export_{}.png'.format(variable), dpi=300)