<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#The-analysis" data-toc-modified-id="The-analysis-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>The analysis</a></span></li><li><span><a href="#Load-libraries" data-toc-modified-id="Load-libraries-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load libraries</a></span></li><li><span><a href="#Import-datasets" data-toc-modified-id="Import-datasets-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Import datasets</a></span><ul class="toc-item"><li><span><a href="#Worldometer-data-and-WHO-data-for-filling-regions" data-toc-modified-id="Worldometer-data-and-WHO-data-for-filling-regions-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Worldometer data and WHO data for filling regions</a></span></li><li><span><a href="#Happiness-report-datasets" data-toc-modified-id="Happiness-report-datasets-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Happiness report datasets</a></span></li></ul></li><li><span><a href="#Color-palettes" data-toc-modified-id="Color-palettes-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Color palettes</a></span><ul class="toc-item"><li><span><a href="#Regions" data-toc-modified-id="Regions-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Regions</a></span></li><li><span><a href="#Variable-group" data-toc-modified-id="Variable-group-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Variable group</a></span></li></ul></li><li><span><a href="#Analyse-correlation" data-toc-modified-id="Analyse-correlation-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Analyse correlation</a></span></li><li><span><a href="#Analyse-the-score-influence-on-the-pandemic-progress" data-toc-modified-id="Analyse-the-score-influence-on-the-pandemic-progress-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Analyse the score influence on the pandemic progress</a></span></li><li><span><a href="#Focus-on-significant-correlations" data-toc-modified-id="Focus-on-significant-correlations-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Focus on significant correlations</a></span></li></ul></div>

# The analysis 

# Load libraries

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
#import colours as c
import re

# Import datasets

## Worldometer data and WHO data for filling regions

In [16]:
file = open('datasets/worldometer_data.csv')
worldometer = pd.read_csv('datasets/worldometer_data.csv')
file.close()
worldometer.fillna(0, inplace=True)
worldometer.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region
0,USA,North America,331510400.0,7667817,30905.0,214884.0,273.0,4874445.0,25407.0,2578488.0,14200.0,23130.0,648.0,112261589.0,338637.0,Americas
1,India,Asia,1383530000.0,6682073,59893.0,103600.0,886.0,5659110.0,75657.0,919363.0,8944.0,4830.0,75.0,79982394.0,57810.0,South-EastAsia
2,Brazil,South America,212953900.0,4918022,2733.0,146417.0,42.0,4263208.0,0.0,508397.0,8318.0,23094.0,688.0,17900000.0,84056.0,Americas
3,Russia,Europe,145951000.0,1225889,10888.0,21475.0,117.0,982324.0,3181.0,222090.0,2300.0,8399.0,147.0,48042343.0,329168.0,Europe
4,Colombia,South America,51023640.0,855052,0.0,26712.0,0.0,761674.0,0.0,66666.0,2220.0,16758.0,524.0,3894289.0,76323.0,Americas


In [17]:
china=["China", "Asia",
       1439323776, 85470, 20, 4634, 0, 80628, 7, 208, 2, 59, 3, 160000000, 111163, 'Western Pacific']
worldometer.loc[len(worldometer)] = china

In [18]:
l = []
for i in worldometer["WHO Region"]:
    if i != 0:
        l.append(re.sub(r"(\w)([A-Z])", r"\1 \2", i))
    else:
        l.append(i)
worldometer["WHO Region"] = l

In [19]:
w_regions = worldometer[worldometer["WHO Region"]!=0][["WHO Region", "Country/Region"]]

In [20]:
file = open('datasets/full_grouped.csv')
full_grouped = pd.read_csv('datasets/full_grouped.csv')
file.close()
full_grouped.fillna(0, inplace=True)
full_grouped.head()

Unnamed: 0,Date,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,WHO Region
0,2020-01-22,Afghanistan,0,0.0,0,0.0,0,0,0,Eastern Mediterranean
1,2020-01-22,Albania,0,0.0,0,0.0,0,0,0,Europe
2,2020-01-22,Algeria,0,0.0,0,0.0,0,0,0,Africa
3,2020-01-22,Andorra,0,0.0,0,0.0,0,0,0,Europe
4,2020-01-22,Angola,0,0.0,0,0.0,0,0,0,Africa


In [21]:
full_grouped.replace(to_replace='US', value='USA', regex=True, inplace=True)

I make a merged dataframe with consecutive WHO Regions - this way I fill wordometer data.

In [22]:
new_df = pd.concat([full_grouped[["Country/Region", "WHO Region"]], w_regions]).drop_duplicates().reset_index(drop=True)
new_df.head()

Unnamed: 0,Country/Region,WHO Region
0,Afghanistan,Eastern Mediterranean
1,Albania,Europe
2,Algeria,Africa
3,Andorra,Europe
4,Angola,Africa


In [23]:
worldometer = worldometer.drop("WHO Region", 1)
worldometer = pd.merge(worldometer, new_df, on='Country/Region')
worldometer["WHO Region"].unique()

array(['Americas', 'South-East Asia', 'Europe', 'Africa',
       'Eastern Mediterranean', 'Western Pacific'], dtype=object)

In [24]:
worldometer["Recovered/1M pop"] = worldometer[["TotalRecovered"]].div(worldometer["TotalCases"], axis=0)*10**6

In [25]:
worldometer.shape

(183, 17)

In [26]:
worldometer.describe()

Unnamed: 0,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,Recovered/1M pop
count,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0
mean,42445450.0,194483.0,1103.84153,5705.786885,16.131148,143504.8,885.04918,36987.16,364.256831,5484.852459,129.575301,3685808.0,132335.7,758710.84664
std,152172200.0,840819.0,5149.942004,22098.068329,73.2074,640557.7,5934.60739,208587.3,1468.424926,7658.500215,209.713583,16047060.0,246020.6,222380.812223
min,33950.0,14.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0
25%,2759925.0,3132.5,0.0,54.0,0.0,1660.5,0.0,270.0,0.0,522.5,8.0,54791.0,11083.5,656473.910246
50%,9921338.0,14527.0,28.0,261.0,0.0,8308.0,2.0,1981.0,5.0,2698.0,40.0,296611.0,48693.0,826229.730187
75%,31356960.0,84583.0,445.5,1532.5,7.0,53200.5,273.0,10954.0,106.5,7684.0,142.5,1529686.0,148580.5,922938.720896
max,1439324000.0,7667817.0,59893.0,214884.0,886.0,5659110.0,75657.0,2578488.0,14200.0,45121.0,1237.0,160000000.0,1778274.0,1000000.0


In [27]:
dif = worldometer[worldometer["TotalCases"] != worldometer["TotalRecovered"]+worldometer["TotalDeaths"]+worldometer["ActiveCases"]]

In [None]:
dif["difference"] = dif["TotalCases"] - (dif["TotalRecovered"]+dif["TotalDeaths"]+dif["ActiveCases"])

In [29]:
dif

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region,Recovered/1M pop,difference
5,Spain,Europe,46759558.0,852838,2099.0,32225.0,46.0,0.0,0.0,0.0,1580.0,18239.0,689.0,13689776.0,292770.0,Europe,0.0,820613.0
11,UK,Europe,67979397.0,515571,12594.0,42369.0,19.0,0.0,0.0,0.0,368.0,7584.0,623.0,25865851.0,380495.0,Europe,0.0,473202.0
27,Netherlands,Europe,17144850.0,140471,4579.0,6461.0,7.0,0.0,0.0,0.0,177.0,8193.0,377.0,2603166.0,151834.0,Europe,0.0,134010.0
41,Sweden,Europe,10115730.0,94283,0.0,5895.0,0.0,0.0,0.0,0.0,20.0,9320.0,583.0,1661484.0,164248.0,Europe,0.0,88388.0


In [30]:
dif["difference"].sum()

1516213.0

## Happiness report datasets

In [31]:
differences_w = ['Trinidad and Tobago','UK','Congo','USA','Czechia','UAE','S. Korea','Palestine',]
differences_d = ['Trinidad & Tobago','United Kingdom','Congo (Brazzaville)','United States','Czech Republic',
                 'United Arab Emirates','South Korea','Palestinian Territories',]

In [33]:
file = open('datasets/2019.csv')
d19 = pd.read_csv('datasets/2019.csv')
file.close()
d19.columns = ["rank", "Country/Region", "Score", "GDP", "Family", "Life expectancy", "Freedom", "Generosity", "Trust"]
d19.drop(columns=["rank"], inplace=True)
d19.replace(to_replace=differences_d, value=differences_w, regex=True, inplace=True)
d19.head()

Unnamed: 0,Country/Region,Score,GDP,Family,Life expectancy,Freedom,Generosity,Trust
0,Finland,7.769,1.34,1.587,0.986,0.596,0.153,0.393
1,Denmark,7.6,1.383,1.573,0.996,0.592,0.252,0.41
2,Norway,7.554,1.488,1.582,1.028,0.603,0.271,0.341
3,Iceland,7.494,1.38,1.624,1.026,0.591,0.354,0.118
4,Netherlands,7.488,1.396,1.522,0.999,0.557,0.322,0.298


In [34]:
df = pd.merge(worldometer, d19, on='Country/Region')

In [35]:
df

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,...,Tests/1M pop,WHO Region,Recovered/1M pop,Score,GDP,Family,Life expectancy,Freedom,Generosity,Trust
0,USA,North America,3.315104e+08,7667817,30905.0,214884.0,273.0,4874445.0,25407.0,2578488.0,...,338637.0,Americas,635701.791005,6.892,1.433,1.457,0.874,0.454,0.280,0.128
1,India,Asia,1.383530e+09,6682073,59893.0,103600.0,886.0,5659110.0,75657.0,919363.0,...,57810.0,South-East Asia,846909.334873,4.015,0.755,0.765,0.588,0.498,0.200,0.085
2,Brazil,South America,2.129539e+08,4918022,2733.0,146417.0,42.0,4263208.0,0.0,508397.0,...,84056.0,Americas,866854.194634,6.300,1.004,1.439,0.802,0.390,0.099,0.086
3,Russia,Europe,1.459510e+08,1225889,10888.0,21475.0,117.0,982324.0,3181.0,222090.0,...,329168.0,Europe,801315.616667,5.648,1.183,1.452,0.726,0.334,0.082,0.031
4,Colombia,South America,5.102364e+07,855052,0.0,26712.0,0.0,761674.0,0.0,66666.0,...,76323.0,Americas,890792.606765,6.125,0.985,1.410,0.841,0.470,0.099,0.034
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
144,Mongolia,Asia,3.291861e+06,314,1.0,0.0,0.0,307.0,0.0,7.0,...,22371.0,Western Pacific,977707.006369,5.285,0.948,1.531,0.667,0.317,0.235,0.038
145,Bhutan,Asia,7.738120e+05,298,15.0,0.0,0.0,237.0,7.0,61.0,...,184166.0,South-East Asia,795302.013423,5.082,0.813,1.321,0.604,0.457,0.370,0.167
146,Cambodia,Asia,1.677868e+07,280,2.0,0.0,0.0,275.0,0.0,5.0,...,8589.0,Western Pacific,982142.857143,4.700,0.574,1.122,0.637,0.609,0.232,0.062
147,Laos,Asia,7.302776e+06,23,0.0,0.0,0.0,22.0,0.0,1.0,...,7546.0,Western Pacific,956521.739130,4.796,0.764,1.030,0.551,0.547,0.266,0.164


# Color palettes

I use divik package to have consistent color theme for all palettes, no matter their length.

In [86]:
from ast import literal_eval as make_tuple
from divik._inspect.color import make_colormap

In [87]:
def make_palette(length):
    l = [i for i in range(length)]
    p = make_colormap(l)
    palette = []
    for i in p:
        palette.append(i[1])
    return palette

## Regions

This palette provides colors for:

In [28]:
worldometer["WHO Region"].unique()

array(['Americas', 'South-East Asia', 'Europe', 'Africa',
       'Eastern Mediterranean', 'Western Pacific'], dtype=object)

In [29]:
region_palette = make_palette(len(worldometer["WHO Region"].unique()))

## Variable group

In [None]:
var_palette = make_palette(3)

# Analyse correlation

In [207]:
to_drop=['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
       'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
       'ActiveCases', 'Serious,Critical',
       'TotalTests', 'WHO Region', "Score"]

# Analyse the score influence on the pandemic progress

# Focus on significant correlations