<a href="https://colab.research.google.com/github/Gichere/visualizing-covid19-worldwide/blob/main/visualizing_Covid_19_using_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installing & Importing modules

Installing all the modules we'll need for this project.


In [1]:
# Install Plotly
!pip install Plotly==4.12

# Install Dash
!pip install dash
!pip install dash-html-components
!pip install dash-core-components
!pip install dash-table

#Install Pycountry
!pip install pycountry

Collecting Plotly==4.12
[?25l  Downloading https://files.pythonhosted.org/packages/a6/66/af86e9d9bf1a3e4f2dabebeabd02a32e8ddf671a5d072b3af2b011efea99/plotly-4.12.0-py2.py3-none-any.whl (13.1MB)
[K     |████████████████████████████████| 13.1MB 324kB/s 
Installing collected packages: Plotly
  Found existing installation: plotly 4.4.1
    Uninstalling plotly-4.4.1:
      Successfully uninstalled plotly-4.4.1
Successfully installed Plotly-4.12.0
Collecting dash
[?25l  Downloading https://files.pythonhosted.org/packages/dd/17/55244363969638edd1151de0ea4aa10e6a7849b42d7d0994e3082514e19d/dash-1.18.1.tar.gz (74kB)
[K     |████████████████████████████████| 81kB 3.5MB/s 
Collecting flask-compress
  Downloading https://files.pythonhosted.org/packages/b2/7a/9c4641f975fb9daaf945dc39da6a52fd5693ab3bbc2d53780eab3b5106f4/Flask_Compress-1.8.0-py3-none-any.whl
Collecting dash_renderer==1.8.3
[?25l  Downloading https://files.pythonhosted.org/packages/72/fe/59a322edb128ad15205002c7b81e3f5e580f6791c4a

In [2]:
import os.path
import sys, json
import requests
import subprocess

import numpy as np
import pandas as pd
import plotly.express as px
import pycountry as pc

from requests.exceptions import RequestException
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

from collections import namedtuple

## Downloading The Datasets

I will be using the 2020 World Bank Human Capital Index and the COVID-19 Worldometer Daily Snapshops datasets obtained from Kaggle for this project. 


In [3]:
url_covid = 'https://raw.githubusercontent.com/Gichere/datasets/main/worldometers_snapshots_October11_to_October12.csv'
url_hci = 'https://raw.githubusercontent.com/Gichere/datasets/main/hci_MaleFemale_september_2020.csv'

## Loading the datasets

In [4]:
covid_df = pd.read_csv(url_covid,usecols=[0, 1, 2, 3, 4, 5, 6])
hci_df = pd.read_csv(url_hci,usecols=['Country Name', 'Income Group', 'Expected Years of School'])

## Data preprocessing and wrangling

In [5]:
display(covid_df.head())
display(hci_df.head())

Unnamed: 0,Date,Country,Population,Total Tests,Total Cases,Total Deaths,Total Recovered
0,2020-10-11,USA,331552784,118486898.0,7991998,219695.0,5128162.0
1,2020-10-11,India,1383826697,86877242.0,7119300,109184.0,6146427.0
2,2020-10-11,Brazil,212986866,17900000.0,5094979,150506.0,4470165.0
3,2020-10-11,Russia,145952340,50781349.0,1298718,22597.0,1020442.0
4,2020-10-11,Colombia,51035485,4173863.0,911316,27834.0,789787.0


Unnamed: 0,Country Name,Income Group,Expected Years of School
0,Afghanistan,Low income,8.9
1,Albania,Upper middle income,12.9
2,Algeria,Lower middle income,11.8
3,Angola,Lower middle income,8.1
4,Antigua and Barbuda,High income,13.0


Looking at the datasets: 
* In **covid_df**, there are two instances for every country, each for a separate date. 
* The country names for the two datasets differ - for example, the U.S. is named *'USA'* in **covid_df** and *'United States'* in **hci_df**. 
* The column names are written in a format that makes them harder to work with.

## Cleaning The Data

To deal with the column names, a function will be written that will run over both datasets to turn their column names to snakecase to make it easier to work with.


In [6]:
def to_snakecase (cols):
  map_dict = {}
  for col in cols:
    map_dict[col] = col.lower().strip().replace(' ', '_')
  return map_dict

To deal with the different country names, a function will be defined  that will map them to the same format across both datasets using the Pycountry library.

In [7]:
def normalize_country (data):
  if pc.countries.get(official_name=data):
    return pc.countries.get(official_name=data).name
  elif pc.countries.get(name=data):
    return pc.countries.get(name=data).name
  elif pc.countries.get(alpha_3=data):
    return pc.countries.get(alpha_3=data).name
  elif pc.countries.get(alpha_2=data):
    return pc.countries.get(alpha_2=data).name

When run the to_snakecase function across both datasets and manually set the country_name column to country in hci_df so we can use it to merge both of them later. Then, in covid_df, we limit the dataset to the most recent data, dropping the data from earlier days. And, finally, we apply the normalize_country function to both datasets to the country names match across both of them.

In [8]:
covid_df.rename(to_snakecase(covid_df.columns), axis=1, inplace=True)
covid_df = covid_df[covid_df.date == '2020-10-12']
covid_df.drop('date', axis=1, inplace=True)
covid_df.country = covid_df.country.apply(normalize_country)

In [9]:
hci_df.rename(to_snakecase(hci_df.columns), axis=1, inplace=True)
hci_df.rename({'country_name':'country'}, axis=1, inplace=True)
hci_df.country = hci_df.country.apply(normalize_country)

## Merging The Datasets

With the problems taken care of, we can now merge both datasets into a single final dataset to work with, using the pd.merge method for such.


In [10]:
data = pd.merge(left=covid_df, right=hci_df, on='country')
data = data[data.country.notnull()].reset_index(drop=True)
data.head()

Unnamed: 0,country,population,total_tests,total_cases,total_deaths,total_recovered,income_group,expected_years_of_school
0,United States,331552784,119497624.0,8037789,220011.0,5184615.0,High income,12.9
1,India,1383826697,87872093.0,7173565,109894.0,6224792.0,Lower middle income,11.1
2,Brazil,212986866,17900000.0,5103408,150709.0,4495269.0,Upper middle income,11.9
3,Colombia,51035485,4202181.0,919083,27985.0,798396.0,Upper middle income,12.9
4,Spain,46759952,14590713.0,918223,33124.0,,High income,13.0


## Plotting The Data

For this project, we'll be creating the follwing three plots and grouping them up together into one final dashboard using plotly dash's interactive features.


## Interest Variable VS Education Level

A scatter plot displaying all countries and their values of the y-axis variable being tracked while also displaying their level of education, population and income group.

In [11]:
data.describe()

Unnamed: 0,population,total_tests,total_cases,total_deaths,total_recovered,expected_years_of_school
count,141.0,133.0,141.0,133.0,137.0,141.0
mean,48536220.0,4520382.0,247011.5,7288.097744,190088.5,11.311348
std,171992100.0,18806670.0,1001286.0,26221.801904,788947.5,2.375533
min,72037.0,96.0,2.0,1.0,24.0,4.2
25%,3986972.0,155202.0,4905.0,93.0,3237.0,10.0
50%,10231670.0,522112.0,26073.0,509.0,15975.0,12.2
75%,33100420.0,2088941.0,108831.0,2179.0,80714.0,13.1
max,1439324000.0,160000000.0,8037789.0,220011.0,6224792.0,13.9


In [12]:
  fig = px.scatter(data,
                  x='expected_years_of_school',
                  y=data.total_cases/data.population,
                  size='population',
                  color='income_group',
                  hover_name='country',
                  template='plotly_dark',
                  labels={'expected_years_of_school':'Expected Years of School',
                          'y': 'Total Cases'},
                  title='Total Cases VS Education Level')
  fig.update_layout()
  fig.show()

Notice how most countries are hard to see due to how small their population is in comparsion to the largest countries. To counteract that issue, we'll make it so you can control the max population you want displayed in the dashboard.

## Interest Variable Per Country

Showing the values of the chosen y-axis variable for each country while also grouping them up by income group.

In [13]:
fig = px.bar(data, 
              x='country', 
              y='total_cases', 
              color='income_group',
              template='plotly_dark',
              labels={'country':'Country',
                      'total_cases':'Total Cases',
                      'total_tests':'Total Tests',
                      'total_deaths':'Total Deaths',
                      'total_recovered':'Total Recovered'},
              title='Total Cases per Country')
fig.show()

## Interest Variable Per Income Group

Just like the previous but displaying the sum of all y-axis values for each income gorup.

In [14]:
  fig = px.bar(data.groupby(by='income_group').sum().reset_index(),
                  x='income_group',
                  y='total_cases',
                  color='income_group',
                  template='plotly_dark',
                  labels={'income_group':'Income Group',
                          'total_cases':'Total Cases',
                          'total_tests':'Total Tests',
                          'total_deaths':'Total Deaths',
                          'total_recovered':'Total Recovered'},
                  title='Total Cases By Income Group')
  fig.update_layout()

## The Dash App
### The Final Visualization

At last, we get to put everything together into a final dash app that makes use of Plotly Dash's layout and intractivity features, which are done via the callbacks.

In order to display our dashboard, we'll want to save it to a .py file, hence the %%writefile magic function at the start of the following cell.

To make things easier, I've also uploaded our processed dataset into github and left it available here. That way we can simply read the .csv file instead of copying all of our code from above.


In [27]:
%%writefile my_dash_app.py
import dash
from dash.dependencies import Output, Input
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
import pandas as pd


external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

colors = {
    'background': '#FFFFFF',
    'text': '#7FDBFF'
}

# Reading The Dataset 

data = pd.read_csv('https://raw.githubusercontent.com/Gichere/datasets/main/covid_worldwide.csv')

# Defining App Layout 

app.layout = html.Div(style={'backgroundColor': colors['background']}, children=[
      html.H1('Visualizing Civid-19 Worldwide', style={'textAlign':'center'}),
      html.Div([
          html.Div([ 
              html.Label('Population'),
              dcc.Slider(
                  id='population-slider',
                  min=data.population.min(),
                  max=data.population.max(),
                  marks={
                    72037 : '72K',
                    80000000 : '80M',
                    150000000 : '150M',
                    300000000 : '300M',
                    700000000 : '700M',
                    1000000000 : '1B',
                    1439323776 : '1.4B' 
                  },
                  value=data.population.min(),
                  step=100000000,
                  updatemode='drag'
              )
          ]),
          html.Div([
              html.Label('Interest Variable'),
              dcc.Dropdown(
                  id='interest-variable',
                  options=[{'label':'Total Cases', 'value':'total_cases'},
                           {'label': 'Total Tests', 'value':'total_tests'},
                           {'label': 'Total Deaths', 'value':'total_deaths'},
                           {'label': 'Total Recovered', 'value':'total_recovered'}],
                  value='total_cases' 
              )
          ])
      ], style = {'width':'90%','margin':'auto'}),      
      html.Div([ 
              dcc.Graph(
                  id='covid-vs-edu',
              ),    
          html.Div(
              dcc.Graph(
                  id='covid-vs-income',
              )
      , style = {'width': '50%', 'display': 'inline-block'}),
          html.Div( 
              dcc.Graph(
                  id='covid-vs-income2',
              )
      , style = {'width': '50%', 'display': 'inline-block'})
      ], style = {'width':'90%','margin':'auto'})
])

def scatter_y_label (var):
  if var == 'total_cases':
    return 'Percentage Infected'
  elif var == 'total_tests':
    return 'Percentage Tested'
  elif var == 'total_deaths':
    return 'Percentage Dead'
  elif var == 'total_recovered':
    return 'Percentage Recovered'

# Variable VS Education Level Scatter Plot

@app.callback(Output('covid-vs-edu', 'figure'),
              [Input('population-slider', 'value'),
               Input('interest-variable', 'value')])             
def update_scatter(selected_pop, interest_var):
  sorted = data[data.population <= selected_pop]
  fig = px.scatter(sorted,
                  x='expected_years_of_school',
                  y=sorted[interest_var]/sorted.population,
                  size='population',
                  color='income_group',
                  hover_name='country',
                  template='plotly_dark',
                  labels={'expected_years_of_school':'Expected Years of School',
                          'y': scatter_y_label(interest_var)},
                  title='Total Cases VS Education Level')
  fig.update_layout(transition_duration=500)
  return fig

# Variable Per Income Group Bar Chart

@app.callback(Output('covid-vs-income', 'figure'),
              [Input('population-slider', 'value'),
               Input('interest-variable', 'value')])             
def update_income_bar(selected_pop, interest_var):
  sorted = data[data.population <= selected_pop].groupby(by='income_group').sum().reset_index()
  fig = px.bar(sorted,
                  x='income_group',
                  y=interest_var,
                  color='income_group',
                  template='plotly_dark',
                  labels={'income_group':'Income Group',
                          'total_cases':'Total Cases',
                          'total_tests':'Total Tests',
                          'total_deaths':'Total Deaths',
                          'total_recovered':'Total Recovered'},
                  title='Total Cases By Income Group')
  fig.update_layout()
  return fig

# Variable Per Country Bar Chart

@app.callback(Output('covid-vs-income2', 'figure'),
              [Input('population-slider', 'value'),
               Input('interest-variable', 'value')])             
def update_country_bar(selected_pop, interest_var):
  sorted = data[data.population <= selected_pop]
  fig = px.bar(sorted, 
                x='country', 
                y=interest_var, 
                color='income_group',
                template='plotly_dark',
                labels={'country':'Country',
                        'total_cases':'Total Cases',
                        'total_tests':'Total Tests',
                        'total_deaths':'Total Deaths',
                        'total_recovered':'Total Recovered'},
                title='Total Cases per Country')
  fig.update_layout()
  return fig

if __name__ == '__main__':
    app.run_server(debug=False, use_reloader=False)

Overwriting my_dash_app.py


In [16]:
!python my_dash_app.py

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "my_dash_app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
^C


In [18]:
def download_ngrok():
    if not os.path.isfile('ngrok'):
        !wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
        !unzip -o ngrok-stable-linux-amd64.zip
    pass

In [19]:
Response = namedtuple('Response', ['url', 'error'])

def get_tunnel():
    try:
        Tunnel = subprocess.Popen(['./ngrok','http','8050'])

        session = requests.Session()
        retry = Retry(connect=3, backoff_factor=0.5)
        adapter = HTTPAdapter(max_retries=retry)
        session.mount('http://', adapter)

        res = session.get('http://localhost:4040/api/tunnels')
        res.raise_for_status()

        tunnel_str = res.text
        tunnel_cfg = json.loads(tunnel_str)
        tunnel_url = tunnel_cfg['tunnels'][0]['public_url']

        return Response(url=tunnel_url, error=None)
    except RequestException as e:
        return Response(url=None, error=str(e))

In [20]:
download_ngrok()

--2020-12-12 18:45:25--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 35.174.46.144, 52.22.13.178, 52.54.205.131, ...
Connecting to bin.equinox.io (bin.equinox.io)|35.174.46.144|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13773305 (13M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip’


2020-12-12 18:45:26 (19.1 MB/s) - ‘ngrok-stable-linux-amd64.zip’ saved [13773305/13773305]

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   


In [None]:
out = get_tunnel()
print(out)
!python my_dash_app.py

Response(url='http://973c14492edc.ngrok.io', error=None)
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "my_dash_app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [12/Dec/2020 19:01:36] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [12/Dec/2020 19:01:36] "[37mGET /_dash-component-suites/dash_renderer/react@16.v1_8_3m1607798580.14.0.min.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [12/Dec/2020 19:01:36] "[37mGET /_dash-component-suites/dash_renderer/polyfill@7.v1_8_3m1607798580.8.7.min.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [12/Dec/2020 19:01:36] "[37mGET /_dash-component-suites/dash_renderer/react-dom@16.v1_8_3m1607798580.14.0.min.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [12/Dec/2020 19:01:37] "[37mGET /_dash-component-suites/dash_renderer/prop-types@15.v1_8_3m1607798580.7.2.min.js HTTP/1.1[0m" 200 -
127.0.0.1 - - [12/Dec/2020 19:01:37] "[3