![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)  


## Plotting the logarithmic scale on cumulative COVID-19 cases per country

In this notebook we will have an opportunity to plot the cumulative number of confirmed COVID-19 cases per country, the cumulative number of deaths per country, and plot the logarithmic scale correspondingly. 


### What is a logarithmic scale?

A logarithmic scale is a nonlinear scale often used when analyzing a large range of quantities. Instead of increasing in equal increments, each interval is increased by a factor of the base of the logarithm. Typically, a base 10 and base $e$ scale are used. In this notebook, we will use base 10. 

Let's say you have a variable $y$ which [grows exponentially](https://en.wikipedia.org/wiki/Exponential_growth), that is, 

on the first day, $y=10$, 

on the second day, $y = 100$, 

on the third day, $y = 1000$...

What this means is that every day, the value of y will increase by a factor of ten.

### Why logarithmic scale?

Using a logarithmic scale is useful when the largest numbers in the data are hundreds or thousands of times larger than the smallest numbers. 

In our previous example, 

on the first day, $log_{(10)} (y) = 1$, 

on the second day, $log_{(10)} (y) = 2$, 

and on the third day, $log_{(10)} (y) = 3$.

### COVID-19 number of confirmed cases grow exponentially

Many articles, [including this one](https://ourworldindata.org/coronavirus) have noted that the number of confirmed cases is growing exponentially - this means that every day the number of confirmed cases is increasing by a factor "x". This number varies across each country. In this notebook we will explore how this is the case. 

Press the >| Run button to run the next cell.


In [1]:
import requests as r
import pandas as pd
from pandas.io.json import json_normalize
import cufflinks as cf
import numpy as np
import plotly.graph_objs as go
#com/mand to display graphics correctly in a Jupyter notebook
cf.go_offline()
print("Sucess!")

Sucess!


We will begin by downloading the data via an [API](https://en.wikipedia.org/wiki/Application_programming_interface) developed by [Omar Laraqui](https://github.com/Omaroid).

The API gets the latest data from [the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)](https://github.com/CSSEGISandData/COVID-19)

Using an API lets us more easily obtain the latest data and helps us parse it using [JSON](https://en.wikipedia.org/wiki/JSON) format. 

Press the >| Run button to get the latest data. 

In [2]:
# Get the latest data
# Confirmed
try:
    API_response_confirmed = r.get("https://covid19api.herokuapp.com/confirmed")
    data = API_response_confirmed.json() # Check the JSON Response Content documentation below
    confirmed_df = json_normalize(data,record_path=["locations"])
    
    print("Confirmed cases download was successful!")
except:
    print("Error: check GitHub is functioning appropriately, check https://covid19api.herokuapp.com/ is not down, check fields were not renamed")
# Deaths
try:
    API_response_death = r.get("https://covid19api.herokuapp.com/deaths")
    data1 = API_response_death.json() # Check the JSON Response Content documentation below
    death_df = json_normalize(data1,record_path=["locations"])
    
    print("Death cases download was successful!")
except:
    print("Error: check GitHub is functioning appropriately, check https://covid19api.herokuapp.com/ is not down, check fields were not renamed")
# Latest
try:
    API_summary = r.get("https://covid19api.herokuapp.com/latest")
    data2 = API_summary.json()
    summary  = json_normalize(data2)
    print("Latest cases download was successful!")
except:
    print("Error: check GitHub is functioning appropriately, check https://covid19api.herokuapp.com/ is not down, check fields were not renamed")


pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead



Confirmed cases download was successful!



pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead



Death cases download was successful!
Latest cases download was successful!



pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead



Now that we have downloaded the data, let's take a look at our dataframes:

In [3]:
print("Confirmed cases, first 5 entries")
confirmed_df.head(5)

Confirmed cases, first 5 entries


Unnamed: 0,country,country_code,latest,province,coordinates.latitude,coordinates.longitude,history.1/22/20,history.1/23/20,history.1/24/20,history.1/25/20,...,history.3/8/20,history.3/9/20,history.4/1/20,history.4/2/20,history.4/3/20,history.4/4/20,history.4/5/20,history.4/6/20,history.4/7/20,history.4/8/20
0,Afghanistan,AF,444,,33.0,65.0,0,0,0,0,...,4,4,237,273,281,299,349,367,423,444
1,Albania,AL,400,,41.1533,20.1683,0,0,0,0,...,0,2,259,277,304,333,361,377,383,400
2,Algeria,DZ,1572,,28.0339,1.6596,0,0,0,0,...,19,20,847,986,1171,1251,1320,1423,1468,1572
3,Andorra,AD,564,,42.5063,1.5218,0,0,0,0,...,1,1,390,428,439,466,501,525,545,564
4,Angola,AO,19,,-11.2027,17.8739,0,0,0,0,...,0,0,8,8,8,10,14,16,17,19


In [4]:
print("Death cases, first 5 entries")
death_df.head(5)

Death cases, first 5 entries


Unnamed: 0,country,country_code,latest,province,coordinates.latitude,coordinates.longitude,history.1/22/20,history.1/23/20,history.1/24/20,history.1/25/20,...,history.3/8/20,history.3/9/20,history.4/1/20,history.4/2/20,history.4/3/20,history.4/4/20,history.4/5/20,history.4/6/20,history.4/7/20,history.4/8/20
0,Afghanistan,AF,14,,33.0,65.0,0,0,0,0,...,0,0,4,6,6,7,7,11,14,14
1,Albania,AL,22,,41.1533,20.1683,0,0,0,0,...,0,0,15,16,17,20,20,21,22,22
2,Algeria,DZ,205,,28.0339,1.6596,0,0,0,0,...,0,0,58,86,105,130,152,173,193,205
3,Andorra,AD,23,,42.5063,1.5218,0,0,0,0,...,0,0,14,15,16,17,18,21,22,23
4,Angola,AO,2,,-11.2027,17.8739,0,0,0,0,...,0,0,2,2,2,2,2,2,2,2


In [5]:
print("Summary data, latest cases")
summary

Summary data, latest cases


Unnamed: 0,confirmed,deaths,recovered
0,1511104,88338,328661


### Data cleanup

We need to manipulate the data a bit to remove the "history." and "coordinates." from the dates. 

In [6]:
# Flattening the data 
flat_confirmed = json_normalize(data=data['locations'])
flat_death = json_normalize(data=data1['locations'])
flat_confirmed.set_index('country', inplace=True)
flat_death.set_index('country', inplace=True)

# Define a function to drop the history.prefix
# Create function drop_prefix
def drop_prefix(self, prefix):
    self.columns = self.columns.str.lstrip(prefix)
    return self

# Call function
pd.core.frame.DataFrame.drop_prefix = drop_prefix

# Define function which removes history. prefix, and orders the column dates in ascending order
def order_dates(flat_df):

    # Drop prefix
    flat_df.drop_prefix('history.')
    flat_df.drop_prefix("coordinates.")
    # Isolate dates columns
    flat_df.iloc[:,6:].columns = pd.to_datetime(flat_df.iloc[:,6:].columns)
    # Transform to datetim format
    sub = flat_df.iloc[:,6:]
    sub.columns = pd.to_datetime(sub.columns)
    # Sort
    sub2 = sub.reindex(sorted(sub.columns), axis=1)
    sub3 = flat_df.reindex(sorted(flat_df.columns),axis=1).iloc[:,-5:]
    # Concatenate
    final = pd.concat([sub2,sub3], axis=1, sort=False)
    return final

# Apply function
final_confirmed = order_dates(flat_confirmed)

final_deaths = order_dates(flat_death)




pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead


pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead



In [7]:
print("Cleaned up dataframe for confirmed cases")
final_confirmed.head(5)

Cleaned up dataframe for confirmed cases


Unnamed: 0_level_0,2020-01-23 00:00:00,2020-01-24 00:00:00,2020-01-25 00:00:00,2020-01-26 00:00:00,2020-01-27 00:00:00,2020-01-28 00:00:00,2020-01-29 00:00:00,2020-01-30 00:00:00,2020-01-31 00:00:00,2020-02-01 00:00:00,...,2020-04-04 00:00:00,2020-04-05 00:00:00,2020-04-06 00:00:00,2020-04-07 00:00:00,2020-04-08 00:00:00,latest,latitude,longitude,province,untry_code
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0,0,0,0,0,0,0,0,0,0,...,299,349,367,423,444,444,33.0,65.0,,AF
Albania,0,0,0,0,0,0,0,0,0,0,...,333,361,377,383,400,400,41.1533,20.1683,,AL
Algeria,0,0,0,0,0,0,0,0,0,0,...,1251,1320,1423,1468,1572,1572,28.0339,1.6596,,DZ
Andorra,0,0,0,0,0,0,0,0,0,0,...,466,501,525,545,564,564,42.5063,1.5218,,AD
Angola,0,0,0,0,0,0,0,0,0,0,...,10,14,16,17,19,19,-11.2027,17.8739,,AO


In [8]:
print("Cleaned up dataframe for deaths")
final_deaths.head(5)

Cleaned up dataframe for deaths


Unnamed: 0_level_0,2020-01-23 00:00:00,2020-01-24 00:00:00,2020-01-25 00:00:00,2020-01-26 00:00:00,2020-01-27 00:00:00,2020-01-28 00:00:00,2020-01-29 00:00:00,2020-01-30 00:00:00,2020-01-31 00:00:00,2020-02-01 00:00:00,...,2020-04-04 00:00:00,2020-04-05 00:00:00,2020-04-06 00:00:00,2020-04-07 00:00:00,2020-04-08 00:00:00,latest,latitude,longitude,province,untry_code
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0,0,0,0,0,0,0,0,0,0,...,7,7,11,14,14,14,33.0,65.0,,AF
Albania,0,0,0,0,0,0,0,0,0,0,...,20,20,21,22,22,22,41.1533,20.1683,,AL
Algeria,0,0,0,0,0,0,0,0,0,0,...,130,152,173,193,205,205,28.0339,1.6596,,DZ
Andorra,0,0,0,0,0,0,0,0,0,0,...,17,18,21,22,23,23,42.5063,1.5218,,AD
Angola,0,0,0,0,0,0,0,0,0,0,...,2,2,2,2,2,2,-11.2027,17.8739,,AO


## Visualizing the data

In the next few cells we will manipulate the data one more time to visualize. 

In [9]:
# We will plot the log projection along with the cumulative number of cases
def plot_log_function(country,final_df,type_case):
    
    latest_arr = []
    date_arr = []
    for item in final_df[final_df.index==country].iloc[:,0:-5].columns:
        date_arr.append(item)
        latest_arr.append(final_df[final_df.index==country][item].sum())

    final_confirmed_red = pd.DataFrame({"Date":date_arr,"CumulativeTotal":latest_arr})

    
    
    x = final_confirmed_red.Date
    y = final_confirmed_red.CumulativeTotal

    npy = np.array(y.to_list())
    l_y = np.log10(npy, where=0<npy, out=np.nan*npy)


    trace1 = go.Bar(x=x,y=y,name=country)
    trace2 = go.Scatter(x=x,y=l_y,name='Log ' + str(country),yaxis='y2')
    layout = go.Layout(
        title= ('Number of ' + str(type_case) + ' cases for ' + str(country)),
        yaxis=dict(title='Total Number of ' + str(type_case) + ' cases',\
                   titlefont=dict(color='blue'), tickfont=dict(color='blue')),
        yaxis2=dict(title='Logarithmic curve', titlefont=dict(color='red'), \
                    tickfont=dict(color='red'), overlaying='y', side='right'),
        showlegend=False)
    fig = go.Figure(data=[trace1,trace2],layout=layout)
    fig.update_yaxes(showgrid=True)
    fig.show()   
    


#### Exercise

Run the cell below to get the list of countries.

Pick a country you are interested in from the list. 


In [10]:
countries_regions = final_confirmed.index.unique().tolist()

countries_regions

['Afghanistan',
 'Albania',
 'Algeria',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Brazil',
 'Brunei',
 'Bulgaria',
 'Burkina Faso',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Congo (Brazzaville)',
 'Congo (Kinshasa)',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Diamond Princess',
 'Cuba',
 'Cyprus',
 'Czechia',
 'Denmark',
 'Djibouti',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Fiji',
 'Finland',
 'France',
 'Gabon',
 'Gambia',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Guatemala',
 'Guinea',
 'Guyana',
 'Haiti',
 'Holy See',
 'Honduras',
 'Hungary',
 'Iceland',
 'India',
 'Indonesia',
 'Iran',
 'Iraq'

Once you picked a country, enter it in the cell below and run. 

Remember to use quotation marks ""!

"Canada" is provided as an example.

In [11]:
country = "Canada"

plot_log_function(country,final_confirmed,"confirmed")
plot_log_function(country,final_deaths,"death")

### Observations

Try multiple countries and compare the red curve with the logarithmic values against the actual values. 

For example: try China, US, Canada, Italy. How does the number of actual cases change? Remember that we are computing log base 10 - which means that the log scale tells us by how many factors of 10 the number of confirmed and deaths have changed over time. 



[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)