![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto’s Weekly Data Visualization


## Plotting Daily and Cumulative COVID-19 Cases per Country on a Logarithmic Scale

### Recommended grade level: 9-12

Callysto's Weekly Data Visualization is a learning resource for teachers and students that helps to develop data literacy through the interpretation of graphs and the application of the data science analysis process. The steps of this process are listed here and applied to the weekly topic. 

1. **Question** - What are you trying to answer?
2. **Gather** - Find the data source(s) you will need. 
3. **Organize** - Arrange the data so that you can easily explore it.
4. **Explore** - Examine the data to find evidence to answer your question. 
5. **Visualize** - Create a visualization that represents your data evidence.
6. **Communicate** - Explain how your data visualization helps answer your question. Cite your data source. 

In this notebook we will explore the logarithmic scale (also known as the log scale) and we will explore how the log scale can help us determine if a country is experiencing exponential growth in the number of reported COVID-19 cases. 


### COVID-19 number of confirmed cases grow exponentially

Since the beginning of the COVID-19 outbreak, health authorities have been tracking the total number of confirmed cases on a daily basis. During the first few months, we can see exponential growth, followed by a decrease in the number of cases. 

Logarithmic scale charts can help show how fast the number of cases is increasing. 

## Question:

How can using log scale help us identify when the number of COVID-19 cases is increasing exponentially? 

Let's take a look at some data[1].

[1] Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) https://systems.jhu.edu/, for creating and maintaining https://github.com/CSSEGISandData/COVID-19 .

## Gather & Organize

The file `download_and_parse_data.py` contains Python code that imports the Python programming libraries we need to gather and organize the data to answer our question. The code also manipulates, cleans and presents the data so that it is easier to work with. 

You can see the full script [here](https://github.com/callysto/data-viz-of-the-week/blob/main/covid-19-visualizations/scripts/download_and_parse_data.py).

Run the cell below to execute this code. 

In [None]:
%run -i ./scripts/download_and_parse_data.py
print("Success!")

## Explore & Visualize

The code below will be used to help us find evidence to answer our question. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations.

In this case, we will start by looking at the daily number of confirmed COVID-19 cases in Canada. 

Run the cell below to generate a visualization. 

In [None]:
try:
    # Select country
    country = "Canada"
    # Subset data to extract information for Canada
    by_prov = final_confirmed[final_confirmed.index==country].set_index("province").T.iloc[:-4,]
    by_prov["TotalDailyCase"] = by_prov.sum(axis=1)

    # This variable contains data on COVID 19 daily cases
    non_cumulative_cases = by_prov.diff(axis=0)
    t = np.linspace(0, len(non_cumulative_cases["TotalDailyCase"]), len(non_cumulative_cases["TotalDailyCase"]))
    # Create figure
    trace3 = go.Scatter(x = non_cumulative_cases.index,y=non_cumulative_cases["TotalDailyCase"])
    layout = go.Layout(
            title= ('Daily number of COVID-19 confirmed cases in' + str(country)),
            yaxis=dict(title='Daily Number of  Reported Infections',\
                       titlefont=dict(color='blue'), tickfont=dict(color='blue')),
                yaxis2=dict(title='Number of infectious members (our model)', titlefont=dict(color='red'), \
                            tickfont=dict(color='red'), overlaying='y', side='right'),
                showlegend=False)
    fig = go.Figure(data=[trace3],layout=layout)
    # Display figure
    fig.show()
except:
    # Sanity check
    print("WARNING")
    print("Please ensure you have run code cell '%run -i ./scripts/download_and_parse_data.py' in this notebook.")

##  Communicate

We can see that in Canada between the months March through May, COVID-19 spread expoentially. Between May and July, the number of cases decreased. Between August and October 2020, the number of infections is increasing rapidly once more. 

Recall that on a logarithmic scale, numbers on the dependent variable y increases by a set factor – (though not always, this factor is usually 10). Estimating a factor (the average number of new infections deriving from an existing infection) in this scenario is challenging for a number of reasons. The article "Why R0 Is Problematic for Predicting COVID-19 Spread" Katarina Zimmer, Jul 13, 2020
 [link here](https://www.the-scientist.com/features/why-r0-is-problematic-for-predicting-covid-19-spread-67690) outlines some of those reasons. 
 
Even if we don't have the factor at which the function increases, the logarithmic scale is a great way to measure the rates of change of new confirmed infections. When we use the logarithmic scale of COVID-19 confirmed cases, we can see when the rate of infection starts to level off. This leveling off signals that exponential growth has stopped. We'll use a factor of 10 to identify when rate of infection is increasing exponentially. 

Let's take a look at this using a graph.


## Explore & Visualize

Run the cell below to display the log curve of covid cases. 

<b>Notes: </b>

a) Enter the name of a country using the text box. This will display the log curve for raw or cumulative number of cases per country. 

b) Toggle the "Get cumulative results" checkbox to see how the log scale changes when we compute the daily number of cases, vs the cumulative number of cases. 

c) To see results for a different country, use the backspace or delete key to remove the country selected at first, and type the name of a new country (use the drop down menu to help you find as well). 


In [None]:
try:
    display(tab)
except:
    # Sanity check
    print("WARNING")
    print("Please ensure you have run code cell '%run -i ./scripts/download_and_parse_data.py' in this notebook.")

## Communicate

The plots above contain the daily total number of confirmed cases and deaths (marked in blue), along with the logarithmic scale associated with those cases (marked in red). 

As an example, if in the box we select "Canada" and leave the box unchecked, we see that during the first few months since cases started being reported, the number of confirmed infections grew exponentially. 

Using a factor of 10 (similar to Richter scale), we see that during the 2020 Summer months we see that the growth reached a minimum level 2 and a maximum level 3 in logarithmic scale. This means that every day during the Summer months between 100 and 1000 cases were reported each day. On the other hand during the 2020 Fall months we see that the growth reached a minimum level 3 and maximum level 4 in logarithmic scale. This means that every day during the Fall months between 1000 and 10,00 cases were reported each day. 

This indicates that the number of cases increased by a factor of 10 from Summer to Fall, and thus, the number of confimed COVID-19 cases in Canada re-entered exponential growth. 

___

### Further questions

Use the widget above to explore how the number of COVID-19 cases in countries in different parts of the world does with respect to the log scale. Can you identify countries whose growth is exponential? 

Explore the cumulative number of cases for different countries. These plots can also indicate when the number of infections is growing exponentially. Identify countries whose log scale indicates exponential growth. 

<b>Notes: </b>

a) Enter the name of a country using the text box. This will display the log curve for raw or cumulative number of cases per country. 

b) Toggle the "Get cumulative results" checkbox to see how the log scale changes when we compute the daily number of cases, vs the cumulative number of cases. 

c) To see results for a different country, use the backspace or delete key to remove the country selected at first, and type the name of a new country (use the drop down menu to help you find as well). 


[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)