## To use this slideshow:
- Run all of the cells, using the menu item: **Kernel** then **Restart & Run All**
- Return to this top cell
- click on "Slideshow" menu item above, that looks like this:
![](images/SlideIcon.png)

![](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)



<img src="Graphing-real-world data_Notebook-Banner.jpg" alt="Drawing" style="width: 1600px;"/>

#### http://tinyurl.com/yd8q2lk6
    
May 14, 2020 with Laura G. Funderburk




In [None]:
import requests
import json
url = 'https://health.canada.ca/en/epathogen/search'

resp = requests.get(url)
data = resp.json()

In [None]:
import pandas as pd
pathogen_cont_level = []
pathogen_name = []
pathogen_type = []
human_risk_group = []
animal_risk_group = []
containtment_considerations = []
for key in data['results'].keys():
    pathogen_cont_level.append(data['results'][key]['ContainmentLevel'])
    pathogen_name.append(data['results'][key]['name'])
    pathogen_type.append(data['results'][key]['type'])
    human_risk_group.append(data['results'][key]['HumanRiskGroup'])
    animal_risk_group.append(data['results'][key]['AnimalRiskGroup'])
    containtment_considerations.append(data['results'][key]['ConsiderationsForContainment'])
    
dictionary_struc = {
                    "Pathogen-Containment-Level":pathogen_cont_level,
                    "Pathogen-Considerations-For-Containment":containtment_considerations,
                    "Pathogen-Name":pathogen_name,
                    "Pathogen-Type":pathogen_type,
                    "Human-Risk-Group":human_risk_group,
                    "Animal-Risk-Group":animal_risk_group
}
pathogen_df = pd.DataFrame(dictionary_struc)
unique_pathogen_name = pathogen_df.drop_duplicates(subset=["Pathogen-Name"])

## Overview
- What is data?
- What is data visualization and what makes it useful?
- Examples of data visualization
- An example using open data: ePATHogen Risk Group Dataset 

## What is data?

- We refer to "data" as a given collection of information

- Information can be numeric or categorical

- For example: collecting information on the total number of people of age 30 or older in a city has both numberical data (number of people) and categorical (age 30 or older, in a city)


## What is data visualization?

- The process of representing data in a graphical or pictorial format


<center><img src="SinglePlot.png" alt="Drawing" style="width: 500px;"/></center>


## What makes data visualization useful?

- Data visualizations can be helpful when dealing with large amounts of data 

- Data visualizations can reveal patterns 

- Application in multiple areas of interest:

    - Census data (key population metrics)
    
    - Grade distribution of a class
    
    - Economics
    
    - Biology (phylogenetics for example)

## An example using Python and Jupyter notebooks

Let's take class grades and plot frequency of grades. 

Suppose we have 20 students and we want to know how the group did overall.  Staring at the table is not necessarily the best way to identify a pattern. Let's plot. 

| Student Number | Final Grade for course | Student Number | Final Grade for course | 
| -| -| -| -|
| 100| A |110| A |
| 101| B |111| B |
| 102| C |112| C |
| 103| C |113| F |
| 104| C |114| F |
| 105| B |115| B |
| 106| B |116| C |
| 107| B |117| F |
| 108| B |118| B |
| 109| B |119| A |


In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
student_number = [str(100 +i) for i in range(20)]
grade = ["A","B","C","C","C","B","B","B","B","B","A","B","C","F","F","B","C","F","B","A"]

plt.title("Grade frequency for class")
plt.xlabel("Grade")
plt.ylabel("Frequency")
plt.hist(grade);

plt.show()

## An example using open data: ePATHogen Risk Group Dataset

Publisher - Current Organization Name: Public Health Agency of Canada

#### Questions: 

1. We want to understand what pathogens are being studied by Canadian researchers

2. We want to understand how these pathogenes are classified using the following metrics: 

    2.1 Pathogen Containment Level
    
    2.2 Human Risk Group
    
###### Source https://health.canada.ca/en/epathogen 
###### Licence https://open.canada.ca/en/open-government-licence-canada 

## Disclaimer: 

The exercises in this notebook are purely for exploratory and educational purposes. The information here should not be taken as official advice or recommendations at the individual, provincial, national, or international level. 

## Human Risk Group (RG)

Classification of biological material based on:

- pathogenicity (does it cause disease)
- virulence (how severe is the disease)
- risk of spread
- availability of effective treatments

It describes, among other things, the risk to the health of individuals and the public. 

Source: https://www.canada.ca/en/public-health/services/canadian-biosafety-standards-guidelines/second-edition.html#a2.2 

## Containment Level (CL) 

In order to study and handle pathogens, researchers developed standards for handling infectious material safely in a laboratory. 

There are four containment levels ranging from a basic laboratory (containment level CL1), to the highest level of containment (containment level CL4).


Source: https://www.canada.ca/en/public-health/services/canadian-biosafety-standards-guidelines/second-edition.html#a2.2 

### Containment Level (CL)

Minimum physical containment and operational practice requirements for handling infectious material or toxins safely in laboratory, large scale production, and animal work environments. There are four containment levels ranging from a basic laboratory (CL1) to the highest level of containment (CL4). 

The Canadian Biosafety Standard (CBS) describes three containment levels regulated by the Public Health Agency of Canada (PHAC) and the Canadian Food Inspection Agency (CFIA), ranging from the lowest level permitted to work with pathogens, toxins, and other regulated infectious material (CL2) to the highest level of containment (CL4). A containment zone itself is a physical area that meets the requirements for a specified containment level. 

### Risk group (RG)

The classification of biological material based on its inherent characteristics, including pathogenicity, virulence, risk of spread, and availability of effective prophylactic or therapeutic treatments, that describes the risk to the health of individuals and the public as well as the health of animals and the animal population.

## Categorizing Risk Groups and Containment Levels

| Risk Group| Containment Level | Risk to Individual | Risk to Community |
| -| - | - | -| 
| RG1 | CL1|Low | Low |
| RG2| CL2 | Moderate | Low |
| RG3| CL3 | High | Low |
| RG4| CL4 | High | High |

In general, the containment level and risk group of the pathogen are the same (e.g., RG2 pathogens are handled at CL2); however, there are exceptions. 

Let's explore data!

2.3.1.1 Risk Group 1 (RG1; low individual and community risk)
A microorganism, nucleic acid, or protein that is either a) not capable of causing human or animal disease; or b) capable of causing human or animal disease, but unlikely to do so. RG1 organisms capable of causing disease are considered pathogens that pose a low risk to the health of individuals or animals, and a low risk to public health and the animal population. RG1 pathogens can be opportunistic and may pose a threat to immunocompromised individuals. Neither of the RG1 subsets is regulated by the PHAC or the CFIA due to the low risk to public health and the animal population.

2.3.1.2 Risk Group 2 (RG2; moderate individual risk, low community risk)
A pathogen or toxin that poses a moderate risk to the health of individuals or animals, and a low risk to public health and the animal population. These pathogens are able to cause serious disease in a human or animal but are unlikely to do so. Effective treatment and preventive measures are available and the risk of spread of diseases caused by these pathogens is low. Examples of RG2 human pathogens are included in Schedule 2 of the HPTA.

2.3.1.3 Risk Group 3 (RG3; high individual risk, low community risk)
A pathogen that poses a high risk to the health of individuals or animals, and a low risk to public health. These pathogens are likely to cause serious disease in a human or animal. Effective treatment and preventive measures are usually available and the risk of spread of disease caused by these pathogens is low for the public. The risk of spread to the animal population, however, can range from low to high depending on the pathogen. Examples of RG3 human pathogens are included in Schedule 3 of the HPTA.

2.3.1.4 Risk Group 4 (RG4; high individual risk, high community risk)
A pathogen that poses a high risk to the health of individuals or animals and a high risk to public health. These pathogens are likely to cause serious disease in a human or animal which can often lead to death. Effective treatment and preventive measures are not usually available and the risk of spread of disease caused by these pathogens is high for the public. The risk of spread of disease to the animal population, however, ranges from low to high depending on the pathogen. Examples of RG4 human pathogens are included in Schedule 4 of the HPTA.

In [None]:
import requests
import json
url = 'https://health.canada.ca/en/epathogen/search'

resp = requests.get(url)
data = resp.json()

# ... plus a tiny bit of manipulation 

unique_pathogen_name.head()

### Time for a quick poll.... 

What human risk group do you think the SARS-CoV-2, the virus responsible for causing COVID-19 in humans, belongs to?

In [None]:
# Let's find SARS-CoV-2
print("Finding SARS-CoV-2")
display(unique_pathogen_name[unique_pathogen_name["Pathogen-Name"].str.contains("SARS-CoV-2")])

# Display pathogens in different risk levels
#print("Finding Pathogens in RG4")
#display(unique_pathogen_name[unique_pathogen_name["Human-Risk-Group"]=="RG4"].head())

## Activity: 
Pick a pathogen and find a picture - share something of interest with the group

<center><img src="./images/Zika-Virus-20-M-1.jpg" alt="Drawing" style="width: 200px;"/></center>
<center>Tick-borne encephalitis virus</center>

Human viral infectious disease involving the central nervous system. Transmitted by the bite of infected ticks, found in woodland habitats. 
No specific antiviral therapy for TBE. Treatment relies on supportive management. Meningitis, encephalitis or meningomyelitis require hospitalisation and supportive care based on syndrome severity. Classified as RG4 and CL4. 

## Reflect: 

Why do you think SARS-CoV-2 was categorized as RG3 and not RG4? 


| Risk Group| Containment Level | Risk to Individual | Risk to Community |
| -| - | - | -| 
| RG1 | CL1|Low | Low |
| RG2| CL2 | Moderate | Low |
| RG3| CL3 | High | Low |
| RG4| CL4 | High | High |

## Questions of interest

- What else can we learn about the data? 
- How many different kinds of pathogens are there?
- How are they classified according to RG and CL?

In [None]:
#!pip install plotly==4.4.1 --user
import plotly_express as px
category = "Pathogen-Type"
#display(unique_pathogen_name[category].value_counts())
display(unique_pathogen_name.head(0))
px.histogram(unique_pathogen_name,category,
            title="Distribution of data points in dataset").update_xaxes(categoryorder= "total ascending")

## RG/CL Classification for each Pathogen

Most pathogens are classified as RG1 and RG2. 

Can we identify how are different pathogens classified? 

How do bacteria differentiate from viruses in the way they are classified? 

We will use a heatmap. Heatmaps allow us to visualize magnitude of a phenomenon as color in two dimensions.

In [None]:
# Use a density heatmap
# Pathogen-Containment-Level
px.density_heatmap(unique_pathogen_name,'Human-Risk-Group','Pathogen-Type', 
                   color_continuous_scale=px.colors.sequential.Viridis,
                  title="Density Heatmap: Human Risk Group vs Pathogen Type")\
.update_xaxes(categoryorder= "total ascending").update_yaxes(categoryorder= "total ascending")

In [None]:
# Use pandas to remove bacteria from the data set
# Pathogen-Containment-Level
all_but_bacteria = unique_pathogen_name[unique_pathogen_name["Pathogen-Type"]!="Bacteria"]
# Plot density heatmap
px.density_heatmap(all_but_bacteria,'Human-Risk-Group','Pathogen-Type', 
                   color_continuous_scale=px.colors.sequential.Viridis,
                   title="Density Heatmap: Human Risk Group vs Pathogen Type")\
.update_xaxes(categoryorder= "total ascending").update_yaxes(categoryorder= "total ascending")

## Visualizing the relationship between containment level (CL) and human risk group (RG) for each pathogen

Let's create a plot that lets us compare both simultaneously using a contour map. A contour map is a plane section of the three-dimensional graph of the function f parallel to the-plane. 

In this case, we are representing 3 variables: pathogen type, CL and RG.


<center><img src="./images/1920px-Courbe_niveau.svg.png" alt="Drawing" style="width: 400px;"/></center>


In [None]:
print(unique_pathogen_name["Pathogen-Type"].unique())
pat_type = "Virus"
category = unique_pathogen_name[unique_pathogen_name["Pathogen-Type"]==pat_type]
px.density_contour(category,
                   'Human-Risk-Group','Pathogen-Containment-Level'
                   ,title="Density contour plot for " + pat_type + " displaying containment level and human risk group")\
#.update_yaxes(categoryorder= "total ascending").update_xaxes(categoryorder= "total ascending")

## Categorizing Risk Groups and Containment Levels

| Risk Group| Containment Level | Risk to Individual | Risk to Community |
| -| - | - | -| 
| RG1 | CL1|Low | Low |
| RG2| CL2 | Moderate | Low |
| RG3| CL3 | High | Low |
| RG4| CL4 | High | High |

In general, the containment level and risk group of the pathogen are the same (e.g., RG2 pathogens are handled at CL2); however, there are exceptions. 

We saw from the contour maps that indeed, most of the times the relationship above holds, but it is not a clear-cut rule. 

## Summary

This notebook was a brief introduction to visualizing data using Python and Jupyter notebooks. 

We explored the ePATHogen Risk Group Database, and learned more about how each of the pathogens are classified according to Human Risk Group and Containment Levels. 

We learned that SARS-CoV-2 (the pathogen responsible for causing COVID-19) is a virus with Human Risk RG3 and containment level CL3. 




![](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)