# Programming assignment 1: Visualizing COVID-19 data

## Disclaimer!!

##### - This is not an epidemiology study. It is simply a visualization exercise. Some epidemiology terms may not be correct. related to this [tweet.](https://twitter.com/tobiasgalla/status/1245612214632902656)
##### - ”Not enough testing was done in my country” is a true statement for every country/province/county/city. This is the data we have to work and infer some understanding of what’s happening. Data is always incomplete and inaccurate.
##### - Please use only the raw data I share. This makes marking consistent for the TAs. You can use more recent data to further customize your own dashboard outside this assignment.

# Geographic distribution of worldwide COVID-19 data

Your team wants to develop a dashboard to present daily visualizations of COVID-19 data collected from the [European Centre for Disease Prevention and Control.](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide) Every day your team downloads the .xlsx file from the website, reads it, perform some transformations and produces several visualizations that are presented in the dashboard. 

Each row of the table contains the number of new cases and deaths reported per day and country. Every section below describes a specific task or visualization to perform.

# 6. Interactive Visualizations: Covid_F_Bokeh

So far, we have only seen dynamic visualizations; plots with time in x-axis. The challenges
are obvious: too many lines, too many colours, difficulty in identifying the countries. In
this Section, we will use [Bokeh](https://bokeh.org/) to plot the current rates per 1M (i.e. the last day for every
country). This is considered a static plot since it only shows the current state of the country
and not the dynamics. Bokeh however can be used to generate the plots described in the
previous sections.

Bokeh is a Python package that enables you to make simple, complex and specialized
interactive plots. The core data structure in Bokeh is called **ColumnDataSource.** Here,
we will convert the pandas dataframe to a ColumnDataSource to be able to build a Bokeh
plot. The main building block of Bokeh plots is called **glyphs.** These are the circles, squares
and other symbols you use to create plots.

In file covid f bokeh.ipynb we follow the steps:

## 1. Import necessary packages and functions. Look in the code below to see where each function is used.


## 2. Read the master table with the cumulative cases.

In [1]:
# Data from here:
# https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
from adjustText import adjust_text
from matplotlib.ticker import ScalarFormatter

from bokeh.models import ColumnDataSource, LabelSet, Arrow, NormalHead, OpenHead, VeeHead, ColorBar, Span
from bokeh.models import Label, Title, NumeralTickFormatter
from bokeh.plotting import figure, output_file, show
from bokeh.transform import linear_cmap
from bokeh.palettes import RdBu, Spectral10, Spectral6
from bokeh.io import curdoc

In [2]:
# READ, RENAME
cumalative_covid = pd.read_excel("Covid_A_Master_Cumalative_Table.xlsx")
cumalative_covid.drop(columns=['Unnamed: 0'], inplace=True)

## 3. Filter the last day of the results and the top N = 50 countries in terms of cases.

In [3]:
# N: HOW MANY TOP COUNTRIES YOU WANT TO PICK IN TERMS OF # OF CASES
N = 50
last_day = cumalative_covid.sort_values('Date', ascending=True).groupby('Country').last()
last_day['Country'] = last_day.index
last_day.sort_values('Cases', ascending=False, inplace=True)

top_countries = last_day[:N]['Country'].tolist() + ['Greece']
top_countries_covid = cumalative_covid[cumalative_covid.Country.isin(top_countries)]

## 4. Calculate 3 relevant metrics: Case fatality rate, Infection rate per 1M and Mortality rate per 1M.

In [4]:
max_cases = top_countries_covid.groupby(by=['Country']).max()
max_cases['Case_fatality_rate'] = max_cases['Deaths']/max_cases['Cases']*100
max_cases.sort_values('Cases', ascending=False)
max_cases['Infection_rate_per_1M'] = max_cases['Cases']/max_cases['Population']*1000000
max_cases['Mortality_rate_per_1M'] = max_cases['Deaths']/max_cases['Population']*1000000
# max_cases.to_excel('covid_b_cases_cumulative_table.xlsx')

## 5. Create a new column to represent the color according to the continent of each country.

In [5]:
max_cases.loc[max_cases['Continent'] == 'Asia', 'Color'] =  Spectral10[0]
max_cases.loc[max_cases['Continent'] == 'Europe', 'Color'] =  Spectral10[3]
max_cases.loc[max_cases['Continent'] == 'America', 'Color'] =  Spectral10[6]
max_cases.loc[max_cases['Continent'] == 'Africa', 'Color'] =  Spectral10[9]

## 6. Calculate the average values for the 3 relevant metrics. These will be used as baselines in the graphs we will plot later.

In [6]:
Average_infection_rate_per_1m = max_cases['Cases'].sum()/max_cases['Population'].sum()*1000000
print("The average infection rate is: {:.1f} per 1m people".format(Average_infection_rate_per_1m))

Average_death_rate_per_1m = max_cases['Deaths'].sum()/max_cases['Population'].sum()*1000000
print("The average death rate is: {:.1f} per 1m people".format(Average_death_rate_per_1m))

Average_case_fatality_rate = max_cases['Deaths'].sum()/max_cases['Cases'].sum()*100
print("The average mortality rate is: {:.1f} per 100 people infected".format(Average_case_fatality_rate))

The average infection rate is: 3852.8 per 1m people
The average death rate is: 134.2 per 1m people
The average mortality rate is: 3.5 per 100 people infected


## 7. Generate the ColumnDataSource data structure from the pandas dataframe max cases. This is the data structure needed for further plotting in Bokeh.

In [7]:
source_1 = ColumnDataSource(max_cases)

In [8]:
source_2 = ColumnDataSource(max_cases)

## 8. Here, we will configure all the components needed to generate the Bokeh figure:

- Define the title and tools we want to include.
- Create a new Figure for plotting with the arguments defined in the parenthesis.
- Configure colours, axis labels, fontsize and number format.
- Call the **glyph** method **circle** to plot the x-y in source 1 data structure. Use ’Continent’ as the legend, the respective ’Color’ as the circle color and black color as the line of each circle.
- Configure the information we want to present when **hovering** over the different data points. Define variables and formats.
- **Annotate with labels:** use the country names text to annotate each circle. Some offset is defined along with other properties of the text such as font size, color and style. The annotation is considered an additional component or layout that is arranged or overlaid along with many other components of the plot. The **add layout** function adds the newly created component to the existing plot.
- **Annotate with horizontal and vertical lines** representing the average x and y (hline and vline, respectively). The lines extend to the edge of the plot area.
- **Add text annotation** to explain what the dashed lines show.
- **Output the file** as an **.html file.** This type of file can be used in a website and contains all the information to allow the used to interact with it.
- Finally, we show the plot generated in a new tab.

In [9]:
TITLE = "Infection Rate vs Case Fatality Rate"
TOOLS = "hover, pan, wheel_zoom, box_zoom, reset, save"
p = figure(title=TITLE, tools=TOOLS, toolbar_location="above", plot_width=1200, x_axis_type="log")
p.toolbar.logo = "grey"
p.background_fill_color = "#dddddd"

p.xaxis.axis_label = "Infection rate per 1M"
p.yaxis.axis_label = "Case fatality rate (%)"
p.xaxis.axis_label_text_font_size = "14pt"
p.yaxis.axis_label_text_font_size = "14pt"
p.xaxis.major_label_text_font_size = "14pt"
p.yaxis.major_label_text_font_size = "14pt"
p.xaxis.formatter = NumeralTickFormatter(format="0,0")
p.grid.grid_line_color = "white"

# #Use the field name of the column source
# mapper = linear_cmap('Infection_rate_per_1m', 'Turbo256', max_cases['Infection_rate_per_1m'].min(), max_cases['Infection_rate_per_1m'].max())

p.circle("Infection_rate_per_1M", "Case_fatality_rate", size=12, source=source_1, 
#          fill_color=linear_cmap('Infection_rate_per_1m', 'Turbo256', max_cases['Infection_rate_per_1m'].min(), max_cases['Infection_rate_per_1m'].max()),
         legend_group='Continent',
         color='Color',
         line_color='black',
         fill_alpha=0.8)

p.hover.tooltips = [
    ("Country", "@{Country}"),
    ("Infection rate per 1m", "@Infection_rate_per_1M{0,0}"),
    ("Case fatality rate (%)", "@Case_fatality_rate{0.1}" ),
    ("Cases", "@Cases{0,0}"),
    ("Deaths", "@Deaths{0,0}"),
]

labels = LabelSet(x="Infection_rate_per_1M", y="Case_fatality_rate", text="Country", y_offset=8,
                  text_font_size="11px", text_color="black", text_font_style='bold',
                  source=source_1, text_align='center')
p.add_layout(labels)

hline = Span(location=Average_case_fatality_rate,
             dimension='width', line_color='black',
             line_dash='dashed', line_width=1)
p.add_layout(hline)

vline = Span(location=Average_infection_rate_per_1m,
             dimension='height', line_color='black',
             line_dash='dashed', line_width=1)
p.add_layout(vline)

citation = Label(x=100, y=12, text='Dashed lines show the average of each axis', 
                 text_font_size="12px", 
                 text_color="black", 
                 text_font_style='bold', 
                 render_mode='css',
                 border_line_color='black', 
                 border_line_alpha=1.0,
                 background_fill_color='white', 
                 background_fill_alpha=1.0)
p.add_layout(citation)

# color_bar = ColorBar(color_mapper=mapper['transform'], width=8,  location=(0,0))
# p.add_layout(color_bar, 'right')

output_file("Infection_Rate_vs_Case_Fatality.html", title="Infection Rate vs Case Fatality")

show(p)

In [10]:
curdoc().clear()

In [11]:
TITLE = "Infection Rate per 1M vs Fatality Rate per 1M"
TOOLS = "hover, pan, wheel_zoom, box_zoom, reset, save"
p = figure(title=TITLE, tools=TOOLS, toolbar_location="above", plot_width=1200, x_axis_type="log", y_axis_type="log")
p.toolbar.logo = "grey"
p.background_fill_color = "#dddddd"

p.xaxis.axis_label = "Infection rate per 1M"
p.yaxis.axis_label = "Fatality rate per 1M"
p.xaxis.axis_label_text_font_size = "14pt"
p.yaxis.axis_label_text_font_size = "14pt"
p.xaxis.major_label_text_font_size = "14pt"
p.yaxis.major_label_text_font_size = "14pt"
p.xaxis.formatter = NumeralTickFormatter(format="0,0")
p.grid.grid_line_color = "white"

# #Use the field name of the column source
# mapper = linear_cmap('Infection_rate_per_1m', 'Turbo256', max_cases['Infection_rate_per_1m'].min(), max_cases['Infection_rate_per_1m'].max())

p.circle("Infection_rate_per_1M", "Mortality_rate_per_1M", size=12, source=source_2, 
#          fill_color=linear_cmap('Infection_rate_per_1m', 'Turbo256', max_cases['Infection_rate_per_1m'].min(), max_cases['Infection_rate_per_1m'].max()),
         legend_group='Continent',
         color='Color',
         line_color='black',
         fill_alpha=0.8)

p.hover.tooltips = [
    ("Country", "@{Country}"),
    ("Infection rate per 1m", "@Infection_rate_per_1M{0,0}"),
    ("Mortality rate per 1M", "@Mortality_rate_per_1M{0,0}" ),
    ("Cases", "@Cases{0,0}"),
    ("Deaths", "@Deaths{0,0}"),
]

labels = LabelSet(x="Infection_rate_per_1M", y="Mortality_rate_per_1M", text="Country", y_offset=8,
                  text_font_size="11px", text_color="black", text_font_style='bold',
                  source=source_2, text_align='center')
p.add_layout(labels)

hline = Span(location=Average_death_rate_per_1m,
             dimension='width', line_color='black',
             line_dash='dashed', line_width=1)
p.add_layout(hline)

vline = Span(location=Average_infection_rate_per_1m,
             dimension='height', line_color='black',
             line_dash='dashed', line_width=1)
p.add_layout(vline)

citation = Label(x=100, y=1000, text='Dashed lines show the average of each axis', 
                 text_font_size="12px", 
                 text_color="black", 
                 text_font_style='bold', 
                 render_mode='css',
                 border_line_color='black', 
                 border_line_alpha=1.0,
                 background_fill_color='white', 
                 background_fill_alpha=1.0)
p.add_layout(citation)

# color_bar = ColorBar(color_mapper=mapper['transform'], width=8,  location=(0,0))
# p.add_layout(color_bar, 'right')

output_file("Infection_Rate_per_1M_vs_Fatality_Rate_per_1M.html", title="Infection Rate per 1M vs Fatality Rate per 1M")

show(p)