# Group 1 - Data Project - Covid-19  

> **Note the following: AND WE SHOULD DELETE THIS EVENTUALLY ** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code from [lecture 5](https://numeconcopenhagen.netlify.com/lectures/Workflow_and_debugging).
> 1. Remember this [guide](https://www.markdownguide.org/basic-syntax/) on markdown and (a bit of) latex.
> 1. Turn on automatic numbering by clicking on the small icon on top of the table of contents in the left sidebar.
> 1. The `dataproject.py` file includes a function which will be used multiple times in this notebook.

Imports and set magics:

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
import folium
import plotly.express as px
import requests
import numpy as np 
import ipywidgets as widgets
from matplotlib_venn import venn2 # install with pip install matplotlib-venn
from ipywidgets import interact, interactive, fixed, interact_manual

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# local modules
import dataproject

# Read and clean data

## Covid-19 data retrieved from The Humanitarian Data Exchange collected by the John Hopkin's Hospital. We are using data on confirmed covid-19 cases, deaths due to covid-19, recovered patients of covid-19 and data on each individual country ##

**Read the CSSEGIS data** on covid-19 retrieved from the official data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL). The data is **cleaned**, removing and renaming columns:

In [3]:
# a. Loading data
death = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
country = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')

# b. Renaming the column names to lowercase
country.columns = map(str.lower, country.columns)
confirmed.columns = map(str.lower, confirmed.columns)
death.columns = map(str.lower, death.columns)
recovered.columns = map(str.lower, recovered.columns)

# c. Renaming country/region to country
confirmed = confirmed.rename(columns={'country/region': 'country'})
recovered = recovered.rename(columns={'country/region': 'country'})
death = death.rename(columns={'country/region': 'country'})
country = country.rename(columns={'country_region': 'country'})

# d. Droping columns
drop_these = ['province/state', 'lat', 'long']
confirmed.drop(drop_these, axis=1, inplace=True)
recovered.drop(drop_these, axis=1, inplace=True)
death.drop(drop_these, axis=1, inplace=True)

**Visualising the worst-hit countries**


In [7]:
sorted_country = country.sort_values('confirmed', ascending= False)

def highlight_col(x):
    b = 'background-color: blue'
    d = 'background-color: darkblue'
    g = 'background-color: green'
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)
    df1.iloc[:, 4] = d
    df1.iloc[:, 5] = b
    df1.iloc[:, 6] = g
    return df1

def show_latest_cases(n):
    n = int(n)
    return country.sort_values('confirmed', ascending= False).head(n).style.apply(highlight_col, axis=None)

widgets.interact(show_latest_cases, n='10')

Unnamed: 0,country,last_update,lat,long_,confirmed,deaths,recovered,active
17,US,2020-04-04 09:36:23,40.0,-100.0,278458,7159,9897,0
10,Italy,2020-04-04 09:36:02,41.8719,12.5674,119827,14681,19758,85388
158,Spain,2020-04-04 09:36:02,40.4637,-3.74922,119199,11198,30513,77488
7,Germany,2020-04-04 09:36:02,51.1657,10.4515,91159,1275,24575,65309
6,France,2020-04-04 09:36:02,46.2276,2.2137,83029,6520,14135,62374
3,China,2020-04-04 08:37:37,30.5928,114.305,82526,3330,76942,2254
89,Iran,2020-04-04 09:36:02,32.4279,53.688,53183,3294,17935,31954
16,United Kingdom,2020-04-04 09:36:02,55.0,-3.0,38697,3611,209,34877
170,Turkey,2020-04-04 09:36:02,38.9637,35.2433,20921,425,484,20012
15,Switzerland,2020-04-04 09:36:02,46.8182,8.2275,19702,604,4846,14252


<function __main__.show_latest_cases(n)>

In [13]:
px.bar(
    sorted_country.head(15),
    x = "country",
    y = "deaths",
    title= "10 Countries most affected by Covid-19", # the axis names
    color_discrete_sequence=["blue"], 
    height=500,
    width=800
)