# Group 1 - Data Project - Covid-19  

> **Note the following: AND WE SHOULD DELETE THIS EVENTUALLY ** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code from [lecture 5](https://numeconcopenhagen.netlify.com/lectures/Workflow_and_debugging).
> 1. Remember this [guide](https://www.markdownguide.org/basic-syntax/) on markdown and (a bit of) latex.
> 1. Turn on automatic numbering by clicking on the small icon on top of the table of contents in the left sidebar.
> 1. The `dataproject.py` file includes a function which will be used multiple times in this notebook.

Imports and set magics:

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
import folium
import plotly.express as px
import requests
import numpy as np 
import ipywidgets as widgets
from matplotlib_venn import venn2 # install with pip install matplotlib-venn
from ipywidgets import interact, interactive, fixed, interact_manual

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# local modules
import dataproject

# Read and clean data

## Covid-19 data retrieved from The Humanitarian Data Exchange collected by the John Hopkin's Hospital. We are using data on confirmed covid-19 cases, deaths due to covid-19, recovered patients of covid-19 and data on each individual country ##

**Read the CSSEGIS data** on covid-19 retrieved from the official data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL). The data is **cleaned**, removing and renaming columns:

In [2]:
# a. Loading data
death = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
country = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv')


# c. Renaming country/region to country
confirmed = confirmed.rename(columns={'Country/Region': 'Country'})
recovered = recovered.rename(columns={'Country/Region': 'Country'})
death = death.rename(columns={'Country/Region': 'Country'})
country = country.rename(columns={'Country_Region': 'Country'})

# d. Droping columns
drop_these = ['Province/State', 'Lat', 'Long']
confirmed.drop(drop_these, axis=1, inplace=True)
recovered.drop(drop_these, axis=1, inplace=True)
death.drop(drop_these, axis=1, inplace=True)

**Visualising the worst-hit countries**


In [3]:
sorted_country = country.sort_values('Confirmed', ascending= False)

def highlight_col(x):
    b = 'background-color: blue'
    d = 'background-color: darkblue'
    g = 'background-color: green'
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)
    df1.iloc[:, 4] = d
    df1.iloc[:, 5] = b
    df1.iloc[:, 6] = g
    return df1

def show_latest_cases(n):
    n = int(n)
    return country.sort_values('Confirmed', ascending= False).head(n).style.apply(highlight_col, axis=None)

interact(show_latest_cases, n='10')

interactive(children=(Text(value='10', description='n'), Output()), _dom_classes=('widget-interact',))

<function __main__.show_latest_cases(n)>

In [4]:
def bubble_chart(n):
    fig = px.scatter(sorted_country.head(n), x="Country", y="Confirmed", size="Confirmed", color="Country",
               hover_name="Country", size_max=60)
    fig.update_layout(
    title=str(n) +" Worst hit countries",
    xaxis_title="Countries",
    yaxis_title="Confirmed Cases",
    width = 700
    )
    fig.show()
interact(bubble_chart, n=10)

interactive(children=(IntSlider(value=10, description='n', max=30, min=-10), Output()), _dom_classes=('widget-…

<function __main__.bubble_chart(n)>

The plot shows the worst hit countries in terms of confirmed cases. It is evident that the US is experiencing the highest number of cases. To fully grasp what countries are worst hit one would need to look at the numbers relative to the sizes of the populations. 

** Visualisation of worst affected countries in terms of deaths**

In [5]:
px.bar(
    sorted_country.head(10),
    x = "Country",
    y = "Deaths",
    title= "10 Countries most affected by Covid-19", # the axis names
    color_discrete_sequence=["blue"], 
    height=500,
    width=800
)

It is evident that among the most affected countries the US, Spain, Italy and France are the countries that are worst hit by the pandemic in terms of deaths caused by covid-19. 