# Covid 19 Project

Welcome to my covid-19 data project where I will be exploring Covid-19 data and answering questions around the effective of different countries policy measures and which countries have been hit the hardest by the pandemic. I will be also looking at underlaying health issues and it's effect on the death rate. Finally, I'll be used data visualisation libraries to plot and bring some the data alvie!
Python libraries used in this project:
1. Pandas
2. Numpy
3. Matplotlib - %matplotlib inline allows the plot to show in the notebook
4. Plotly

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.graph_objects as go 
import plotly.express as px

In [None]:
## Get covid 19 data from owid.org
c19_data = 'owid-covid-data.csv'
df = pd.read_csv(c19_data)
df.head()

In [None]:
## Create an unique dataframe with just the country names 
country_df = df.location.unique()
country_df.size
## Selecting specific country (Ghana) & total cases
get_country = df[df['location'] == 'Ghana'].index
ghana_tc = df.loc[get_country, 'date':'total_cases']
ghana_tc.set_index('date', inplace=True)
ghana_tc.dropna()
ghana_tc

In [None]:
## Plotting Ghana's total cases using matplotlib, with moving average
ghana_tc.plot(figsize=(12,6))
ghana_tc.rolling(window=7).mean()['total_cases'].plot()

# Policy Measure: Looking at the effectiveness of a lockdown

This section attempts to look at the effectiveness of a national lockdown at anytime time during the pandemic. To look at the effectiveness of the lockdown, I'll take a dive into the United Kingdom and Sweden. During the peak of the pandemcic, the United Kindgom decided to implement a national lockdown whereas Sweden opted against such measure. 

In [None]:
def plot_covid_data(country, col, plot_ma=False, y_max=3000):
    get_country = df[df['location'] == country].index
    world_df = df.loc[get_country, 'date':col]
    world_df.set_index('date', inplace=True)
    world_df.dropna()
    world_df.drop(world_df.columns.difference([col]), 1, inplace=True)
    world_df.plot(figsize=(12,6), ylim=[0, y_max])
    
    if plot_ma:
        world_df.rolling(window=7).mean()[col].plot()

plot_covid_data('United Kingdom','new_cases_per_million', True, 1000)
plot_covid_data('Sweden','new_cases_per_million', True, )

# Population: Looking at the medium age
In this section, we look at the corelation between the medium age and new deaths per million. Research suggests that young people are more likely to recover from the virus as oppose to people aged 60 and over. To answer this question we will look at two countries that sit at the opposite ends of the spectrum with regards to medium age. Country one is Germany, with the medium age of 47.1 and population of 83 million and the second country is South Korea, with a medium age of 41.8 and a population of 52 milllion.

In [None]:
plot_covid_data('Germany','new_deaths_per_million', True, 15)
plot_covid_data('South Korea','new_deaths_per_million', True, 1)

# Obesity: Underlying Health issues and the effect on death rate
This section explains how underlying health issues, such as obesity, effects the death rate. With regards to Covid, research suggest that those with underlying health, in particular respiratory and cardiovascular issues, are at greater risk of the virus. Canada is a country that has a prevalant obesity issue throughout the country while, India has a relatively low obesity rate.

In [None]:
plot_covid_data('Canada','new_deaths_per_million', True, 8)
plot_covid_data('India','new_deaths_per_million', True, 2.5)

# Testing Measures
Lastly, this section takes a look at the argument for the need of extensive testing on all aspects of living, such as airports, care homes & hospitals. Frequent testing allows countries to pick up and trace the virius more effectively, where signs of surges may occur. Government states can then react to the testing data and implement new restrictions to help control the infection rate. UAE is a country that has a widespread testing regime relative to South Africa where testing isn't so common.

In [None]:
plot_covid_data('United Arab Emirates','new_deaths_per_million', True, 1.5)
plot_covid_data('South Africa','new_deaths_per_million', True, 13)

# Scatter plot
This scatter plot will show the correlation between deaths per million and the population aged 65 and over. The data represented in the scatter plot will be pulled from the 02-01-2020 

In [None]:
def scatter_graph(date, col1, col2):
    tot_df = pd.DataFrame(columns = ['date', 'location', col1, col2])
    for c in country_df:
        temp_df = df[(df['location'] == c) & (df['date'] == date)][['date', 'location', col1, col2]]
        tot_df = tot_df.append(temp_df, ignore_index = True).fillna(0)
    tot_df.set_index('date', inplace = True)
    
    fig = px.scatter(data_frame=tot_df, x=col2, y=col1, color='location', trendline='ols', hover_data=['location'])
    fig.show()
    return tot_df

scatter_graph('2021-01-02', 'total_deaths_per_million', 'aged_65_older')