# COVID-19 in Ireland
__Alessandra Ravida__ - _UCDPA Certificate in Introductory Data Analytics - January 2022_

***

## Abstract
In this project I am analysing Covid-19 datasets in Ireland. Firstly I'll explore the datasets per se, looking for interesting patterns, and then I'll try to contextualise patterns and explore correlations with other data (such as infection trends in other countries, temperature and weather conditions, and the impact of some govenrnment restrictions on some specific cohorts).

***

## Coding

__Import all relevant packages__

In [3]:
#Here I import requests, numpy, pandas, matplotlib and seaborn

import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

__Importing COVID-19 Dataset for Ireland__
[Click here for dataset source](https://covid-19.geohive.ie/datasets/d8eb52d56273413b84b0187a4e9117be/explore?showTable=true)

In [19]:
#This dataset is a csv file saved locally, I import it as Pandas DataFrame.

covid_IRL = pd.read_csv('/Users/Alessandra/Dropbox/Data Analytics/Project/UCDPA_AlessandraRavida/Data/COVID-19_HPSC_Detailed_Statistics_Profile.csv')

__Understanding and Cleaning the dataset__

In [25]:
#I start analysing structure of the df
covid_IRL.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 673 entries, 0 to 672
Data columns (total 41 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   X                            673 non-null    float64
 1   Y                            673 non-null    float64
 2   Date                         673 non-null    object 
 3   ConfirmedCovidCases          673 non-null    int64  
 4   TotalConfirmedCovidCases     673 non-null    int64  
 5   ConfirmedCovidDeaths         502 non-null    float64
 6   TotalCovidDeaths             673 non-null    int64  
 7   StatisticsProfileDate        673 non-null    object 
 8   CovidCasesConfirmed          671 non-null    float64
 9   HospitalisedCovidCases       671 non-null    float64
 10  RequiringICUCovidCases       671 non-null    float64
 11  HealthcareWorkersCovidCases  671 non-null    float64
 12  ClustersNotified             651 non-null    float64
 13  HospitalisedAged5   

In [20]:
#look at the first 5 rows of the dataset

covid_IRL.head()

Unnamed: 0,X,Y,Date,ConfirmedCovidCases,TotalConfirmedCovidCases,ConfirmedCovidDeaths,TotalCovidDeaths,StatisticsProfileDate,CovidCasesConfirmed,HospitalisedCovidCases,...,CommunityTransmission,CloseContact,TravelAbroad,FID,HospitalisedAged65to74,HospitalisedAged75to84,HospitalisedAged85up,Aged65to74,Aged75to84,Aged85up
0,-7.692596,53.288234,2020/02/29 00:00:00+00,1,1,0.0,0,2020/02/27 00:00:00+00,,,...,0,0,0,1,,,,,,
1,-7.692596,53.288234,2020/03/03 00:00:00+00,1,2,0.0,0,2020/03/01 00:00:00+00,,,...,0,0,0,2,,,,,,
2,-7.692596,53.288234,2020/03/04 00:00:00+00,4,6,0.0,0,2020/03/02 00:00:00+00,1.0,0.0,...,0,0,0,3,0.0,0.0,0.0,0.0,0.0,0.0
3,-7.692596,53.288234,2020/03/05 00:00:00+00,7,13,0.0,0,2020/03/03 00:00:00+00,2.0,1.0,...,0,0,0,4,0.0,0.0,0.0,0.0,0.0,0.0
4,-7.692596,53.288234,2020/03/06 00:00:00+00,5,18,0.0,0,2020/03/04 00:00:00+00,5.0,4.0,...,0,0,0,5,0.0,0.0,0.0,0.0,0.0,0.0


In [32]:
#First date in dataset is 29th of Feb 2020
#I now check what's the last date

covid_IRL.loc[672 , "Date"]

'2022/01/03 00:00:00+00'

In [18]:
#There is a mixture of NaN and 0.0 in the df.
#I check more rows to see if I can convert the NaN to 0.0 across the board

covid_IRL.head(20)

Unnamed: 0,X,Y,Date,ConfirmedCovidCases,TotalConfirmedCovidCases,ConfirmedCovidDeaths,TotalCovidDeaths,StatisticsProfileDate,CovidCasesConfirmed,HospitalisedCovidCases,...,CommunityTransmission,CloseContact,TravelAbroad,FID,HospitalisedAged65to74,HospitalisedAged75to84,HospitalisedAged85up,Aged65to74,Aged75to84,Aged85up
0,-7.692596,53.288234,2020/02/29 00:00:00+00,1,1,0.0,0,2020/02/27 00:00:00+00,,,...,0,0,0,1,,,,,,
1,-7.692596,53.288234,2020/03/03 00:00:00+00,1,2,0.0,0,2020/03/01 00:00:00+00,,,...,0,0,0,2,,,,,,
2,-7.692596,53.288234,2020/03/04 00:00:00+00,4,6,0.0,0,2020/03/02 00:00:00+00,1.0,0.0,...,0,0,0,3,0.0,0.0,0.0,0.0,0.0,0.0
3,-7.692596,53.288234,2020/03/05 00:00:00+00,7,13,0.0,0,2020/03/03 00:00:00+00,2.0,1.0,...,0,0,0,4,0.0,0.0,0.0,0.0,0.0,0.0
4,-7.692596,53.288234,2020/03/06 00:00:00+00,5,18,0.0,0,2020/03/04 00:00:00+00,5.0,4.0,...,0,0,0,5,0.0,0.0,0.0,0.0,0.0,0.0
5,-7.692596,53.288234,2020/03/07 00:00:00+00,1,19,0.0,0,2020/03/05 00:00:00+00,8.0,7.0,...,0,0,0,6,0.0,0.0,0.0,0.0,0.0,0.0
6,-7.692596,53.288234,2020/03/08 00:00:00+00,2,21,0.0,0,2020/03/06 00:00:00+00,13.0,9.0,...,0,0,0,7,0.0,0.0,0.0,0.0,0.0,0.0
7,-7.692596,53.288234,2020/03/09 00:00:00+00,3,24,0.0,0,2020/03/07 00:00:00+00,16.0,11.0,...,0,0,0,8,0.0,0.0,0.0,0.0,0.0,0.0
8,-7.692596,53.288234,2020/03/10 00:00:00+00,10,34,0.0,0,2020/03/08 00:00:00+00,18.0,13.0,...,0,0,0,9,2.0,0.0,0.0,2.0,0.0,0.0
9,-7.692596,53.288234,2020/03/11 00:00:00+00,9,43,1.0,1,2020/03/09 00:00:00+00,25.0,18.0,...,0,0,0,10,2.0,0.0,0.0,3.0,0.0,0.0


In [36]:
""" I believe that for the first few days they might not have 
tested all hospitalised patients for COVID hence it's more correct 
to report those values as NaN rather than 0.0, however I think that
for the purposese of my analysis, it's safe to convert NaN into 0.0."""

#I check in how many more instances I find NaN in the dataset

covid_IRL.isna().sum()

X                                0
Y                                0
Date                             0
ConfirmedCovidCases              0
TotalConfirmedCovidCases         0
ConfirmedCovidDeaths           171
TotalCovidDeaths                 0
StatisticsProfileDate            0
CovidCasesConfirmed              2
HospitalisedCovidCases           2
RequiringICUCovidCases           2
HealthcareWorkersCovidCases      2
ClustersNotified                22
HospitalisedAged5                2
HospitalisedAged5to14            2
HospitalisedAged15to24           2
HospitalisedAged25to34           2
HospitalisedAged35to44           2
HospitalisedAged45to54           2
HospitalisedAged55to64           2
Male                             2
Female                           2
Unknown                          2
Aged1to4                         2
Aged5to14                        2
Aged15to24                       2
Aged25to34                       2
Aged35to44                       2
Aged45to54          

In [None]:
"""The column with most NaN is ConfirmedCovidDeaths, 
this i due to the fact that the reporting system for 
COVID deaths has changed in the summer from a daily
to a weekly report, hence numbers appear now only 
once a week."""

__Note:__ It might be better to analyse data on a weekly bases rather than on a dayly basis

In [115]:
#replacing NaN with 0 and check if successful

covid_IRL.fillna(0)
covid_IRL.isna().sum()

X                               0
Y                               0
Date                            0
ConfirmedCovidCases             0
TotalConfirmedCovidCases        0
ConfirmedCovidDeaths            0
TotalCovidDeaths                0
StatisticsProfileDate           0
CovidCasesConfirmed             2
HospitalisedCovidCases          2
RequiringICUCovidCases          2
HealthcareWorkersCovidCases     2
ClustersNotified               22
HospitalisedAged5               2
HospitalisedAged5to14           2
HospitalisedAged15to24          2
HospitalisedAged25to34          2
HospitalisedAged35to44          2
HospitalisedAged45to54          2
HospitalisedAged55to64          2
Male                            2
Female                          2
Unknown                         2
Aged1to4                        2
Aged5to14                       2
Aged15to24                      2
Aged25to34                      2
Aged35to44                      2
Aged45to54                      2
Aged55to64    

In [121]:
covid_IRL.isnull().sum()

X                               0
Y                               0
Date                            0
ConfirmedCovidCases             0
TotalConfirmedCovidCases        0
ConfirmedCovidDeaths            0
TotalCovidDeaths                0
StatisticsProfileDate           0
CovidCasesConfirmed             2
HospitalisedCovidCases          2
RequiringICUCovidCases          2
HealthcareWorkersCovidCases     2
ClustersNotified               22
HospitalisedAged5               2
HospitalisedAged5to14           2
HospitalisedAged15to24          2
HospitalisedAged25to34          2
HospitalisedAged35to44          2
HospitalisedAged45to54          2
HospitalisedAged55to64          2
Male                            2
Female                          2
Unknown                         2
Aged1to4                        2
Aged5to14                       2
Aged15to24                      2
Aged25to34                      2
Aged35to44                      2
Aged45to54                      2
Aged55to64    