<a href="https://colab.research.google.com/github/MC-Codingcat/NL-COVID-19-Data-Analysis/blob/main/NL_COVID_19_Analysis_Trend.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NL COVID-19 Analysis: Trend**
This project explores time series analysis using the COVID data from RIVM (National Institute for Public Health and the Environment). The data includes dates of report and related timestamps.

Data source: [*RIVM-Covid-19 cumulatieve aantallen per gemeente*](https://data.rivm.nl/meta/srv/api/records/1c0fcd57-1102-4620-9cfa-441e93ea5604)

The data has been cleaned before applying this analysis. The process of dropping empty cells and handling abnormal values is included in the other project - [*NL COVID-19 Analysis: Relevance*](https://colab.research.google.com/github/MC-Codingcat/NL-COVID-19-Data-Analysis/blob/main/NL_COVID_19_Analysis_Relevance.ipynb)
<br><br/>
____
### **Important data descriptions:**
(Also see the explanation here on [*Overheid.nl*](https://data.overheid.nl/en/dataset/11508-covid-19-aantallen-gemeente-cumulatief))
* The Netherlands has reached an endemic phase for the SARS-CoV-2 virus (coronavirus) and the GGD test streets will be closed as of **March 17, 2023**. As a result, the data has not been updated since April 1, 2023.
* As of January 1, 2023, RIVM no longer collected additional information. There has no longer been deaths reported since January 1, 2023. The [Deceased] column has been set to 9999 since **January 1, 2023**.
* The variable 'hospital_admission' has no longer been updated and has been given the value 9999 for records with [Date_of_report] from **January 18, 2022**. For the number of hospital admissions, reference is made to the registered hospital admissions of the [NICE Foundation](https://data.rivm.nl/covid-19/COVID-19_ziekenhuiss.html).
<br><br/>


##**The Imports**

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

#allow reading data files from google drive
from google.colab import drive
drive.mount('/content/drive')

# filter warnings
import warnings
warnings.filterwarnings('ignore')

Mounted at /content/drive


In [None]:
from google.colab import drive
drive.mount('/content/drive')

##**Get the Data**

In [2]:
covid = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/NL Covid-19 analysis/cumulative-gm.csv',sep=',',header=0)
covid.head()

Unnamed: 0.1,Unnamed: 0,Date_of_report,Municipality_name,Province,Total_reported,Hospital_admission,Deceased
0,0,2020-03-13 10:00:00,Appingedam,Groningen,0,0.0,0.0
1,1,2020-03-13 10:00:00,Delfzijl,Groningen,0,0.0,0.0
2,2,2020-03-13 10:00:00,Groningen,Groningen,3,0.0,0.0
3,3,2020-03-13 10:00:00,Loppersum,Groningen,0,0.0,0.0
4,4,2020-03-13 10:00:00,Almere,Flevoland,1,1.0,0.0


In [3]:
covid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 389421 entries, 0 to 389420
Data columns (total 7 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   Unnamed: 0          389421 non-null  int64  
 1   Date_of_report      389421 non-null  object 
 2   Municipality_name   389421 non-null  object 
 3   Province            389421 non-null  object 
 4   Total_reported      389421 non-null  int64  
 5   Hospital_admission  238852 non-null  float64
 6   Deceased            358641 non-null  float64
dtypes: float64(2), int64(2), object(3)
memory usage: 20.8+ MB


Currently, the number of reported cases is **_cumulative_**. To apply a time series analysis, the data of the cases reported per day is needed. This feature can be created with the **.diff()** method.