# Bayesian Inference of COVID-19 Cases Evolution in Catalonia

## A Bayesian Approach to Understanding the Spread of the Pandemic.

**Authors:** Robert Torrell Belzach, Nabil El Bachiri

**Date:** 16/05/2024

---

## Abstract

This report describes the initial study results of applying a Bayesian approach to define a Statistical model that explains the spread, damage and effects of the contagious COVID-19 virus though the whole Catalonial territory and society.
The objective of such analysis is to extract how such disease spreads on the population and try to infer what parameters affected more on the efficacy and damage of the virus. By understanding the most influential parameters, we could suggest what improvements and decisions could the Catalonial society take in such health crisis to minimze the negative effects of it.

---

## Introduction

The COVID pandemic has been a historical crisis that has put in check our modern society, left us with horrible damage but united the whole human civilization to try to study, contain and erradicate this dangerous desease. Unique to this pandemic has been the intense and succesful recollection of enormous data related to this virus and the crisis by the scientific community, an international collaboration that granted us lots of interesting insights and informative empirical observations of the nature of a infectious desease, either on the spead and damage of the desease though society but also on the repercussions of it in all levels of our lifes: Health, economic, psychological and even cultural effects and even aftermarks related to this.

In this project, we will try to find interesting insighs on this data and define, using a Bayesian approach, a statistical model that describes how such desease spreads and affects society and, in this case, on the whole Catalonial territory and it's citizens. 
By doing such "post-crisis" analysis, we could extract useful information that could helps us prevent the potentially horrible damage of such another desease arises on our world again. To observe on what we endured, we can improve our society and be better prepared for whatever dangers appear in the future.

From now on, the "COVID desease spread", "COVID damage on society" and the "Other effects of the COVID pandemic on the Catalonial society" will all be considered the "Phenomenons" to study.

## Objective

Specifically, this project will try to:
- Obtain useful empirical datasets that show us the COVID spread and effects on the Catalonial territory and society durign the emergency pandemic period.
  - Perform an initial data exploration process to better understand the properties of the data at hand.
  - Perform the necessary data cleanup, normalization and data preprocess if necessary to better model and process the phenomenons.
- Design different Bayesian models, with increasing complexity on the Hierarchical Bayesian Model structure to try to obtain a valid explanation of the phenomenons.
  - We will try to justify the different Prioris on the models that can reasonably adjust to what could be expected on the desease and the effects
  - Our ideal objective is to develop a single, complex but informative Hierarchical Bayesian Model that explains all three phenomenons and the relation between them.
- Formulate different hypothesis on the phenomenons and try to answer the truthfullness of them with the Inferred Posterior Distribution of the designed models.


## Data exploration

As all data science project, we need to begin the implementation of our computations with the import of our datasets. For this project, we have found a lot of repositories online from hundreds of public institutions that has openly published the data they have recollected about the pandemic. Thanks to their efforts, we will perform our study with millions of observations.

The complete list of data sources has been indexed in the Bibliography section of this report, but summarized, the data comes from these sources:

- Data on the evolution of the virus:
- Data on the damages to society of the virus:
- Data on the repercussions of the pandemic:
- Data on the actions taken by society on the pandemic:

Even though we have compiled all this data, the first models we'll create will only take into account the evolution of the virus, and the more complex models will take the necessary extra data. The data exploration will take into account all the present data.

We will first import all the data as independent Pandas dataframes, each for every CSV file.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import math 

df_casos_comarques_trimestral = pd.read_csv('./Datasets/casos_comarques_trimestral.csv')
df_casos_municipi_trimestral = pd.read_csv('./Datasets/casos_municipi_trimestral.csv')

df_defuncions_comarques_mensual = pd.read_csv('./Datasets/defuncions_comarca_mensual.csv')
df_defuncions_comarques_trimestral = pd.read_csv('./Datasets/defuncions_comarques_trimestral.csv')

df_vacunes_comarques_mensual = pd.read_csv('./Datasets/vacunes_comarques_mensual.csv')

df_mobilitat_comarques_setmanal = pd.read_csv('./Datasets/mobilitat_provincies_setmanal_v3.csv', delimiter=";")



In [2]:
df_vacunes_comarques_mensual.head()

Unnamed: 0,mes,comarques i Aran,concepte,sexe,estat,valor
0,12/2020,Alt Camp,dosis administrades,homes,,7.0
1,12/2020,Alt Camp,dosis administrades,dones,,34.0
2,12/2020,Alt Camp,dosis administrades,total,,41.0
3,12/2020,Alt Camp,persones vacunades amb primera dosi,homes,,7.0
4,12/2020,Alt Camp,persones vacunades amb primera dosi,dones,,34.0


In [3]:
df_casos_comarques_trimestral.head()

Unnamed: 0,trimestre,comarques i Aran,concepte,sexe,estat,valor
0,T1/2020,Alt Camp,casos confirmats,homes,,36.0
1,T1/2020,Alt Camp,casos confirmats,dones,,39.0
2,T1/2020,Alt Camp,casos confirmats,total,,75.0
3,T1/2020,Alt Camp,casos per cada 100 habitants,homes,,0.2
4,T1/2020,Alt Camp,casos per cada 100 habitants,dones,,0.2


In [4]:
df_casos_municipi_trimestral.head()

Unnamed: 0,trimestre,municipis,concepte,sexe,estat,valor
0,T1/2020,Abrera,casos confirmats,homes,,16.0
1,T1/2020,Abrera,casos confirmats,dones,,17.0
2,T1/2020,Abrera,casos confirmats,total,,33.0
3,T1/2020,Abrera,casos per cada 100 habitants,homes,,0.3
4,T1/2020,Abrera,casos per cada 100 habitants,dones,,0.3


In [5]:
df_defuncions_comarques_mensual.head()

Unnamed: 0,mes,comarques i Aran,concepte,estat,valor
0,03/2020,Alt Camp,defuncions,,10
1,03/2020,Alt Empordà,defuncions,,2
2,03/2020,Alt Penedès,defuncions,,22
3,03/2020,Alt Urgell,defuncions,,3
4,03/2020,Alta Ribagorça,defuncions,,0


In [6]:
df_defuncions_comarques_trimestral.head()

Unnamed: 0,trimestre,comarques i Aran,concepte,sexe,estat,valor
0,T1/2020,Alt Camp,defuncions,homes,,6.0
1,T1/2020,Alt Camp,defuncions,dones,,4.0
2,T1/2020,Alt Camp,defuncions,total,,10.0
3,T1/2020,Alt Camp,defuncions per cada 1.000 habitants,homes,,0.3
4,T1/2020,Alt Camp,defuncions per cada 1.000 habitants,dones,,0.2


In [7]:
df_vacunes_comarques_mensual.head()

Unnamed: 0,mes,comarques i Aran,concepte,sexe,estat,valor
0,12/2020,Alt Camp,dosis administrades,homes,,7.0
1,12/2020,Alt Camp,dosis administrades,dones,,34.0
2,12/2020,Alt Camp,dosis administrades,total,,41.0
3,12/2020,Alt Camp,persones vacunades amb primera dosi,homes,,7.0
4,12/2020,Alt Camp,persones vacunades amb primera dosi,dones,,34.0


In [8]:
df_mobilitat_comarques_setmanal.head()

Unnamed: 0,Total Nacional,Comunidades y Ciudades Aut�nomas,Provincias,Islas,Tipo de dato,Periodo,Total
0,Total Nacional,Catalu�a,Barcelona,,Porcentaje de poblaci�n que sale del �rea,29/12/2021,1787
1,Total Nacional,Catalu�a,Barcelona,,Porcentaje de poblaci�n que sale del �rea,26/12/2021,1226
2,Total Nacional,Catalu�a,Barcelona,,Porcentaje de poblaci�n que sale del �rea,22/12/2021,221
3,Total Nacional,Catalu�a,Barcelona,,Porcentaje de poblaci�n que sale del �rea,19/12/2021,15
4,Total Nacional,Catalu�a,Barcelona,,Porcentaje de poblaci�n que sale del �rea,15/12/2021,2476


In [9]:
def transform_by_concepte(df, index):
    pivot_df = df.pivot_table(index=index, columns='concepte', values='valor', aggfunc='sum')
    pivot_df.reset_index(inplace=True)
    return pivot_df

# Function to reorder 'quarter' values
def reorder_quarter(quarter):
    q, year = quarter.split('/')
    return f"{year}/{q}"

def reorder_quarter_df(df, index_str):
    df[index_str] = df[index_str].apply(reorder_quarter)
    df = df.sort_values(by=index_str).reset_index(drop=True)

def parse_date_to_quarter(df, index):
    df['mes'] = df['mes'].apply(lambda x: str(x.split('/')[0]) + '/T' + str(math.trunc(int(x.split('/')[1]) - 1) // 3 + 1) )
    
    df = df.groupby(['mes', 'comarques i Aran', 'concepte', 'sexe']).agg({'valor': 'sum'}).reset_index()

    df = transform_by_concepte(df, index)

    #df_quarterly = df.groupby(['mes', 'comarques i Aran', 'concepte', 'sexe']).agg({'valor': 'sum'}).reset_index()
    
    return df

In [10]:
df_casos_comarques_trimestral_total = df_casos_comarques_trimestral.drop(columns=['estat'])[df_casos_comarques_trimestral["sexe"] == "total"]
df_casos_municipi_trimestral_total = df_casos_municipi_trimestral.drop(columns=['estat'])[df_casos_municipi_trimestral["sexe"] == "total"]
df_defuncions_comarques_trimestral_total = df_defuncions_comarques_trimestral.drop(columns=['estat'])[df_defuncions_comarques_trimestral["sexe"] == "total"]
df_vacunes_comarques_mensual_total = df_vacunes_comarques_mensual.drop(columns=['estat'])[df_vacunes_comarques_mensual["sexe"] == "total"]

reorder_quarter_df(df_casos_comarques_trimestral_total, 'trimestre')
reorder_quarter_df(df_casos_municipi_trimestral_total, 'trimestre')
reorder_quarter_df(df_defuncions_comarques_trimestral_total, 'trimestre')
reorder_quarter_df(df_vacunes_comarques_mensual_total, 'mes')

df_vacunes_comarques_trimestral_total = parse_date_to_quarter(df_vacunes_comarques_mensual_total, ["comarques i Aran", "mes"])
df_vacunes_comarques_trimestral_total.rename(columns={'mes': 'trimestre'}, inplace=True)

df_casos_comarques_trimestral_total = transform_by_concepte(df_casos_comarques_trimestral_total, ["comarques i Aran", "trimestre"])
df_casos_municipi_trimestral_total = transform_by_concepte(df_casos_municipi_trimestral_total, ["municipis", "trimestre"])
df_defuncions_comarques_trimestral_total = transform_by_concepte(df_defuncions_comarques_trimestral_total, ["comarques i Aran", "trimestre"])
df_comarques_trimestral_total = pd.merge(df_casos_comarques_trimestral_total, 
                                                     df_defuncions_comarques_trimestral_total, 
                                                     on=['comarques i Aran', 'trimestre'], how='outer')


merge_columns = df_vacunes_comarques_trimestral_total.columns.difference(df_comarques_trimestral_total.columns)

df_comarques_trimestral_total = pd.merge(df_comarques_trimestral_total, 
                                                     df_vacunes_comarques_trimestral_total, 
                                                     on=['comarques i Aran', 'trimestre'], how='left')

df_comarques_trimestral_total[merge_columns] = df_comarques_trimestral_total[merge_columns].fillna(0)

In [11]:
df_vacunes_comarques_trimestral_total[0:10]

concepte,comarques i Aran,trimestre,dosis administrades,dosis administrades per cada 100 persones,persones vacunades amb dosi addicional,persones vacunades amb primera dosi,persones vacunades amb segona dosi,població vacunada amb dosi addicional (%),població vacunada amb primera dosi (%),població vacunada amb segona dosi (%)
0,Alt Camp,2020/T4,41.0,0.1,0.0,41.0,0.0,0.0,0.1,0.0
1,Alt Camp,2021/T1,12408.0,27.7,0.0,8763.0,3645.0,0.0,19.5,8.1
2,Alt Camp,2021/T2,79735.0,177.8,0.0,52202.0,27533.0,0.0,116.3,61.4
3,Alt Camp,2021/T3,175756.0,391.8,370.0,94939.0,80447.0,0.8,211.5,179.3
4,Alt Camp,2021/T4,211880.0,472.3,16035.0,103894.0,91951.0,35.7,231.5,205.0
5,Alt Camp,2022/T1,267732.0,596.8,56824.0,110804.0,100083.0,126.7,247.1,223.0
6,Alt Camp,2022/T2,273173.0,608.9,59218.0,111083.0,102670.0,132.0,247.6,228.8
7,Alt Empordà,2020/T4,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
8,Alt Empordà,2021/T1,31986.0,22.9,0.0,23391.0,8595.0,0.0,16.8,6.1
9,Alt Empordà,2021/T2,216775.0,155.5,0.0,141007.0,75768.0,0.0,101.1,54.4


In [12]:
df_comarques_trimestral_total

concepte,comarques i Aran,trimestre,casos confirmats,casos per cada 100 habitants,defuncions,defuncions per cada 1.000 habitants,dosis administrades,dosis administrades per cada 100 persones,persones vacunades amb dosi addicional,persones vacunades amb primera dosi,persones vacunades amb segona dosi,població vacunada amb dosi addicional (%),població vacunada amb primera dosi (%),població vacunada amb segona dosi (%)
0,Alt Camp,2020/T1,75.0,0.2,10.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alt Camp,2020/T2,93.0,0.2,21.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alt Camp,2020/T3,277.0,0.6,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alt Camp,2020/T4,932.0,2.1,14.0,0.3,41.0,0.1,0.0,41.0,0.0,0.0,0.1,0.0
4,Alt Camp,2021/T1,838.0,1.9,22.0,0.5,12408.0,27.7,0.0,8763.0,3645.0,0.0,19.5,8.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
425,Vallès Oriental,2021/T2,6163.0,1.5,33.0,0.1,708078.0,171.3,0.0,467561.0,240517.0,0.0,113.1,58.2
426,Vallès Oriental,2021/T3,12111.0,2.9,80.0,0.2,1642530.0,397.4,3195.0,897358.0,741977.0,0.8,217.1,179.5
427,Vallès Oriental,2021/T4,18085.0,4.4,30.0,0.1,1976461.0,478.0,137645.0,986868.0,851948.0,33.3,238.7,206.0
428,Vallès Oriental,2022/T1,66559.0,16.1,132.0,0.3,2488613.0,601.9,508619.0,1047238.0,932254.0,123.0,253.4,225.4


In [13]:
df_casos_municipi_trimestral_total.head()

concepte,municipis,trimestre,casos confirmats,casos per cada 100 habitants
0,Abrera,2020/T1,33.0,0.3
1,Abrera,2020/T2,107.0,0.9
2,Abrera,2020/T3,90.0,0.7
3,Abrera,2020/T4,264.0,2.1
4,Abrera,2021/T1,224.0,1.8


In [14]:
df_defuncions_comarques_trimestral_total.head()

concepte,comarques i Aran,trimestre,defuncions,defuncions per cada 1.000 habitants
0,Alt Camp,2020/T1,10.0,0.2
1,Alt Camp,2020/T2,21.0,0.5
2,Alt Camp,2020/T3,2.0,0.0
3,Alt Camp,2020/T4,14.0,0.3
4,Alt Camp,2021/T1,22.0,0.5


Notes:

- hem de pensar amb indicadors clau:
  -  Infeccions COVID
  -  Defuncions COVID
    -  Defuncions majors 60 anys
    -  Defuncions menors 60 anys
  -  Cobertura territorial
  -  Vacunacio
  -  Capacitat hospitalaria
  -  Cost de la vida / aliments
  -  Mobilitat nacional
  -  Productivitat
  -  Pobresa
  -  Altres pacients medics
  -  Morts altres pacients medics
 
- Models a realitzar:
  - Regresio lineal chorra
  - Model jerarquic "custom" A.k.a. Poisson-Nabil model
  - ARMA (1,1) (Suposo que el de la docu de Stan ya sirve?)

## Bibliography

### Datasets
https://analisi.transparenciacatalunya.cat/Salut/Registre-de-defuncions-per-COVID-19-a-Catalunya-pe/uqk7-bf9s/about_data

https://analisi.transparenciacatalunya.cat/Salut/Dades-di-ries-de-COVID-19-per-comarca/c7sd-zy9j/about_data

https://www.idescat.cat/dades/obertes/covid

https://ourworldindata.org/coronavirus

https://www.idescat.cat/dades/obertes/ist

https://www.idescat.cat/dades/micro/

https://www.ine.es/experimental/movilidad/experimental_em4.htm#tablas_resultados --- https://www.ine.es/jaxiT3/Datos.htm?t=48252#_tabs-tabla