# <center>Mortality prediction based on the VAERS post vaccine adverse reactions</center>

## *<center>Authors: Szymon Szewczyk, Łukasz Szyszka</center>*

### *<center>July 7, 2024</center>*

---

Authors:
Szymon Szewczyk
Łukasz Szyszka

### Legal Disclaimer

The purpose of this project is to develop Bayesian Models for predicting the probability of patient mortality due to adverse reactions from COVID-19 vaccination. This project is intended solely for educational purposes. The analysis and results derived from this project should not be interpreted as advice for or against any type of vaccination. The authors strongly recommend that individuals consult with appropriate healthcare professionals before making any health-related decisions. The authors disclaim any responsibility for any consequences resulting from actions taken by individuals based on information from this project, in compliance with applicable US and EU laws.

### 1. Problem formulation [0-5 pts]:
- is the problem clearly stated [1 pt]
- what is the point of creating model, are potential use cases defined [1 pt]
- where do data comes from, what does it containt [1 pt]
- DAG has been drawn [1 pt]
- confoundings (pipe, fork, collider) were described [1 pt]

The Bayesian Models developed in this project are designed to predict the probability of patient mortality due to adverse reactions from COVID-19 vaccination. The predictions are based on various predictors, such as the patient's age, sex, number of days since vaccination, and the number of days the patient had to spend in the hospital due to the adverse reaction.

The objective of creating these models is to present the probability of patient mortality in an unbiased and objective manner. The first model is more general, utilizing two predictors, while the second model employs four predictors. Both models can be used similarly to illustrate the influence of different predictors on the final outcome. Healthcare is a critical aspect of our lives, and conducting unbiased research is essential to improving the quality of all forms of medicine and medical services.

The data used to create the models is sourced from the VAERS datasets for the year 2022, available at https://vaers.hhs.gov/data/datasets.html. This data contains reported cases of vaccination adverse reactions in the United States in 2022.
The names and descriptions of the columns can be found below:

#### VAERSDARA.csv:
VAERS_ID: VAERS identification number\
RECVDATE: Date report was received\
STATE: State\
AGE_YRS: Age in years\
CAGE_YR: Calculated age of patient in years *\
CAGE_MO: Calculated age of patient in months *\
SEX: Sex\
RPT_DATE: Date form completed\
SYMPTOM_TEXT: Reported symptom text\
DIED Char: Died\
DATEDIED: Date of death\
L_THREAT: Life-threatening illness\
ER_VISIT: Emergency room or doctor visit\
HOSPITAL: Hospitalized\
HOSPDAYS: Number of days hospitalized\
X_STAY: Prolongation of existing hospitalization\
DISABLE: Disability\
RECOVD: Recovered\
VAX_DATE: Vaccination date\
ONSET_DATE: Adverse event onset date\
NUMDAYS: Number of days (onset date – vaccination date)\
LAB_DATA: Diagnostic laboratory data\
V_ADMINBY: Type of facility where vaccine was administered\
V_FUNDBY: Type of funds used to purchase vaccines\
OTHER_MEDS: Other medications\
CUR_ILL: Illnesses at time of vaccination\
HISTORY: Chronic or long-standing health conditions\
PRIOR_VAX: Prior vaccination event information\
SPLTTYPE: Manufacturer/immunization project report number\
FORM_VERS: VAERS form version 1 or 2\
TODAYS_DATE Date: Date Form Completed\
BIRTH_DEFECT: Congenital anomaly or birth defect\
OFC_VISIT: Doctor or other healthcare provider office/clinic visit\
ER_ED_VISIT: Emergency room/ department or urgent care\
ALLERGIES: Allergies to medications, food, or other products

\* The sum of the two variables CAGE_YR and CAGE_MO provide the calculated age of a person. For example, if CAGE_YR = 1
and CAGE_MO = 0.5, then the age of the individual is 1.5 years, or 1 year 6 months. 

#### VAERSVAX.csv:
VAERS_ID: VAERS identification number\
VAX_TYPE: Administered vaccine type\
VAX_MANU: Vaccine manufacturer\
VAX_LOT: Manufacturer’s vaccine lot\
VAX_DOSE_SERIES: Number of doses administered\
VAX_ROUTE: Vaccination route\
VAX_SITE: Vaccination site\
VAX_NAME: Vaccination name


About VAERS (source: https://vaers.hhs.gov/about.html)
Established in 1990, the Vaccine Adverse Event Reporting System (VAERS) is a national early warning system to detect possible safety problems in U.S.-licensed vaccines. VAERS is co-managed by the Centers for Disease Control and Prevention (CDC) and the U.S. Food and Drug Administration (FDA). VAERS accepts and analyzes reports of adverse events (possible side effects) after a person has received a vaccination. Anyone can report an adverse event to VAERS. Healthcare professionals are required to report certain adverse events and vaccine manufacturers are required to report all adverse events that come to their attention.

DAG (Directed acyclic graph) used to create the models is presented below:


![DAG graph](DAG_DA_2.png)

The confoundings (pipe, fork, collider) are described below:

#### Pipes: When one variable influences the next in a direct sequence.

'Patient's age' influences both 'Probability of patient reporting illness/adverse reaction(s)' and 'Patient's history of illnesses'.\
'Patient's sex' influences both 'General probability of females suffering from adverse reaction(s) (usually mild ones)' and 'Probability of males suffering from serious adverse reaction(s)'.\
Then all the above except for 'Patient's age' and 'Patient's sex' influence 'Probability that a patient needs hospitalization'.\
Then 'Number of days spent in hospital' influences 'Probability of patient mortality'.

'Type of vaccine (mRNA / Protein subunit)' influences 'Probability of patient mortality'.\
'Vaccine manufacturer' influences 'Probability of patient mortality'.\
'Number of days since vaccination' influences 'Probability of patient mortality'.\
'State of residence (quality of healthcare)' influences 'Probability of patient mortality'.

#### Forks: When one variable influences multiple variables.

Patient's age infuences:
- 'Probability of patient reporting illness/adverse reaction(s)'
- 'Patient's history of illnesses'

Patient's sex influences:
- 'General probability of females suffering from adverse reaction(s) (usually mild ones)'
- 'Probability of males suffering from serious adverse reaction(s)'

#### Colliders: When two or more variables influence a single variable.

'Probability that a patient needs hospitalization' is influenced by:
- 'Probability of patient reporting illness/adverse reaction(s)'
- 'Patient's history of illnesses'
- 'General probability of females suffering from adverse reaction(s) (usually mild ones)'
- 'Probability of males suffering from serious adverse reaction(s)'

'Probability of patient mortality' is influenced by:
- 'Patient's age'
- 'Patient's sex'
- 'Number of days spent in hospital'
- 'Type of vaccine (mRNA / Protein subunit)'
- 'Vaccine manufacturer'
- 'Number of days since vaccination'
- 'State of residence (quality of healthcare)'



### 2. Data preprocessing [0-2 pts]:
- is preprocessing step clearly described [1 pt]
- reasoning and types of actions taken on the dataset have been described [1 pt]

#### Original Data
TODO: Load original data


#### Data preprocessing

### 3. Model [0-4 pts]
- are two different models specified [1 pt]
- are difference between two models explained [1 pt]
- is the difference in the models justified (e.g. does adding aditional parameter makes sense? ) [1 pt]
- are models sufficiently described (what are formulas, what are parameters, what data are required ) [1 pt]

### First model

The first model uses two predictors:
- Patient's age
- Patient's sex


### Second model

The second model extends the first model adding two new parameters:
- Number of days since vaccination
- Number of days spent in hospital

### 4. Priors [0-4 pts]
- Is it explained why particular priors for parameters were selected [1 pt]
- Have prior predictive checks been done for parameters (are parameters simulated from priors make sense) [1 pt]
- Have prior predictive checks been done for measurements (are measurements simulated from priors make sense) [1 pt]
- How prior parameters were selected [1 pt]

### 5. Posterior analysis (model 1) [0-4 pts]
- were there any issues with the sampling? if there were what kind of ideas for mitigation were used [1 pt]
- are the samples from posterior predictive distribution analyzed [1 pt]
- are the data consistent with posterior predictive samples and is it sufficiently commented (if they are not then is the justification provided)
- have parameter marginal disrtibutions been analyzed (histograms of individual parametes plus summaries, are they diffuse or concentrated, what can we say about values) [1 pt]

### 6. Posterior analysis (model 2) [0-4 pts]
- were there any issues with the sampling? if there were what kind of ideas for mitigation were used [1 pt]
- are the samples from posterior predictive distribution analyzed [1 pt]
- are the data consistent with posterior predictive samples and is it sufficiently commented (if they are not then is the justification provided)
- have parameter marginal disrtibutions been analyzed (histograms of individual parametes plus summaries, are they diffuse or concentrated, what can we say about values) [1 pt]

### 7. Model comaprison [0-4 pts]
- have models been compared using information criteria [1 pt]
- have result for WAIC been discussed (is there a clear winner, or is there an overlap, were there any warnings) [1 pt]
- have result for PSIS-LOO been discussed (is there a clear winner, or is there an overlap, were there any warnings) [1 pt]
- what the model comparison discussed? Do authors agree with information criteria? Why in your opinion one model better than another [1 pt]