# Evaluating Vaccine: A Comparative Analysis of COVID-19 Vaccine

Kasey Nastahunina NetID: knasta3

Daniel Quirke NetID: dquirk2

Maxwell Olmen NetID: molme2

Divyal Desle NetID: ddesle2

Ananya Mate NetID: amate4

The project aims to assess the effectiveness and popularity COVID-19 vaccines in different stages in preventing infections, reducing severe cases, and mitigating transmission rates. With the global rollout of multiple vaccines, understanding their relative performance is crucial for optimizing vaccination strategies and addressing potential concerns.This deep analysis will involve the collection and integration of real-world data from diverse sources (e.g CDC, Covid Tracking, HealthCare)nand public health records. The project will employ advanced statistical techniques and machine learning algorithms to evaluate each stages (1st dose, 2nd dose, boosters) of vaccine efficacy across geographic regions.

### Key objectives

1. Exploratory Data Analysis (EDA) - initial data exploration to understand the structure, patterns, and characteristics of the dataset.

2. Visualisations - summary statistics, visualizations, and plots to gain insights.

3. Data Preparation - transform features to create meaningful variables for analysis.

4. Hypothesis - Based on the data analysis of the problem, formulate one or more null hypothesis (H0) and an alternative hypothesis (H1)..

5. Machine Learning Analysis - Choose appropriate machine learning or statistical models based on the project's goal. Split the data into training, validation, and test sets for model development and evaluation. Train and fine-tune the models using appropriate techniques (e.g., cross-validation, hyperparameter tuning).

6. Evaluation and Validation - assess model performance using relevant metrics (e.g., accuracy, precision, recall, F1-score for classification; RMSE, MAE for regression).

7. Interpretation - interpret model results and provide meaningful insights. Create visualizations, reports, and dashboards to report findings.

8. Conclusion and Final Report - Summarize the findings, conclusions, and actionable recommendations. Present the project's achievements and potential next steps.


## Goal

The project's goal is to help develop new hypotheses and strategies for controlling the spread of infectious diseases in the future. With a wider range of data available, we can understand various aspects of vaccination efforts. This could include studying the efficacy of different vaccine types, analyzing the impact of vaccination on different demographics, and evaluating the long-term effectiveness of booster shots.

Ultimately, the effectiveness of COVID-19 vaccines is a critical aspect to combat the pandemic that has influenced numerous lives and is still affecting our society after multiple years. By deeply studying and analyzing this topic, we can contribute to the global understanding of vaccination as a powerful tool in controlling infectious diseases and get an understanding for more effective strategies in the future.


In [11]:
import pandas as pd
import numpy as np

## Part 1: Covid-19 Data Exploration

The COVID-19 data was found on World Health Organization (WHO) webpage. It is used to track the COVID-19 cases, deaths, recover, etc. There are multiple colomns such as date when the new cases and deathes were reported, country and country code, region for WHO statisics, and total cases of infectoin and death for the stated country.

Here are some of the data tables:

In [27]:
covidData = pd.read_csv("WHO-COVID-19-global-data.csv", dtype={'sched_dep_time': 'f8', 'sched_arr_time': 'f8'})
covidData.tail(10)

Unnamed: 0,Date_reported,Country_code,Country,WHO_region,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths
331790,2023-10-24,ZW,Zimbabwe,AFRO,0,265821,0,5720
331791,2023-10-25,ZW,Zimbabwe,AFRO,0,265821,0,5720
331792,2023-10-26,ZW,Zimbabwe,AFRO,0,265821,0,5720
331793,2023-10-27,ZW,Zimbabwe,AFRO,0,265821,0,5720
331794,2023-10-28,ZW,Zimbabwe,AFRO,0,265821,0,5720
331795,2023-10-29,ZW,Zimbabwe,AFRO,0,265821,0,5720
331796,2023-10-30,ZW,Zimbabwe,AFRO,0,265821,0,5720
331797,2023-10-31,ZW,Zimbabwe,AFRO,0,265821,0,5720
331798,2023-11-01,ZW,Zimbabwe,AFRO,0,265821,0,5720
331799,2023-11-02,ZW,Zimbabwe,AFRO,0,265821,0,5720


In [28]:
covidData.describe()

Unnamed: 0,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths
count,331800.0,331800.0,331800.0,331800.0
mean,2325.737,1518453.0,21.027797,18345.86
std,38702.92,6883106.0,147.262467,76553.02
min,-65079.0,0.0,-3520.0,0.0
25%,0.0,2664.75,0.0,21.0
50%,0.0,37503.0,0.0,409.0
75%,146.0,439900.2,1.0,5874.25
max,6966046.0,103436800.0,11447.0,1138309.0





#### Data Preperation

There is a lot of data we might need to use when prepering it for the project use. There are a few expamles of who to manipulate the data set to find needed information: 

In [14]:
covidData['Date_reported'] = pd.to_datetime(covidData['Date_reported'])
latest_date_row = covidData[covidData['Date_reported'] == '2020-09-09']

result = latest_date_row[['Date_reported', 'Country', 'Cumulative_cases', 'Cumulative_deaths']]

display(result)

Unnamed: 0,Date_reported,Country,Cumulative_cases,Cumulative_deaths
250,2020-09-09,Afghanistan,38520,1418
1650,2020-09-09,Albania,10406,319
3050,2020-09-09,Algeria,46938,1569
4450,2020-09-09,American Samoa,0,0
5850,2020-09-09,Andorra,1261,53
...,...,...,...,...
325050,2020-09-09,Viet Nam,1054,35
326450,2020-09-09,Wallis and Futuna,0,0
327850,2020-09-09,Yemen,1998,577
329250,2020-09-09,Zambia,12952,298


In [15]:
covidData['Date_reported'] = pd.to_datetime(covidData['Date_reported'])

latest_date_row = covidData[covidData['Date_reported'] == covidData['Date_reported'].max()]

# Calculate the total cases for the latest date
total_cases_latest_date = latest_date_row['Cumulative_cases'].sum()

print("Total cases on latest date:", total_cases_latest_date)

Total cases on latest date: 771679618


In [16]:
covidData['Date_reported'] = pd.to_datetime(covidData['Date_reported'])

latest_date_row = covidData[covidData['Date_reported'] == covidData['Date_reported'].max()]

total_deaths_latest_date = latest_date_row['Cumulative_deaths'].sum()

print("Total death on latest date:", total_deaths_latest_date)

Total death on latest date: 6977023


## Part 2: Vaccine Data Exploration

The Vaccine data was found on World Health Organization (WHO) webpage. It is used to track the Covid-19 vaccine types, uses and  etc. There are multiple columns such as the name of the country, number of vaccines, number of people receiving different doses of vaccines, number of people receive different doses of vaccines per 100 people, etc.

Here are some of the data tables:

In [23]:
vaccineData = pd.read_csv("vaccination-data.csv", dtype={'sched_dep_time': 'f8', 'sched_arr_time': 'f8'})
vaccineData.tail(15)

Unnamed: 0,COUNTRY,ISO3,WHO_REGION,DATA_SOURCE,DATE_UPDATED,TOTAL_VACCINATIONS,PERSONS_VACCINATED_1PLUS_DOSE,TOTAL_VACCINATIONS_PER100,PERSONS_VACCINATED_1PLUS_DOSE_PER100,PERSONS_LAST_DOSE,PERSONS_LAST_DOSE_PER100,VACCINES_USED,FIRST_VACCINE_DATE,NUMBER_VACCINES_TYPES_USED,PERSONS_BOOSTER_ADD_DOSE,PERSONS_BOOSTER_ADD_DOSE_PER100
214,Dominican Republic,DOM,AMRO,REPORTING,2023-06-02,15972891.0,7322815,147.244,67.504,6095810,56.193,,2021-02-16,5.0,2554266.0,23.546
215,Germany,DEU,EURO,REPORTING,2023-09-03,193241457.0,64868618,232.4,77.932,63566137,76.368,,2020-12-23,9.0,52143826.0,62.645
216,Gibraltar,GIB,EURO,OWID,2022-12-16,132810.0,42175,394.2,125.182,41465,123.074,,,2.0,49170.0,145.944
217,Guatemala,GTM,AMRO,REPORTING,2023-06-02,20370595.0,8935137,113.703,49.874,7127944,39.786,,2021-02-25,5.0,3719543.0,20.762
218,Israel,ISR,EURO,REPORTING,2022-06-05,17915305.0,7055466,207.0,79.274,6385731,71.749,,2020-12-16,3.0,4474108.0,50.271
219,Kazakhstan,KAZ,EURO,REPORTING,2023-04-30,38355605.0,12443364,204.3,64.821,13070774,68.089,,2021-04-07,5.0,6801526.0,35.431
220,Kuwait,KWT,EMRO,REPORTING,2023-11-05,8261159.0,3457471,193.444,80.96,3346144,78.354,,2020-12-28,10.0,1457544.0,34.13
221,Lao People's Democratic Republic,LAO,WPRO,REPORTING,2023-05-21,13879410.0,6324678,190.768,86.93,5691962,78.234,,2020-11-25,8.0,2451034.0,33.689
222,Micronesia (Federated States of),FSM,WPRO,REPORTING,2023-03-28,198749.0,83871,172.791,72.917,80910,70.342,,2021-01-13,4.0,30537.0,26.549
223,Rwanda,RWA,AFRO,REPORTING,2023-06-04,27322059.0,10884714,210.945,84.037,10399665,80.293,,2021-03-05,7.0,4225194.0,32.621


In [24]:
vaccineData.describe()

Unnamed: 0,TOTAL_VACCINATIONS,PERSONS_VACCINATED_1PLUS_DOSE,TOTAL_VACCINATIONS_PER100,PERSONS_VACCINATED_1PLUS_DOSE_PER100,PERSONS_LAST_DOSE,PERSONS_LAST_DOSE_PER100,VACCINES_USED,NUMBER_VACCINES_TYPES_USED,PERSONS_BOOSTER_ADD_DOSE,PERSONS_BOOSTER_ADD_DOSE_PER100
count,228.0,229.0,224.0,229.0,229.0,229.0,0.0,225.0,213.0,213.0
mean,59361730.0,24444420.0,157.525549,64.02152,22525110.0,59.055817,,5.004444,11676330.0,32.384286
std,283992300.0,113883400.0,84.384285,25.279778,108429800.0,25.316711,,2.925409,60964850.0,25.255548
min,117.0,0.0,0.348,0.0,0.0,0.0,,1.0,0.0,0.0
25%,473655.2,192371.0,87.04,45.654,184801.0,39.85,,3.0,45995.0,8.574
50%,4749752.0,2740227.0,161.0555,68.86,2484985.0,63.703,,5.0,646504.0,30.681
75%,23365450.0,10884710.0,223.5425,82.616,9574047.0,78.354,,7.0,4474108.0,53.03
max,3516881000.0,1318027000.0,469.778,163.185,1284480000.0,163.185,,12.0,834060100.0,145.944


In [26]:
selected_columns = ['COUNTRY', 'DATE_UPDATED', 'TOTAL_VACCINATIONS', 'PERSONS_VACCINATED_1PLUS_DOSE', 'PERSONS_LAST_DOSE', 'PERSONS_BOOSTER_ADD_DOSE']
vaccineData_selected = vaccineData[selected_columns]

# Display the selected columns
display(vaccineData_selected)

Unnamed: 0,COUNTRY,DATE_UPDATED,TOTAL_VACCINATIONS,PERSONS_VACCINATED_1PLUS_DOSE,PERSONS_LAST_DOSE,PERSONS_BOOSTER_ADD_DOSE
0,American Samoa,2023-03-29,114706.0,46206,42479,24160.0
1,Austria,2023-07-02,20403676.0,6899873,6682372,5405966.0
2,Bangladesh,2023-10-18,362229859.0,151504394,142193276,68532189.0
3,Brunei Darussalam,2023-06-30,1293100.0,451149,446714,340466.0
4,Bulgaria,2023-09-24,4618931.0,2108544,2080324,833135.0
...,...,...,...,...,...,...
224,"Saint Helena, Ascension and Tristan da Cunha",2021-05-05,7892.0,4361,3531,
225,Saint Lucia,2023-06-02,122977.0,60140,54971,7866.0
226,Serbia,2023-03-05,6722671.0,3398116,3324555,0.0
227,Sweden,2023-10-01,23237123.0,7473037,7334822,5542831.0
