# Overview of COVID-19 World Vaccination Progress

#### Imported packages

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 1. Introduction

This Notebook briefly overviews the progress made by countries in vaccinating their population. The dataset was taken from Kaggle: "COVID-19 World Vaccination Progress. Daily and Total Vaccination for COVID-19 in the World", and is available [here](https://www.kaggle.com/gpreda/covid-world-vaccination-progress?select=country_vaccinations.csv). Figures were collected daily from *Our World in Data* GitHub repository for COVID-19, merged and uploaded.

The data contains the following information:

- Country - this is the country for which the vaccination information is provided;
- Country ISO Code - ISO code for the country;
- Date - date for the data entry; for some of the dates only the daily vaccinations are available, for others, only the (cumulative) total;
- Total number of vaccinations - this is the absolute number of total immunizations in the country;
- Total number of people vaccinated - a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccinations might be larger than the number of people;
- Total number of people fully vaccinated - this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;
- Daily vaccinations (raw) - for a certain data entry, the number of vaccinations for that date/country;
- Daily vaccinations - for a certain data entry, the number of vaccinations for that date/country;
- Total vaccinations per hundred - ratio (in percent) between vaccination number and total population up to the date in the country;
- Total number of people vaccinated per hundred - ratio (in percent) between population immunized and total population up to the date in the country;
- Total number of people fully vaccinated per hundred - ratio (in percent) between population fully immunized and total population up to the date in the country;
- Number of vaccinations per day - number of daily vaccinations for that day and country;
- Daily vaccinations per million - ratio (in ppm) between vaccination number and total population for the current date in the country;
- Vaccines used in the country - total number of vaccines used in the country (up to date);
- Source name - source of the information (national authority, international organization, local organization etc.);
- Source website - website of the source of information.

## 2. Load the dataset

The dataset was loaded and stored in `country_vaccinations`. A sample of 10 entries is displayed below.

In [None]:
country_vaccinations = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")

In [None]:
country_vaccinations.sample(10)

Dataset's shape (rows, columns) is diplayed below.

In [None]:
country_vaccinations.shape

### 3. Exploratory Data Analysis

The cell below shows the title of all columns in the dataset.

In [None]:
country_vaccinations.columns

The information was collected from 170 countries (as of 8th of April). These are listed below.

In [None]:
len(country_vaccinations.country.unique())

In [None]:
country_vaccinations.country.unique()

Data were collected daily per country. There are figures for 115 days (as of 8th of April, 2021), starting from 13-th December, 2020.

In [None]:
country_vaccinations.date.unique()

In [None]:
# Count days with entries
len(country_vaccinations.date.unique())

Dataset's description shows that there are many missing values. General statistics is displayed below.

In [None]:
country_vaccinations.describe().T

The first and the last three columns are 'object' type. The remaining ones contain floating point values.

In [None]:
country_vaccinations.dtypes

Columns "iso_code", "source_name", and "source_website" do not hold information valuable for this analysis. Therefore, they were removed from the dataset.

In [None]:
country_vaccinations = country_vaccinations.drop(["iso_code", "source_name", "source_website"], axis = 1)

In [None]:
country_vaccinations.shape

It would be interesting for one to see which vaccines were used mostly. A brief check shows that the countries didn't provide information for each vaccine type (producer); instead, there are combinations such as `Moderna, Oxford/AstraZeneca, Pfizer/BioNTech`, which makes detailed analysis impossible.

In [None]:
country_vaccinations.vaccines.unique()

Counting immunizations with different vaccine combinations is not very informative either.

In [None]:
country_vaccinations.groupby("vaccines").size()

### 3.1. Fully vaccinated people

This first sub-chapter explores the number of fully vaccinated people in each country.

"Fully vaccinated people" are those people who received the entire set of immunization according to the immunization scheme (typically 2). At a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme.

The code lines below extract data for fully vaccinated people per country and take the maximum value for each state.

In [None]:
fully_vaccinated_people = country_vaccinations[["country", "people_fully_vaccinated"]]

In [None]:
fully_vaccinated_people_final = fully_vaccinated_people[fully_vaccinated_people["people_fully_vaccinated"] == fully_vaccinated_people.groupby("country")["people_fully_vaccinated"].transform("max")]

In [None]:
fully_vaccinated_people_final

There are 125 entries in the initial dataset. However, some countries recorded one and the same figures for two or more consecutive days. The next line removes these duplicates.

In [None]:
fully_vaccinated_people_final = fully_vaccinated_people_final.drop_duplicates()

In [None]:
fully_vaccinated_people_final.shape

A brief check for missing values shows that all cells contain proper data.

In [None]:
fully_vaccinated_people_final.isna().any()

Progress in full vaccination of national population made so far is displayed below. Some countries (e.g. the United States, India, Israel) immunized much more citizens than others (e.g. Albania). 

Data are plotted on a logarithmic scale, which makes comparisons easier and the graph more readable.

In [None]:
# Function to display vaccination progress
def plot_vaccination_status(dataset, title = None, ylabel = None):
    plt.style.use("Solarize_Light2")
    dataset.plot(kind = "bar", figsize = (18, 6), logy = True, color = "teal")
    
    if title is not None:
        plt.title(title)
    
    plt.xlabel("Countries")
    if ylabel is not None:
        plt.ylabel(ylabel)
    
    plt.margins(x = 0)
    plt.xticks(fontsize = 10, rotation = 90)
    plt.show()

In [None]:
plot_vaccination_status(fully_vaccinated_people_final.set_index("country"), 
                        "Fully COVID-19 vaccinated people per country until 08 April, 2021",
                        "Number of people")

### 3.2. Total vaccinations made so far

This sub-chapter explores the total number of vaccinations, i.e. the absolute number of immunizations made in each country. Total vaccinations data and corresponding country are extracted and stored in a separate variable (`total_vaccinations_per_country`).

In [None]:
total_vaccinations_per_country = country_vaccinations[["country", "total_vaccinations"]]

In [None]:
total_vaccinations_per_country

As above, the largest value per country was taken out as a final figure.

In [None]:
total_vaccinations_per_country_final = total_vaccinations_per_country[total_vaccinations_per_country["total_vaccinations"] == total_vaccinations_per_country.groupby("country")["total_vaccinations"].transform("max")]

In [None]:
total_vaccinations_per_country_final.shape

In [None]:
# Check for duplicates
total_vaccinations_per_country_final.drop_duplicates()

There are total vaccination data for all 170 countries. Duplicates (see above) and missing values (see below) were not identified.

In [None]:
total_vaccinations_per_country_final.isna().any()

Total vaccinations per county are displayed on the plots below. 

The data shows that China and the United States are leaders, whereas countries such as Grenada and Montenegro are far behind. Such comparison, however, is not quite appropriate since China and the United States are larger in terms o population than tiny countries like Montenegro. Therefore, the next sub-chapter explores vaccinated individuals per hundred people.

In [None]:
plot_vaccination_status(total_vaccinations_per_country_final.set_index("country")[:57],
                       "Total vaccinations per country until 8 April, 2021 [part I]",
                       "Vaccinations")

In [None]:
plot_vaccination_status(total_vaccinations_per_country_final.set_index("country")[57:114],
                       "Total vaccinations per country until 8 April, 2021 [part II]",
                       "Vaccinations")

In [None]:
plot_vaccination_status(total_vaccinations_per_country_final.set_index("country")[114:],
                       "Total vaccinations per country until 8 April, 2021 [part III]",
                       "Vaccinations")

### 3.3. Vaccinated people per hundred

The total number of people vaccinated per hundred shows the ratio (in percent) between population immunized against COVID-19 and total population up to the date in the country. 

Data were extracted from the main table and stored in `vaccinated_people_per_hundred`.

In [None]:
vaccinated_people_per_hundred = country_vaccinations[["country", "people_vaccinated_per_hundred"]]

In [None]:
vaccinated_people_per_hundred

The maximum value per country was taken out and stored in the final variable.

In [None]:
vaccinated_people_per_hundred_final = vaccinated_people_per_hundred[vaccinated_people_per_hundred["people_vaccinated_per_hundred"] == vaccinated_people_per_hundred.groupby("country")["people_vaccinated_per_hundred"].transform("max")]

In [None]:
vaccinated_people_per_hundred_final.shape

A brief check showed that there were duplicate values which were successfully removed. There were not missing values or wrong data types.

In [None]:
vaccinated_people_per_hundred_final = vaccinated_people_per_hundred_final.drop_duplicates()

In [None]:
vaccinated_people_per_hundred_final.shape

In [None]:
vaccinated_people_per_hundred_final.isna().any()

The plotting function was not used in this case since logarithmic scale is unnecessary. 

The code lines below display the progress made by countries in COVID-19 immunization, per hundred people.

In [None]:
# Display the first part of the dataset
vaccinated_people_per_hundred_final.set_index("country")[:57].plot(kind = "bar", figsize = (18, 6), color = "coral")
plt.title("Vaccinated people, per hundred, until 8 April, 2021 [part I]")
plt.xlabel("Countries")
plt.ylabel("Vaccinated people, per hundred")
plt.margins(x = 0)
plt.xticks(fontsize = 10, rotation = 90)
plt.show()

In [None]:
# Display the second part of the dataset
vaccinated_people_per_hundred_final.set_index("country")[57:114].plot(kind = "bar", figsize = (18, 6), color = "coral")
plt.title("Vaccinated people, per hundred, until 8 April, 2021 [part II]")
plt.xlabel("Countries")
plt.ylabel("Vaccinated people, per hundred")
plt.margins(x = 0)
plt.xticks(fontsize = 10, rotation = 90)
plt.show()

In [None]:
# Display the third part of the dataset
vaccinated_people_per_hundred_final.set_index("country")[114:].plot(kind = "bar", figsize = (18, 6), color = "coral")
plt.title("Vaccinated people, per hundred, until 8 April, 2021 [part III]")
plt.xlabel("Countries")
plt.ylabel("Vaccinated people, per hundred")
plt.margins(x = 0)
plt.xticks(fontsize = 10, rotation = 90)
plt.show()

The figures above reveal an interesting trend: the UK and Commonwealth countries, save Australia and Canada, have the highest levels of vaccinated people (per hundred). Even the United States and EU countries are lagging behind. 

Another interesting approach to analyse vaccination rates is to see how the number of immunized people grow over time. The next sub-chapter explores COVID-19 vaccination tempo in Bulgaria.

### 3.4. Vaccination rates in Bulgaria

The relevant to Bulgaria data were extracted from the main table and stored in `vaccinations_bulgaria`.

In [None]:
vaccinations_bulgaria = country_vaccinations[country_vaccinations["country"] == "Bulgaria"]

In [None]:
vaccinations_bulgaria.head()

A brief check shows that Bulgaria began immunizations later. In fact, the EU started to vaccinate its citizens in the end of December 2020, only after the European Medicines Agency approved the first vaccine (Pfizer/BioNTech).

In [None]:
vaccinations_bulgaria.shape

Exploring change over time requires extracting data about daily vaccinations. This is performed below.

In [None]:
daily_vacciantions_bulgaria = vaccinations_bulgaria[["date", "daily_vaccinations"]]

In [None]:
daily_vacciantions_bulgaria

Missing values were intentionally left in the dataset.

Initially, due to the lack of vaccines, daily vaccination rates were well below 2000. As from the end of January 2021, for some 20 days, the country recorded around 3000 shots per day. Thereafter, figures rose to more than 10000 daily.

In [None]:
# Plot daily vaccinations in Bulgaria
daily_vacciantions_bulgaria.set_index("date").plot(kind = "bar", figsize = (20, 6), color = "royalblue")
plt.title("Daily COVID-19 vaccinations in Bulgaria")
plt.xlabel("Date")
plt.ylabel("Number of vacciantions")
plt.margins(x = 0)
plt.show()

### 4. Conclusion

COVID-19 World Vaccination Progress dataset allows exploring and comparing countries’ daily and total vaccination levels. Large countries such as the United States and China recorded the highest number of vaccinations so far but the UK and its Commonwealth territories lead immunizations per hundred people. Looking into a particular country, the data could reveal how vaccination evolved over time.