# Visualising the Current Status of COVID-19 Vaccination

## Introduction

#### This notebook performs exploratory data analysis on multiple datasets and visualises the consolidated data.

The main objective is to visualise COVID-19 vaccination process globally and monitor its progress.

Plotly is used to visualise the data.

#### Used datasets are:

- COVID-19 World Vaccination Progress
- Population by Country - 2020

#### Visualisations answer three simple questions:

- What vaccines are used and in which countries?
- What country is vaccinated more people?
- What country is vaccinated a larger percent from its population?

## 1. Import necessary libraries

In [None]:
import pandas as pd
import numpy as np
import math
import cufflinks as cf
import plotly
import plotly.express as px
import plotly.graph_objects as go
%matplotlib inline

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot 
init_notebook_mode(connected=True)
cf.go_offline()

## 2. EDA on Vaccine Data Frame

### Read the vaccine data frame with necessary columns & parsing the date column

In [None]:
df_vac = pd.read_csv('../input/covid-world-vaccination-progress/country_vaccinations.csv',
                 usecols=["country", "iso_code", "date","people_vaccinated", "vaccines"],
                 parse_dates = ['date'])

### Inspect the vaccine data frame

In [None]:
df_vac.info()

### See which rows do not have ISO Code

In [None]:
df_vac_na = df_vac[df_vac['iso_code'].isnull()]

df_vac_na['country'].unique()

### Drop the rows including the countries above since they are already in the United Kingdom

In [None]:
# This command will also drop the rows that has no value in their people vaccinated column
# Since this column is cumulative sum, we are going to use max() function to find our value in final data frame

df_vac = df_vac.dropna()
df_vac.info()

### Create a dataframe that gets the maximum value of total vaccinations and groups by country

In [None]:
df_vac = pd.DataFrame(df_vac.groupby(["country","iso_code",'vaccines'])["people_vaccinated"].max())

# Resetting indexes to use the data frame in merge function

df_vac.reset_index(level=0, inplace=True)
df_vac.reset_index(level=0, inplace=True)
df_vac.reset_index(level=0, inplace=True)

df_vac.head()

In [None]:
df_vac.info()

The vaccines data frame is ready for merge, now perform EDA on population data frame.

## 3. EDA on Population Data Frame

### Read the population data frame with necessary columns & parsing the date column

In [None]:
df_pop = pd.read_csv('../input/population-by-country-2020/population_by_country_2020.csv')

df_pop.rename(columns={'Country (or dependency)': 'country'}, inplace=True)

df_pop.info()

### Merge two data frames

In [None]:
df = pd.merge(df_vac, df_pop, how='left', on='country')

In [None]:
df.info()

### We have population missing for six different countries, let's have a look what they are

In [None]:
df_na = df[df['Population (2020)'].isnull()]

df_na['country'].unique()

### Investigate the population data frame to see if we can find their equivalents

In [None]:
df_pop['country'].unique()

### Replace the countries in the population data frame with their corresponding name in the vaccine data frame, and follow the same steps

In [None]:
df_pop.country = df_pop.country.replace().replace({
    "Côte d'Ivoire": "Cote d'Ivoire", 
    "Czech Republic (Czechia)": "Czechia", 
    "Turks and Caicos": "Turks and Caicos Islands"})

df = pd.merge(df_vac, df_pop, how='left', on='country')

df_na = df[df['Population (2020)'].isnull()]

df_na['country'].unique()

In [None]:
df.info()

### Since we do not have these countries, we can drop the rows that include them

In [None]:
# Before dropping rows, drop the Migrants (net) column since it includes
# more missing data and dropping NaN values will cause loss of data

del df['Migrants (net)']

# Delete the Yearly Change, Fert. Rate, Med. Age, Urban Pop %, World Share columnssince they are not relevant

del df["Yearly Change"]
del df["Fert. Rate"]
del df["Med. Age"]
del df["Urban Pop %"]
del df["World Share"]

# Drop NaN columns one more time to remove the rows for Guernsey and Jersey

df_f = df.dropna()

df_f.info()

### Have a look at the final data frame

In [None]:
df_f = df_f.reset_index(drop=True)

df_f.head()

## 4. Creating a Detailed Vaccine Data Frame

### Have a look at the vaccines column to see how many unique vaccine groups are present

In [None]:
df_f['vaccines'].unique()

### Create boolean columns for all vaccines and concatenate them in one data frame

In [None]:
boo1 = df_f['vaccines'].str.contains('Pfizer/BioNTech')
df_boo1 = pd.DataFrame(boo1).reset_index(drop=True)
df_boo1.rename(columns={'vaccines': 'Pfizer/BioNTech'}, inplace=True)

boo2 = df_f['vaccines'].str.contains('Sputnik V')
df_boo2 = pd.DataFrame(boo2).reset_index(drop=True)
df_boo2.rename(columns={'vaccines': 'Sputnik V'}, inplace=True)

boo3 = df_f['vaccines'].str.contains('Oxford/AstraZeneca')
df_boo3 = pd.DataFrame(boo3).reset_index(drop=True)
df_boo3.rename(columns={'vaccines': 'Oxford/AstraZeneca'}, inplace=True)

boo4 = df_f['vaccines'].str.contains('Sinopharm/Beijing')
df_boo4 = pd.DataFrame(boo4).reset_index(drop=True)
df_boo4.rename(columns={'vaccines': 'Sinopharm/Beijing'}, inplace=True)

boo5 = df_f['vaccines'].str.contains('Moderna')
df_boo5 = pd.DataFrame(boo5).reset_index(drop=True)
df_boo5.rename(columns={'vaccines': 'Moderna'}, inplace=True)

boo6 = df_f['vaccines'].str.contains('Sinovac')
df_boo6 = pd.DataFrame(boo6).reset_index(drop=True)
df_boo6.rename(columns={'vaccines': 'Sinovac'}, inplace=True)

boo7 = df_f['vaccines'].str.contains('Covaxin')
df_boo7 = pd.DataFrame(boo7).reset_index(drop=True)
df_boo7.rename(columns={'vaccines': 'Covaxin'}, inplace=True)

boo8 = df_f['vaccines'].str.contains('EpiVacCorona')
df_boo8 = pd.DataFrame(boo8).reset_index(drop=True)
df_boo8.rename(columns={'vaccines': 'EpiVacCorona'}, inplace=True)

boo9 = df_f['vaccines'].str.contains('Sinopharm/Wuhan')
df_boo9 = pd.DataFrame(boo9).reset_index(drop=True)
df_boo9.rename(columns={'vaccines': 'Sinopharm/Wuhan'}, inplace=True)

result = pd.concat([df_boo1, df_boo2, df_boo3, df_boo4, df_boo5, df_boo6, df_boo7, df_boo8, df_boo9], axis=1)

result = result.astype(int)

result.head()

### Merge detailed vaccine data frame to the main data frame and delete the vaccines column, then add a column for the total number of different vaccines

In [None]:
df_f = pd.concat([df_f, result], axis=1)

df_f['Different Vaccines'] = df_f['Pfizer/BioNTech'] + df_f['Sputnik V']+ df_f['Oxford/AstraZeneca']+ df_f['Sinopharm/Beijing']+ df_f['Moderna']+ df_f['Sinovac']+ df_f['Covaxin']+ df_f['EpiVacCorona']+ df_f['Sinopharm/Wuhan']

df_f.head()

Now we have our final data frame with no missing values, so we can start visualizing the data.

## 5. Data Visualisation

### Which vaccine is most commonly used around the world?

In [None]:
# Create a data frame to calculate the usage of every individual vaccine in different countries

summ = df_f.iloc[:,8:17].sum()
df_vacc = pd.DataFrame(summ, columns=['Usage'])
df_vacc = df_vacc.sort_values('Usage', ascending=False)
df_vacc.reset_index(level=0, inplace=True)

# Visualise the vaccine usage

fig = px.bar(df_vacc,
              x = 'index',
              y = 'Usage',
              color='Usage',
              hover_name = 'index',
              hover_data = ['Usage'],
              color_continuous_scale = 'Magenta',
              labels = {'index':'Vaccine','Usage':'# of Countries Using'},
              height=500,
              text = 'Usage')

fig.update_layout(uniformtext_minsize = 15,
                   xaxis_tickangle = -45,
                   title = 'Usage of Vaccines by Countries',
                   title_x = 0.5,
                   showlegend=False)

fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')

fig.update(layout_coloraxis_showscale=False)
fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)', 'paper_bgcolor': 'rgba(0, 0, 0, 0)',})

fig.show()

### Vaccination Race

It appears that the vaccine developed by Pfizer/Biontech benefits being first in the market. Although the vaccine developed by Oxford/AstraZeneca was released some time after the one developed by Pfizer/Biontech, it is a strong competent in the vaccination race since it is easier to roll out in terms of handling and transportation. Currently These two are the most common COVID-19 vaccines around the world by stepping in more than 60 countries.

### What vaccines are used and in which countries?

In [None]:
df_f['Log Scale'] = df_f['people_vaccinated'].apply(lambda x : math.log2(x+1))

df_f_sorted = df_f.sort_values('Different Vaccines', ascending = False)

fig = px.bar(df_f_sorted,
              x = 'country',
              y = 'Different Vaccines',
              color='Different Vaccines',
              hover_name = 'country',
              hover_data = ['vaccines'],
              color_continuous_scale = 'Blues',
              labels = {'country':'Country', 'vaccines':'Used Vaccines'},  # changing the labels in to tooltip
              height=500)

fig.update_layout(uniformtext_minsize = 15,
                   xaxis_tickangle = -45,
                   title = 'Total Number of Different Vaccines Used by Countries',
                   title_x = 0.5)

fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)', 'paper_bgcolor': 'rgba(0, 0, 0, 0)',})

fig.update(layout_coloraxis_showscale=False)

fig.show()

### Does more money mean more different types of vaccine?

It looks like UAE and Hungary are the countries who uses 5 of 9 different kind of vaccines. They are followed by Bahrain and Serbia with 4 different kind of vaccines used. Let's have a look at the distribution in the map to understand it better.

In [None]:
fig = px.choropleth(df_f,
                    locations="country", 
                    locationmode='country names',
                    color="Different Vaccines", 
                    hover_name="country", 
                    hover_data=['Different Vaccines','vaccines'],
                    color_continuous_scale="Blues",
                    labels={'country':'Country','vaccines':'Used Vaccines'})

fig.update_layout(title="Total Number of Different Vaccines Used by Countries",
                  title_x=0.5)

fig.update(layout_coloraxis_showscale=False)

fig.show()

### What country is vaccinated more people?

In [None]:
# Sort data frame by total_vaccinations
df_f_sorted = df_f.sort_values('people_vaccinated', ascending = False)

# Select first 10 rows
df_f_sorted = df_f_sorted.iloc[0:19,:]

# Plot the bar chart
fig = px.bar(df_f_sorted,
              x = 'country',
              y = 'people_vaccinated',
              color='Log Scale',
              hover_name = 'country',
              hover_data = ['people_vaccinated'],
              color_continuous_scale = 'mint',
              labels = {'country':'Country','people_vaccinated':'People Vaccinated'},
              height=500,
              text='people_vaccinated')

fig.update_layout(uniformtext_minsize = 15,
                   xaxis_tickangle = -45,
                   title = 'Total Vaccinated People by Top 20 Countries',
                   title_x = 0.5)

fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)', 'paper_bgcolor': 'rgba(0, 0, 0, 0)',})

fig.update(layout_coloraxis_showscale=False)

fig.show()

### More population, more people to vaccinate

United States are far above most of the countries with almost 70M people vaccinated. It is followed by India, UK, Brazil and Turkey. Given most of these countries have the highest population in the world, it is not suprising they have higher number of people vaccinated compared to the rest.

In [None]:
fig = px.choropleth(df_f,
                    locations="country", 
                    locationmode='country names',
                    color="Log Scale", 
                    hover_name="country", 
                    hover_data=['people_vaccinated'],
                    color_continuous_scale="mint",
                    labels={'country':'Country','people_vaccinated':'People Vaccinated'})

fig.update_layout(title="Total Vaccinated People by Country", title_x=0.5)

fig.update(layout_coloraxis_showscale=False)

fig.show()

### Numbers are definitely not wrong, but might be misleading

It is clear that the countries with high number of population has also high number of people vaccinated. To understand the countries' success in vaccination process better, we should look at the % of the population vaccinated by country.

### What country is vaccinated a larger percent from its population?

In [None]:
df_f['pct_vaccinated'] = df_f['people_vaccinated'] / df_f['Population (2020)']

# Sort data frame by pct_vaccinated
df_f_sorted = df_f.sort_values('pct_vaccinated', ascending = False)

# Select first 10 rows
df_f_sorted = df_f_sorted.iloc[0:19,:]

# Plot the bar chart
fig = px.bar(df_f_sorted,
              x = 'country',
              y = 'pct_vaccinated',
              color='pct_vaccinated',
              hover_name = 'country',
              hover_data = ['pct_vaccinated'],
              color_continuous_scale = 'Peach',
              labels = {'country':'Country','pct_vaccinated':'% of Population Vaccinated'},
              height=500)

fig.update_layout(uniformtext_minsize = 15,
                   xaxis_tickangle = -45,
                   title = 'Total Vaccination % of Population by Top 20 Countries',
                   title_x = 0.5)

fig.update(layout_coloraxis_showscale=False)
fig.update_layout({'plot_bgcolor': 'rgba(0, 0, 0, 0)', 'paper_bgcolor': 'rgba(0, 0, 0, 0)',})

fig.show()

### Surprise surprise!

Let's be honest, no one is expecting Gibraltar or Seychelles to be the first in the vaccinated people % of the population, yet, here it is...

The final results shows the top 5 best performing countries in the vaccination process, and they are

- Gibraltar
- Seychelles
- Israel
- Falkland Islands
- United Arab Emirates



In [None]:
fig = px.choropleth(df_f,
                    locations="country", 
                    locationmode='country names',
                    color="pct_vaccinated", 
                    hover_name="country", 
                    hover_data=['pct_vaccinated'],
                    color_continuous_scale="Peach",
                    labels={'country':'Country','pct_vaccinated':'% of Population Vaccinated'})

fig.update_layout(title="Total Vaccination % of Population by Country", title_x=0.5)

fig.update(layout_coloraxis_showscale=False)

fig.show()

## 6. Conclusion

### Relativity is everything

Considering every single country is responsible for their own people (apart from minority groups such as immigrants), we can understand that the success in vaccination process should not only be assessed with the number of people vaccinated. When we look at the total vaccination % of the population, we see a more accurate indicator of success and understand that the measure of success is quality rather than quantity.

## Before closing...

### If you've made it so far in the notebook, first, congratulations! and second, please let me know about your thoughts!

- LinkedIn: https://www.linkedin.com/in/emrecanokten/
- Kaggle:https://www.kaggle.com/emrecano

### And do not forget the vote the notebook or like the post (whichever is applicable)!

# Thanks a lot!