# Covid vaccines used by country worldwide

As I was curious about the usage of the different vaccins worldwide, I decided to make a Notebook about it. As Gabriel Preda already made a dataset with the required data downloaded from Our World in Data, I am using his dataset.

## Table of contents
[**1. Loading and preparing data**](#1.-Loading-and-preparing-data)

[**2. Which vaccines are used where?**](#2.-Which-vaccines-are-used-where?)

[2.1 Worldwide usage maps by vaccin](#2.1-Worldwide-usage-maps-by-vaccin)

[2.2 Worldwide map plotting the number of vaccines used by country](#2.2-Worldwide-map-plotting-the-number-of-vaccines-used-by-country)

[2.3 Usage of vaccines in Europe by country](#2.3-Usage-of-vaccines-in-Europe-by-country)

[2.4 Usage of vaccines in Europe over time](#2.4-Usage-of-vaccines-in-Europe-over-time)

[**3. Overall vaccination rates**](#3.-Overall-vaccination-rates)

[3.1 Vaccination rates by country](#3.1-Vaccination-rates-by-country)

[3.2 Vaccination rates by continent](#3.2-Vaccination-rates-by-continent)


In [1]:
conda install nbconvert

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import plotly.express as px
import plotly.graph_objs as go

from plotly.offline import iplot, init_notebook_mode
import cufflinks
# cufflinks.go_offline(connected=True)
# init_notebook_mode(connected=True)

import warnings
warnings.filterwarnings("ignore")

# 1. Loading and preparing data

In [3]:
manu = pd.read_csv("./country_vaccinations_by_manufacturer.csv")
manu['date'] = pd.to_datetime(manu.date)

country = pd.read_csv("./country_vaccinations.csv")
country['date'] = pd.to_datetime(country.date)

The file with country vaccinations by manufacturer seems most promising as it holds detailed information the vaccinations given per vaccine per date per country.

In [4]:
manu.head()

Unnamed: 0,location,date,vaccine,total_vaccinations
0,Austria,2021-01-08,Johnson&Johnson,0
1,Austria,2021-01-08,Moderna,0
2,Austria,2021-01-08,Oxford/AstraZeneca,0
3,Austria,2021-01-08,Pfizer/BioNTech,30938
4,Austria,2021-01-15,Johnson&Johnson,0


However, it turns out that this information is only available for a limited number of countries.

In [5]:
manu.location.unique()

array(['Austria', 'Belgium', 'Bulgaria', 'Chile', 'Croatia', 'Cyprus',
       'Czechia', 'Denmark', 'Estonia', 'Finland', 'France', 'Germany',
       'Hungary', 'Iceland', 'Ireland', 'Italy', 'Japan', 'Latvia',
       'Liechtenstein', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands',
       'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain',
       'Sweden', 'Switzerland', 'United States', 'Uruguay'], dtype=object)

Although less ideal, it turns out that the country file holds a column that specifies all vaccins used in a country

In [6]:
#temporarily set max max_colwidth to None
pd.set_option('display.max_colwidth', None)
country[['country', 'date', 'vaccines']].head()

Unnamed: 0,country,date,vaccines
0,Afghanistan,2021-02-22,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech"
1,Afghanistan,2021-02-23,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech"
2,Afghanistan,2021-02-24,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech"
3,Afghanistan,2021-02-25,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech"
4,Afghanistan,2021-02-26,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech"


Now, I want to keep the best-known vaccins (Pfizer, Moderna, AstraZeneca and Johnson&Johnson) as they are and group lesser-known brands into the countries where they are made.

In [7]:
#strip text
country['vaccines'] = country['vaccines'].str.replace("/Beijing|/Wuhan|/HayatVax|/BioNTech|Oxford/", "", regex=True)

#replace with Chinese vaccins
country['vaccines'] = country['vaccines'].str.replace("Sinopharm|BBIBP-CorV|Sinovac|CanSino|RBD-Dimer", "Chinese", regex=True)

#replace with Russian vaccins
country['vaccines'] = country['vaccines'].str.replace("EpiVacCorona|Sputnik V", "Russian", regex=True)

#replace with Cuban vaccins
country['vaccines'] = country['vaccines'].str.replace("Soberana02|Abdala", "Cuban", regex=True)

#replace some others
to_replace = {'Covaxin': 'Indian',
              'QazVac': 'Kazachstan'}
country['vaccines'] = country['vaccines'].replace(to_replace, regex=True)

In [8]:
#making a list of all vaccins
vac_list = [x.split(", ") for x in country.vaccines.values]
vaccins = [item for elem in vac_list for item in elem]
vaccins = set(vaccins)
vaccins = list(vaccins)
vaccins

['Chinese',
 'AstraZeneca',
 'Moderna',
 'Russian',
 'Cuban',
 'Kazachstan',
 'Johnson&Johnson',
 'Indian',
 'Pfizer']

Now, I want to add a column with True/False for each vaccin, and only keep the row with the most recent information for each country.

In [9]:
#adding a column with True/False for each vaccin
for vaccin in vaccins:
    country[vaccin] = np.where(country['vaccines'].str.contains(vaccin), True, False)

country = country.sort_values(by = ['country', 'date'], ascending = [True, False])
country_latest = country.drop_duplicates(subset = "country", keep = "first")

#head of selected columns only
country_latest.iloc[:, np.r_[0,12, 15:len(country_latest.columns)]].head()

Unnamed: 0,country,vaccines,Chinese,AstraZeneca,Moderna,Russian,Cuban,Kazachstan,Johnson&Johnson,Indian,Pfizer
133,Afghanistan,"Chinese, AstraZeneca, Pfizer",True,True,False,False,False,False,False,False,True
310,Albania,"AstraZeneca, Pfizer, Chinese, Russian",True,True,False,True,False,False,False,False,True
439,Algeria,"AstraZeneca, Russian",False,True,False,True,False,False,False,False,False
594,Andorra,"AstraZeneca, Pfizer",False,True,False,False,False,False,False,False,True
721,Angola,AstraZeneca,False,True,False,False,False,False,False,False,False


# 2. Which vaccines are used where?

## 2.1 Worldwide usage maps by vaccin

Below, I am making a function that allows me to specify which vaccin I want to display on a worldwide map and the color that I want to give to the countries using the vaccin.

In [10]:
def plot_vaccin(color, vaccin):
    fig = px.choropleth(country_latest, locations="iso_code",
                        color=vaccin,
                        hover_name="country",
                        color_discrete_map={True: color, False: 'lightgrey'})

    layout = go.Layout(
        title=go.layout.Title(
            text= f"<b>Countries using {vaccin} vaccin</b>",
            x=0.5
        ),
        showlegend=False,
        font=dict(size=14),
        width = 750,
        height = 350,
        margin=dict(l=0,r=0,b=0,t=30)
    )

    fig.update_layout(layout)

    fig.show()

As we can see, Pfizer is used a lot in the Western world.

In [11]:
plot_vaccin('red', 'Pfizer')

Moderna's footprint of countries is smaller as it is mostly the US, Canada and Europe.

In [12]:
plot_vaccin('green', "Moderna")

AstraZeneca is used in a lot of countries including most African and South American countries. Some noticable countries missing here are the US, South Africa and some European countries such as Norway and Denmark.

In [13]:
plot_vaccin('brown', "AstraZeneca")

Johnson & Johnson's footprint is smaller. Noticable is that South Africa uses Johnson & Johnson while not using AstraZenca (the opposite of most African countries).

In [14]:
plot_vaccin('orange', "Johnson&Johnson")

Chinese brands are used in a lot of Asian countries and also almost all South American countries use at least some Chinese vaccins.

In [15]:
plot_vaccin('blue', "Chinese")

Besides Russia itself, there is a significant number of countries also using Russian vaccins (I suppose mostly Sputnik as I had not heard about the other Russian vaccin yet).

In [16]:
plot_vaccin('yellow', "Russian")

The Indian vaccin (Covaxin) is used in India and a few other countries.

In [17]:
plot_vaccin('magenta', "Indian")

I had not heard about the Cuban vaccins yet, but it turns out that those are only used in Cuba itself.

In [18]:
plot_vaccin('goldenrod', "Cuban")

The same things applies for the vaccin made in Kazachstan: it is only used in the country itself.

In [19]:
plot_vaccin('darkblue', "Kazachstan")

## 2.2 Worldwide map plotting the number of vaccines used by country

As you can see, many countries use only one or two different vaccines.

In [20]:
country_latest['Vaccins_used']= country_latest.iloc[:, -9:].sum(axis=1)
country_latest.Vaccins_used.value_counts()

1    64
2    63
3    47
4    39
6     2
5     2
Name: Vaccins_used, dtype: int64

Noticable is that only Libya and Hungary use more than 4 different vaccines.

In [21]:
#function as I will make a similar map later
def plot_ww_numbers(data, color, hover_data, title):
    fig = px.choropleth(data, locations="iso_code",
                        color= color,
                        hover_data= hover_data)

    layout = go.Layout(title=go.layout.Title(
        text= f"<b>{title}</b>",
        x=0.5
        ),
        font=dict(size=14),
        width = 750,
        height = 350,
        margin=dict(l=0,r=0,b=0,t=30)
                      )

    fig.update_layout(layout)

    fig.show()


In [22]:
plot_ww_numbers(data = country_latest,
                color = 'Vaccins_used',
                hover_data= ["country", "vaccines"],
                title = 'Number of different vaccines used by country')

## 2.3 Usage of vaccines in Europe by country

As mentioned before, the file by manufacturer does not hold all countries. However, it does contain detailed info on most European countries. I decided to get rid of the 3 countries outside Europe and to dig deeper anyway.

In [23]:
non_european = ['Chile', 'United States', 'Uruguay']
manu = manu.query('location not in @non_european')
manu.location.nunique()

30

By just checking some info of my own country, I see that the numbers are clearly cumulatief.

In [24]:
manu[manu.location == "Netherlands"].tail(8)

Unnamed: 0,location,date,vaccine,total_vaccinations
6301,Netherlands,2021-06-25,Johnson&Johnson,527331
6302,Netherlands,2021-06-25,Moderna,1178312
6303,Netherlands,2021-06-25,Oxford/AstraZeneca,2719628
6304,Netherlands,2021-06-25,Pfizer/BioNTech,10862816
6305,Netherlands,2021-07-02,Johnson&Johnson,580987
6306,Netherlands,2021-07-02,Moderna,1229660
6307,Netherlands,2021-07-02,Oxford/AstraZeneca,2795756
6308,Netherlands,2021-07-02,Pfizer/BioNTech,11324449


In [25]:
manu_totals = manu.copy()

#keeping only the latest info
manu_totals = manu_totals.sort_values(by = ['location', 'date', 'vaccine'], ascending = [True, False, True])
manu_totals = manu_totals.drop(columns = "date")
manu_totals = manu_totals.drop_duplicates(subset = ['location', 'vaccine'], keep = "first")

manu_totals = manu_totals.pivot(index = ['location'], columns = 'vaccine', values = 'total_vaccinations')
manu_totals = manu_totals.fillna(0)
manu_totals.iloc[:, -6:] = manu_totals.iloc[:, -6:].astype(int)

#convert to percent
cols = list(manu_totals.columns)
manu_totals[cols] = manu_totals[cols].div(manu_totals[cols].sum(axis=1), axis=0).multiply(100)
manu_totals = manu_totals.round(1)
                                
#adding ISO codes from my own dataset https://www.kaggle.com/erikbruin/countries-of-the-world-iso-codes-and-population
countries_iso = pd.read_csv("./country_codes_2020.csv")
countries_iso = countries_iso.rename(columns = {'name': 'location', 'cca3': 'iso_code'})

manu_totals = manu_totals.reset_index()
manu_totals = manu_totals.merge(countries_iso[['location', 'iso_code']], on = 'location', how = "left")

However, this time I am interested to see what percentage of vaccines uses was mRNA (Pfizer and Moderna).

In [26]:
manu_totals['mRNA'] = manu_totals['Moderna'] + manu_totals['Pfizer/BioNTech']
manu_totals.head()

Unnamed: 0,location,Johnson&Johnson,Moderna,Oxford/AstraZeneca,Pfizer/BioNTech,Sinopharm/Beijing,Sputnik V,iso_code,mRNA
0,Austria,1.6,9.8,17.0,71.6,0.0,0.0,AUT,81.4
1,Belgium,2.4,9.2,18.4,70.1,0.0,0.0,BEL,79.3
2,Bulgaria,2.2,13.1,26.4,58.3,0.0,0.0,BGR,71.4
3,Croatia,0.9,10.7,18.8,69.6,0.0,0.0,HRV,80.3
4,Cyprus,1.8,8.8,25.1,64.3,0.0,0.0,CYP,73.1


In [27]:
fig = px.choropleth(manu_totals, locations="iso_code",
                    color='mRNA',
                    hover_data= ["location"])

layout = go.Layout(
    title=go.layout.Title(
        text= f"<b>Percent mRNA vaccines used by country</b>",
        x=0.5
    ),
    font=dict(size=14),
    autosize=False,
    width = 800,
    height = 600,
    margin=dict(l=0,r=0,b=0,t=30)
)
fig.update_geos(scope="europe")
fig.update_layout(layout)

fig.show()


We see that especially Hungary uses a lot less mRNA. This is because Hungary is the only European country that also uses Sputnik and Sinopharm.

In [28]:
manu_totals[manu_totals.location == "Hungary"]

Unnamed: 0,location,Johnson&Johnson,Moderna,Oxford/AstraZeneca,Pfizer/BioNTech,Sinopharm/Beijing,Sputnik V,iso_code,mRNA
11,Hungary,0.8,6.4,11.1,44.0,20.0,17.5,HUN,50.4


## 2.4 Usage of vaccines in Europe over time


In [29]:
manu = manu.sort_values(by=['location', 'vaccine', 'date'], ascending = [True, True, False])
manu = manu.reset_index(drop=True)

manu['new'] = 0
for i in range(len(manu)-1):
    if ((manu.loc[i+1, 'vaccine'] == manu.loc[i, 'vaccine']) & (manu.loc[i+1, 'location'] == manu.loc[i, 'location'])):
        manu.loc[i, 'new'] = (manu.loc[i, 'total_vaccinations'] - manu.loc[i+1, 'total_vaccinations'])
    else:
        manu.loc[i, 'new'] = manu.loc[i, 'total_vaccinations']
        
manu.head()

Unnamed: 0,location,date,vaccine,total_vaccinations,new
0,Austria,2021-07-06,Johnson&Johnson,130125,4362
1,Austria,2021-07-02,Johnson&Johnson,125763,24385
2,Austria,2021-06-25,Johnson&Johnson,101378,8381
3,Austria,2021-06-18,Johnson&Johnson,92997,14529
4,Austria,2021-06-11,Johnson&Johnson,78468,21394


In [30]:
new_vaccines = manu[['date', 'vaccine', 'new']]

df = new_vaccines[['date', 'vaccine']]
df.drop_duplicates(inplace = True)
df = df.sort_values(by = ['vaccine', 'date'], ascending = [True, False]).reset_index(drop=True)
df['before'] = 0

for i,row in df.iterrows():
    total_before = new_vaccines[((new_vaccines.vaccine == row.vaccine) & (new_vaccines.date <= row.date))]['new'].sum()
    df.loc[i, 'before'] = total_before
    
df = df.pivot(index= 'date', columns = 'vaccine', values = 'before')
df = df[['Johnson&Johnson', 'Moderna', 'Oxford/AstraZeneca', 'Pfizer/BioNTech']] #getting rid of Sputnik and Sinopharm
df.head()

vaccine,Johnson&Johnson,Moderna,Oxford/AstraZeneca,Pfizer/BioNTech
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-12-04,,1.0,,
2020-12-07,,,,1.0
2020-12-16,,,,2.0
2020-12-20,,4.0,,
2020-12-22,,,,4.0


As you can see below, Europe is increasingly relying on Pfizer. While Pfizer was used the most since the start of the vaccinations, the trend is also upward. The amounts used of Moderna and AstraZeneca are also significant. Johnson& Johnson is not used a lot yet.

In [31]:
df.iplot(mode = 'lines', xTitle = 'Date', yTitle = 'Total vaccinations', title = 'Cumulative vaccinations in Europe by vaccine')

# 3. Overall vaccination rates

## 3.1 Vaccination rates by country

In [32]:
#calculating population
country_latest['population'] = round((country['total_vaccinations']/country['total_vaccinations_per_hundred']*100), 0)

cols_to_keep = ['country', 'iso_code', 'total_vaccinations', 'total_vaccinations_per_hundred', 'population']
country_latest[cols_to_keep].sort_values(by = "total_vaccinations_per_hundred", ascending = [False]).head(10)

Unnamed: 0,country,iso_code,total_vaccinations,total_vaccinations_per_hundred,population
10182,Gibraltar,GIB,78125.0,231.89,33691.0
27623,United Arab Emirates,ARE,15574641.0,157.47,9890545.0
16545,Malta,MLT,685024.0,155.14,441552.0
4926,Cayman Islands,CYM,93748.0,142.65,65719.0
23754,Seychelles,SYC,138581.0,140.92,98340.0
22780,San Marino,SMR,44657.0,131.58,33939.0
12840,Isle of Man,IMN,110561.0,130.02,85034.0
22220,Saint Helena,SHN,7892.0,130.0,6071.0
3192,Bermuda,BMU,80722.0,129.63,62271.0
11887,Iceland,ISL,435373.0,127.58,341255.0


In [33]:
#rename to make name short for legend
country_latest = country_latest.rename(columns = {'total_vaccinations_per_hundred': 'per 100'})

#taking out very small countries at least gets rid of outlier Gibraltar (want to keep Emirates with nearly 10 million population)
plot_ww_numbers(data = country_latest[country_latest.population > 50000],
                color = 'per 100',
                hover_data= ["country"],
                title = 'Total vaccines given per country per 100 inhabitants')

## 3.2 Vaccination rates by continent

Adding the continents leads to some duplicates as for instance parts of Russia and Turkey are European and other parts Asia. To keep it simple, I am assigning those countries to Asia.

We see that vaccination rates in North America and Europa are high, and Africa is low. No surprises here. What is somewhat surprising to is is the low rate in Oceania as I was expecting the rich countries Australia and New Zealand to have high rates too.

In [34]:
#adding continent codes from this dataset https://www.kaggle.com/erikbruin/countries-of-the-world-iso-codes-and-population
cols_to_load = ['Three_Letter_Country_Code', 'Continent_Name']
continents = pd.read_csv("./country-and-continent-codes-list-csv.csv")[cols_to_load]
continents = continents.rename(columns = {'Three_Letter_Country_Code': 'iso_code'})

country_latest = country_latest.merge(continents, on = "iso_code", how = "inner")
continent_latest = country_latest.copy()
continent_latest = continent_latest[['Continent_Name', 'country', 'iso_code', 'population', 'total_vaccinations']]

#assign Russia, Turkey to Asia
continent_latest = continent_latest.sort_values(by = ['iso_code', 'Continent_Name'], ascending = [True, True])
continent_latest = continent_latest.drop_duplicates(subset = "iso_code", keep = "first")

#info from 3 countries is missing. Remove them to keep it simple
continent_latest = continent_latest[continent_latest.total_vaccinations.notna()]

continent_latest[['population', 'total_vaccinations']] = continent_latest[['population', 'total_vaccinations']].astype(int)

continent_latest['continent_population'] = continent_latest.groupby('Continent_Name')['population'].transform('sum')
continent_latest['continent_vaccinations'] = continent_latest.groupby('Continent_Name')['total_vaccinations'].transform('sum')
continent_latest['per 100'] = round(((continent_latest['continent_vaccinations']/continent_latest['continent_population'])*100), 1)

In [35]:
plot_ww_numbers(data = continent_latest,
                color = 'per 100',
                hover_data= ["country", "Continent_Name"],
                title = 'Total vaccines given per continent per 100 inhabitants')

**To be continued: Please stay tuned!**