<h1><center>COVID-19 World Vaccination Progress - data visualization and analysis</center></h1>

<center><img width="800" height="600" src="https://www.aeccglobal.co.th/wp-content/uploads/2020/12/shutterstock_1716494026.jpg"></center>

<a id="top"></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:130%;
           font-family:Verdana;
           letter-spacing:0.5px" role="tab" aria-controls="home"><center>Navigation</center></h3>

[1. Data overview](#1)<br><br>
[2. Data preparing](#2)<br><br>
[3. Leaders in the number of vaccinations](#3)<br><br>
[4. Day by day progress by country](#4)<br><br>
[5. What vaccines are used and in which countries?](#5)<br><br>
[6. Vaccination progress on choropleth](#6)<br><br>

<a id="1"></a>
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:150%;
           font-family:Verdana;
           letter-spacing:0.5px" role="tab" aria-controls="home"><center>Data overview</center></h2>

This dataset includes information about:  

* **Country** - this is the country for which the vaccination information is provided;     
* **Country ISO Code** - ISO code for the country;   
* **Date**- date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total;   
* **Total number of vaccin ations** - this is the absolute number of total immunizations in the country;  
* **Total number of people vaccinated** - a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people;  
* **Total number of people fully vaccinated** - this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;  
* **Daily vaccinations (raw)** - for a certain data entry, the number of vaccination for that date/country;  
* **Daily vaccinations** - for a certain data entry, the number of vaccination for that date/country;  
* **Total vaccinations per hundred** - ratio (in percent) between vaccination number and total population up to the date in the country;  
* **Total number of people vaccinated per hundred** - ratio (in percent) between population immunized and total population up to the date in the country;  
* **Total number of people fully vaccinated per hundred** - ratio (in percent) between population fully immunized and total population up to the date in the country;   
* **Number of vaccinations per day** - number of daily vaccination for that day and country;   
* **Daily vaccinations per million** -  ratio (in ppm) between vaccination number and total population for the current date in the country;    
* **Vaccines used in the country** - total number of vaccines used in the country (up to date);    
* **Source name** - source of the information (national authority, international organization, local organization etc.);   
* **Source website** - website of the source of information;  

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px
import plotly.graph_objects as go
from datetime import timedelta


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

pd.options.mode.chained_assignment = None

Let's take a look at the data

In [None]:
data = pd.read_csv("/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv")
data.head()

Some of columns are not important for vaccination progress analysis. So, let's drop it

In [None]:
data.drop(['source_name', 'source_website'], axis = 1, inplace = True)
data.head()

As we can see, there are a lot of nan values in dataset. For correct visualization some of columns are needed to be interpolate 

In [None]:
data.isnull().sum(axis=0)

<a id="2"></a>
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:150%;
           font-family:Verdana;
           letter-spacing:0.5px" role="tab" aria-controls="home"><center>Data preparing</center></h2>

To check progress of vaccination for some countries we need more data about daily vaccinations, number of fully vaccinated people and total vaccinations for some date. Now, we need to interpolate total_vaccinations column.

In [None]:
def fix_total(data_to_fix): 
    
    data_to_fix['date'] = pd.to_datetime(data_to_fix['date'])
    max_date = pd.to_datetime(data_to_fix['date']).max()
    
    for country in data_to_fix.country.unique():
        interpolation = data_to_fix.loc[data_to_fix['country'] == country, 'total_vaccinations'].interpolate(dtype=pd.Int64Dtype()) 
        data_to_fix.loc[data_to_fix['country'] == country, 'total_vaccinations'] = interpolation.apply(np.floor)
        cur_row = data_to_fix.loc[(data_to_fix['country'] == country) & (data_to_fix['date'] == \
                                      data_to_fix.loc[data_to_fix['country'] == country, 'date'].max())]
        
        for day in range(1, (max_date - data_to_fix.loc[data_to_fix['country'] == country, 'date'].max()).days + 1):
            next_row = cur_row.copy()
            next_row['date'] = cur_row['date'] + timedelta(days=day) 
            data_to_fix = data_to_fix.append(next_row)
    
    return data_to_fix

In [None]:
vaccinations = fix_total(data[["country", "date", "total_vaccinations", "daily_vaccinations", "vaccines"]])
vaccinations["date"] = pd.to_datetime(vaccinations["date"])
vaccinations = vaccinations.set_index(["country"])
vaccinations.head()

<a id="3"></a>
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:150%;
           font-family:Verdana;
           letter-spacing:0.5px" role="tab" aria-controls="home"><center>Leaders in the number of vaccinations</center></h2>

Now, let's see what countries vaccinate the most people. Here is the top-10 countries by number of vaccinations.

In [None]:
total_vaccinations_top10 = vaccinations.groupby(['country']).max()["total_vaccinations"]
total_vaccinations_top10 = total_vaccinations_top10.sort_values(ascending=False)[:10]

In [None]:
fig = px.bar(x=total_vaccinations_top10.index, y=total_vaccinations_top10.values,
             color=total_vaccinations_top10.index,
             labels={"x": "country", "y": "total vaccinations"},
             color_discrete_sequence =px.colors.sequential.Viridis)
fig.show()

As we can see, USA vaccinate the most people (27.8 millions). China has slightly less vaccinations than the United States, but many times more than other countries. In Europe leaders in number of vaccinations are Germany, Italy and United Kingdom, which includes England.

<a id="4"></a>
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:150%;
           font-family:Verdana;
           letter-spacing:0.5px" role="tab" aria-controls="home"><center>Day by day progress by country</center></h2>

Let's see how changes daily number of vaccinations in each country. Using drop down menu you can choose country for what you want see day by day progress.

In [None]:
def create_dropdown_scatter(feature,title):
    fig=go.Figure()

    country_plot_names = []
    buttons=[]

    default_country = "Argentina"
    days_of_vacc = vaccinations.groupby('country').count()['date']

    for country in days_of_vacc[days_of_vacc > 2].index.unique():
        fig.add_trace(go.Scatter(x=vaccinations.loc[country]["date"], 
                                 y=vaccinations.loc[country][feature], 
                                 visible=(country==default_country),
                                 name=country,
                                 mode="lines+markers",
                                 hovertemplate="Date: %{x}<br>Value: %{y}"))
        country_plot_names.extend([country])

    for country in vaccinations.index.unique():
        buttons.append(dict(method='update',
                            label=country,
                            args = [{'visible': [country == r for r in country_plot_names]}]))

    # Add dropdown menus to the figure
    fig.update_layout(showlegend=False, 
                      updatemenus=[{"buttons": buttons, "direction": "down", "active": 0, "showactive": True, "x": 0.5, "y": 1.15}],
                      title={"text": title})
    fig.show()


In [None]:
create_dropdown_scatter("daily_vaccinations", "Daily vaccinations by country")

This plot shows how changes total number of vaccinations in each country. Don't forget to use drop down menu)

In [None]:
create_dropdown_scatter("total_vaccinations", "Total vaccinations by country")

<a id="5"></a>
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:150%;
           font-family:Verdana;
           letter-spacing:0.5px" role="tab" aria-controls="home"><center>What vaccines are used and in which countries?</center></h2>

Let's explore what the most commonly used vaccination scheme. Here are top-5 that are the most used.

In [None]:
vaccines = vaccinations.groupby(['vaccines', 'date']).sum().reset_index()
vaccines_top5 = vaccines.groupby("vaccines").max()['total_vaccinations'].reset_index()
vaccines_top5 = vaccines_top5.nlargest(5, columns=['total_vaccinations'])
vaccines_top5

So, the most commonly used vaccination scheme is "Moderna, Pfizer/BioNTech". And, as we can see, Pfizer/BioNTech vaccine presents in 3/5 the most used vaccination schemes.

In [None]:
fig = px.bar(x = vaccines_top5['vaccines'], y=vaccines_top5['total_vaccinations'], 
             labels={"x": "country", "y": "total vaccinations"},
             color=vaccines_top5['vaccines'],
             color_discrete_sequence =px.colors.sequential.Viridis[1:][::2])
fig.show()

Now, let's explore which vaccine is used in which countries.

In [None]:
total_vaccinations = vaccinations.groupby(['country']).max()[["total_vaccinations", "vaccines"]].reset_index()
total_vaccinations

In [None]:
fig = px.choropleth(total_vaccinations, locations = 'country',locationmode = 'country names',color = 'vaccines',
                   title = 'Vaccines used for each country',hover_data= ['total_vaccinations'],
                   color_discrete_map=dict(zip(total_vaccinations['vaccines'], px.colors.sequential.Viridis)),
                   labels={'vaccines': 'Name of vaccine', 'country': 'Country', 'total_vaccinations': 'Number of vaccinations'})
fig.update_geos(
    visible=True, 
    resolution=50,
    showcountries=True, 
    countrycolor="darkgrey"
    )
fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
)
fig.show()

We can see the large violet area of Sputnik V vaccine, but it is used only in Russia and in Argentina. The most European  Middle Eastern and North American countries use Pfizer/BioNTech. CNBG, Sinovac is second most used vaccination scheme, but it is used only in China. The Oxford/AstraZeneca vaccine is distributed in United Kingdom, Brazil, Myanmar and India. 

Let's check daily progress of using each vaccinations scheme.

In [None]:
fig=go.Figure()


title = "Total vaccinations by vaccines"

for vaccine in vaccines['vaccines'].unique():
    data_vaccine = vaccines[vaccines['vaccines'] == vaccine]
    # We have two traces we're plotting per state: a boxplot of the submission quartiles, and a line with the current data to-date
    fig.add_trace(go.Scatter(x=data_vaccine['date'], 
                             y=data_vaccine['total_vaccinations'], 
                             name=vaccine,
                             mode="lines+markers",
                             hovertemplate="Date: %{x}<br>Value: %{y}"))

# Add dropdown menus to the figure
fig.update_layout(title={"text": title})
fig.show()

<a id="6"></a>
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:150%;
           font-family:Verdana;
           letter-spacing:0.5px" role="tab" aria-controls="home"><center>Vaccination progress on choropleth</center></h2>

As we can see before, USA and China vaccinate the most people, but what about how many people are vaccinate per hundred of population?

In [None]:
total_vac_hundred = data[['country', 'iso_code', 'total_vaccinations_per_hundred']]
total_vac_hundred['total_vaccinations_per_hundred'] = total_vac_hundred['total_vaccinations_per_hundred'].fillna(0)
total_vac_hundred = total_vac_hundred.groupby(['country', 'iso_code']).max().reset_index()

In [None]:
def create_choropleth(loc, z, text, title):
    fig = go.Figure(data=go.Choropleth(
        locations = loc,
        z = z,
        text = text,
        colorscale = px.colors.sequential.speed[::-1][::2][1:5],
        autocolorscale=False,
        reversescale=True,
        marker_line_color='darkgray',
        marker_line_width=0.5,
    ))
    
    fig.update_geos(
        visible=True, 
        resolution=50,
        showcountries=True, 
        countrycolor="darkgrey"
        )

    fig.update_layout(
        title_text=title,
        geo=dict(
            showframe=False,
            showcoastlines=False,
            projection_type='equirectangular'
        )
    )

    fig.show()

In [None]:
create_choropleth(total_vac_hundred['iso_code'], 
                  total_vac_hundred['total_vaccinations_per_hundred'], 
                  total_vac_hundred['country'], 
                  'Total vaccinations per hundred')

The most bright color have Israel and UAE. It means that this countries have the largest percent of vaccinated people in countrie. For Israel it is 53% and 30% for UAE. In USA there are only 9 people per hundred are vaccinated, for China it is 1.

In [None]:
fully_vac_hundred = data[['country', 'iso_code', 'people_fully_vaccinated_per_hundred']]
fully_vac_hundred['people_fully_vaccinated_per_hundred'] = fully_vac_hundred['people_fully_vaccinated_per_hundred'].fillna(0)
fully_vac_hundred = fully_vac_hundred.groupby(['country', 'iso_code']).max().reset_index()

In [None]:
create_choropleth(fully_vac_hundred['iso_code'], 
                  fully_vac_hundred['people_fully_vaccinated_per_hundred'], 
                  fully_vac_hundred['country'], 
                  'People fully vaccinated per hundred')

Fully vaccinated  is the number of people that received the entire set of immunization according to the immunization scheme (typically 2). Now, the largest percent of fully vaccinated people in Israel and UAE. But we have no data about fully vaccinated for other countries.

On the next choropleth you can see how many people per million are vaccinated every day in different countries.

In [None]:
dayly_vac_million = data[['country', 'iso_code', 'daily_vaccinations_per_million']]
dayly_vac_million['daily_vaccinations_per_million'] = dayly_vac_million['daily_vaccinations_per_million'].fillna(0)
dayly_vac_million = dayly_vac_million.groupby(['country', 'iso_code']).max().reset_index()

In [None]:
create_choropleth(dayly_vac_million['iso_code'], 
                  dayly_vac_million['daily_vaccinations_per_million'], 
                  dayly_vac_million['country'], 
                  'Daily vaccinations per million')

Let's explore which countries have  daily vaccinations the most.

In [None]:
dayly_vac = data[['country', 'iso_code', 'daily_vaccinations']]
dayly_vac['daily_vaccinations'] = dayly_vac['daily_vaccinations'].fillna(0)
dayly_vac = dayly_vac.groupby(['country', 'iso_code']).max().reset_index()

In [None]:
create_choropleth(dayly_vac['iso_code'], 
                  dayly_vac['daily_vaccinations'], 
                  dayly_vac['country'], 
                  'Daily vaccinations')

More tha 1 million of people are vaccinated in USA and China. This is the best result among other countries, but due to large number of population this countries still have not big percent of vaccinated people.

I will keep updating this notebook ⏳

<center><h1>Upvote if you like this notebook! Looking forward for your feedback</h1></center>
