# <center> COVID-19 World Vaccination Progress - Data Visualization And Analysis

![width=100](http://images.financialexpress.com/2020/11/Covid-19-vaccine-reuters.jpg)

 # <center>Overview

**This notebook explored the COVID-19 world vaccination progress based on different regions**. 

This is a basic exploratory data analysis of the Covid-19 vaccination drive that’s going around the world. Since past 11 months we have been sitting in our rooms and now we are getting some hopes in terms of COVID-19 vaccinations.

* This data set starts on 2020-12-13, which means the vaccination according to it starts on this specific date and captures the data upto 2021-02-13.

* Till date the vaccination drive is going around the world in 78 different countries with more than 10 types of vaccines are being used.

Hopefully this drive will bring success and save millions of lives around the world.

I have done an exploratory data analysis with 3D graphs and charts.

*If you like this project then don't forget to upvote👍*

# <center> Data Overview

This dataset includes information about:

**Country** - this is the country for which the vaccination information is provided;

**Country ISO Code** - ISO code for the country;

**Date**- date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total;

**Total number of vaccinations** - this is the absolute number of total immunizations in the country;

**Total number of people vaccinated** - a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people;

**Total number of people fully vaccinated**- this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2); at a certain moment in time, there might be a certain number of people that received one vaccine and another number (smaller) of people that received all vaccines in the scheme;

**Daily vaccinations (raw)** - for a certain data entry, the number of vaccination for that date/country;

**Daily vaccinations** - for a certain data entry, the number of vaccination for that date/country;

**Total vaccinations per hundred** - ratio (in percent) between vaccination number and total population up to the date in the country;

**Total number of people vaccinated per hundred** - ratio (in percent) between population immunized and total population up to the date in the country;

**Total number of people fully vaccinated per hundred** - ratio (in percent) between population fully immunized and total population up to the date in the country;

**Number of vaccinations per day** - number of daily vaccination for that day and country;

**Daily vaccinations per million** - ratio (in ppm) between vaccination number and total population for the current date in the country;

**Vaccines used in the country** - total number of vaccines used in the country (up to date);
Source name - source of the information (national authority, international organization, local organization etc.);

**Source website** - website of the source of information;

### IMPORT:

In [None]:
import pandas as pd
import numpy as np


#for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

import cufflinks as cf
import plotly.graph_objs as go
import plotly.express as px
sns.set_style('whitegrid')
%matplotlib inline

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot 
# to get the connection 
init_notebook_mode(connected = True) 

# plotly also serves online, 
# but we are using just a sample 
cf.go_offline 

In [None]:

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



In [None]:
covid_df = pd.read_csv('/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv')
covid_df

### Data Cleaning

In [None]:
#let's drop source_name	and source_website
covid_df.drop(['source_name','source_website'], axis=1, inplace = True)
covid_df

In [None]:
covid_df.head()

In [None]:
covid_df.isnull().sum(axis=0)

In [None]:
covid_df.shape

In [None]:
covid_df.dropna(subset=["total_vaccinations", "people_vaccinated"], how="all", inplace=True)
covid_df

In [None]:
covid_df.isnull().sum()

**So we are going to answer these 3 questions using exploratory data analysis.**

1. [What vaccines are used and in which countries?](http://www.kaggle.com/dataay/notebookb1bd0495c6/edit/)
2. [What country is vaccinated more people?](http://www.kaggle.com/dataay/notebookb1bd0495c6/edit/)
3. [What country is vaccinated a larger percent from its population?](http://www.kaggle.com/dataay/notebookb1bd0495c6/edit/)

### <center> **[What vaccines are used and in which countries?](http://www.kaggle.com/dataay/notebookb1bd0495c6/edit/)**

#### Hover and Zoom on the country to view the details of vaccines being used

In [None]:
vaccinebycountry_df = covid_df[['country','iso_code','vaccines']]
vaccinebycountry_df                         

In [None]:
vaccinesbycountry_grd = vaccinebycountry_df.groupby('country').max()
vaccinesbycountry_grd

In [None]:
fig = px.choropleth(vaccinesbycountry_grd, locations='iso_code', projection='natural earth',
                   color=vaccinesbycountry_grd.index, hover_name='vaccines')

fig.update_layout(title="Vaccines used by each Country")
iplot(fig)

#Hover and Zoom on the country to view the details of vaccines being used

#### Let's explore what the most commonly used vaccination scheme. Here are top-5 that are the most used.



In [None]:
vaccines = covid_df.groupby(['vaccines','date']).sum().reset_index()
vaccines_top5 = vaccines.groupby('vaccines').max()['total_vaccinations'].reset_index()
vaccines_top5 = vaccines_top5.nlargest(5, columns=['total_vaccinations'])
vaccines_top5

In [None]:
fig = px.bar(x=vaccines_top5['vaccines'], y=vaccines_top5['total_vaccinations'],
            color=vaccines_top5['vaccines'],
            color_discrete_sequence = px.colors.sequential.Viridis[1:][::2])
fig.show()

#### So, the most commonly used vaccination scheme is "Johnson&Johnson, Moderna, Pfizer/BioNTech". And, as we can see, Pfizer/BioNTech vaccine presents in 3/5 the most used vaccination schemes.



### Let's look at total vaccines used by each country

In [None]:
vaccinebycountry_df = covid_df[['country','iso_code','vaccines', 'total_vaccinations']]
vaccinebycountry_df


In [None]:
total_vaccinations= vaccinebycountry_df.groupby(['country']).max()[['total_vaccinations'
                     ,'vaccines','iso_code']].reset_index()
total_vaccinations

In [None]:
fig = px.choropleth(total_vaccinations, locations = 'country', locationmode = 'country names',color = 'vaccines',
                   title = 'total Vaccines used for each country',hover_data= ['total_vaccinations'],
                    color_discrete_map=dict(zip(total_vaccinations['vaccines'], px.colors.sequential.Viridis)),
                   labels={'vaccines': 'Name of vaccine', 'country': 'Country',
                           'total_vaccinations': 'Number of vaccinations'})
fig.update_geos(
    visible=True, 
    resolution=50,
    showcountries=True, 
    countrycolor="darkgrey"
    )
fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
)
fig.show()

#### We can see the large green area of Sputnik V vaccine, but it is used only in Russia and in Argentina. The most European Middle Eastern and North American countries use Pfizer/BioNTech. CNBG, Sinovac is second most used vaccination scheme, but it is used only in China. The Oxford/AstraZeneca vaccine is distributed in United Kingdom, Brazil, Myanmar and India.



In [None]:
countries = covid_df.country.unique()
print(countries)
print(len(countries))

### <center> **[What country is vaccinated more people?](http://www.kaggle.com/dataay/notebookb1bd0495c6/edit/)**

**Now, let's see what countries vaccinate the most people. Here is the top-10 countries by number of vaccinations.**



In [None]:
total_vaccinations_top10 = covid_df.groupby(['country']).max()["total_vaccinations"]
total_vaccinations_top10 = total_vaccinations_top10.sort_values(ascending=False)[:10]
total_vaccinations_top10

#### As we can see in the graph , USA vaccinated the most people (150.3 millions). China has slightly less vaccinations than the United States, but many times more than other countries. In Europe leaders in number of vaccinations are Germany and United Kingdom, which includes England.

In [None]:
fig = px.bar(x=total_vaccinations_top10.index ,y=total_vaccinations_top10.values,
            color=total_vaccinations_top10.index,
            labels={"x": "country", "y": "total vaccinations"},
            color_discrete_sequence =px.colors.sequential.Viridis)

fig.show()

### <center> [What country is vaccinated a larger percent from its population?](http://www.kaggle.com/dataay/notebookb1bd0495c6/edit/)

### As we can see before, USA and China vaccinate the most people, but what about how many people are vaccinate per hundred of population?

In [None]:
total_vac_hundrd = covid_df[['country','iso_code', 'total_vaccinations_per_hundred']]
total_vac_hundrd['total_vaccinations_per_hundred'] = total_vac_hundrd['total_vaccinations_per_hundred'].fillna(0)
total_vac_hundrd = total_vac_hundrd.groupby(['country', 'iso_code']).max().reset_index()

In [None]:
def create_choropleth(loc,z,text, title):
    fig = go.Figure(data=go.Choropleth(
          locations = loc,
           z=z,
           text=text,
           colorscale = 'viridis',
           autocolorscale=False,
           reversescale=True,
           marker_line_color='white',
           marker_line_width=0.5
           
     ))
    
    fig.update_geos(
         visible=True,
         resolution=50,
         showcountries=True,
         countrycolor = 'darkgrey'
         )
    
    fig.update_layout(
         title_text = title,
         geo=dict(
             showframe=False,
             showcoastlines=False,
             projection_type='natural earth'
     )) 
    
    fig.show()
    

In [None]:
create_choropleth(total_vac_hundrd['iso_code'],
                  total_vac_hundrd['total_vaccinations_per_hundred'], 
                  total_vac_hundrd['country'], 
                  'Total vaccinations per hundred')

#### The most dark colors have Israel and UAE. It means that this countries have the largest percent of vaccinated people. In USA there are only 44 people per hundred are vaccinated, for China it is 8.

### What about how many people are fully vaccinated per hundred of population?







In [None]:
fully_vac_hundrd = covid_df[['country', 'iso_code', 'people_fully_vaccinated_per_hundred']]
fully_vac_hundrd['people_fully_vaccinated_per_hundred'] = fully_vac_hundrd['people_fully_vaccinated_per_hundred'].fillna(0)
fully_vac_hundrd = fully_vac_hundrd.groupby(['country', 'iso_code']).max().reset_index()

In [None]:
create_choropleth(fully_vac_hundrd['iso_code'], 
                  fully_vac_hundrd['people_fully_vaccinated_per_hundred'], 
                  fully_vac_hundrd['country'], 
                  'People fully vaccinated per hundred')

###### Fully vaccinated is the number of people that received the entire set of immunization according to the immunization scheme . Now, the largest percent of fully vaccinated people in Israel and UAE. But we have no data about fully vaccinated for other countries.

### On the next choropleth you can see how many people per million are vaccinated every day in different countries.

In [None]:
dayly_vac_million = covid_df[['country', 'iso_code', 'daily_vaccinations_per_million']]
dayly_vac_million['daily_vaccinations_per_million'] = dayly_vac_million['daily_vaccinations_per_million'].fillna(0)
dayly_vac_million = dayly_vac_million.groupby(['country', 'iso_code']).max().reset_index()


create_choropleth(dayly_vac_million['iso_code'], 
                  dayly_vac_million['daily_vaccinations_per_million'], 
                  dayly_vac_million['country'], 
                  'Daily vaccinations per million')


##### Daily vaccinations per million is high in UAE and israel.

### Let's explore which countries have daily vaccinations the most.

In [None]:
dayly_vac = covid_df[['country', 'iso_code', 'daily_vaccinations']]
dayly_vac['daily_vaccinations'] = dayly_vac['daily_vaccinations'].fillna(0)
dayly_vac = dayly_vac.groupby(['country', 'iso_code']).max().reset_index()


create_choropleth(dayly_vac['iso_code'], 
                  dayly_vac['daily_vaccinations'], 
                  dayly_vac['country'], 
                  'Daily vaccinations')

##### **More tha 1 million of people are vaccinated in USA, China And India. This is the best result among other countries, but due to large number of population this countries still have not big percent of vaccinated people.**

### Let's look at highly vaccinated countries

In [None]:
total_vac_hundred = covid_df[['country', 'total_vaccinations_per_hundred' ]]
total_vac_hundred['total_vaccinations_per_hundred'] = total_vac_hundred['total_vaccinations_per_hundred'].fillna(0
                                                                                                                )
total_vac_hundred.head()



In [None]:
total_vac_hundred = total_vac_hundred.groupby('country').mean()
total_vac_hundred.head()

In [None]:
#Sorting the data based on total_vaccinations_per_hundred from higher to lower
total_vac_hundred.sort_values(by="total_vaccinations_per_hundred", ascending=False, inplace=True)
total_vac_hundred

In [None]:
trace = go.Bar(x=total_vac_hundred.index[:25], y=total_vac_hundred["total_vaccinations_per_hundred"], 
                marker=dict(
                  color=np.arange(26)
              ))

fig = go.Figure(data=[trace])
fig.update_layout(title="Top 25 highly vaccinated countries (as per population)")
fig.update_xaxes(title="Country")
fig.update_yaxes(title="Vaccination per hundred")

iplot(fig)

##### We can observe that Israel, UAE, etc. have a higher ratio of total vaccinations per hundred as compared to USA, UK.



## <center> Upvote if you like this notebook! Looking forward for your feedback😊
