# COVID-19 World Vaccination Progress(EDA) 📊
In this notebook we are going to answer some of the question related to the COVID-19 World Vaccination that is going overall the world for controlling the pandemic started in the year 2019. So, we perform Exploratory Data Analysis to understand the different features in the dataset and what factors are responsible in making the vaccination rate higher in any particular country. At the end of the EDA, we also make a report related to the prediction and the understanding we extract from the dataset while performing EDA.

Before moving for analysis, we make some of the key steps that we follow in this notebook to make ourself on the track for what we are doing.
* Load the Dataset.
* Understand the Dataset i.e. Mean, Median, Min, Max, etc.
* Handle Missing Values.
* Normalize the Dataset where required.
* Perform Feature Engineering.
* Analysis the Dataset and make prediction.

# Content
This is the description of the labels used in the Dataset. This make easy to understand the dataset and make prediction.

| Features | Description |
| -------- | ----------- |
| Country | this is the country for which the vaccination information is provided |
| Country ISO Code | ISO code for the country |
| Date | date for the data entry; for some of the dates we have only the daily vaccinations, for others, only the (cumulative) total |
| Total number of vaccinations | this is the absolute number of total immunizations in the country |
| Total number of people vaccinated | a person, depending on the immunization scheme, will receive one or more (typically 2) vaccines |
| Total number of people fully vaccinated | this is the number of people that received the entire set of immunization according to the immunization scheme (typically 2) |
| Daily vaccinations (raw) | for a certain data entry, the number of vaccination for that date/country |
| Daily vaccinations | for a certain data entry, the number of vaccination for that date/country |
| Total vaccinations per hundred | ratio (in percent) between vaccination number and total population up to the date in the country |
| Total number of people vaccinated per hundred | ratio (in percent) between population immunized and total population up to the date in the country |
| Total number of people fully vaccinated per hundred | ratio (in percent) between population fully immunized and total population up to the date in the country |
| Number of vaccinations per day | number of daily vaccination for that day and country |
| Daily vaccinations per million | ratio (in ppm) between vaccination number and total population for the current date in the country |
| Vaccines used in the country | total number of vaccines used in the country (up to date) |
| Source name | source of the information (national authority, international organization, local organization etc.) |
| Source website | website of the source of information |

# Load the Dataset

In [None]:
# import the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set_style('whitegrid')
import warnings
warnings.filterwarnings('ignore')
import missingno
import geopandas

In [None]:
df = pd.read_csv('../input/covid-world-vaccination-progress/country_vaccinations.csv')
df.head()

In [None]:
print(f"Length of the dataset is: {len(df)}")

In [None]:
df.describe()

In [None]:
missingno.bar(df);

So, seven columns in the dataset have the null values. and `people_fully_vaccinated` and `people_fully_vaccinated_per_hundered` have the maximum null values.

# Hangling the Missing Values
In this section, we fill all the missing values present in the dataset. This makes our dataset more consistent for performing the EDA.

In [None]:
df.head()

In [None]:
data = df.isna().sum()/len(df)
for i in range(len(data)):
    if data[i] != 0.0:
        print(data.index[i], ": ", data[i])

In most of the cases we cannot say that, at the particular date how many people are accuared by the vaccination. So, keep the dataset stable, we fill the missing value with the 0, considoring that at a particular date no one got the vaccine. For the reason, we find out when we perform the EDA.

In [None]:
df.total_vaccinations.fillna(0.0, inplace=True)
df.people_vaccinated.fillna(0.0, inplace=True)
df.people_fully_vaccinated.fillna(0.0, inplace=True)
df.daily_vaccinations_raw.fillna(0.0, inplace=True)
df.total_vaccinations_per_hundred.fillna(0.0, inplace=True)
df.people_vaccinated_per_hundred.fillna(0.0, inplace=True)
df.people_fully_vaccinated_per_hundred.fillna(0.0, inplace=True)
df.dropna(inplace=True)

In [None]:
df.isna().sum()/len(df)

So, now we don't have any missing value in our dataset. Currently we fill all the missing value with the 0.0 as we don't have idea about the value to be filled in place of the missing value.

# Feature Engineering 
In this section, we create new fields from the existing dataset so that we can make our prediction more clean and understandable.

In [None]:
df.head()

We can perform the feature engineering over:
* `date` column
* `vaccines` column

In [None]:
date_month = [date.split('-')[1] for date in df.date]
df['date_month'] = date_month

In [None]:
vaccines = []
for vaccine in df.vaccines:
    vaccines.extend(vaccine.split(','))
vaccines_update = {}
for i in vaccines:
    vaccines_update[i] = vaccines.count(i)
vaccines_update

# Exploratory Data Analysis
In this section, we perform the EDA to understand the dataset and make the possible prediction at the end of the notebook. But before that we create some set of questions which we anser while performing the EDA.

**Question**
* What vaccines are used and in which countries?
* What country is vaccinated more people?
* What country is vaccinated a larger percent from its population?
* In which country the vaccination programme is more advanced?
* Where are vaccinated more people per day? But in terms of percent from entire population ?
* In which month the number of people who vaccinated is most?
* How many people are fully vaccinated (country wise)?

In [None]:
df.head()

In [None]:
data = df.groupby('country')['vaccines'].unique()
data_df = pd.DataFrame(data)


In [None]:
data_df.style

In [None]:
plt.figure(figsize=(20, 6))
df.groupby('country')['total_vaccinations'].max().sort_values(ascending=False)[:50].plot(kind='bar');

So, this grapg shows that `United States`, `United Kingdom`, `England`, `India` and `China` are the top five country that are vaccinating maximum people in a day.

In [None]:
plt.figure(figsize=(20, 6))
data = df.groupby('country')['total_vaccinations_per_hundred'].max().sort_values(ascending=False)[:50]
plt.bar(data.index, data);
plt.xticks(rotation='90')
plt.yticks(np.arange(10.0, 160.0, step=20.0))
plt.title('Country vise Total Vaccination per Hundred', fontsize=24, fontweight='bold')
plt.ylabel('total vaccination per hundred');

`Gibraltar` is with the most people vaccinated (per hundred). May be due to less population (33,701 in 2019).

In [None]:
plt.figure(figsize=(20, 6))
df.groupby('country')['daily_vaccinations_per_million'].max().sort_values(ascending=False)[:50].plot(kind='bar')
plt.ylabel('daily vaccinations per million')
plt.title('Daily Vaccination in Country (in ppm)', fontsize=24, fontweight='bold');


`Falkland Islands` is the leading country in vaccinated the people daily. It may be less population on islands so they cover there most of the people in getting the part the vaccination program.

In [None]:
data = df.groupby('date_month')['daily_vaccinations'].sum().sort_values(ascending=False)
plt.bar(data.index, data);
plt.xlabel('month');
plt.ylabel('total vaccination in month')
plt.title('Total Vaccination in a Month all over the World', fontsize=11, fontweight='bold');

In March, maximum people got vaccinated all over the month. As most of the country in March have a vaccine so they try to vaccinate there population as much it is possible.

In [None]:
plt.figure(figsize=(20, 6))
df.groupby('country')['people_fully_vaccinated_per_hundred'].max().sort_values(ascending=False)[:50].plot(kind='bar')
plt.ylabel('people_fully_vaccinated_per_hundred')
plt.title('People fully vaccinated in Country', fontsize=24, fontweight='bold');

In `Gabraltar`, maximum people got vaccinated all over the world. As it have less population as comparison to others so it covers its all population more fastly as others. 

# Summary
Vaccination program in all over the world is going at a high rate. In some country, they cover there population at a higher rate as comparisons to other countries while in some country the program is started in a month. In some part of the world, they vaccinated there population from the month of December (2020). In some undeveloped country, it is still not started but few people got the vaccinated (may be due to high profile).