# COVID-19 Vaccination Data Analysis

This project is about "**COVID-19 World Vaccination Progress**" Data Analysis with Python. Collected this Dataset from "Kaggle" which is the world's largest data science community with powerful tools and resources.

![COVID-19 Vaccination image](https://assetsds.cdnedge.bluemix.net/sites/default/files/styles/very_big_1/public/feature/images/2021/01/11/covid-vaccine.jpg?itok=xMzn_yrM)

This dataset contains 35310 rows and 15 columns which is really informaive to analysis. In this project,an attempt has been made to analyze various information of COVID-19 World Vaccination Progress such as country, total_Vaccinations, people_vaccinated,	daily_vaccinations total_vaccinations_per_hundred, people_vaccinated_per_hundred, people_fully_vaccinated_per_hundred,	 vaccines and many more.

Library Used:

* pandas
* matplotlib
* seaborn

## Data Preparation and Cleaning




> - Load the dataset into a data frame using Pandas
> - Explore the number of rows & columns, ranges of values etc.
> - Handle missing, incorrect and invalid data

In [None]:
import pandas as pd

In [None]:
vaccinations_df = pd.read_csv('../input/covid-world-vaccination-progress/country_vaccinations.csv')

In [None]:
vaccinations_df

In [None]:
vaccinations_df.info()

In [None]:
vaccinations_df.columns

In [None]:
vaccinations_df.shape

In [None]:
vaccinations_df.describe()

In [None]:
vaccinations_df.isnull().sum()

In [None]:
vaccinations_df.fillna(value=0, inplace=True)
date = vaccinations_df.date.str.split('-', expand=True)
date

In [None]:
vaccinations_df['year'] = date[0]
vaccinations_df['month'] = date[1]
vaccinations_df['day'] = date[2]

vaccinations_df.year = pd.to_numeric(vaccinations_df.year)
vaccinations_df.month = pd.to_numeric(vaccinations_df.month)
vaccinations_df.day = pd.to_numeric(vaccinations_df.day)

vaccinations_df.date = pd.to_datetime(vaccinations_df.date)

vaccinations_df.head()

In [None]:
vaccinations_df.info()

## Exploratory Analysis and Visualization




Let's begin by importing`matplotlib.pyplot` and `seaborn`.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Explore the mean, min, max

In [None]:
vaccinations_df.mean()

In [None]:
vaccinations_df.min()

In [None]:
vaccinations_df.max()

Explore the country Coloum

In [None]:
vaccinations_df.country.value_counts()

In [None]:
vaccinations_df.country

In [None]:
vaccinations_df.country.nunique()

Explore the min and max of fully vacnated people. 

In [None]:
vaccinations_df.people_fully_vaccinated.min()

In [None]:
vaccinations_df.people_fully_vaccinated.max()

Explore the min and max date.

In [None]:
vaccinations_df.date.min()

In [None]:
vaccinations_df.date.max()

Explore The Number of daily vaccinations dynamic

In [None]:
plt.figure(figsize=(16,8))
sns.lineplot(x=vaccinations_df.date, y=vaccinations_df.daily_vaccinations)
plt.title('The Number of daily vaccinations dynamic')
plt.show()

Explore the Vaccination procedure go on rapidly from which date.

In [None]:
countries = vaccinations_df.groupby('country')['total_vaccinations'].max().sort_values(ascending= False)[:5].index

top_countries = pd.DataFrame(columns= vaccinations_df.columns)
for country in countries:
    top_countries = top_countries.append(vaccinations_df.loc[vaccinations_df['country'] == country])

In [None]:
plt.figure(figsize=(20,8))
sns.lineplot(top_countries['date'], top_countries['daily_vaccinations_per_million'], hue= top_countries['country'], ci= False)
plt.title('Vaccination procedure go on rapidly');

## Asking and Answering Questions



#### Q1: Which country has most number of fully vaccinated people?

In [None]:
fully_vaccinated = vaccinations_df.groupby("country")["people_fully_vaccinated"].max().sort_values(ascending= False).head(25)

In [None]:
fully_vaccinated.reset_index()

In [None]:
plt.figure(figsize=(16,10))
ax = sns.barplot(x=fully_vaccinated, y=fully_vaccinated.index)
plt.xlabel("Fully Vaccinated")
plt.ylabel("Country");
plt.title('Which country has most number of fully vaccinated people?');

for patch in ax.patches:
    width = patch.get_width()
    height = patch.get_height()
    x = patch.get_x()
    y = patch.get_y()
    
    plt.text(width + x, height + y, '{:.1f} '.format(width))

#### Q2: Daily COVID-19 vaccine doses administered per million people.

In [None]:
daily_vaccinations_per_million = vaccinations_df.groupby("country")["daily_vaccinations_per_million"].max().sort_values(ascending= False).head(15)

In [None]:
daily_vaccinations_per_million.reset_index()

In [None]:
plt.figure(figsize=(12,8))
ax = sns.barplot(x=daily_vaccinations_per_million, y=daily_vaccinations_per_million.index )
plt.xlabel("daily vaccinations per million")
plt.ylabel("Country")
plt.title("Daily COVID-19 vaccine doses administered per million people");

for patch in ax.patches:
    width = patch.get_width()
    height = patch.get_height()
    x = patch.get_x()
    y = patch.get_y()
    
    plt.text(width + x, height + y, '{:.1f} '.format(width))

#### Q3: How many people daily vaccinated in Bangladesh?

In [None]:
bangladesh_df = vaccinations_df[vaccinations_df['country'] == 'Bangladesh']
bangladesh_df

In [None]:
bangladesh_df.info()

In [None]:
bangladesh_df.daily_vaccinations_raw.sum()

In [None]:
plt.figure(figsize=(20,10))
sns.lineplot(x=bangladesh_df.date, y=bangladesh_df.daily_vaccinations_raw)
plt.xlabel("Date")
plt.ylabel("Daily_Vaccination")
plt.title('How many people daily vaccinated in Bangladesh?');

#### Q4: How many people take at least one dose of vaccine in Bangladesh?

In [None]:
total_vaccinated_bd = bangladesh_df.total_vaccinations.max()/1000000

In [None]:
print("{0:.2f} M people take at least one dose of vaccine in Bangladesh.".format(total_vaccinated_bd))

#### Q5: How many people total fully vaccinated in Bangladesh?

In [None]:
fully_vaccinated_bd = bangladesh_df.people_fully_vaccinated.max()/1000000

In [None]:
print("Total fully vaccinated people in Bangladesh: {0:.2f}M".format(fully_vaccinated_bd))

#### Q6: What is the country that vaccinated completely most of the population?

In [None]:
population_country=vaccinations_df.groupby('country')['total_vaccinations_per_hundred'].max().sort_values(ascending=False).head(15)

In [None]:
population_country.reset_index()

In [None]:
plt.figure(figsize= (15, 8))
ax = sns.barplot(x=population_country, y=population_country.index)
plt.title('Total Vaccinations / Population')
plt.xlabel('Total Vaccinations')
plt.ylabel('Country')

for patch in ax.patches:
    width = patch.get_width()
    height = patch.get_height()
    x = patch.get_x()
    y = patch.get_y()
    
    plt.text(width + x, height + y, '{:.1f} %'.format(width))

## Conclusion

Here is the analysis of the covid-19 vaccinations data. In future we work more analysis on this data.

* Datasets link - https://www.kaggle.com/gpreda/covid-world-vaccination-progress