<a id="1.1"></a>
<h3 style="background-color:skyblue;font-family:newtimeroman;font-size:200%;text-align:center">How many people have received a coronavirus vaccine?</h3>

Tracking COVID-19 vaccination rates is crucial to understand the scale of protection against the virus, and how this is distributed across the global population.

A global, aggregated database on COVID-19 vaccination rates is essential to monitor progress, but it is unfortunately not yet available. 

Until such a database is made available, Our World in Data will be tracking recent announcements on the first countries to administer these vaccinations.

https://ourworldindata.org/covid-vaccinations

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objs as go
import plotly.offline as py
import plotly.express as px
from plotly.offline import iplot
import seaborn
import cv2 as cv

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
IMG_PATH = "../input/cusersmarilonedriveimagensvaccinationpng/covid-vaccination-doses-per-capita.png"

imgArray = cv.imread(IMG_PATH)

In [None]:
plt.imshow(imgArray)

plt.show()

In [None]:
nRowsRead = 1000 # specify 'None' if want to read whole file
df = pd.read_csv('../input/cusersmarildownloadsvaccinationcsv/vaccination.csv', delimiter=';', encoding = "utf8", nrows = nRowsRead)
df.dataframeName = 'vaccination.csv'
nRow, nCol = df.shape
print(f'There are {nRow} rows and {nCol} columns')
df.head()

In [None]:
convertedArray = cv.cvtColor(imgArray, cv.COLOR_BGR2RGB)

plt.subplots(figsize=(15,10))
plt.imshow(convertedArray);plt.show()

<div class="alert alert-block alert-success">
    The chart above show the number of COVID-19 vaccination doses administered per 100 people within a given population. Note that this does not measure the total number of people that have been vaccinated (which is usually two doses)
    https://ourworldindata.org/covid-vaccinations

</div>

In [None]:
df.isnull().sum()

#Codes by Shubham https://www.kaggle.com/shubham47/eda-with-seaborn-on-churn-dataset/notebook

In [None]:
ax = sns.catplot(y='total_vaccinations_per_hundred', kind='count', data=df, height=2.6, aspect=2.5,)

In [None]:
def barplot_percentages(feature, orient='v', axis_name="percentage of Total Vaccinations"):
    ratios = pd.DataFrame()
    g = df.groupby(feature)["total_vaccinations_per_hundred"].value_counts().to_frame()
    g = g.rename({"total_vaccinations_per_hundred": axis_name}, axis=1).reset_index()
    g[axis_name] = g[axis_name]/len(df)
    if orient == 'v':
        ax = sns.barplot(x=feature, y= axis_name, hue='total_vaccinations_per_hundred', data=g, orient=orient)
        ax.set_yticklabels(['{:,.0%}'.format(y) for y in ax.get_yticks()])
    else:
        ax = sns.barplot(x= axis_name, y=feature, hue='total_vaccinations_per_hundred', data=g, orient=orient)
        ax.set_xticklabels(['{:,.0%}'.format(x) for x in ax.get_xticks()])
    ax.plot()
    plt.xticks(rotation=45)
barplot_percentages("Date")

In [None]:
plt.figure(figsize=(9, 4.5))
barplot_percentages("Date", orient='h')

In [None]:
plt.figure(figsize=(9, 4.5))
barplot_percentages("Entity", orient='h')

In [None]:
df.columns.tolist()

In [None]:
from sklearn.ensemble import RandomForestClassifier

params = {'random_state': 0, 'n_jobs': 4, 'n_estimators': 5000, 'max_depth': 8}
# One-hot encode
df = pd.get_dummies(df)
# Drop redundant columns (for features with two unique values)
drop = ['Code_OWID_WRL']
x, y = df.drop(drop,axis=1), df['Date_13/12/2020']
# Fit RandomForest Classifier
clf = RandomForestClassifier(**params)
clf = clf.fit(x, y)
# Plot features importances
imp = pd.Series(data=clf.feature_importances_, index=x.columns).sort_values(ascending=False)
plt.figure(figsize=(10,12))
plt.title("Feature importance")
ax = sns.barplot(y=imp.index, x=imp.values, palette="Blues_d", orient='h')

In [None]:
#Code by Olga Belitskaya https://www.kaggle.com/olgabelitskaya/sequential-data/comments
from IPython.display import display,HTML
c1,c2,f1,f2,fs1,fs2=\
'#eb3434','#eb3446','Akronim','Smokum',30,15
def dhtml(string,fontcolor=c1,font=f1,fontsize=fs1):
    display(HTML("""<style>
    @import 'https://fonts.googleapis.com/css?family="""\
    +font+"""&effect=3d-float';</style>
    <h1 class='font-effect-3d-float' style='font-family:"""+\
    font+"""; color:"""+fontcolor+"""; font-size:"""+\
    str(fontsize)+"""px;'>%s</h1>"""%string))
    
    
dhtml('Covid the next TB? Marília Prata, @mpwolke was Here' )