# COVID-19 World Vaccination Progress
-----------------------------------------

Data is collected daily from [Our World in Data](https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv) GitHub repository for covid-19, merged and uploaded.



## Content
----------------

1. **`location`**: name of the country (or region within a country).
2. **`iso_code`**: ISO 3166-1 alpha-3 – three-letter country codes. 
3. **`date`**:  date of the observation.
4. **`total_vaccinations`**:  total number of doses administered. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses). If a person receives one dose of the vaccine, this metric goes up by 1. If they receive a second dose, it goes up by 1 again.
5. **`people_vaccinated`**:   total number of people who received at least one vaccine dose. If a person receives the first dose of a 2-dose vaccine, this metric goes up by 1. If they receive the second dose, the metric stays the same.
6. **`people_fully_vaccinated`**: total number of people who received all doses prescribed by the vaccination protocol. If a person receives the first dose of a 2-dose vaccine, this metric stays the same. If they receive the second dose, the metric goes up by 1
7. **`daily_vaccinations_raw`**: daily change in the total number of doses administered. It is only calculated for consecutive days. This is a raw measure provided for data checks and transparency, but we strongly recommend that any analysis on daily vaccination rates be conducted using `daily_vaccinations` instead.
8. **`daily_vaccinations`**:  new doses administered per day (7-day smoothed). For countries that don't report data on a daily basis, we assume that doses changed equally on a daily basis over any periods in which no data was reported. This produces a complete series of daily figures, which is then averaged over a rolling 7-day window.
9. **`total_vaccinations_per_hundred`**: `total_vaccinations` per 100 people in the total population of the country
10. **`people_vaccinated_per_hundred`**: `people_vaccinated` per 100 people in the total population of the country
11. **`people_fully_vaccinated_per_hundred`**: `people_fully_vaccinated` per 100 people in the total population of the country.
12. **`daily_vaccinations_per_million`**: `daily_vaccinations` per 1,000,000 people in the total population of the country.

## Dependencies
-------------------

In [2]:
import pandas as pd
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

import streamlit as st
import os
import pycountry

## Load the Data
----------------

In [68]:
df = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv')


## Exploratory Data Analysis
--------------------------

In [114]:
df.head()

Unnamed: 0,location,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,people_vaccinated_once
0,Afghanistan,AFG,2021-02-22,0.0,0.0,,,,0.0,0.0,,,
1,Afghanistan,AFG,2021-02-23,,,,,1367.0,,,,35.0,
2,Afghanistan,AFG,2021-02-24,,,,,1367.0,,,,35.0,
3,Afghanistan,AFG,2021-02-25,,,,,1367.0,,,,35.0,
4,Afghanistan,AFG,2021-02-26,,,,,1367.0,,,,35.0,


In [70]:
df.shape

(22866, 12)

Let's enginner our data by creating the feature of people only vaccinated once.

In [71]:
df['people_vaccinated_once'] = df['people_vaccinated']-df['people_fully_vaccinated']

Now, let's check the locations that we have!

In [72]:
df.location.unique().tolist()

['Afghanistan',
 'Africa',
 'Albania',
 'Algeria',
 'Andorra',
 'Angola',
 'Anguilla',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Aruba',
 'Asia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bermuda',
 'Bhutan',
 'Bolivia',
 'Bonaire Sint Eustatius and Saba',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'British Virgin Islands',
 'Brunei',
 'Bulgaria',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Cape Verde',
 'Cayman Islands',
 'Central African Republic',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo',
 'Cook Islands',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Curacao',
 'Cyprus',
 'Czechia',
 'Democratic Republic of Congo',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'England',
 'Equatorial Guinea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Europe',
 'European Union',
 'Faeroe Islands',
 'Falkland Islands'

We have some **locations** like countries, other like continents and some as even incomes. So we will separate them into four types of data:

* Those with all
* Those with countries
* Those with continents
* Those with incomes

In [73]:
all_vals = df.location.unique().tolist()
continents_vals = ['Africa','Asia','Europe', 'European Union','North America', 'Oceania','South America']
income_vals = ['High income','Low income', 'Lower middle income', 'Upper middle income']
world_val = ['World'] 
excluded_vals = continents_vals+income_vals+world_val
countries_vals = [value for value in all_vals if value not in excluded_vals]

In [74]:
df_continents = df[df.location.isin(continents_vals)]
df_income = df[df.location.isin(income_vals)]
df_world = df[df.location.isin(world_val)]
df_countries = df[df.location.isin(countries_vals)]

Now, we will insert the ISO-3 code for the countries so we can later build a map figure.

In [115]:
countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_3

In [116]:
countries

{'Aruba': 'ABW',
 'Afghanistan': 'AFG',
 'Angola': 'AGO',
 'Anguilla': 'AIA',
 'Åland Islands': 'ALA',
 'Albania': 'ALB',
 'Andorra': 'AND',
 'United Arab Emirates': 'ARE',
 'Argentina': 'ARG',
 'Armenia': 'ARM',
 'American Samoa': 'ASM',
 'Antarctica': 'ATA',
 'French Southern Territories': 'ATF',
 'Antigua and Barbuda': 'ATG',
 'Australia': 'AUS',
 'Austria': 'AUT',
 'Azerbaijan': 'AZE',
 'Burundi': 'BDI',
 'Belgium': 'BEL',
 'Benin': 'BEN',
 'Bonaire, Sint Eustatius and Saba': 'BES',
 'Burkina Faso': 'BFA',
 'Bangladesh': 'BGD',
 'Bulgaria': 'BGR',
 'Bahrain': 'BHR',
 'Bahamas': 'BHS',
 'Bosnia and Herzegovina': 'BIH',
 'Saint Barthélemy': 'BLM',
 'Belarus': 'BLR',
 'Belize': 'BLZ',
 'Bermuda': 'BMU',
 'Bolivia, Plurinational State of': 'BOL',
 'Brazil': 'BRA',
 'Barbados': 'BRB',
 'Brunei Darussalam': 'BRN',
 'Bhutan': 'BTN',
 'Bouvet Island': 'BVT',
 'Botswana': 'BWA',
 'Central African Republic': 'CAF',
 'Canada': 'CAN',
 'Cocos (Keeling) Islands': 'CCK',
 'Switzerland': 'CHE',
 

Now let's create the **`iso_code`** feature

In [117]:
path = os.getcwd().replace("notebooks","")
df_countries.loc[:,"iso_code"] = df_countries["location"].map(countries)
df_countries

Unnamed: 0,location,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,people_vaccinated_once
0,Afghanistan,AFG,2021-02-22,0.0,0.0,,,,0.00,0.00,,,
1,Afghanistan,AFG,2021-02-23,,,,,1367.0,,,,35.0,
2,Afghanistan,AFG,2021-02-24,,,,,1367.0,,,,35.0,
3,Afghanistan,AFG,2021-02-25,,,,,1367.0,,,,35.0,
4,Afghanistan,AFG,2021-02-26,,,,,1367.0,,,,35.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
22861,Zimbabwe,ZWE,2021-05-24,914921.0,633635.0,281286.0,3888.0,14380.0,6.16,4.26,1.89,968.0,352349.0
22862,Zimbabwe,ZWE,2021-05-25,927990.0,639553.0,288437.0,13069.0,13719.0,6.24,4.30,1.94,923.0,351116.0
22863,Zimbabwe,ZWE,2021-05-26,937040.0,643531.0,293509.0,9050.0,13194.0,6.30,4.33,1.97,888.0,350022.0
22864,Zimbabwe,ZWE,2021-05-27,953389.0,648121.0,305268.0,16349.0,12285.0,6.41,4.36,2.05,827.0,342853.0


If we check the new feature (**`iso_code`**), we will see there is some NaN values. This is because some countries'names  is slightely different from those used in the *pycountry* library. So we will fix that. First let's eliminate the countries from united kingdom, because is the only ISO that we have, and as we already have the info about United Kingdom, let's keep it.

In [118]:
df_countries = df_countries[df_countries.location != 'England']
df_countries = df_countries[df_countries.location != 'Northern Ireland']
df_countries = df_countries[df_countries.location != 'Scotland']

In [119]:
df_countries.iso_code.isnull().sum()

1957

In [120]:
df_countries

Unnamed: 0,location,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,people_vaccinated_once
0,Afghanistan,AFG,2021-02-22,0.0,0.0,,,,0.00,0.00,,,
1,Afghanistan,AFG,2021-02-23,,,,,1367.0,,,,35.0,
2,Afghanistan,AFG,2021-02-24,,,,,1367.0,,,,35.0,
3,Afghanistan,AFG,2021-02-25,,,,,1367.0,,,,35.0,
4,Afghanistan,AFG,2021-02-26,,,,,1367.0,,,,35.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
22861,Zimbabwe,ZWE,2021-05-24,914921.0,633635.0,281286.0,3888.0,14380.0,6.16,4.26,1.89,968.0,352349.0
22862,Zimbabwe,ZWE,2021-05-25,927990.0,639553.0,288437.0,13069.0,13719.0,6.24,4.30,1.94,923.0,351116.0
22863,Zimbabwe,ZWE,2021-05-26,937040.0,643531.0,293509.0,9050.0,13194.0,6.30,4.33,1.97,888.0,350022.0
22864,Zimbabwe,ZWE,2021-05-27,953389.0,648121.0,305268.0,16349.0,12285.0,6.41,4.36,2.05,827.0,342853.0


Now we will create a dictionary where the keys are country names, and the values are the ISOs

In [121]:
truth_vals = {'Bolivia':'BQL', 'Bonaire Sint Eustatius and Saba':'BES',
             'British Virgin Islands':'VGB', 'Brunei':'BRN', 'Cape Verde':'CPV',
              'Curacao':'CUW', 'Faeroe Islands':'FRO','Falkland Islands':'FLK','Iran':'IRN',
              'Laos':'LAO','Moldova':'MDA','Northern Cyprus':'CYP',
              'Palestine':'PSE','Russia':'RUS','Saint Helena':'SHN','South Korea':'KOR',
              'Timor':'TLS','Vietnam':'VNM','Wales':'WLF',"Cote d'Ivoire":'CIV',
       'Democratic Republic of Congo':'COD', 'Kosovo':'XKX', 'Syria':'SYR', 'Taiwan':'TWN',
       'Venezuela':'VEN'
             }


Finally let's fullfill our **`iso_code`**.

In [122]:
for key, val in truth_vals.items():
    inds = df_countries[df_countries.location==key].index
    df_countries.loc[inds,'iso_code'] = val

In [123]:
df_countries.iso_code.isnull().sum()

0

Cool!! Now, let's plot some information.

## Vaccination Map
--------------------------

In [124]:
df_c = df_countries.dropna(subset=['people_fully_vaccinated'])

In [125]:
fig = px.scatter_geo(df_c, locations="iso_code", color="location", size="people_fully_vaccinated",
                animation_frame='date',color_discrete_sequence=px.colors.qualitative.Dark24,
                projection="natural earth")
fig.update_geos(
    showcountries=True, countrycolor="Black",
    showocean=True, oceancolor="LightBlue",
    showland=True, landcolor="LightGreen",
)
fig.show()

## Higher Vaccinated Countries Chart
---------------------------------------------------

Group the dataframe of countries by contries.

In [126]:
df_loc = df_countries.groupby(['location']).max()

Now, let's sort the values by the **`people_fully_vaccinated`**, and then choose this column.

In [128]:
fully_vaccinated = df_loc.sort_values(by='people_fully_vaccinated', ascending=False).people_fully_vaccinated.dropna().to_frame()
    

Now let's plot it!

In [129]:
 graph = px.pie(
    fully_vaccinated, fully_vaccinated.iloc[:5,:].index,fully_vaccinated.iloc[:5,:].people_fully_vaccinated,
    labels={'label':'Country','value':'People'}
 )
graph.update_layout(
   title="5 highest countries with people fully vaccinated"
)
graph.show()

## Vaccinated per Continents
--------------------------------------

Now let's see the vaccinations per continent.

In [130]:
df_conts = df_continents.groupby(['location']).max()

Now we will sort by **`people_fully_vaccinated`**.

In [131]:
df_conts = df_conts.sort_values(by='people_fully_vaccinated', ascending=False).people_fully_vaccinated.dropna().to_frame()

Now, let's plot it!

In [134]:
plot = px.bar(df_conts, df_conts.index, 'people_fully_vaccinated', height=700)
plot.update_layout(
    xaxis={'title':'Continents'},
    yaxis={'title':'Fully Vaccinated'},
    title=f"People of 2-Dose vaccinated per continents"
)

Finally, let's see the time evolution of the vaccination per country.

## Vaccination Time Evolution
--------------------------------------

Let's define some countries

In [135]:
countries = ['Brazil','United States','United Kingdom','India','China']

In [139]:
fig = go.Figure()

for i in range(len(countries)):

    df_country = df_countries[df_countries.location==countries[i]].dropna()

    fig.add_trace(
        go.Scatter(
            x=df_country.date,
            y=df_country.people_fully_vaccinated,
            mode='lines+markers',
            name=countries[i]
        )
    )

fig.update_layout(
    xaxis={'title':'Date'},
    yaxis={'title':'Countries'},
    title=f"People of 2-Dose vaccinated per date"
)
fig.show()