# Covid 19 Vaccination Analysis in the World and in Portugal

It is time to be an optimist about the future. The vaccine has arrived, and there is a massive urge to fight CoVid 19.

Before starting... Hi everyone! I'm João, from Portugal and this is my first project. I started one month ago to learn python & data science. I'm still learning the basics, but this is fun! 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
import datetime

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Analysing the Data

For this analysis, I only used one file. It is essential to understand that the column "people_fully_vaccinated_per_hundred" has missing values because it takes around one month to have the second vaccination and be fully vaccinated.

It is also crucial to understand that the data has only its first days. It needs more time to answer more detailed questions. Many countries have missing values.

I also add the EU countries manually, and it's population. I'm from Portugal, and I would like to analyse better the situation in Portugal. 

In [None]:
data = pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")

europe = ["France", "Spain", "Germany", "Scotland", "Croatia", "Italy", "Denmark", "Slovenia", "Iceland", "Portugal", "Northern Ireland", "Lithuania", "Romania", "Poland", "Ireland", "Finland", "Sweden"]
pop_eu = [67, 47, 83, 5.5, 4, 60.3, 5.8, 2, 0.3, 11, 1.8, 2.7, 19.4, 38, 4.9, 5.5, 10.2] 

data.info()
data.shape

In [None]:
data.head()

## World
Before starting to see numbers in Europe, let's give a quick global observation.

In [None]:
unique_country = data.drop_duplicates(subset="country")
global_vac = unique_country.groupby("country")["total_vaccinations"].max().sort_values(ascending=False)

global_vac.plot(kind="bar", rot=90, figsize=(20, 5), color="salmon")
plt.title("Total vaccinations in the world")
plt.ylabel("Total")
plt.yticks([0, 200000, 400000, 600000, 800000, 1000000, 1200000, 1400000], ["0", "0.2B", "0.4B", "0.6B", "0.8B", "1B", "1.2B", "1.4B"])

plt.show()

In [None]:
total_vac_hundred = unique_country.groupby("country")["total_vaccinations_per_hundred"].max().sort_values(ascending=False)

total_vac_hundred.plot(kind="bar", rot=90, figsize=(20, 5), color="salmon")
plt.title("Total vaccinations per hundred in the world")
plt.ylabel("Total")

plt.show()

## Europe
Now a quick observation to Europe

In [None]:
global_vac[europe].plot(kind="bar", rot=45, figsize=(10, 5), color="salmon")

plt.title("Total vaccinations in Europe")
plt.ylabel("Total")

plt.show()

In [None]:
total_vac_hundred[europe].plot(kind="bar", rot=45, figsize=(20, 5), color="salmon")

plt.title("Total vaccinations per hundred in Europe")
plt.ylabel("Total")

plt.show()

### Compare country size vs people fully vaccinated and total vaccinations
Does the size of the country matters? 

Countries with a larger number of the population usually have more resources. Does that mean that they have a better response to the CoVid19? Or do they struggle more than smaller countries?

This analysis would be fundamental to incorporate each country's GDP and discover if it is related to vaccinating response. (Next time!)

In [None]:
col_eu = []

np_pop = np.array(pop_eu)
np_pop = np_pop * 20

for pop_size in pop_eu:
  if pop_size < 5:
    col_eu.append("darkkhaki")
  elif pop_size < 10:
    col_eu.append("goldenrod")
  elif pop_size < 15:
    col_eu.append("darkorange")
  elif pop_size < 20:
    col_eu.append("orangered")
  else:
    col_eu.append("darkred")
    
data_euro = data[data["country"].isin(europe)]

pop_full = data_euro.groupby("country")["people_fully_vaccinated"].max()
total_vac = data_euro.groupby("country")["total_vaccinations"].max()

fig, ax = plt.subplots(figsize=(20, 10))

ax.scatter(total_vac, pop_full, s=np_pop, alpha = 0.8, c=col_eu)

ax.set_ylabel("People fully vaccinated")
ax.set_xlabel("Total vaccinations")
ax.set_title("Compare country size vs people fully vaccinated and total vaccinations")
plt.xticks([0, 500000, 1000000, 1500000, 2000000, 2500000, 3000000, 3500000], ["0", "0.5B", "1", "1.5B", "2B", "2.5B", "3B", "3.5B"])
plt.yticks([0, 200000, 400000, 600000, 800000, 1000000, 1200000], ["0", "0.2B", "0.4B", "0.6B", "0.8B", "1B", "1.2B"])


for x_pos, y_pos, label in zip(total_vac, pop_full, europe):
    ax.annotate(label,
                xy=(x_pos, y_pos),
                xytext=(15, 0),
                textcoords='offset points',
                ha='left',
                va='center')


plt.show()

## Portugal

Portugal, my country :)

In [None]:
portugal = data[data["country"] == "Portugal"]

fig, ax = plt.subplots(figsize=(20, 8))

ax.bar(portugal["date"], portugal["total_vaccinations"], color="salmon")
ax.bar(portugal["date"], portugal["people_fully_vaccinated"], bottom=portugal["total_vaccinations"], color="gold")

ax.set_xticklabels(portugal["date"], rotation=45)
ax.set_title("Total vaccinations & total fully vaccinated in Portugal")

plt.show()

### Portugal Vaccination Plan
Portugal is aiming to achieve group immunity until the end of the summer. In this case, Portugal will need to vaccinate 70% of the total population.

In this graph, I draw a line to understand if Portugal is achieving his primary goal. I know that it shouldn't be a straight line, but I couldn't find the correct data from Portugal's vaccination plan. 

In [None]:
plan_pt = {
    "date": ["2021-02-11", "2021-09-01"],
    "total": [550000, 7700000]
}

previsoes = pd.DataFrame.from_dict(plan_pt)

portugal["date"] = portugal["date"].astype("datetime64[ns]")
previsoes["date"] = previsoes["date"].astype("datetime64[ns]")

fig, ax = plt.subplots(figsize=(20, 8))

ax.bar(portugal["date"], portugal["total_vaccinations"], color="brown")
ax.plot(previsoes["date"], previsoes["total"], color="red")

plt.show()