<a href="https://www.kaggle.com/code/danielebaldoni/italian-vaccination?scriptVersionId=97568923" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<img src="https://upload.wikimedia.org/wikipedia/commons/0/03/Flag_of_Italy.svg"
     style="width: 60%; margin: auto; padding-bottom: 20px;">

![ ](https://sif-website.s3.amazonaws.com/uploads/magazine_item/image/210/vaccino_anti_covid.jpg)![]()

<h1 style='background:#26A2AB; border:0; color:black'><center>Introduction</center></h1> 

The data contains the following information:  
* **administration_date** - date for the data entry; for some of the dates we have only the daily vaccinations for a specific region and a specific range of age for which the vaccination information is provided;  

* **supplier** - The name of the supplier for the Vaccine
* **region** - Acronyms of Italian regions
* **age_range** - Age groups
* **males** - Number of vaccinated males
* **females** - Number of vaccinated females
* **first_dose** - Number of administered first doses
* **second_dose** - Number of administered second doses
* **previous infection** - Number of administrated doses to subjects previously infected with COVID-19
* **additional_booster_dose** - Number of administrated additional doses or boosters
* **NUTS1_code** - Stands for Nomenclature of Territorial Units for Statistics, is the codes for  5 Italian areas
* **NUTS2_code** - Same like Nuts_1 but it can differ for 21 regions instead of Geografic Areas 
* **ISTAT_regional_code** - short numeric code to every administrative division
* **region_name** - The entire name of the region  

To see the Nuts code you can read here:
https://en.wikipedia.org/wiki/NUTS_statistical_regions_of_Italy

#<a id="0"></a>

<a class="anchor" id="0.1"></a>
### Content  

* <a href='#1'>Analysis preparation</a>
* <a href='#2'>How many are vaccinated for Gender?</a>
* <a href='#3'>How many are vaccinated for Class of Age?</a>
* <a href='#4'>How many are vaccinated for Supplier?</a>
* <a href='#5'>How many are vaccinated for Region?</a>
* <a href='#6'>What vaccines are used in each region?</a> 
* <a href='#7'>How many are vaccinated (total and as percent from population)?</a>    
* <a href='#8'>General trends</a>  


### Last updated


Analysis Prepartion <a class="anchor" id="1" ></a>
===
                                                                                            

In [None]:
data=pd.read_csv("/kaggle/input/d/arthurio/italian-vaccination/italian_vaccination.csv")
data.head()

In [None]:
data

In [None]:
data.shape

As we see in the cell below the number of rows for the dataset is very high(more than 150k) and 14 columns. 
That's because for every day the data are divided for class of age and region. Just to make an example we have the first 146 rows related to the single day of 2020-12-27.

We should make some grouping operation if we want to understand more the different aspect 

In [None]:
data.nunique()

We have 5 different suppliers, 21 different regions (20 + the autonomous province of Trento and Bolzano) and 10 different range of Age so we have a lot of different combinations of data (at the same time  we have to remember that we don't have data for every  single day for the different combination of this features that's why a line plot could not represent uniformly the trend of the vaccines). 

That's why some vaccines are retired and other vaccines are introduced in a second moment.

Just to make an example we have 3 different number of rows related to the 3 different dates in the cell below

In [None]:
data.columns

In [None]:
data.columns

In [None]:
#the first date in december 2021 has 147 rows, the second in march 2021 is of 461 and the last in january 2022 is 407
data.administration_date.unique()
data[data.administration_date=='2020-12-27'].count()[1],data[data.administration_date=='2021-03-15'].count()[1],data[data.administration_date=='2022-01-05'].count()[1],data[data.administration_date=="2022-06-03"].count()[1]

In [None]:
data.region.value_counts()

at the same time we have to consider the frequence of data for every region is not the same. Lazio was one of the faster region in this campaign and one of the most populated, that's why we have the highest frequency of data for this region

In [None]:
#Number of rows for different vaccines
data.supplier.value_counts()

The sum of Males and Females should be equal to: 
first doses + second doses + previous infection + additional dose + booster dose

In [None]:
data.info()

We have both categorical and numerical features and we can make different kind of plots depending on the combinations of thi type of features.

The Age is grouped in class it's still categorical but we could make a transformation to treat it as numeric.

We can make some kind of grouping on the various modality of the categorical features.

How many are vaccinated for Gender? <a class="anchor" id="2"></a>
===


First of all we import the 2 libraries most used for graphical options. 
*Matplotlib* and *Seaborn*

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

The total of vaccination for gender is  of 64596637 vaccination for women and 60895478 for men as we can see below

In [None]:
data.females.sum(),data.males.sum()

In [None]:
total_vaccinations=data.males.sum()+data.females.sum()
total_vaccinations

Let's organize this data in a tabular way

In [None]:
print("Number of Vaccination for Gender")
tot=np.array([data.males.sum(),data.females.sum()])
tot=pd.DataFrame(tot,index=["Males","Females"])
tot.columns=["Total"]
tot.gender=["Males","Females"]
tot

The difference is minimum.

In [None]:
plt.title("Vaccinations for Gender")
#plt.bar(tot.index,list(tot["Total"].value_counts()),color=["b","pink"])
plt.bar(tot.index,list(tot["Total"]),color= ["b","pink"])
#plt.bar(list(tot["Total"].value_counts()[0:1].keys()),list(tot["Total"].value_counts()[0:1]))
#plt.bar(list(fifa['nationality'].value_counts()[0:5].keys()),list(fifa['nationality'].value_counts()[0:5]),color="g")
plt.show()       



Of course we have to weight this data with the real  values for the distribution of the population in Italy.
We found in this site this values of 28864088 for Male population and 30393478 for Female population who is registered at the moment in Italy. 

You can find this data at this site:
28864088
https://www.statista.com/statistics/786485/population-by-gender-in-italy/

In [None]:
tot["Registered"]=[28864088,30393478]
tot["Prop"]=tot.Total/tot.Registered
tot

In [None]:
plt.title("Mean number of Vaccinations for Gender")
plt.bar(tot.index,list(tot["Prop"]),color=["b","pink"])
#plt.bar(list(tot["Total"].value_counts()[0:1].keys()),list(tot["Total"].value_counts()[0:1]))
#plt.bar(list(fifa['nationality'].value_counts()[0:5].keys()),list(fifa['nationality'].value_counts()[0:5]),color="g")
plt.show()

In the 2 tables below we observe the number of the complexive vaccination for gender in the different regions

In [None]:
print("Number of Vaccination for Region on male population")
pd.DataFrame(data.groupby("region_name")["males"].sum().sort_values(ascending =False))

In [None]:
print("Number of Vaccination for Region on female population")
pd.DataFrame(data.groupby("region_name")["females"].sum().sort_values(ascending = False))

This table represent the number of different kind of vaccination for day. As we can see in the beginning we had just 2 kind of vaccinations ( first doses and doses for previous infection).
While many people has already vaccinated in this phase the most of vaccines are additional booster for people who already get 2 doses but the number of first and second doses tells us the number of people withouth any vaccination is decreasing by few tens of thousands for day

Data Grouped by day
----

In [None]:
data.columns

In [None]:
data.groupby("administration_date")['first_dose', 'second_dose', 'previous_infection',
       'additional_booster_dose', 'booster_immuno', 'second_booster'].sum()

In [None]:
data

In [None]:
data["dailytotal"]=data.males+data.females
data


In [None]:
#data["dailytotal"]=data.groupby("data")["m"].sum()+data.groupby("data")["f"].sum()
dictionary={"males":data.groupby("administration_date")["males"].sum(),"females":data.groupby("administration_date")["females"].sum()}
daily_gender=pd.DataFrame(dictionary)
daily_gender["Total"]=daily_gender.males+daily_gender.females
daily_gender

How many are vaccinated for Class of Age <a class='anchor' id= 3></a>
===


In [None]:
data.age_range

In [None]:
print("Number of Vaccination for Age in class for the male population")
data.groupby("age_range")["males"].sum().sort_values(ascending = False)

In [None]:
print("Number of Vaccination for Age in class for the female population")
data.groupby("age_range")["females"].sum().sort_values()

In [None]:
total_vaccination=data.males.sum()+data.females.sum()
plt.title("Distribution of the total vaccinations for Age")
age_vac=pd.DataFrame(data.groupby("age_range")["dailytotal"].sum().sort_values()/total_vaccinations)
age_vac.columns=["Total"]
plt.bar(age_vac.index,list(age_vac["Total"]))
plt.show()

In [None]:
print(((data.groupby("age_range")["dailytotal"].sum().sort_values()/total_vaccinations)*100).round(2))

In [None]:
data.males.sum()

In [None]:
plt.title("Distribution of Vaccine for Age on Males")
age_m=pd.DataFrame(data.groupby("age_range")["males"].sum().sort_values()/data.males.sum())
#(data.groupby("age_range")["dailytotal"].sum().sort_values()/60627130)
age_m.columns=["Total"]
plt.bar(age_m.index,list(age_m["Total"]))
plt.show()

In [None]:
((data.groupby("age_range")["males"].sum().sort_values()/data.males.sum())*100).round(2).sort_values(ascending=False)


In [None]:
plt.title("Distribution of Vaccine for Age on Females")
age_f= pd.DataFrame(data.groupby("age_range")["females"].sum().sort_values()/data.females.sum())
age_f.columns=["Total"]
plt.bar(age_f.index,list(age_f["Total"]))
plt.show()

We can found some differences in percentage but to make conclusions we should know the demographic structure of the population. 

For example we can see that the percentage of old people (90+) vaccinated in the female population is double than the male proportion but it could be due to the fact that women as in mean an higher life expectation.

In general we can't see a particular difference for the age structure of population in the diffrent class of age. 


In [None]:
data.columns

How many are vaccinated for Supplier? <a class="anchor" id="4"></a>
===


In [None]:
male=pd.DataFrame(data.groupby("supplier")["males"].sum().sort_values(ascending = False)/data.males.sum())
female=pd.DataFrame(data.groupby("supplier")["females"].sum().sort_values(ascending = False)/data.females.sum())

In [None]:
male

In [None]:
female

We could note 2 facts:
- Women seems to prefer a little more pfizer than the men.
- Janssen seems to be more diffused in the men population

The differences aniway are not so much strong, 


In [None]:
data.groupby("supplier")["males"].sum(),data.groupby("supplier")["females"].sum()

In [None]:
plt.figure(figsize=(10,7))
plt.title ("Suppliers for Men")
sns.barplot(male.index,male.males)
plt.xticks(rotation=30)
plt.show()

In [None]:
plt.figure(figsize=(9,7))
plt.title ("Suppliers for Women")
sns.barplot(female.index,female.females)
plt.xticks(rotation=30)
plt.show()

In [None]:
data.supplier.value_counts()

In [None]:
supplier_tot=pd.DataFrame(data.groupby("supplier").sum())
supplier_tot

In [None]:
plt.figure(figsize=(8,5))
plt.title ("Suppliers overall")
sns.barplot(supplier_tot.index,supplier_tot.dailytotal)
plt.xticks(rotation=45)
plt.show()

In [None]:
male

In [None]:
massi=male
massi["gender"]="male"
massi.columns=["tot","gender"]
#massi["tot"]=massi["tot"]*60895478
massi


In [None]:
female["gender"]="female"
female.columns=["tot","gender"]
female


In [None]:
data.groupby("supplier")["males"].sum()

In [None]:
#male.index.sort_values()
supp_man=pd.Series(data.groupby("supplier")["males"].sum())
supp_women=pd.Series(data.groupby("supplier")["females"].sum())
supp_women
supplier_gender={"man":supp_man,"woman":supp_women}
pd.DataFrame(supplier_gender,index=male.index.sort_values())

In [None]:

massi.tot=data.groupby("supplier")["males"].sum().sort_values(ascending = False)
female.tot=data.groupby("supplier")["females"].sum().sort_values(ascending = False)
genderdf=[male,female]
genderdf=pd.concat(genderdf)
genderdf["supplier"]=genderdf.index
genderdf


In [None]:
plt.figure(figsize=(10,5))
ax=sns.barplot(x="supplier", y= "tot",hue="gender", data =genderdf)
plt.xticks(rotation=30)


How many are vaccinated for Region? <a class="anchor" id="5"></a>
===

In [None]:
regions=pd.read_csv("/kaggle/input/italian-regions/ita_reg_ann_data.csv")
regions

In [None]:
#Let's change ordet to can compare the two datasets
regions=regions.sort_values(by="den_reg")
regions



In [None]:
regions1=regions.copy()
#regions1.append(regions[5:6])
regions1

In [None]:
regions1.den_reg[5]="Friuli-Venezia Giulia"
regions1=regions1.drop(index=(3))
regions1

In [None]:
data['date'] = pd.to_datetime(data['administration_date'])
data

In [None]:
data.info()

In [None]:
data.head()

In [None]:
reg_name=pd.DataFrame(data.groupby("region_name")[["dailytotal","first_dose","second_dose","previous_infection","additional_booster_dose","booster_immuno","second_booster","males","females"]].sum())
#reg_name["pop_resid"]=regions["pop_resid"]
#reg_name["propTV"]=reg_name["TotVaccine"]/reg_name["pop_resid"]
reg_name1=reg_name.drop(index=("Provincia Autonoma Bolzano / Bozen"))
reg_name1=reg_name1.drop(index=("Provincia Autonoma Trento"))
reg_name1

In [None]:
a=pd.DataFrame(data.groupby("region_name")["previous_infection"].sum())
a["first_dose"]=data.groupby("region_name")["first_dose"].sum()
a["second_dose"]=data.groupby("region_name")["second_dose"].sum()
a["booster_dose1"]=data.groupby("region_name")["additional_booster_dose"].sum()
a["booster_dose_inf"]=data.groupby("region_name")["booster_immuno"].sum()
a["booster_dose2"]=data.groupby("region_name")["second_booster"].sum()
a

In [None]:
a=regions1.pop_resid
a.index=reg_name1.index
reg_name1["pop_resid"]=a
reg_name1
reg_name1["propTV"]=reg_name1["dailytotal"]/reg_name1["pop_resid"]#numero di vaccini a persona
reg_name1

In [None]:
reg_name1.sort_values(by=("propTV"),ascending = False).head()

In [None]:
reg_name1.sort_values(by=("propTV"),ascending = False).tail()

In [None]:
T_A_A=reg_name[12:14]
reg_name[12:14]

In [None]:
T_A_A.iloc[0,0]

In [None]:
#For Trentino Alto Adige region we have a value who one of the lowest of our distribution.
#with 2.02 is higher just than Sicilia,Calabria,Sardegna and Campania who are all south regions.
vaccination_TAA= T_A_A.iloc[0,0]+T_A_A.iloc[1,0]
propTV_TAA=vaccination_TAA/1072276
propTV_TAA

we can analize the 2 comunities singularly. We can find online the population for 

Bolzano province = 536838

Trento Province= 542235

who is more or less the same of our dataset (1079073)

https://it.wikipedia.org/wiki/Trentino-Alto_Adige


In [None]:
propBA=T_A_A.iloc[0,0]/536838
print(f"Provincia di Bolzano: {propBA}")
propTR=T_A_A.iloc[1,0]/542235
print(f"Provincia di Trento: {propTR}")

The proportion is really different between the 2 province, we have 1,95 who is the second lowest value ( higher just than Sicilia) for the autonomous province of Bolzano  and 2,13 for the autonomous province of Trento who is near the best values of the other regions. It is curious if we consider that this regions are really near geographically. 

In [None]:
data.first_dose.sum(),data.second_dose.sum(),data.additional_booster_dose.sum(),data.previous_infection.sum(),data.booster_immuno.sum(),data.second_booster.sum()


This is interesting because we can note that some region has an higher percentage of people with previous infection than people vaccinated with the classic cicle of vacccination and viceversa. 

In [None]:
data.columns

In [None]:
percentages=pd.DataFrame(data.groupby("region_name")["previous_infection"].sum()/data["previous_infection"].sum())*100
percentages["d1"]=(data.groupby("region_name")["first_dose"].sum()/data.first_dose.sum())*100
percentages["d2"]=(data.groupby("region_name")["second_dose"].sum()/data.second_dose.sum())*100
percentages["db1"]=(data.groupby("region_name")["additional_booster_dose"].sum()/data.additional_booster_dose.sum())*100
percentages["dbi"]=(data.groupby("region_name")["booster_immuno"].sum()/data.booster_immuno.sum())*100
percentages["db2"]=(data.groupby("region_name")["second_booster"].sum()/data.second_booster.sum())*100
percentages

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(16,7))
sns.barplot(data["region_name"],data["previous_infection"])
plt.xticks(rotation=90)
plt.show()

In [None]:
plt.figure(figsize=(16,7))
sns.barplot(data["region_name"],data["first_dose"])
plt.xticks(rotation=60)
plt.show()

In [None]:
reg_name1["1stProp"]=reg_name1["first_dose"]/reg_name1["pop_resid"]
reg_name1.sort_values(by=("1stProp"),ascending = False).head()

In [None]:
reg_name1["2ndProp"]=reg_name1["second_dose"]/reg_name1["pop_resid"]
reg_name1.sort_values(by=("2ndProp"),ascending = False).head()

In [None]:
reg_name1["boostProp"]=reg_name1["additional_booster_dose"]/reg_name1["pop_resid"]
reg_name1.sort_values(by=("boostProp"),ascending = False).head()

In [None]:
reg_name1["boostProp2"]=reg_name1["second_booster"]/reg_name1["pop_resid"]
reg_name1.sort_values(by=("boostProp2"),ascending = False).head()

In [None]:
reg_name1["dpiProp"]=reg_name1["previous_infection"]/reg_name1["pop_resid"]
reg_name1.sort_values(by=("boostProp"),ascending = False).head()

In [None]:
a=reg_name1.sort_values(by=("propTV"),ascending = False)
plt.figure(figsize=(16,7))
fig=sns.barplot(a["propTV"],a.index)
plt.title("Proportions of Total Vaccination for Regions")
plt.show()
print("The region with the highest proportion of total vaccinations is:", a.index[0]) 

a=reg_name1.sort_values(by=("1stProp"),ascending = False)
plt.figure(figsize=(16,7))
sns.barplot(a["1stProp"],a.index)
plt.title("Proportions of first doses for Regions")
plt.show()
print("The region with the highest proportion of first doses is:",a.index[0])

a=reg_name1.sort_values(by=("2ndProp"),ascending = False)
plt.figure(figsize=(16,7))
sns.barplot(a["2ndProp"],a.index)
plt.title("Proportions of second doses for Regions")
plt.show()
print("The region with the highest proportion of second doses is:", a.index[0])

a=reg_name1.sort_values(by=("boostProp"),ascending = False)
plt.figure(figsize=(16,7))
sns.barplot(a["boostProp"],a.index)
plt.title("Proportions of booster doses for Regions")
plt.show()
print("The region with the highest proportion of booster doses is:",a.index[0])


a=reg_name1.sort_values(by=("boostProp2"),ascending = False)
plt.figure(figsize=(16,7))
sns.barplot(a["boostProp2"],a.index)
plt.title("Proportions of 2nd booster doses for Regions")
plt.show()
print("The region with the highest proportion of booster doses is:",a.index[0])

In [None]:
plt.figure(figsize=(16,7))
plt.title("Booster Vaccinated for Region")
sns.barplot(data["region_name"],data["additional_booster_dose"])

plt.xticks(rotation=80)
plt.show()

General Trends <a class="anchor" id="8"></a>
===

Data Grouped by day
===
To make an analysis of trends we need to make some grouping operations on the Dataset. 
Here we can group the data by day

In [None]:
data.columns

In [None]:
data.groupby('administration_date')['first_dose', 'second_dose', 'previous_infection',
       'additional_booster_dose', 'booster_immuno', 'second_booster'].sum()

We can note that Astrazeneca was the most used in the first period of vaccination while Pfizer was the most diffused during the last year excepting for the last weeks because of the difficult to find other doses of Pfizer.

We can note the presence in the last part of the serie of the new kind of vaccine "Pfizer Pediatrico" who is for young kids. 
Remember that the govern authorized the vaccine for young children just in the last weeks. That's why we find this kind of vaccine just on the right side of the graph.

Due to the recent authorization of vaccines for children we still not have data for the second dose.

This is because of the lack of doses of Pfizer, most people could only have a booster dose of Moderna

Focusing on the last period we can see how it's more common to find booster of moderna Vaccines. 

In [None]:
df = data[['date', 'supplier', 'region', 'age_range', 'males',
       'females', 'first_dose', 'second_dose', 'previous_infection',
       'additional_booster_dose', 'booster_immuno', 'second_booster' ,'region_name']].copy()
df.head()

In [None]:
df["TotVaccine"]=data["males"]+data["females"]
df

Let's see how the situation of the total of daily vaccines has evolved during the time

The campaign begun in january of 2021. We have 2 peakse. The first one is after the introduction of the green pass in July.In this phase the most of the people vaccineted was old people and the part of population who accepted withouth particularly problems this decision. The second peak is in january with the introduction of vaccines for young children simoultaneosly with the new introduction of reinforced green pass. With this law many people who were not vaccinated choose to have the first dose who allowed them to work. In this phase we have a wider distribution of people vaccinated including the ones who were skeptic at the beginning and young people as well.

The goal of the italian government was to get the 80% of the total of population vaccinated or atleast the 90% for the old people and citizen with high risk. We can say that this goal has been achieved despite a slow start and the next step is to increase this percentage for the total population included young people.
We will see in the next months.

https://www.reuters.com/article/us-health-coronavirus-italy-vaccines-idUSKBN2B50L0

In [None]:
"""
plt.figure(figsize=(20,7))
sns.lineplot(x="date",y="TotVaccine",data=df).set_xticklabels(["2020-12-27","2021-03-17","2021-07-01","2021-08-06","2021-12-06"])
plt.title("Previous infection")
plt.show()
"""

Now let's try to aggregate the vaccines in daily somministrations
---

We will create a new dataframe with data grouped for daily  total of vaccinations and we will call it "daily".

In [None]:
data.columns

In [None]:
df.columns=['date', 'supplier', 'region', 'age_range', 'males', 'females', 'first_dose', 'second_dose',
            'previous_infection', 'additional_booster_dose','booster_immuno', 'second_booster', 'region_name', 'TotVaccine']
df.head()

In [None]:
daily=pd.DataFrame(df.groupby("date")["TotVaccine"].sum())
daily["day"]=df.date.unique()
daily

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="TotVaccine",data=daily).set_xticklabels(["2020-12-27","2021-03-01","2021-03-17","2021-07-01","2021-08-06","2021-12-06","2021-01-25","2021-05-05"])
plt.title("Total Vaccines")
plt.show()

In the last Graph i decided to highlight some specific Dates

What they represent?

The vaccination compaign has begun with the first vaccine on december 2021. Than after a low increase the govern appointed on march 2021 the Paolo Figliuolo as special commissioner for the campagin with the task to organize the distribution of the vaccines.

On March 21th 2021 The European Union has decided to block the somministration of Axtrazeneca and we can note the effet of this decision with the first slowdown on the campaign due to the panic following the release of news related to the side effects of the astrazeneca vaccine in part of the populations.

After the slowdown people resumed this time with other kind of vaccines (especially Pfizer and Moderna) till the summer.
Once the vaccination of people easily convinced (high age groups and people high risk people) by the vaccine was almost finished the govern decided to introduce the  Green Pass to convince the remaining part of population (young age groups and skeptical people).

The first version was discussed in June-July 2021 and became effective on August 6th 2021. We can observe that the number of daily vaccinations was generally stable beginning a decrease at the end of summer due to the fact that most of the population was now vaccinated.

So we can note the last significative data December 6th 2021 who is the day of the introduction of the "Super Green Pass" a new measure that require other restrictive parameters. 

In the first version of Green pass people skeptical could still parteticiate to the social live going to work or attending meeting places even not vaccinated just with an antigenic or molecolar negative swab. 

With the Super Green pass this is no more possible.

At the same time a new wave of infections and the introduction of the Booster dose influenced the new resumptions in the number of daily vaccinations.


In [None]:
daily=pd.DataFrame(df.groupby("date")[["TotVaccine",'first_dose', 'second_dose', 'previous_infection',
       'additional_booster_dose', 'booster_immuno', 'second_booster',"males","females"]].sum())
daily["day"]=df.date.unique()
daily

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="males",data=daily,label="males")
sns.lineplot(x="day",y="females",data=daily,label="females")
plt.title("Total Vaccines for gender")
plt.show()

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="first_dose",data=daily)
plt.title("First doses")
plt.show()

In [None]:
plt.figure(figsize=(20,7))
#sns.lineplot(x="day",y="second_dose",data=daily).set_xticklabels(["2020-12-27","2021-03-01","2021-03-17","2021-07-01","2021-08-06","2021-12-06"])
sns.lineplot(x="day",y="second_dose",data=daily)
plt.title("Second doses")
plt.show()

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="previous_infection",data=daily)
plt.title("Previous infections")
plt.show()

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="additional_booster_dose",data=daily)
plt.title("Additional booster dose")
plt.show()

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="second_booster",data=daily)
plt.title("Additional booster dose2")
plt.show()

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="booster_immuno",data=daily)
plt.title("Additional booster infection")
plt.show()

In [None]:
pfizer=data[data["supplier"]=="Pfizer/BioNTech"]
pfizer=pd.DataFrame(pfizer.groupby("date")["dailytotal"].sum())
pfizer["day"]=df.date.unique()
pfizer

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="date",y="dailytotal",data=pfizer)
plt.title ("Pfizer vaccinations")

In [None]:
moderna=data[data["supplier"]=="Moderna"]
moderna=pd.DataFrame(moderna.groupby("date")["dailytotal"].sum())
#moderna["day"]=df.date.unique()

plt.figure(figsize=(20,7))
sns.lineplot(x="date",y="dailytotal",data=moderna)
plt.title ("Moderna vaccinations")


in moderna devo inserire un valore mancante per il 31-12-2020 ossia indice di riga 4
https://stackoverflow.com/questions/15888648/is-it-possible-to-insert-a-row-at-an-arbitrary-position-in-a-dataframe-using-pan


In [None]:
data.supplier.unique()
janssen=data[data["supplier"]=="Janssen"]
janssen=pd.DataFrame(janssen.groupby("date")["dailytotal"].sum())
janssen

plt.figure(figsize=(20,7))
sns.lineplot(x="date",y="dailytotal",data=janssen)
plt.title ("Janssen vaccinations")

In [None]:
data.supplier.unique()
astra=data[data["supplier"]=="Vaxzevria (AstraZeneca)"]
astra=pd.DataFrame(astra.groupby("date")["dailytotal"].sum())
astra
plt.figure(figsize=(20,7))
sns.lineplot(x="date",y="dailytotal",data=astra)
plt.title ("Astrazeneca vaccinations")


In [None]:
pfizch=data[data["supplier"]=="Pfizer for children"]
pfizch=pd.DataFrame(pfizch.groupby("date")["dailytotal"].sum())
pfizch
plt.figure(figsize=(20,7))
sns.lineplot(x="date",y="dailytotal",data=pfizch)
plt.title ("Pfizer for children's vaccinations")

In [None]:
nova=data[data["supplier"]=="Novavax"]
nova=pd.DataFrame(pfizch.groupby("date")["dailytotal"].sum())
pfizch
plt.figure(figsize=(20,7))
sns.lineplot(x="date",y="dailytotal",data=nova)
plt.title ("Novavax vaccinations")

In [None]:
nova

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="dailytotal",data=pfizer,label= "Pfizer").get_figure()
sns.lineplot(x="date",y="dailytotal",data=moderna,label="Moderna")
sns.lineplot(x="date",y="dailytotal",data=janssen,label = "Janssen")
sns.lineplot(x="date",y="dailytotal",data=pfizch,label ="Pfizch")
sns.lineplot(x="date",y="dailytotal",data=astra,label="Astra")
sns.lineplot(x="date",y="dailytotal",data=nova,label="Novavax")
sns.lineplot(x="date",y="TotVaccine",data=daily,label="Total")
plt.title("Suppliers")
plt.savefig('save_as_a_png.png')
plt.show()

In [None]:
plt.figure(figsize=(20,7))
sns.lineplot(x="day",y="dailytotal",data=pfizer,label= "Pfizer").get_figure()
sns.lineplot(x="date",y="dailytotal",data=moderna,label="Moderna")
sns.lineplot(x="date",y="dailytotal",data=janssen,label = "Janssen")
sns.lineplot(x="date",y="dailytotal",data=pfizch,label ="Pfizch")
sns.lineplot(x="date",y="dailytotal",data=astra,label="Astra")
sns.lineplot(x="date",y="dailytotal",data=nova,label="Novavax")
plt.title("Suppliers")
plt.savefig('save_as_a_png.png')
plt.show()

In [None]:
Vaccination=pd.read_csv("../input/covid-world-vaccination-progress/country_vaccinations.csv")
Vaccination

We found as well another Dataset who can be used for confirmation to extend our research 

In [None]:
Italy=Vaccination[Vaccination.country=="Italy"]
Italy.tail(3)

In [None]:
Vaccination.isnull().mean()

In [None]:
Vaccination.groupby("country")["people_vaccinated"].max()

In [None]:
Vaccination.pivot_table(values="people_fully_vaccinated",index=["country"],aggfunc="max")

In [None]:
Vaccination[["country","people_vaccinated"]].groupby("country").max()

In [None]:
Vaccination[["country","people_vaccinated_per_hundred"]].groupby("country").max()

In [None]:
Vaccination.info()

In [None]:
Vaccination[["country","total_vaccinations"]].groupby("country").max()