# ***Covid-19 Vaccines Analysis with Python***  
There was a time when Covid-19 got out of hand. Even after the lockdown, this still resulted in a rapid increase in cases as in some countries cases were brought under control but the economy was sacrificed. In such a situation, only vaccines are seen as the only tool that can help the world fight covid-19. In this article, i will walk you through the task of Covid-19 vaccines analysis with Python.  


## ***Covid-19 Vaccines Analysis***  
Many vaccines have been introduced so far to fight covid-19. No vaccine has guaranteed 100% accuracy so far, but most manufacturing companies claim their vaccine is not 100% accurate, but stil, it will save your life by giving you immunity.  
  
Thus each country tries to vaccine a large part of its population so as not to depend on a single vaccine. That's what i'm going to analyze in this article, which is how many vaccines each country is using to fight covid-19.   
In the section below, i will take you through a data science tutorial on Covid-19 vaccines analysis with Python.  



## ***Covid-19 Vaccines Analysis with Python***  
The dataset that i will be using here for the task of covid-19 vaccines analysis is taken from kaggle. Let's start by importing the necessary Python libraries and the dataset.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


data = pd.read_csv("/content/country_vaccinations.csv")
data.head()

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
0,Afghanistan,AFG,2021-02-22,0.0,0.0,,,,0.0,0.0,,,Oxford/AstraZeneca,Government of Afghanistan,http://www.xinhuanet.com/english/asiapacific/2...
1,Afghanistan,AFG,2021-02-23,,,,,1367.0,,,,35.0,Oxford/AstraZeneca,Government of Afghanistan,http://www.xinhuanet.com/english/asiapacific/2...
2,Afghanistan,AFG,2021-02-24,,,,,1367.0,,,,35.0,Oxford/AstraZeneca,Government of Afghanistan,http://www.xinhuanet.com/english/asiapacific/2...
3,Afghanistan,AFG,2021-02-25,,,,,1367.0,,,,35.0,Oxford/AstraZeneca,Government of Afghanistan,http://www.xinhuanet.com/english/asiapacific/2...
4,Afghanistan,AFG,2021-02-26,,,,,1367.0,,,,35.0,Oxford/AstraZeneca,Government of Afghanistan,http://www.xinhuanet.com/english/asiapacific/2...


Now let's explore this data before we start analyzing the vaccines taken by countries

In [2]:
data.describe()

Unnamed: 0,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million
count,6646.0,5987.0,4284.0,5566.0,10960.0,6646.0,5987.0,4284.0,10960.0
mean,3708106.0,2631279.0,1197255.0,120445.8,70252.38,12.425435,9.357416,4.473908,2817.128741
std,14291810.0,9188796.0,4956694.0,445506.7,301269.2,20.459554,13.74141,8.981476,4968.009025
min,0.0,0.0,1.0,-29286.0,0.0,0.0,0.0,0.0,0.0
25%,52196.75,49398.5,22317.75,3178.75,928.75,0.98,0.94,0.48,322.0
50%,351536.0,295828.0,150083.5,15385.5,5948.5,4.82,3.81,1.71,1397.0
75%,1627582.0,1212549.0,581850.8,59946.75,26858.25,14.76,11.29,4.4825,3507.75
max,187047100.0,119242900.0,72630890.0,7185000.0,5190143.0,188.99,100.85,88.13,118759.0


In [5]:
#pd.to_datetime(data.date)
data = data[data.country.apply(lambda x: x not in ["England", "Scotland", "Wales", "Northern Ireland"])]

data.country.value_counts()


Canada           119
China            118
Russia           118
Israel           114
United States    113
                ... 
Eswatini           7
Brunei             5
Mali               4
Armenia            1
Laos               1
Name: country, Length: 172, dtype: int64

Now let's explore the vaccines available in this dataset:

In [6]:
data.vaccines.value_counts()

Moderna, Oxford/AstraZeneca, Pfizer/BioNTech                                          2609
Oxford/AstraZeneca                                                                    1829
Pfizer/BioNTech                                                                       1412
Oxford/AstraZeneca, Pfizer/BioNTech                                                    829
Pfizer/BioNTech, Sinovac                                                               480
Moderna, Pfizer/BioNTech                                                               408
Sputnik V                                                                              375
Oxford/AstraZeneca, Sinovac                                                            307
Oxford/AstraZeneca, Sinopharm/Beijing                                                  268
Oxford/AstraZeneca, Pfizer/BioNTech, Sinopharm/Beijing, Sputnik V                      260
Oxford/AstraZeneca, Sinopharm/Beijing, Sputnik V                                       236

So we have almost all the Covid-19 vaccines available in this dataset. Now I will create a new DataFrame by only selecting the vaccine and the country columns to explore which vaccine is taken by which country

In [7]:
df = data[["vaccines", "country"]]
df.head()

Unnamed: 0,vaccines,country
0,Oxford/AstraZeneca,Afghanistan
1,Oxford/AstraZeneca,Afghanistan
2,Oxford/AstraZeneca,Afghanistan
3,Oxford/AstraZeneca,Afghanistan
4,Oxford/AstraZeneca,Afghanistan


Now let's see how many countries are taken each of the vaccines mentioned in this data

In [13]:
dict_ = {}
for i in df.vaccines.unique():
  dict_[i] = [df["country"][j] for j in df[df["vaccines"]==i].index]

vaccines = {}
for key, value in dict_.items():
  vaccines[key] = set(value)
for i, j in vaccines.items():
  print(f"{i}:>>{j}")

Oxford/AstraZeneca:>>{'Sri Lanka', 'Mongolia', 'Sao Tome and Principe', 'Botswana', 'Mali', 'Mauritius', 'Moldova', 'Falkland Islands', 'Belize', 'Trinidad and Tobago', 'Eswatini', 'Vietnam', 'Guyana', 'Jamaica', 'Gambia', 'Nigeria', 'Myanmar', 'Papua New Guinea', 'Bahamas', 'Saint Helena', 'Solomon Islands', 'Togo', 'Sierra Leone', 'Kenya', 'Nepal', 'Grenada', 'Bangladesh', 'Montserrat', 'Saint Lucia', 'Anguilla', 'Sudan', 'Antigua and Barbuda', 'Angola', 'Kosovo', 'Suriname', "Cote d'Ivoire", 'Georgia', 'Taiwan', 'Ghana', 'Brunei', 'Saint Kitts and Nevis', 'Afghanistan', 'Saint Vincent and the Grenadines', 'Barbados', 'Maldives', 'Bhutan', 'Ukraine', 'Cape Verde', 'Uganda', 'Dominica', 'Malawi', 'Uzbekistan'}
Pfizer/BioNTech, Sinovac:>>{'Colombia', 'Malaysia', 'Hong Kong', 'Turkey', 'Uruguay', 'Albania', 'Chile'}
Sputnik V:>>{'Belarus', 'Algeria', 'Guinea', 'Paraguay', 'Syria', 'Armenia', 'Kazakhstan', 'Venezuela', 'Iran'}
Pfizer/BioNTech:>>{'Cayman Islands', 'Cyprus', 'North Macedon

Now let's visualize this data to have a look at what combination of vaccines every country is using:

In [14]:

import plotly.express as px
import plotly.offline as py

vaccine_map = px.choropleth(data, locations = 'iso_code', color = 'vaccines')
vaccine_map.update_layout(height=300, margin={"r":0,"t":0,"l":0,"b":0})
vaccine_map.show()

## ***Summary***  
So this is how we can analyze the type of vaccines taken by each country today. We can explore more insights from this dataset as there is a lot that we can do with this data.

# ***Web Scraping with Python***



Web Scraping est une technique conçue pour extraire de grandes quantités de données à partir de sites Web, les données étant extraites et enregistrées dans un fichier local de votre ordinateur.

Les données affichées par la plupart des sites Web ne peuvent être consultées qu'à l'aide d'un navigateur Web. Ils n'offrent pas la fonctionnalité pour enregistrer une copie de ces données pour un usage personnel.

La seule option consiste alors à copier et coller manuellement les données, ce qui peut prendre de nombreuses heures, voire parfois des jours, à compléter la technique d'automatisation de ce processus.

Ainsi, au lieu de copier manuellement les données des sites Web, le  logiciel de scraping Web  est utilisé pour effectuer la même tâche en une fraction du temps.

Dans ce tutoriel, nous allons construire un programme pour extraire des données de Wikipédia avec Python sur le thème «Science des données».

In [17]:
import nltk
import urllib
import bs4 as bs
import re
from nltk.corpus import stopwords
nltk.download('stopwords')
nltk.download('punkt')
# Gettings the data source
source = urllib.request.urlopen('https://en.wikipedia.org/wiki/Data_science').read()

# Parsing the data/ creating BeautifulSoup object
soup = bs.BeautifulSoup(source,'lxml')

# Fetching the data
text = ' '
for paragraph in soup.find_all('p'):
    text += paragraph.text

# Preprocessing the data
text = re.sub(r'\[[0-9]*\]',' ',text)
text = re.sub(r'\s+',' ',text)
text = text.lower()
text = re.sub(r'\d',' ',text)
text = re.sub(r'\s+',' ',text)

# Preparing the dataset
sentences = nltk.sent_tokenize(text)

sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
print(sentences)
for i in sentences:
  print(i)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[['data', 'science', 'is', 'an', 'interdisciplinary', 'field', 'that', 'uses', 'scientific', 'methods', ',', 'processes', ',', 'algorithms', 'and', 'systems', 'to', 'extract', 'knowledge', 'and', 'insights', 'from', 'structured', 'and', 'unstructured', 'data', ',', 'and', 'apply', 'knowledge', 'and', 'actionable', 'insights', 'from', 'data', 'across', 'a', 'broad', 'range', 'of', 'application', 'domains', '.'], ['data', 'science', 'is', 'related', 'to', 'data', 'mining', ',', 'machine', 'learning', 'and', 'big', 'data', '.'], ['data', 'science', 'is', 'a', '``', 'concept', 'to', 'unify', 'statistics', ',', 'data', 'analysis', ',', 'informatics', ',', 'and', 'their', 'related', 'methods', "''", 'in', 'order', 'to', '``', 'understand', 'and', 'analyze', 'actual', 'ph