# Data Visualization Project - Data Engineering

This notebook contains all the data manipulations we will perform throughout the development of the Covid-19 poster project for the Data Visualization curricular unit.

The goal of the project is to showcase that the access to Covid-19 vaccines, there is flagrant inequality between developed countries and countries in development. In order to do that, we will rely on data from different sources, mesh it together and output a solid dataset that can be used in a data visualization tool like Tableau or Microsoft PowerBI.

#### Brief outline of desired columns and the source used:

#### Part 1 - General country information and representation

1. Country Name
2. Location - polygon design - to allow for representation 
Sources: World map shapefile: A file with the necessary data to allow world map vizualization; https://hub.arcgis.com/datasets/2b93b06dc0dc4e809d3c8db5cb96ba69_0

3. GDP 
4. Population 
5. GDP p/capita
Source: IMF, World Bank
Source: https://www.imf.org/en/Publications/WEO/weo-database/2020/October

#### Part 2 - Covid Vaccine Data

Contracted quantatity by manufacturer:
https://launchandscalefaster.org/COVID-19

specifically
https://public.tableau.com/vizql/w/TimelineofCOVIDVaccineProcurementDeals_16125539354560/v/Dashboard1/viewData/sessions/BD1E18003B5448B88669524972EB60A5-0:0/views/16126187992227925297_15952188591581136529?maxrows=200&viz=%7B%22worksheet%22%3A%22Sheet%201%22%2C%22dashboard%22%3A%22Dashboard%201%22%7D

vaccination by country: other vaccination data - (number of vaccines taken) https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations

vaccination by manufacturer - vaccinations performed (not bought) https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations-by-manufacturer.csv

Price of vaccines - UNICEF - may rely on
https://app.powerbi.com/view?r=eyJrIjoiNmE0YjZiNzUtZjk2OS00ZTg4LThlMzMtNTRhNzE0NzA4YmZlIiwidCI6Ijc3NDEwMTk1LTE0ZTEtNGZiOC05MDRiLWFiMTg5MjAyMzY2NyIsImMiOjh9&pageName=ReportSectiona329b3eafd86059a947b

Data agendada (esperada) para primeiras entrega de vacinas

#### Part 3 - The Dream - apenas a pensar depois de dados para as partes 1 e 2 estarem encontrados.

Em países ainda sem vacina, já morreram estas........, quantas mais é que estamos dispostos a ter ou aceitar? 
Mortes confirmadas
Mortes projetadas até que o país tenha a vacina (se espere) - não há

Data limite de entrega
Data esperada do contrato
Quantidades verdadeiramente entregues para cada time period!

In [1]:
#pip install  openpyxl 

In [2]:
#pip install geopandas

In [3]:
import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from math import ceil
import warnings

import os
warnings.filterwarnings("ignore")
import matplotlib.gridspec as gspec



#import plotly.graph_objects as go
#import plotly.express as pex


In [4]:
import geopandas as gdp

In [5]:
#Select the path where you've put your dataset provided in Moodle

#step 1: check current file directory
path = os.getcwd()
files = os.listdir(path)
files

['.ipynb_checkpoints',
 'Data_Viz_CoVID-Copy2.ipynb',
 'general',
 'SHP_exp',
 'vaccinations']

In [6]:
os.chdir('general')
path = os.getcwd()
files = os.listdir(path)

In [7]:
pop = pd.read_excel('POP.xlsx')

#load gdp dataset
GDP = pd.read_excel("GDP,US.xlsx")

In [8]:
GDP

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",GDP p/capita
0,Afghanistan,19.006,499.441
1,Albania,14.034,4898.277
2,Algeria,147.323,3331.076
3,Angola,62.724,2021.310
4,Antigua and Barbuda,1.389,14158.571
...,...,...,...
187,Vietnam,340.602,3497.512
188,West Bank and Gaza,14.750,2894.069
189,Yemen,20.948,645.126
190,Zambia,18.909,1001.440


In [9]:
data = pd.merge(GDP, pop, left_on='Country', right_on='Country')

In [10]:
data

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",GDP p/capita,Population in 2020 (Millions)
0,Afghanistan,19.006,499.441,38.055
1,Albania,14.034,4898.277,2.865
2,Algeria,147.323,3331.076,44.227
3,Angola,62.724,2021.310,31.031
4,Antigua and Barbuda,1.389,14158.571,0.098
...,...,...,...,...
187,Vietnam,340.602,3497.512,97.384
188,West Bank and Gaza,14.750,2894.069,5.097
189,Yemen,20.948,645.126,32.471
190,Zambia,18.909,1001.440,18.882


In [11]:
data = data[['Country','GDP, current prices, in  2020 (Bilions U.S. dollar)','Population in 2020 (Millions)','GDP p/capita']]

In [12]:
data

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita
0,Afghanistan,19.006,38.055,499.441
1,Albania,14.034,2.865,4898.277
2,Algeria,147.323,44.227,3331.076
3,Angola,62.724,31.031,2021.310
4,Antigua and Barbuda,1.389,0.098,14158.571
...,...,...,...,...
187,Vietnam,340.602,97.384,3497.512
188,West Bank and Gaza,14.750,5.097,2894.069
189,Yemen,20.948,32.471,645.126
190,Zambia,18.909,18.882,1001.440


In [13]:
geo = gdp.read_file('World_Countries__Generalized_.shp')

In [14]:
geo = geo[['COUNTRY','geometry']]

In [15]:
geo

Unnamed: 0,COUNTRY,geometry
0,American Samoa,"POLYGON ((-170.74390 -14.37555, -170.74942 -14..."
1,United States Minor Outlying Islands,"MULTIPOLYGON (((-160.02114 -0.39805, -160.0281..."
2,Cook Islands,"MULTIPOLYGON (((-159.74698 -21.25667, -159.793..."
3,French Polynesia,"MULTIPOLYGON (((-149.17920 -17.87084, -149.258..."
4,Niue,"POLYGON ((-169.89389 -19.14556, -169.93088 -19..."
...,...,...
244,Northern Mariana Islands,"MULTIPOLYGON (((145.73468 15.08722, 145.72830 ..."
245,Palau,"MULTIPOLYGON (((134.53137 7.35444, 134.52234 7..."
246,Russian Federation,"MULTIPOLYGON (((-179.99999 68.98010, -179.9580..."
247,Spain,"MULTIPOLYGON (((-2.91472 35.27361, -2.93924 35..."


In [16]:
exp = pd.merge(geo, data, left_on='COUNTRY', right_on='Country')

In [17]:
exp

Unnamed: 0,COUNTRY,geometry,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita
0,Samoa,"MULTIPOLYGON (((-172.59650 -13.50911, -172.551...",Samoa,0.829,0.203,4083.806
1,Tonga,"MULTIPOLYGON (((-175.14529 -21.26806, -175.186...",Tonga,0.503,0.100,5023.166
2,El Salvador,"POLYGON ((-87.69467 13.81901, -87.72501 13.733...",El Salvador,24.784,6.486,3821.286
3,Guatemala,"POLYGON ((-89.34831 14.43198, -89.43556 14.414...",Guatemala,76.191,17.971,4239.672
4,Mexico,"MULTIPOLYGON (((-111.56001 24.42945, -111.5761...",Mexico,1040.372,128.933,8069.104
...,...,...,...,...,...,...
181,Marshall Islands,"MULTIPOLYGON (((168.78637 7.28889, 168.76721 7...",Marshall Islands,0.225,0.055,4070.617
182,Micronesia,"MULTIPOLYGON (((158.22775 6.78055, 158.18469 6...",Micronesia,0.395,0.103,3854.743
183,Palau,"MULTIPOLYGON (((134.53137 7.35444, 134.52234 7...",Palau,0.251,0.018,14232.720
184,Russian Federation,"MULTIPOLYGON (((-179.99999 68.98010, -179.9580...",Russian Federation,1464.078,146.812,9972.495


In [18]:
a = exp['Country'].to_list()

In [19]:
 b = data['Country'].to_list()

In [20]:
# países que não se perderam por não haver dados
print([x for x in b if x not in set(a)])

['Hong Kong SAR', 'Korea', 'Kosovo', 'Macao SAR', 'Taiwan Province of China', 'West Bank and Gaza']


In [21]:
exp

Unnamed: 0,COUNTRY,geometry,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita
0,Samoa,"MULTIPOLYGON (((-172.59650 -13.50911, -172.551...",Samoa,0.829,0.203,4083.806
1,Tonga,"MULTIPOLYGON (((-175.14529 -21.26806, -175.186...",Tonga,0.503,0.100,5023.166
2,El Salvador,"POLYGON ((-87.69467 13.81901, -87.72501 13.733...",El Salvador,24.784,6.486,3821.286
3,Guatemala,"POLYGON ((-89.34831 14.43198, -89.43556 14.414...",Guatemala,76.191,17.971,4239.672
4,Mexico,"MULTIPOLYGON (((-111.56001 24.42945, -111.5761...",Mexico,1040.372,128.933,8069.104
...,...,...,...,...,...,...
181,Marshall Islands,"MULTIPOLYGON (((168.78637 7.28889, 168.76721 7...",Marshall Islands,0.225,0.055,4070.617
182,Micronesia,"MULTIPOLYGON (((158.22775 6.78055, 158.18469 6...",Micronesia,0.395,0.103,3854.743
183,Palau,"MULTIPOLYGON (((134.53137 7.35444, 134.52234 7...",Palau,0.251,0.018,14232.720
184,Russian Federation,"MULTIPOLYGON (((-179.99999 68.98010, -179.9580...",Russian Federation,1464.078,146.812,9972.495


In [22]:
#print_SHP to test

#exp.to_file("test.shp", driver='ESRI Shapefile')

## Joining part 2:

In [23]:
 #vaccines by manufacturer
#the following code searchs a github directory and extracts all csv files in the directory to a dictionary

#in thius case, it will look into the country data folder on owid's covid
#Probably not very efficient, but works

#TO RUN THIS CELL YOU MIGHT HAVE TIO INSTALL BS4 AND REQUESTS, uncomment if needed
#!pip install bs4
#!pip install requests

# Import the required packages: 
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re 

# Store the url as a string scalar: url => str
url = 'https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations/country_data'

# Issue request: r => requests.models.Response
r = requests.get(url)

# Extract text: html_doc => str
html_doc = r.text

# Parse the HTML: soup => bs4.BeautifulSoup
soup = BeautifulSoup(html_doc)

# Find all 'a' tags (which define hyperlinks): a_tags => bs4.element.ResultSet
a_tags = soup.find_all('a')

# Store a list of urls ending in .csv: urls => list
urls = ['https://raw.githubusercontent.com'+re.sub('/blob', '', link.get('href')) 
        for link in a_tags  if '.csv' in link.get('href')]

# Store a list of Data Frame names to be assigned to the list: df_list_names => list
df_list_names = [url.split('.csv')[0].split('/')[url.count('/')] for url in urls]

# Initialise an empty list the same length as the urls list: df_list => list
df_list = [pd.DataFrame([None]) for i in range(len(urls))]

# Store an empty list of dataframes: df_list => list
df_list = [pd.read_csv(url, sep = ',') for url in urls]

# Name the dataframes in the list, coerce to a dictionary: df_dict => dict
df_dict = dict(zip(df_list_names, df_list))

In [24]:
df_dict

{'Albania':    location        date          vaccine  \
 0   Albania  2021-01-10  Pfizer/BioNTech   
 1   Albania  2021-01-12  Pfizer/BioNTech   
 2   Albania  2021-01-13  Pfizer/BioNTech   
 3   Albania  2021-01-14  Pfizer/BioNTech   
 4   Albania  2021-01-15  Pfizer/BioNTech   
 5   Albania  2021-01-16  Pfizer/BioNTech   
 6   Albania  2021-01-17  Pfizer/BioNTech   
 7   Albania  2021-01-18  Pfizer/BioNTech   
 8   Albania  2021-01-19  Pfizer/BioNTech   
 9   Albania  2021-01-20  Pfizer/BioNTech   
 10  Albania  2021-01-21  Pfizer/BioNTech   
 11  Albania  2021-02-02  Pfizer/BioNTech   
 12  Albania  2021-02-09  Pfizer/BioNTech   
 13  Albania  2021-02-17  Pfizer/BioNTech   
 14  Albania  2021-02-18  Pfizer/BioNTech   
 15  Albania  2021-02-19  Pfizer/BioNTech   
 16  Albania  2021-02-22  Pfizer/BioNTech   
 17  Albania  2021-02-25  Pfizer/BioNTech   
 18  Albania  2021-03-01  Pfizer/BioNTech   
 19  Albania  2021-03-03  Pfizer/BioNTech   
 
                                          

In [25]:
#convert dict in dataframe
# adding the key in 
for key in df_dict.keys():
    df_dict[key]['key'] = key 

# concatenating the DataFrames
countries = pd.concat(df_dict.values())
countries

Unnamed: 0,location,date,vaccine,source_url,total_vaccinations,people_vaccinated,people_fully_vaccinated,key
0,Albania,2021-01-10,Pfizer/BioNTech,https://www.france24.com/en/live-news/20210111...,0.0,0.0,,Albania
1,Albania,2021-01-12,Pfizer/BioNTech,https://shendetesia.gov.al/dita-iii-e-vaksinim...,128.0,128.0,,Albania
2,Albania,2021-01-13,Pfizer/BioNTech,https://shendetesia.gov.al/dita-iii-e-vaksinim...,188.0,188.0,,Albania
3,Albania,2021-01-14,Pfizer/BioNTech,https://shendetesia.gov.al/dita-iv-e-vaksinimi...,266.0,266.0,,Albania
4,Albania,2021-01-15,Pfizer/BioNTech,https://shendetesia.gov.al/dita-peste-e-vaksin...,308.0,308.0,,Albania
...,...,...,...,...,...,...,...,...
8,Zimbabwe,2021-03-01,Sinopharm/Beijing,https://twitter.com/MoHCCZim/status/1366477055...,21456.0,21456.0,,Zimbabwe
9,Zimbabwe,2021-03-02,Sinopharm/Beijing,https://twitter.com/MoHCCZim/status/1366851011...,25077.0,25077.0,,Zimbabwe
10,Zimbabwe,2021-03-03,Sinopharm/Beijing,https://twitter.com/MoHCCZim/status/1367208409...,27970.0,27970.0,,Zimbabwe
11,Zimbabwe,2021-03-04,Sinopharm/Beijing,https://twitter.com/MoHCCZim/status/1367546700...,30658.0,30658.0,,Zimbabwe


In [26]:
#keep last row (most updated one)

countries = countries.drop_duplicates(subset='key', keep="last").drop(['source_url', 'key'], axis = 1)
countries

Unnamed: 0,location,date,vaccine,total_vaccinations,people_vaccinated,people_fully_vaccinated
19,Albania,2021-03-03,Pfizer/BioNTech,15793.0,,
2,Algeria,2021-02-19,Sputnik V,75000.0,,
6,Andorra,2021-02-26,Pfizer/BioNTech,2526.0,2526.0,
3,Anguilla,2021-02-26,Oxford/AstraZeneca,3929.0,3929.0,
47,Argentina,2021-03-05,Sputnik V,1357596.0,1030504.0,327092.0
...,...,...,...,...,...,...
61,United States,2021-03-05,"Moderna, Pfizer/BioNTech",85008094.0,55547697.0,28701201.0
5,Uruguay,2021-03-05,Sinovac,70408.0,70408.0,
1,Venezuela,2021-02-22,Sputnik V,157.0,157.0,
55,Wales,2021-03-04,"Oxford/AstraZeneca, Pfizer/BioNTech",1121861.0,967042.0,154819.0


In [27]:
#go back
os.chdir('..')
path = os.getcwd()
files = os.listdir(path)

#go to vaccinations
os.chdir('vaccinations')
path = os.getcwd()
files = os.listdir(path)

In [28]:
files

['country_data',
 'locations.csv',
 'README.md',
 'Sheet_1_data.csv',
 'us_state_vaccinations.csv',
 'vaccinations-by-manufacturer.csv',
 'vaccinations.csv',
 'vaccinations.json',
 'vaccine proc_data_26_02.csv']

In [29]:
#get vaccine data from ds

vaccines = pd.read_csv(r'vaccine proc_data_26_02.csv')
vaccines

Unnamed: 0,Country seperate (group),subtitle,Page 1,Company and Scientific Name1,Deal not on Map,Deal Period1 11,"Potential (1=yes, 0=no)1",3 Star Note,Company's Country,Country seperate,...,Purchaser Entity / Country1,Purchaser's country Economic Status,Purchaser's Country Income Status,Tooltip deal amount,Type of Vaccine,Year,% Of National Population Able To Be Vaccinated,Deal Amount,Number of people able to be vaccinated with doses procured,Population
0,African Union,This map shows the percentage of the populatio...,January 2021,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Burundi,...,African Union,Low income,LMIC,vaccines,Adenoviral,2021,3.676475,100000000.0,50000000.0,1359998350
1,African Union,This map shows the percentage of the populatio...,February 2021,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Burundi,...,African Union,Low income,LMIC,vaccines,Adenoviral,2021,3.676475,100000000.0,50000000.0,1359998350
2,African Union,This map shows the percentage of the populatio...,January 2021,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Cameroon,...,African Union,Low income,LMIC,unknown amount,Adenoviral,2021,,,,1359998350
3,African Union,This map shows the percentage of the populatio...,February 2021,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Cameroon,...,African Union,Low income,LMIC,unknown amount,Adenoviral,2021,,,,1359998350
4,African Union,This map shows the percentage of the populatio...,January 2021,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Central African Republic,...,African Union,Low income,LMIC,unknown amount,Adenoviral,2021,,,,1359998350
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3269,Canada,This map shows the percentage of the populatio...,January 2021,Pfizer-BioNTech_BNT162,0,January 2021,Confirmed,1,USA/Germany,Canada,...,Canada,High income,HIC,vaccines,mRNA,2021,26.603342,20000000.0,10000000.0,37589262
3270,USA,This map shows the percentage of the populatio...,February 2021,Moderna_mRNA-1273,0,January 2021,Confirmed,1,USA,USA,...,USA,High income,HIC,vaccines,mRNA,2021,15.232779,100000000.0,50000000.0,328239523
3271,USA,This map shows the percentage of the populatio...,February 2021,Pfizer-BioNTech_BNT162,0,January 2021,Confirmed,1,USA/Germany,USA,...,USA,High income,HIC,vaccines,mRNA,2021,15.232779,100000000.0,50000000.0,328239523
3272,USA,This map shows the percentage of the populatio...,January 2021,Moderna_mRNA-1273,0,January 2021,Confirmed,1,USA,USA,...,USA,High income,HIC,vaccines,mRNA,2021,15.232779,100000000.0,50000000.0,328239523


In [30]:
short_vaccine = pd.read_csv(r'Sheet_1_data.csv')
short_vaccine

Unnamed: 0,Country seperate (group),subtitle,Page 1,Month,title vaccine,Latitude (generated),Longitude (generated),% Of National Population Able To Be Vaccinated,Deal Amount
0,USA,This map shows publicly-reported vaccine purch...,May 2020,May,only Oxford-AstraZeneca _AZD1222,40.079201,-98.816399,45.698336,300000000.0
1,UK,This map shows publicly-reported vaccine purch...,May 2020,May,only Oxford-AstraZeneca _AZD1222,52.289001,-1.259000,74.811768,100000000.0
2,USA,This map shows publicly-reported vaccine purch...,June 2020,*,*,40.079201,-98.816399,47.221614,310000000.0
3,UK,This map shows publicly-reported vaccine purch...,June 2020,May,only Oxford-AstraZeneca _AZD1222,52.289001,-1.259000,74.811768,100000000.0
4,Israel,This map shows publicly-reported vaccine purch...,June 2020,Jun,only Moderna_mRNA-1273,30.992001,34.834000,11.045696,2000000.0
...,...,...,...,...,...,...,...,...,...
307,Azerbaijan,This map shows publicly-reported vaccine purch...,March 2021,Jan,only Sinovac_Coronavac,40.459999,47.882999,19.953472,4000000.0
308,Australia,This map shows publicly-reported vaccine purch...,March 2021,*,*,-24.577999,133.582001,246.015001,124800000.0
309,Argentina,This map shows publicly-reported vaccine purch...,March 2021,Nov,*,-33.166000,-64.309998,52.293444,47000000.0
310,Albania,This map shows publicly-reported vaccine purch...,March 2021,Jan,only Pfizer-BioNTech_BNT162,40.653999,20.076000,8.759049,500000.0


In [31]:
#short_vaccine.info()

In [32]:
#vaccines.info()

In [33]:
#replacing names of countries to match each other

vaccines['Country seperate'] = vaccines['Country seperate'].replace('UK', 'United Kingdom')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('USA', 'United States')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('Príncipe', 'Sao Tome and Principe')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('São Tomé', 'Sao Tome and Principe')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('South Korea', 'Korea')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('Côte d’Ivoire', 'Côte d\'Ivoire')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('DR Congo', 'Congo DRC')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('Congo Republic', 'Congo')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('Taiwan', 'Taiwan Province of China')
vaccines['Country seperate'] = vaccines['Country seperate'].replace('Hong Kong', 'Hong Kong SAR')

In [34]:
countries['location'] = countries['location'].replace('Czechia', 'Czech Republic')
countries['location'] = countries['location'].replace('Russia', 'Russian Federation')

In [35]:
#store number of countries to see if there is difference

#vaccines df and store it in list

a = list(vaccines['Country seperate'].unique())

#countries df and store it in list

b = list(countries['location'].unique())

#check overlap in lists
new_list = list(set(a).difference(set(b)))

In [36]:
new_list1 = list(set(b).difference(set(a)))

In [37]:
#getting the set of countries in the world map shapefile
#merge exp with vaccines and countries

In [38]:
#drop irrelevant columns

vaccines = vaccines.drop(['subtitle', 'Page 1', 'Purchaser Entity / Country1', 'Tooltip deal amount'], axis = 1)

In [74]:
#NEXT STEP - MERGE THE ORIGINAL

pro_vaxers = vaccines.drop_duplicates()


Unnamed: 0,Country seperate (group),Company and Scientific Name1,Deal not on Map,Deal Period1 11,"Potential (1=yes, 0=no)1",3 Star Note,Company's Country,Country seperate,Date of Deal,Day,...,Number of Doses Needed per Person,Partners,Purchaser's country Economic Status,Purchaser's Country Income Status,Type of Vaccine,Year,% Of National Population Able To Be Vaccinated,Deal Amount,Number of people able to be vaccinated with doses procured,Population
0,African Union,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Burundi,Thu Jan 14 00:00:00 EST 2021,14,...,2,AstraZeneca,Low income,LMIC,Adenoviral,2021,3.676475,100000000.0,50000000.0,1359998350
2,African Union,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Cameroon,Thu Jan 14 00:00:00 EST 2021,14,...,2,AstraZeneca,Low income,LMIC,Adenoviral,2021,,,,1359998350
4,African Union,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Central African Republic,Thu Jan 14 00:00:00 EST 2021,14,...,2,AstraZeneca,Low income,LMIC,Adenoviral,2021,,,,1359998350
6,African Union,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Chad,Thu Jan 14 00:00:00 EST 2021,14,...,2,AstraZeneca,Low income,LMIC,Adenoviral,2021,,,,1359998350
8,African Union,Oxford-AstraZeneca _AZD1222,0,January 2021,Confirmed,0,UK,Congo,Thu Jan 14 00:00:00 EST 2021,14,...,2,AstraZeneca,Low income,LMIC,Adenoviral,2021,,,,1359998350
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3144,USA,Pfizer-BioNTech_BNT162,0,December 2020,Confirmed,1,USA/Germany,United States,Wed Dec 23 00:00:00 EST 2020,23,...,2,BioNTech and Fosun Pharma,High income,HIC,mRNA,2020,15.232779,100000000.0,50000000.0,328239523
3267,UK,Valneva_VLA2001,0,February 2021,Confirmed,1,France,United Kingdom,Tue Feb 02 00:00:00 EST 2021,2,...,2,NIH,High income,HIC,Protein adjuvant,2021,29.924707,40000000.0,20000000.0,66834405
3268,Canada,Pfizer-BioNTech_BNT162,0,January 2021,Confirmed,1,USA/Germany,Canada,Tue Jan 12 00:00:00 EST 2021,12,...,2,BioNTech and Fosun Pharma,High income,HIC,mRNA,2021,26.603342,20000000.0,10000000.0,37589262
3270,USA,Moderna_mRNA-1273,0,January 2021,Confirmed,1,USA,United States,Wed Jan 27 00:00:00 EST 2021,27,...,2,NIH,High income,HIC,mRNA,2021,15.232779,100000000.0,50000000.0,328239523


In [82]:
proer_vaxxers = pro_vaxers.groupby(['Country seperate', 'Company and Scientific Name1', 'Potential (1=yes, 0=no)1']).agg(
             Population = ('Population', 'mean'),
             Percent_pop_covered =('% Of National Population Able To Be Vaccinated', 'sum'),
             Vaccines_bought = ('Deal Amount', 'sum'),
             People_covered = ('Number of people able to be vaccinated with doses procured', 'sum'))

In [83]:
proer_vaxxers = proer_vaxxers.reset_index()

In [84]:
#create pivot table to relate vaccines bought by manufacturer to countries

first_pivot = pd.pivot_table(proer_vaxxers, values='Vaccines_bought', index=['Country seperate (group)'],
                    columns=['Company and Scientific Name1'])

#reset index
first_pivot = first_pivot.reset_index()
first_pivot

KeyError: 'Country seperate (group)'

In [43]:
first_pivot.rename(columns = {'Country seperate (group)': 'COUNTRY'}, inplace = True)

In [44]:
#merge with original shapefile, left join - keeps all original countries of exp and adds column values for those where there is information available

exp = exp.merge(first_pivot, on='COUNTRY', how='left')
exp

Unnamed: 0,COUNTRY,geometry,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita,Arcturus Therapeutics_LUNAR-COV19,Bharat Biotech_COVAXIN,COVAX Vaccines,COVAXX (United Biomedical)_UB-162,...,Janssen (J&J)_Ad26.COV2.S,Medicago_CoVLP,Moderna_mRNA-1273,Novavax_NVX-CoV2373,Oxford-AstraZeneca _AZD1222,Pfizer-BioNTech_BNT162,Sanofi-GSK_SARS-CoV-2 Vaccine,Sinopharm,Sinovac_Coronavac,Valneva_VLA2001
0,Samoa,"MULTIPOLYGON (((-172.59650 -13.50911, -172.551...",Samoa,0.829,0.203,4083.806,,,,,...,,,,,,,,,,
1,Tonga,"MULTIPOLYGON (((-175.14529 -21.26806, -175.186...",Tonga,0.503,0.100,5023.166,,,,,...,,,,,,,,,,
2,El Salvador,"POLYGON ((-87.69467 13.81901, -87.72501 13.733...",El Salvador,24.784,6.486,3821.286,,,,,...,,,,,2000000.0,,,,,
3,Guatemala,"POLYGON ((-89.34831 14.43198, -89.43556 14.414...",Guatemala,76.191,17.971,4239.672,,,,,...,,,,,,,,,,
4,Mexico,"MULTIPOLYGON (((-111.56001 24.42945, -111.5761...",Mexico,1040.372,128.933,8069.104,,,,,...,,,,,77400000.0,34400000.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181,Marshall Islands,"MULTIPOLYGON (((168.78637 7.28889, 168.76721 7...",Marshall Islands,0.225,0.055,4070.617,,,,,...,,,,,,,,,,
182,Micronesia,"MULTIPOLYGON (((158.22775 6.78055, 158.18469 6...",Micronesia,0.395,0.103,3854.743,,,,,...,,,,,,,,,,
183,Palau,"MULTIPOLYGON (((134.53137 7.35444, 134.52234 7...",Palau,0.251,0.018,14232.720,,,,,...,,,,,,,,,,
184,Russian Federation,"MULTIPOLYGON (((-179.99999 68.98010, -179.9580...",Russian Federation,1464.078,146.812,9972.495,,,,,...,,,,,,,,,,


In [45]:
#now, to do the same with the countries table
countries = countries.rename(columns = {'location': 'COUNTRY', 'date': 'last_update'}).drop('vaccine', axis = 1)
countries

Unnamed: 0,COUNTRY,last_update,total_vaccinations,people_vaccinated,people_fully_vaccinated
19,Albania,2021-03-03,15793.0,,
2,Algeria,2021-02-19,75000.0,,
6,Andorra,2021-02-26,2526.0,2526.0,
3,Anguilla,2021-02-26,3929.0,3929.0,
47,Argentina,2021-03-05,1357596.0,1030504.0,327092.0
...,...,...,...,...,...
61,United States,2021-03-05,85008094.0,55547697.0,28701201.0
5,Uruguay,2021-03-05,70408.0,70408.0,
1,Venezuela,2021-02-22,157.0,157.0,
55,Wales,2021-03-04,1121861.0,967042.0,154819.0


In [46]:
#merge with original shapefile, left join - keeps all original countries of exp and adds column values for those where there is information available

exp = exp.merge(countries, on='COUNTRY', how='left')
exp

Unnamed: 0,COUNTRY,geometry,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita,Arcturus Therapeutics_LUNAR-COV19,Bharat Biotech_COVAXIN,COVAX Vaccines,COVAXX (United Biomedical)_UB-162,...,Oxford-AstraZeneca _AZD1222,Pfizer-BioNTech_BNT162,Sanofi-GSK_SARS-CoV-2 Vaccine,Sinopharm,Sinovac_Coronavac,Valneva_VLA2001,last_update,total_vaccinations,people_vaccinated,people_fully_vaccinated
0,Samoa,"MULTIPOLYGON (((-172.59650 -13.50911, -172.551...",Samoa,0.829,0.203,4083.806,,,,,...,,,,,,,,,,
1,Tonga,"MULTIPOLYGON (((-175.14529 -21.26806, -175.186...",Tonga,0.503,0.100,5023.166,,,,,...,,,,,,,,,,
2,El Salvador,"POLYGON ((-87.69467 13.81901, -87.72501 13.733...",El Salvador,24.784,6.486,3821.286,,,,,...,2000000.0,,,,,,2021-02-25,16000.0,16000.0,
3,Guatemala,"POLYGON ((-89.34831 14.43198, -89.43556 14.414...",Guatemala,76.191,17.971,4239.672,,,,,...,,,,,,,2021-03-01,2427.0,,
4,Mexico,"MULTIPOLYGON (((-111.56001 24.42945, -111.5761...",Mexico,1040.372,128.933,8069.104,,,,,...,77400000.0,34400000.0,,,,,2021-03-05,2731900.0,2128766.0,603134.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181,Marshall Islands,"MULTIPOLYGON (((168.78637 7.28889, 168.76721 7...",Marshall Islands,0.225,0.055,4070.617,,,,,...,,,,,,,,,,
182,Micronesia,"MULTIPOLYGON (((158.22775 6.78055, 158.18469 6...",Micronesia,0.395,0.103,3854.743,,,,,...,,,,,,,,,,
183,Palau,"MULTIPOLYGON (((134.53137 7.35444, 134.52234 7...",Palau,0.251,0.018,14232.720,,,,,...,,,,,,,2021-02-01,3109.0,,
184,Russian Federation,"MULTIPOLYGON (((-179.99999 68.98010, -179.9580...",Russian Federation,1464.078,146.812,9972.495,,,,,...,,,,,,,2021-03-05,6301854.0,4908178.0,1393676.0


In [47]:
proer_vaxxers.rename(columns = {'Country seperate (group)': 'COUNTRY'}, inplace = True)

In [48]:
#go back
os.chdir('..')
path = os.getcwd()
files = os.listdir(path)

#go to SHP_exp folder
os.chdir('SHP_exp')
path = os.getcwd()
files = os.listdir(path)

#export and pray it works
exp.to_file("test.shp", driver='ESRI Shapefile')

In [49]:
first_pivot.head(40)

Company and Scientific Name1,COUNTRY,Arcturus Therapeutics_LUNAR-COV19,Bharat Biotech_COVAXIN,COVAX Vaccines,COVAXX (United Biomedical)_UB-162,CanSino Biologics_Ad5-nCoV,CureVac_CVnCov,Gamaleya Research Institute_Sputnik V,Janssen (J&J)_Ad26.COV2.S,Medicago_CoVLP,Moderna_mRNA-1273,Novavax_NVX-CoV2373,Oxford-AstraZeneca _AZD1222,Pfizer-BioNTech_BNT162,Sanofi-GSK_SARS-CoV-2 Vaccine,Sinopharm,Sinovac_Coronavac,Valneva_VLA2001
0,African Union,,,,,,,,120000000.0,,,,500000000.0,52000000.0,,,,
1,Albania,,,,,,,,,,,,,500000.0,,,,
2,Argentina,,,,,,,25000000.0,,,,,22000000.0,,,,,
3,Australia,,,,,,,,,,,51000000.0,53800000.0,20000000.0,,,,
4,Azerbaijan,,,,,,,,,,,,,,,,4000000.0,
5,Bangladesh,,,,,,,,,,,,33000000.0,,,,,
6,Bolivia,,,,,,,2600000.0,,,,,5000000.0,,,,,
7,Brazil,,20000000.0,,,,,10000000.0,,,,,102000000.0,,,,100000000.0,
8,COVAX,,,210000000.0,,,,,500000000.0,,,,170000000.0,40000000.0,200000000.0,,,
9,Canada,,,,,,,,38000000.0,76000000.0,40000000.0,52000000.0,20000000.0,40000000.0,72000000.0,,,


In [50]:
first_pivot.columns

Index(['COUNTRY', 'Arcturus Therapeutics_LUNAR-COV19',
       'Bharat Biotech_COVAXIN', 'COVAX Vaccines',
       'COVAXX (United Biomedical)_UB-162', 'CanSino Biologics_Ad5-nCoV',
       'CureVac_CVnCov', 'Gamaleya Research Institute_Sputnik V',
       'Janssen (J&J)_Ad26.COV2.S', 'Medicago_CoVLP', 'Moderna_mRNA-1273',
       'Novavax_NVX-CoV2373', 'Oxford-AstraZeneca _AZD1222',
       'Pfizer-BioNTech_BNT162', 'Sanofi-GSK_SARS-CoV-2 Vaccine', 'Sinopharm',
       'Sinovac_Coronavac', 'Valneva_VLA2001'],
      dtype='object', name='Company and Scientific Name1')

In [51]:
df=first_pivot[['COUNTRY',
                'COVAX Vaccines',
                'Pfizer-BioNTech_BNT162',
                'Janssen (J&J)_Ad26.COV2.S',
                'Oxford-AstraZeneca _AZD1222',
                'Moderna_mRNA-1273',
                'Sinovac_Coronavac',
                'Gamaleya Research Institute_Sputnik V' ]]

In [52]:
df

Company and Scientific Name1,COUNTRY,COVAX Vaccines,Pfizer-BioNTech_BNT162,Janssen (J&J)_Ad26.COV2.S,Oxford-AstraZeneca _AZD1222,Moderna_mRNA-1273,Sinovac_Coronavac,Gamaleya Research Institute_Sputnik V
0,African Union,,52000000.0,120000000.0,500000000.0,,,
1,Albania,,500000.0,,,,,
2,Argentina,,,,22000000.0,,,25000000.0
3,Australia,,20000000.0,,53800000.0,,,
4,Azerbaijan,,,,,,4000000.0,
5,Bangladesh,,,,33000000.0,,,
6,Bolivia,,,,5000000.0,,,2600000.0
7,Brazil,,,,102000000.0,,100000000.0,10000000.0
8,COVAX,210000000.0,40000000.0,500000000.0,170000000.0,,,
9,Canada,,40000000.0,38000000.0,20000000.0,40000000.0,,


In [53]:
df=df.rename(columns = {'COVAX Vaccines': 'COVAX',
                     'Pfizer-BioNTech_BNT162':'Pfizer/BioNTech',
                     'Janssen (J&J)_Ad26.COV2.S':"Jhonson&Jhonson",
                     'Oxford-AstraZeneca _AZD1222':'Oxford-AstraZeneca',
                     'Moderna_mRNA-1273':'Moderna',
                     'Sinovac_Coronavac':'Sinovac',
                     'Sanofi-GSK_SARS-CoV-2 Vaccine':'Sanofi/GSK',
                     'Gamaleya Research Institute_Sputnik V':'SputnikV'})

In [54]:
df

Company and Scientific Name1,COUNTRY,COVAX,Pfizer/BioNTech,Jhonson&Jhonson,Oxford-AstraZeneca,Moderna,Sinovac,SputnikV
0,African Union,,52000000.0,120000000.0,500000000.0,,,
1,Albania,,500000.0,,,,,
2,Argentina,,,,22000000.0,,,25000000.0
3,Australia,,20000000.0,,53800000.0,,,
4,Azerbaijan,,,,,,4000000.0,
5,Bangladesh,,,,33000000.0,,,
6,Bolivia,,,,5000000.0,,,2600000.0
7,Brazil,,,,102000000.0,,100000000.0,10000000.0
8,COVAX,210000000.0,40000000.0,500000000.0,170000000.0,,,
9,Canada,,40000000.0,38000000.0,20000000.0,40000000.0,,


In [55]:
df=df.replace(0.0, np.NaN)

In [56]:
df=df.dropna(how='all',thresh=2)

In [57]:
df

Company and Scientific Name1,COUNTRY,COVAX,Pfizer/BioNTech,Jhonson&Jhonson,Oxford-AstraZeneca,Moderna,Sinovac,SputnikV
0,African Union,,52000000.0,120000000.0,500000000.0,,,
1,Albania,,500000.0,,,,,
2,Argentina,,,,22000000.0,,,25000000.0
3,Australia,,20000000.0,,53800000.0,,,
4,Azerbaijan,,,,,,4000000.0,
5,Bangladesh,,,,33000000.0,,,
6,Bolivia,,,,5000000.0,,,2600000.0
7,Brazil,,,,102000000.0,,100000000.0,10000000.0
8,COVAX,210000000.0,40000000.0,500000000.0,170000000.0,,,
9,Canada,,40000000.0,38000000.0,20000000.0,40000000.0,,


In [58]:
df2=df.set_index('COUNTRY').stack().to_frame('values').reset_index()

In [59]:
df2.rename(columns={"Company and Scientific Name1":"Labs",
                   "values":"vaccines"}, inplace=True)

In [60]:
#merge
df2 = df2.merge(proer_vaxxers, on='COUNTRY', how = 'left')
df2

Unnamed: 0,COUNTRY,Labs,vaccines,Company and Scientific Name1,"Potential (1=yes, 0=no)1",Population,Percent_pop_covered,Vaccines_bought,People_covered
0,African Union,Pfizer/BioNTech,52000000.0,Janssen (J&J)_Ad26.COV2.S,Confirmed,1.359998e+09,8.823540,120000000.0,120000000.0
1,African Union,Pfizer/BioNTech,52000000.0,Oxford-AstraZeneca _AZD1222,Confirmed,1.359998e+09,18.382375,500000000.0,250000000.0
2,African Union,Pfizer/BioNTech,52000000.0,Pfizer-BioNTech_BNT162,Confirmed,1.335921e+09,10.389106,52000000.0,26000000.0
3,African Union,Jhonson&Jhonson,120000000.0,Janssen (J&J)_Ad26.COV2.S,Confirmed,1.359998e+09,8.823540,120000000.0,120000000.0
4,African Union,Jhonson&Jhonson,120000000.0,Oxford-AstraZeneca _AZD1222,Confirmed,1.359998e+09,18.382375,500000000.0,250000000.0
...,...,...,...,...,...,...,...,...,...
388,Venezuela,SputnikV,10000000.0,Gamaleya Research Institute_Sputnik V,Confirmed,2.851583e+07,17.534121,10000000.0,5000000.0
389,Vietnam,Oxford-AstraZeneca,30000000.0,Gamaleya Research Institute_Sputnik V,Confirmed,9.646211e+07,25.916913,50000000.0,25000000.0
390,Vietnam,Oxford-AstraZeneca,30000000.0,Oxford-AstraZeneca _AZD1222,Confirmed,9.646211e+07,15.550148,30000000.0,15000000.0
391,Vietnam,SputnikV,50000000.0,Gamaleya Research Institute_Sputnik V,Confirmed,9.646211e+07,25.916913,50000000.0,25000000.0


In [61]:
df2['Labs'].unique()

array(['Pfizer/BioNTech', 'Jhonson&Jhonson', 'Oxford-AstraZeneca',
       'SputnikV', 'Sinovac', 'COVAX', 'Moderna'], dtype=object)

In [62]:
df2['Company and Scientific Name1'].unique()

#matches: 'COVAX Vaccines': 'COVAX',
                     #'Pfizer-BioNTech_BNT162':'Pfizer/BioNTech',
                     #'Janssen (J&J)_Ad26.COV2.S':"Jhonson&Jhonson",
                     #'Oxford-AstraZeneca _AZD1222':'Oxford-AstraZeneca',
                     #'Moderna_mRNA-1273':'Moderna',
                     #'Sinovac_Coronavac':'Sinovac',
                     #'Sanofi-GSK_SARS-CoV-2 Vaccine':'Sanofi/GSK',
                     #'Gamaleya Research Institute_Sputnik V':'SputnikV'

array(['Janssen (J&J)_Ad26.COV2.S', 'Oxford-AstraZeneca _AZD1222',
       'Pfizer-BioNTech_BNT162', 'Gamaleya Research Institute_Sputnik V',
       'Novavax_NVX-CoV2373', 'Sinovac_Coronavac',
       'Bharat Biotech_COVAXIN', 'COVAX Vaccines',
       'Sanofi-GSK_SARS-CoV-2 Vaccine', 'Medicago_CoVLP',
       'Moderna_mRNA-1273', 'COVAXX (United Biomedical)_UB-162',
       'CureVac_CVnCov', 'Sinopharm', 'CanSino Biologics_Ad5-nCoV',
       'Arcturus Therapeutics_LUNAR-COV19', 'Valneva_VLA2001'],
      dtype=object)

In [63]:
#create dataframes for each manufacturer and merge together
pfizer = df2[(df2['Labs']=='Pfizer/BioNTech') & (df2['Company and Scientific Name1']=='Pfizer-BioNTech_BNT162')]
jj = df2[(df2['Labs']=='Jhonson&Jhonson') & (df2['Company and Scientific Name1']=='Janssen (J&J)_Ad26.COV2.S')]
oxf = df2[(df2['Labs']=='Oxford-AstraZeneca') & (df2['Company and Scientific Name1']=='Oxford-AstraZeneca _AZD1222')]
mod = df2[(df2['Labs']=='Moderna') & (df2['Company and Scientific Name1']=='Moderna_mRNA-1273')]
sinovac = df2[(df2['Labs']=='Sinovac') & (df2['Company and Scientific Name1']=='Sinovac_Coronavac')]
sanofi = df2[(df2['Labs']=='Sanofi/GSK') & (df2['Company and Scientific Name1']=='Sanofi-GSK_SARS-CoV-2 Vaccine')]
sputnik = df2[(df2['Labs']=='SputnikV') & (df2['Company and Scientific Name1']=='Gamaleya Research Institute_Sputnik V')]

pdList = [pfizer, jj, oxf, mod, sinovac, sanofi, sputnik]  # List of your dataframes
pdList

#replace df2
df2 = pd.concat(pdList)
df2

Unnamed: 0,COUNTRY,Labs,vaccines,Company and Scientific Name1,"Potential (1=yes, 0=no)1",Population,Percent_pop_covered,Vaccines_bought,People_covered
2,African Union,Pfizer/BioNTech,52000000.0,Pfizer-BioNTech_BNT162,Confirmed,1.335921e+09,10.389106,52000000.0,26000000.0
9,Albania,Pfizer/BioNTech,500000.0,Pfizer-BioNTech_BNT162,Confirmed,2.854191e+06,8.759049,500000.0,250000.0
16,Australia,Pfizer/BioNTech,20000000.0,Pfizer-BioNTech_BNT162,Confirmed,2.536431e+07,39.425481,20000000.0,10000000.0
46,COVAX,Pfizer/BioNTech,40000000.0,Pfizer-BioNTech_BNT162,Confirmed,5.047561e+06,396.230972,40000000.0,20000000.0
63,Canada,Pfizer/BioNTech,40000000.0,Pfizer-BioNTech_BNT162,Confirmed,3.758926e+07,53.206684,40000000.0,20000000.0
...,...,...,...,...,...,...,...,...,...
267,Palestine,SputnikV,10000.0,Gamaleya Research Institute_Sputnik V,Confirmed,5.168185e+06,0.096746,10000.0,5000.0
294,Serbia,SputnikV,2000000.0,Gamaleya Research Institute_Sputnik V,Confirmed,6.944975e+06,14.398900,2000000.0,1000000.0
387,Uzbekistan,SputnikV,35000000.0,Gamaleya Research Institute_Sputnik V,Confirmed,3.358065e+07,52.113345,35000000.0,17500000.0
388,Venezuela,SputnikV,10000000.0,Gamaleya Research Institute_Sputnik V,Confirmed,2.851583e+07,17.534121,10000000.0,5000000.0


In [64]:
df2 = df2.sort_values(by = 'COUNTRY').drop(['Company and Scientific Name1', 'Vaccines_bought'], axis = 1).reset_index(drop = True)

In [65]:
df2['COUNTRY'].unique()

array(['African Union', 'Albania', 'Argentina', 'Australia', 'Azerbaijan',
       'Bangladesh', 'Bolivia', 'Brazil', 'COVAX', 'Canada', 'Chile',
       'China', 'Colombia', 'Costa Rica', 'Dominican Republic', 'Ecuador',
       'El Salvador', 'European Union', 'Hong Kong', 'India', 'Indonesia',
       'Iraq', 'Israel', 'Japan', 'Jordan', 'Kazakhstan', 'Kuwait',
       'Latin America w/o Brazil', 'Lebanon', 'Malaysia', 'Mexico',
       'Nepal', 'New Zealand', 'North Macedonia', 'Oman', 'Palestine',
       'Panama', 'Peru', 'Philippines', 'Saudi Arabia', 'Serbia',
       'South Korea', 'Taiwan', 'Thailand', 'Turkey', 'UK', 'USA',
       'Ukraine', 'Uruguay', 'Uzbekistan', 'Venezuela', 'Vietnam'],
      dtype=object)

In [66]:
data

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita
0,Afghanistan,19.006,38.055,499.441
1,Albania,14.034,2.865,4898.277
2,Algeria,147.323,44.227,3331.076
3,Angola,62.724,31.031,2021.310
4,Antigua and Barbuda,1.389,0.098,14158.571
...,...,...,...,...
187,Vietnam,340.602,97.384,3497.512
188,West Bank and Gaza,14.750,5.097,2894.069
189,Yemen,20.948,32.471,645.126
190,Zambia,18.909,18.882,1001.440


In [67]:
data = data.drop('Population in 2020 (Millions)', axis = 1).rename(columns = {'Country': 'COUNTRY'})
data

Unnamed: 0,COUNTRY,"GDP, current prices, in 2020 (Bilions U.S. dollar)",GDP p/capita
0,Afghanistan,19.006,499.441
1,Albania,14.034,4898.277
2,Algeria,147.323,3331.076
3,Angola,62.724,2021.310
4,Antigua and Barbuda,1.389,14158.571
...,...,...,...
187,Vietnam,340.602,3497.512
188,West Bank and Gaza,14.750,2894.069
189,Yemen,20.948,645.126
190,Zambia,18.909,1001.440


In [68]:
df2 = df2.merge(data, on = 'COUNTRY', how = 'left')
df2

Unnamed: 0,COUNTRY,Labs,vaccines,"Potential (1=yes, 0=no)1",Population,Percent_pop_covered,People_covered,"GDP, current prices, in 2020 (Bilions U.S. dollar)",GDP p/capita
0,African Union,Pfizer/BioNTech,52000000.0,Confirmed,1.335921e+09,10.389106,26000000.0,,
1,African Union,Oxford-AstraZeneca,500000000.0,Confirmed,1.359998e+09,18.382375,250000000.0,,
2,African Union,Jhonson&Jhonson,120000000.0,Confirmed,1.359998e+09,8.823540,120000000.0,,
3,Albania,Pfizer/BioNTech,500000.0,Confirmed,2.854191e+06,8.759049,250000.0,14.034,4898.277
4,Argentina,Oxford-AstraZeneca,22000000.0,Confirmed,4.493871e+07,24.477782,11000000.0,382.760,8433.039
...,...,...,...,...,...,...,...,...,...
105,Uruguay,Sinovac,1700000.0,Confirmed,3.461734e+06,24.554169,850000.0,54.135,15331.717
106,Uzbekistan,SputnikV,35000000.0,Confirmed,3.358065e+07,52.113345,17500000.0,59.771,1762.856
107,Venezuela,SputnikV,10000000.0,Confirmed,2.851583e+07,17.534121,5000000.0,48.610,1739.112
108,Vietnam,Oxford-AstraZeneca,30000000.0,Confirmed,9.646211e+07,15.550148,15000000.0,340.602,3497.512


In [69]:
df2['COUNTRY'].unique()

array(['African Union', 'Albania', 'Argentina', 'Australia', 'Azerbaijan',
       'Bangladesh', 'Bolivia', 'Brazil', 'COVAX', 'Canada', 'Chile',
       'China', 'Colombia', 'Costa Rica', 'Dominican Republic', 'Ecuador',
       'El Salvador', 'European Union', 'Hong Kong', 'India', 'Indonesia',
       'Iraq', 'Israel', 'Japan', 'Jordan', 'Kazakhstan', 'Kuwait',
       'Latin America w/o Brazil', 'Lebanon', 'Malaysia', 'Mexico',
       'Nepal', 'New Zealand', 'North Macedonia', 'Oman', 'Palestine',
       'Panama', 'Peru', 'Philippines', 'Saudi Arabia', 'Serbia',
       'South Korea', 'Taiwan', 'Thailand', 'Turkey', 'UK', 'USA',
       'Ukraine', 'Uruguay', 'Uzbekistan', 'Venezuela', 'Vietnam'],
      dtype=object)

In [70]:
df2.to_csv('Teste_mapa.csv')

In [71]:
null_data = df2[df2.isnull().any(axis=1)]
null_data

Unnamed: 0,COUNTRY,Labs,vaccines,"Potential (1=yes, 0=no)1",Population,Percent_pop_covered,People_covered,"GDP, current prices, in 2020 (Bilions U.S. dollar)",GDP p/capita
0,African Union,Pfizer/BioNTech,52000000.0,Confirmed,1335921000.0,10.389106,26000000.0,,
1,African Union,Oxford-AstraZeneca,500000000.0,Confirmed,1359998000.0,18.382375,250000000.0,,
2,African Union,Jhonson&Jhonson,120000000.0,Confirmed,1359998000.0,8.82354,120000000.0,,
15,COVAX,Oxford-AstraZeneca,170000000.0,Confirmed,5047561.0,1683.98163,85000000.0,,
16,COVAX,Pfizer/BioNTech,40000000.0,Confirmed,5047561.0,396.230972,20000000.0,,
17,COVAX,Jhonson&Jhonson,500000000.0,Confirmed,50339440.0,993.256918,500000000.0,,
39,European Union,Moderna,160000000.0,Confirmed,447512000.0,17.876614,80000000.0,,
40,European Union,SputnikV,2000000.0,Confirmed,9769949.0,10.235468,1000000.0,,
41,European Union,Jhonson&Jhonson,200000000.0,Confirmed,447512000.0,44.691535,200000000.0,,
42,European Union,Oxford-AstraZeneca,400000000.0,Confirmed,447512000.0,44.691535,200000000.0,,


In [73]:
countries['COUNTRY'].unique()

array(['Albania', 'Algeria', 'Andorra', 'Anguilla', 'Argentina',
       'Australia', 'Austria', 'Azerbaijan', 'Bahrain', 'Bangladesh',
       'Barbados', 'Belarus', 'Belgium', 'Belize', 'Bermuda', 'Bolivia',
       'Brazil', 'Bulgaria', 'Cambodia', 'Canada', 'Cayman Islands',
       'Chile', 'China', 'Colombia', 'Costa Rica', 'Croatia', 'Cyprus',
       'Czech Republic', 'Denmark', 'Dominican Republic', 'Ecuador',
       'Egypt', 'El Salvador', 'England', 'Estonia', 'Faeroe Islands',
       'Falkland Islands', 'Finland', 'France', 'Germany', 'Gibraltar',
       'Greece', 'Greenland', 'Guatemala', 'Guernsey', 'Guinea', 'Guyana',
       'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India',
       'Indonesia', 'Iran', 'Ireland', 'Isle of Man', 'Israel', 'Italy',
       'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kuwait', 'Latvia',
       'Lebanon', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao',
       'Malaysia', 'Maldives', 'Malta', 'Mauritius', 'Mexico', 'Monaco',
       'Mongol

In [None]:
'African Union', 'Albania', 'Argentina', 'Australia', 'Azerbaijan',
       'Bangladesh', 'Bolivia', 'Brazil', 'COVAX', 'Canada', 'Chile',
       'China', 'Colombia', 'Costa Rica', 'Dominican Republic', 'Ecuador',
       'El Salvador', 'European Union', 'Hong Kong', 'India', 'Indonesia',
       'Iraq', 'Israel', 'Japan', 'Jordan', 'Kazakhstan', 'Kuwait',
       'Latin America w/o Brazil', 'Lebanon', 'Malaysia', 'Mexico',
       'Nepal', 'New Zealand', 'North Macedonia', 'Oman', 'Palestine',
       'Panama', 'Peru', 'Philippines', 'Saudi Arabia', 'Serbia',
       'South Korea', 'Taiwan', 'Thailand', 'Turkey', 'UK', 'USA',
       'Ukraine', 'Uruguay', 'Uzbekistan', 'Venezuela', 'Vietnam'],

In [77]:
##Step 1: create column of Countries for each group.

Af_union = set(pro_vaxers[pro_vaxers['Country seperate (group)'] == 'African Union']['Country seperate'])
UK = ['England', 'Scotland', 'Wales', 'Northern Ireland']
EU = set(pro_vaxers[pro_vaxers['Country seperate (group)'] == 'European Union']['Country seperate'])


In [81]:
pro_vaxers[pro_vaxers['Country seperate (group)'] == 'European Union']['Country seperate'].unique()

array(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus',
       'Czech Republic', 'Denmark', 'Estonia', 'Finland', 'France',
       'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Latvia',
       'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland',
       'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden',
       'Norway', 'Switzerland', 'Iceland'], dtype=object)

In [None]:
Af_union = set(pro_vaxers[pro_vaxers['Country seperate (group)'] == 'African Union']['Country seperate'])

### Sankey Chart

In [None]:


### Firstly define an unique index for all elements in sankey chart
### 

all_nodes = df2.Labs.values.tolist() + df2.COUNTRY.values.tolist()

source_indices = [all_nodes.index(labs) for labs in df2.Labs]

target_indices = [all_nodes.index(country) for country in df2.COUNTRY]


### colors
colors=pex.colors.sequential.haline

node_colors_mappings = dict([(node,np.random.choice(colors)) for node in all_nodes])

node_colors = [node_colors_mappings[node] for node in df2.Labs]

edge_colors = [node_colors_mappings[node] for node in df2.COUNTRY]



fig = go.Figure(data=[go.Sankey(
    arrangement = "snap",
    
### Define nodes
    node = dict(
      pad = 45,
      thickness = 15,
      line = dict(color = "Black", width = 1.0),
      label =  all_nodes,
      color = node_colors),

### Add links
    link = dict(
      source =  source_indices,
      target =  target_indices,
      value =  df2.vaccines,
      color = node_colors),
        
    orientation = "h")
                     ])

fig.update_layout(title_text="Vaccines",font_size=10,height = 1250,
    width = 1000,
                 plot_bgcolor='black', paper_bgcolor='White')


fig.show()