# Data Visualization Project - Data Engineering

This notebook contains all the data manipulations we will perform throughout the development of the Covid-19 poster project for the Data Visualization curricular unit.

The goal of the project is to showcase that the access to Covid-19 vaccines, there is flagrant inequality between developed countries and countries in development. In order to do that, we will rely on data from different sources, mesh it together and output a solid dataset that can be used in a data visualization tool like Tableau or Microsoft PowerBI.

#### Brief outline of desired columns and the source used:

#### Part 1 - General country information and representation

1. Country Name
2. Location - polygon design - to allow for representation 
Sources: World map shapefile: A file with the necessary data to allow world map vizualization; https://hub.arcgis.com/datasets/2b93b06dc0dc4e809d3c8db5cb96ba69_0

3. GDP 
4. Population 
5. GDP p/capita
Source: IMF, World Bank
Source: https://www.imf.org/en/Publications/WEO/weo-database/2020/October

#### Part 2 - Covid Vaccine Data

Contracted quantatity by manufacturer:
https://launchandscalefaster.org/COVID-19

specifically
https://public.tableau.com/vizql/w/TimelineofCOVIDVaccineProcurementDeals_16125539354560/v/Dashboard1/viewData/sessions/BD1E18003B5448B88669524972EB60A5-0:0/views/16126187992227925297_15952188591581136529?maxrows=200&viz=%7B%22worksheet%22%3A%22Sheet%201%22%2C%22dashboard%22%3A%22Dashboard%201%22%7D

vaccination by country: other vaccination data - (number of vaccines taken) https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations

vaccination by manufacturer - vaccinations performed (not bought) https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations-by-manufacturer.csv

Price of vaccines - UNICEF - may rely on
https://app.powerbi.com/view?r=eyJrIjoiNmE0YjZiNzUtZjk2OS00ZTg4LThlMzMtNTRhNzE0NzA4YmZlIiwidCI6Ijc3NDEwMTk1LTE0ZTEtNGZiOC05MDRiLWFiMTg5MjAyMzY2NyIsImMiOjh9&pageName=ReportSectiona329b3eafd86059a947b

Data agendada (esperada) para primeiras entrega de vacinas

#### Part 3 - The Dream - apenas a pensar depois de dados para as partes 1 e 2 estarem encontrados.

Em países ainda sem vacina, já morreram estas........, quantas mais é que estamos dispostos a ter ou aceitar? 
Mortes confirmadas
Mortes projetadas até que o país tenha a vacina (se espere) - não há

Data limite de entrega
Data esperada do contrato
Quantidades verdadeiramente entregues para cada time period!

In [13]:
!pip install  openpyxl 

Collecting openpyxl
  Downloading openpyxl-3.0.6-py2.py3-none-any.whl (242 kB)
Collecting et-xmlfile
  Downloading et_xmlfile-1.0.1.tar.gz (8.4 kB)
Collecting jdcal
  Downloading jdcal-1.4.1-py2.py3-none-any.whl (9.5 kB)
Building wheels for collected packages: et-xmlfile
  Building wheel for et-xmlfile (setup.py): started
  Building wheel for et-xmlfile (setup.py): finished with status 'done'
  Created wheel for et-xmlfile: filename=et_xmlfile-1.0.1-py3-none-any.whl size=8913 sha256=572da4af1280e1e96f6b8ee71966b994e1dd2436e7c7d3057d57d245d177b793
  Stored in directory: c:\users\henrique costa\appdata\local\pip\cache\wheels\6e\df\38\abda47b884e3e25f9f9b6430e5ce44c47670758a50c0c51759
Successfully built et-xmlfile
Installing collected packages: jdcal, et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.0.1 jdcal-1.4.1 openpyxl-3.0.6


In [1]:
import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from math import ceil
import warnings

warnings.filterwarnings("ignore")
import matplotlib.gridspec as gspec

In [2]:
import geopandas as gdp

In [12]:
#Select the path where you've put your dataset provided in Moodle

path_datasets = 'https://raw.githubusercontent.com/Data-Visualization/blob/main/Datasets%20and%20SHPs/general/'
path_general = path_datasets + 'general/'

pop = pd.read_excel(path_general + 'POP.xlsx')
pop

HTTPError: HTTP Error 404: Not Found

In [51]:
pop

Unnamed: 0,Country,Population in 2020 (Millions)
0,Afghanistan,38.055
1,Albania,2.865
2,Algeria,44.227
3,Angola,31.031
4,Antigua and Barbuda,0.098
...,...,...
187,Vietnam,97.384
188,West Bank and Gaza,5.097
189,Yemen,32.471
190,Zambia,18.882


In [52]:
GDP = pd.read_excel("GDP,US.xls.xlsx")

In [53]:
GDP

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",GDP p/capita
0,Afghanistan,19.006,499.441
1,Albania,14.034,4898.277
2,Algeria,147.323,3331.076
3,Angola,62.724,2021.310
4,Antigua and Barbuda,1.389,14158.571
...,...,...,...
187,Vietnam,340.602,3497.512
188,West Bank and Gaza,14.750,2894.069
189,Yemen,20.948,645.126
190,Zambia,18.909,1001.440


In [54]:
data = pd.merge(GDP, pop, left_on='Country', right_on='Country')

In [55]:
data

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",GDP p/capita,Population in 2020 (Millions)
0,Afghanistan,19.006,499.441,38.055
1,Albania,14.034,4898.277,2.865
2,Algeria,147.323,3331.076,44.227
3,Angola,62.724,2021.310,31.031
4,Antigua and Barbuda,1.389,14158.571,0.098
...,...,...,...,...
187,Vietnam,340.602,3497.512,97.384
188,West Bank and Gaza,14.750,2894.069,5.097
189,Yemen,20.948,645.126,32.471
190,Zambia,18.909,1001.440,18.882


In [56]:
data = data[['Country','GDP, current prices, in  2020 (Bilions U.S. dollar)','Population in 2020 (Millions)','GDP p/capita']]

In [57]:
data

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita
0,Afghanistan,19.006,38.055,499.441
1,Albania,14.034,2.865,4898.277
2,Algeria,147.323,44.227,3331.076
3,Angola,62.724,31.031,2021.310
4,Antigua and Barbuda,1.389,0.098,14158.571
...,...,...,...,...
187,Vietnam,340.602,97.384,3497.512
188,West Bank and Gaza,14.750,5.097,2894.069
189,Yemen,20.948,32.471,645.126
190,Zambia,18.909,18.882,1001.440


In [58]:
geo = gdp.read_file('World_Countries__Generalized_.shp')

In [59]:
geo = geo[['COUNTRY','geometry']]

In [60]:
geo

Unnamed: 0,COUNTRY,geometry
0,American Samoa,"POLYGON ((-170.74390 -14.37555, -170.74942 -14..."
1,United States Minor Outlying Islands,"MULTIPOLYGON (((-160.02114 -0.39805, -160.0281..."
2,Cook Islands,"MULTIPOLYGON (((-159.74698 -21.25667, -159.793..."
3,French Polynesia,"MULTIPOLYGON (((-149.17920 -17.87084, -149.258..."
4,Niue,"POLYGON ((-169.89389 -19.14556, -169.93088 -19..."
...,...,...
244,Northern Mariana Islands,"MULTIPOLYGON (((145.73468 15.08722, 145.72830 ..."
245,Palau,"MULTIPOLYGON (((134.53137 7.35444, 134.52234 7..."
246,Russian Federation,"MULTIPOLYGON (((-179.99999 68.98010, -179.9580..."
247,Spain,"MULTIPOLYGON (((-2.91472 35.27361, -2.93924 35..."


In [61]:
exp = pd.merge(data, geo, left_on='Country', right_on='COUNTRY')

In [62]:
exp

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita,COUNTRY,geometry
0,Afghanistan,19.006,38.055,499.441,Afghanistan,"POLYGON ((61.27655 35.60725, 61.29638 35.62854..."
1,Albania,14.034,2.865,4898.277,Albania,"POLYGON ((19.57083 41.68527, 19.58195 41.69569..."
2,Algeria,147.323,44.227,3331.076,Algeria,"POLYGON ((4.60335 36.88791, 4.63555 36.88638, ..."
3,Angola,62.724,31.031,2021.310,Angola,"MULTIPOLYGON (((23.47611 -17.62584, 23.28916 -..."
4,Antigua and Barbuda,1.389,0.098,14158.571,Antigua and Barbuda,"MULTIPOLYGON (((-61.73806 16.98972, -61.82917 ..."
...,...,...,...,...,...,...
181,Venezuela,48.610,27.951,1739.112,Venezuela,"MULTIPOLYGON (((-66.31029 10.62602, -66.28309 ..."
182,Vietnam,340.602,97.384,3497.512,Vietnam,"MULTIPOLYGON (((107.07896 17.10804, 107.08333 ..."
183,Yemen,20.948,32.471,645.126,Yemen,"MULTIPOLYGON (((47.25445 13.61528, 47.16888 13..."
184,Zambia,18.909,18.882,1001.440,Zambia,"POLYGON ((30.21302 -14.98172, 30.21916 -15.096..."


In [63]:
a = exp['Country'].to_list()

In [64]:
 b = data['Country'].to_list()

In [65]:
# países que não se perderam por não haver dados
print([x for x in b if x not in set(a)])

['Hong Kong SAR', 'Korea', 'Kosovo', 'Macao SAR', 'Taiwan Province of China', 'West Bank and Gaza']


In [66]:
exp

Unnamed: 0,Country,"GDP, current prices, in 2020 (Bilions U.S. dollar)",Population in 2020 (Millions),GDP p/capita,COUNTRY,geometry
0,Afghanistan,19.006,38.055,499.441,Afghanistan,"POLYGON ((61.27655 35.60725, 61.29638 35.62854..."
1,Albania,14.034,2.865,4898.277,Albania,"POLYGON ((19.57083 41.68527, 19.58195 41.69569..."
2,Algeria,147.323,44.227,3331.076,Algeria,"POLYGON ((4.60335 36.88791, 4.63555 36.88638, ..."
3,Angola,62.724,31.031,2021.310,Angola,"MULTIPOLYGON (((23.47611 -17.62584, 23.28916 -..."
4,Antigua and Barbuda,1.389,0.098,14158.571,Antigua and Barbuda,"MULTIPOLYGON (((-61.73806 16.98972, -61.82917 ..."
...,...,...,...,...,...,...
181,Venezuela,48.610,27.951,1739.112,Venezuela,"MULTIPOLYGON (((-66.31029 10.62602, -66.28309 ..."
182,Vietnam,340.602,97.384,3497.512,Vietnam,"MULTIPOLYGON (((107.07896 17.10804, 107.08333 ..."
183,Yemen,20.948,32.471,645.126,Yemen,"MULTIPOLYGON (((47.25445 13.61528, 47.16888 13..."
184,Zambia,18.909,18.882,1001.440,Zambia,"POLYGON ((30.21302 -14.98172, 30.21916 -15.096..."
