# Notebook to download data

This notebook imports data for COVID related analysis from a number of online sources.

First of all, we need to import a few useful libraries.

In [92]:
import pandas as pd
import os
from pandasdmx import Request 
import eurostat

Make sure to have all packages. If not, type "pip install --namepackage--" in the console or in the terminal. We can now start to get the data. 

## Data From [Johns Hopkins](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series)

John Hopkins provides daily data on **confirmed cases**, **deaths** and **recovered** for many countries in the world. Data can be freely accessed through GitHub.

In [12]:
url_JH_confirmed = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
url_JH_death     = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
url_JH_recovered = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'
df_JH_confirmed  = pd.read_csv(url_JH_confirmed,index_col=0,parse_dates=[0])
df_JH_death      = pd.read_csv(url_JH_death,index_col=0,parse_dates=[0])
df_JH_recovered  = pd.read_csv(url_JH_recovered,index_col=0,parse_dates=[0])

Here is an extract of the data

In [13]:
df_JH_confirmed.head()

Unnamed: 0_level_0,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,...,3/16/20,3/17/20,3/18/20,3/19/20,3/20/20,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20
Province/State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
,Afghanistan,33.0,65.0,0,0,0,0,0,0,0,...,21,22,22,22,24,24,40,40,74,84
,Albania,41.1533,20.1683,0,0,0,0,0,0,0,...,51,55,59,64,70,76,89,104,123,146
,Algeria,28.0339,1.6596,0,0,0,0,0,0,0,...,54,60,74,87,90,139,201,230,264,302
,Andorra,42.5063,1.5218,0,0,0,0,0,0,0,...,2,39,39,53,75,88,113,133,164,188
,Angola,-11.2027,17.8739,0,0,0,0,0,0,0,...,0,0,0,0,1,2,2,3,3,3


## Data From [World Population Prospects](https://population.un.org/wpp/Download/Standard/CSV/)

We can use the same method as before to conviniently download the data from the United Nations website. Downloading the data takes same time.

In [20]:
url_WPP_tot_pop     = 'https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_TotalPopulationBySex.csv'
url_WPP_pop_age_sex = 'https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_PopulationByAgeSex_Medium.csv'
url_WPP_feritlity   = 'https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_Fertility_by_Age.csv'
df_WPP_tot_pop      = pd.read_csv(url_WPP_tot_pop,index_col=0,parse_dates=[0])
df_WPP_pop_age_sex  = pd.read_csv(url_WPP_pop_age_sex,index_col=0,parse_dates=[0])
df_WPP_feritlity    = pd.read_csv(url_WPP_feritlity,index_col=0,parse_dates=[0])

Here is an extract of the data

In [22]:
df_WPP_tot_pop.head()

Unnamed: 0_level_0,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
LocID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
4,Afghanistan,2,Medium,1950,1950.5,4099.243,3652.874,7752.117,11.874
4,Afghanistan,2,Medium,1951,1951.5,4134.756,3705.395,7840.151,12.009
4,Afghanistan,2,Medium,1952,1952.5,4174.45,3761.546,7935.996,12.156
4,Afghanistan,2,Medium,1953,1953.5,4218.336,3821.348,8039.684,12.315
4,Afghanistan,2,Medium,1954,1954.5,4266.484,3884.832,8151.316,12.486


## Data From [Eurostat](https://ec.europa.eu/eurostat/web/population-demography-migration-projections/data/database)

The library Eurostat allows to get demographics data in one line of code. We download data about population as well as national accounts. Data need to be cleaned for later usage.

In [88]:
data_EU_pop        = eurostat.get_data_df('demo_r_gind3')   # Population
data_EU_gdp        = eurostat.get_data_df('nama_10_gdp')    # GDP
data_EU_cons       = eurostat.get_data_df('nama_10_fcs')    # Consumption
data_EU_trade      = eurostat.get_data_df('nama_10_exi')    # Import export
data_EU_short_rate = eurostat.get_data_df('irt_st_a')       # Short term int rates on govt bonds
data_EU_long_rate  = eurostat.get_data_df('irt_lt_gby10_a') # 10 y int rates on govt bonds
data_EU_unemp      = eurostat.get_data_df('une_rt_a')       # Unemployment by sex and age
data_EU_inv        = eurostat.get_data_df('nama_10_an6')    # Gross capital formation (gross investment)

Here is an extract of the data

In [89]:
data_EU_GDP.head()

Unnamed: 0,unit,na_item,geo\time,2019,2018,2017,2016,2015,2014,2013,...,1984,1983,1982,1981,1980,1979,1978,1977,1976,1975
0,CLV05_MEUR,B1G,AL,,8909.9,8556.9,8237.4,7982.7,7805.2,7642.3,...,,,,,,,,,,
1,CLV05_MEUR,B1G,AT,278666.6,274012.8,266975.6,260239.3,255017.1,252879.7,251167.1,...,,,,,,,,,,
2,CLV05_MEUR,B1G,BA,,10040.3,10132.1,9809.2,9509.7,9223.0,9117.3,...,,,,,,,,,,
3,CLV05_MEUR,B1G,BE,340098.7,335501.2,330669.4,324396.7,320627.1,313684.5,308495.6,...,,,,,,,,,,
4,CLV05_MEUR,B1G,BG,30629.9,29749.3,28750.3,27602.5,26661.0,25768.7,25219.2,...,,,,,,,,,,


## Save all data

Run the lines below **only** if you need to save data locally. First of all, change your working directory to the path where you want data to be saved. 

In [None]:
print(os.getcwd())        # Current directory
print(os.chdir(NEW_PATH)) # New directory

In [None]:
df_JH_confirmed    = df_JH_confirmed.to_csv("df_JH_confirmed.csv", index=False)
df_JH_death        = df_JH_death.to_csv("df_JH_death.csv", index=False)
df_JH_recovered    = df_JH_recovered.to_csv("df_JH_recovered.csv", index=False)
df_WPP_tot_pop     = df_WPP_tot_pop.to_csv("df_WPP_tot_pop.csv", index=False)
df_WPP_pop_age_sex = df_WPP_pop_age_sex.to_csv("df_WPP_pop_age_sex.csv", index=False)
df_WPP_feritlity   = df_WPP_feritlity.to_csv("df_WPP_feritlity.csv", index=False)
data_EU_pop        = data_EU_pop.to_csv("data_EU_pop.csv", index=False)
data_EU_gdp        = data_EU_gdp.to_csv("data_EU_gdp.csv", index=False)
data_EU_cons       = data_EU_cons.to_csv("data_EU_cons.csv", index=False)
data_EU_trade      = data_EU_trade.to_csv("data_EU_trade.csv", index=False)
data_EU_short_rate = data_EU_short_rate.to_csv("data_EU_short_rate.csv", index=False)
data_EU_long_rate  = data_EU_long_rate.to_csv("data_EU_long_rate.csv", index=False)
data_EU_unemp      = data_EU_unemp.to_csv("data_EU_unemp.csv", index=False)
data_EU_inv        = data_EU_inv.to_csv("data_EU_inv.csv", index=False)