Fase de descarga de información:

Es necesario instalar el lector de páginas web de pandas:

In [1]:
!pip install pandas-datareader



Importamos la librería de pandas:

In [2]:
import pandas as pd

Acá 'pandas datareader' hará la búsqueda en internet con las palabras que nosotros le proveemos para la búsqueda:

In [3]:
from pandas_datareader import wb
# matches = wb.search('gdp.*capita.*const')

Acá es imprescindible conocer el "indicador" en el sitio web de donde estamos obteniendo la información.
https://data.worldbank.org/indicator/NY.GDP.PCAP.KD

Le instruimos que busque el PIB per cápita

METADATOS:
Indicator Code = NY.GDP.PCAP.KD
Indicator Name = PIB per cápita (US$ a precios actuales)

In [55]:
data_gdp = wb.download(indicator='NY.GDP.PCAP.KD', country=[], start=1960, end=2020)


MultiIndex([('Africa Eastern and Southern', '2020'),
            ('Africa Eastern and Southern', '2019'),
            ('Africa Eastern and Southern', '2018'),
            ('Africa Eastern and Southern', '2017'),
            ('Africa Eastern and Southern', '2016'),
            ('Africa Eastern and Southern', '2015'),
            ('Africa Eastern and Southern', '2014'),
            ('Africa Eastern and Southern', '2013'),
            ('Africa Eastern and Southern', '2012'),
            ('Africa Eastern and Southern', '2011'),
            ...
            (                   'Zimbabwe', '1969'),
            (                   'Zimbabwe', '1968'),
            (                   'Zimbabwe', '1967'),
            (                   'Zimbabwe', '1966'),
            (                   'Zimbabwe', '1965'),
            (                   'Zimbabwe', '1964'),
            (                   'Zimbabwe', '1963'),
            (                   'Zimbabwe', '1962'),
            (                 

Imprimimos para verificar:

In [57]:
print(data_gdp)

                                  NY.GDP.PCAP.KD
country                     year                
Africa Eastern and Southern 2020     1418.380523
                            2019     1499.256275
                            2018     1507.861055
                            2017     1507.821357
                            2016     1501.671310
...                                          ...
Zimbabwe                    1964     1152.997692
                            1963     1206.107233
                            1962     1174.431444
                            1961     1197.603795
                            1960     1164.740250

[16226 rows x 1 columns]


In [6]:
# Esta es una forma de consultar países en particular
# data_gdp = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX'], start=1980, end=2020)

Ahora procedemos a descargar "Esperanza de vida".

METADATOS:
Indicator Code = SP.DYN.LE00.IN
Indicator Name = Esperanza de vida al nacer, total (años)

In [7]:
data_life = wb.download(indicator='SP.DYN.LE00.IN', country=[], start=1960, end=2020)
print(data_life)

                                  SP.DYN.LE00.IN
country                     year                
Africa Eastern and Southern 2020             NaN
                            2019       64.005197
                            2018       63.648988
                            2017       63.246264
                            2016       62.787681
...                                          ...
Zimbabwe                    1964       54.849000
                            1963       54.403000
                            1962       53.946000
                            1961       53.483000
                            1960       53.019000

[16226 rows x 1 columns]


Procedemos a descargar la información de mortalidad infantil

METADATOS:
Indicator Code = SH.DYN.NMRT 
Indicator Name = Tasa de mortalidad, neonatal (por cada 1.000 nacidos vivos)

In [8]:
data_mort = wb.download(indicator='SH.DYN.NMRT', country=[], start=1960, end=2020)
print(data_mort)

                                  SH.DYN.NMRT
country                     year             
Africa Eastern and Southern 2020          NaN
                            2019    24.387831
                            2018    24.938211
                            2017    25.480369
                            2016    26.014181
...                                       ...
Zimbabwe                    1964          NaN
                            1963          NaN
                            1962          NaN
                            1961          NaN
                            1960          NaN

[16226 rows x 1 columns]


Procedemos a descargar la información de población

METADATOS:
Indicator Code = SP.POP.TOTL
Indicator Name = Población, total

In [9]:
data_pop = wb.download(indicator='SP.POP.TOTL', country=[], start=1960, end=2020)
print(data_pop)

                                  SP.POP.TOTL
country                     year             
Africa Eastern and Southern 2020  677243299.0
                            2019  660046272.0
                            2018  643090131.0
                            2017  626392880.0
                            2016  609978946.0
...                                       ...
Zimbabwe                    1964    4322854.0
                            1963    4178726.0
                            1962    4039209.0
                            1961    3905038.0
                            1960    3776679.0

[16226 rows x 1 columns]


**Información de la región de cada país**

In [54]:
region = pd.read_csv("Region.csv", encoding="windows_1258")
region = region.set_index("Name")


Unnamed: 0_level_0,Unnamed: 1_level_0,Region
country,year,Unnamed: 2_level_1
Africa Eastern and Southern,2020,
Africa Eastern and Southern,2019,
Africa Eastern and Southern,2018,
Africa Eastern and Southern,2017,
Africa Eastern and Southern,2016,
...,...,...
Zimbabwe,1964,
Zimbabwe,1963,
Zimbabwe,1962,
Zimbabwe,1961,


Procedemos a combinar toda la información (PIB per cápita, Esperanza de vida, población y mortalidad infantil) en un solo "data frame" que utilizaremos de ahora en adelante

In [58]:
df1 = pd.concat([data_gdp, data_life.reindex(data_gdp.index), data_pop.reindex(data_gdp.index), data_mort.reindex(data_gdp.index)], axis=1)
df1.columns

Index(['NY.GDP.PCAP.KD', 'SP.DYN.LE00.IN', 'SP.POP.TOTL', 'SH.DYN.NMRT'], dtype='object')

# Limpiando la base de datos

Limpiamos la base de datos haciendo uso del metodo dropna() de la clase dataframe. Con ello eliminamos todas las filas que contengan NaN como datos faltantes.

Finalmente guardamos en un archivo csv la información.

In [23]:
df = df1.dropna()
df

# Guardando en un archivo csv
df.to_csv("datos.csv")

In [24]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,NY.GDP.PCAP.KD,SP.DYN.LE00.IN,SP.POP.TOTL,SH.DYN.NMRT
country,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa Eastern and Southern,2019,1499.256275,64.005197,660046272.0,24.387831
Africa Eastern and Southern,2018,1507.861055,63.648988,643090131.0,24.938211
Africa Eastern and Southern,2017,1507.821357,63.246264,626392880.0,25.480369
Africa Eastern and Southern,2016,1501.671310,62.787681,609978946.0,26.014181
Africa Eastern and Southern,2015,1507.800256,62.259288,593871847.0,26.601909
...,...,...,...,...,...
Zimbabwe,1969,1290.313227,56.656000,5111326.0,31.700000
Zimbabwe,1968,1187.023436,56.362000,4941901.0,31.700000
Zimbabwe,1967,1203.561658,56.034000,4779825.0,31.800000
Zimbabwe,1966,1148.226118,55.671000,4623340.0,32.000000


In [59]:
df = pd.read_csv("datos.csv")
df

Unnamed: 0,country,year,NY.GDP.PCAP.KD,SP.DYN.LE00.IN,SP.POP.TOTL,SH.DYN.NMRT
0,Africa Eastern and Southern,2019,1499.256275,64.005197,660046272.0,24.387831
1,Africa Eastern and Southern,2018,1507.861055,63.648988,643090131.0,24.938211
2,Africa Eastern and Southern,2017,1507.821357,63.246264,626392880.0,25.480369
3,Africa Eastern and Southern,2016,1501.671310,62.787681,609978946.0,26.014181
4,Africa Eastern and Southern,2015,1507.800256,62.259288,593871847.0,26.601909
...,...,...,...,...,...,...
9145,Zimbabwe,1969,1290.313227,56.656000,5111326.0,31.700000
9146,Zimbabwe,1968,1187.023436,56.362000,4941901.0,31.700000
9147,Zimbabwe,1967,1203.561658,56.034000,4779825.0,31.800000
9148,Zimbabwe,1966,1148.226118,55.671000,4623340.0,32.000000


In [64]:
df3 = df.set_index("country")
df4 = pd.concat([df3, region.reindex(df3.index)] ,axis=1)
df4

Unnamed: 0_level_0,year,NY.GDP.PCAP.KD,SP.DYN.LE00.IN,SP.POP.TOTL,SH.DYN.NMRT,Region
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Africa Eastern and Southern,2019,1499.256275,64.005197,660046272.0,24.387831,
Africa Eastern and Southern,2018,1507.861055,63.648988,643090131.0,24.938211,
Africa Eastern and Southern,2017,1507.821357,63.246264,626392880.0,25.480369,
Africa Eastern and Southern,2016,1501.671310,62.787681,609978946.0,26.014181,
Africa Eastern and Southern,2015,1507.800256,62.259288,593871847.0,26.601909,
...,...,...,...,...,...,...
Zimbabwe,1969,1290.313227,56.656000,5111326.0,31.700000,Sub-Saharan Africa
Zimbabwe,1968,1187.023436,56.362000,4941901.0,31.700000,Sub-Saharan Africa
Zimbabwe,1967,1203.561658,56.034000,4779825.0,31.800000,Sub-Saharan Africa
Zimbabwe,1966,1148.226118,55.671000,4623340.0,32.000000,Sub-Saharan Africa


In [67]:
df_final = df4.dropna()

df_final.to_csv("datos2.csv")

In [68]:
df_final

Unnamed: 0_level_0,year,NY.GDP.PCAP.KD,SP.DYN.LE00.IN,SP.POP.TOTL,SH.DYN.NMRT,Region
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghanistan,2019,577.563058,64.833,38041757.0,35.9,South Asia
Afghanistan,2018,568.827927,64.486,37171922.0,36.9,South Asia
Afghanistan,2017,575.707053,64.130,36296111.0,38.0,South Asia
Afghanistan,2016,575.334442,63.763,35383028.0,39.1,South Asia
Afghanistan,2015,578.466353,63.377,34413603.0,40.3,South Asia
...,...,...,...,...,...,...
Zimbabwe,1969,1290.313227,56.656,5111326.0,31.7,Sub-Saharan Africa
Zimbabwe,1968,1187.023436,56.362,4941901.0,31.7,Sub-Saharan Africa
Zimbabwe,1967,1203.561658,56.034,4779825.0,31.8,Sub-Saharan Africa
Zimbabwe,1966,1148.226118,55.671,4623340.0,32.0,Sub-Saharan Africa


In [30]:
inter = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv')
inter["country"].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Australia', 'Austria', 'Bahrain', 'Bangladesh', 'Belgium',
       'Benin', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'China',
       'Colombia', 'Comoros', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Czech Republic',
       'Denmark', 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Ethiopia',
       'Finland', 'France', 'Gabon', 'Gambia', 'Germany', 'Ghana',
       'Greece', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Haiti',
       'Honduras', 'Hong Kong, China', 'Hungary', 'Iceland', 'India',
       'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kenya', 'Korea, Dem. Rep.',
       'Korea, Rep.', 'Kuwait', 'Leba