ETL. Extracción y transformación
EDA. Análisis preliminar de calidad de datos

In [10]:
import pandas as pd
import requests
import json

Como el objetivo principal del trabajo es analizar cúales con los factores socioeconómicos que más inciden en la Eperanza de Vida de los (35) países de que integran la OEA (Organización de Estados Americanos), nos vamos vamos a focalizar en buscar indicadores que tengan que ver con el desarrollo económico y el crecimiento de los países. 
Como fuente de datos, nos vamos a nutrir de las bases de datos del World Bank y de base de datos externas de otros organismos internacionales especializados en desarrollo.
Vamos a poceder a extrer los datos de la base del Banco Mundial utilizando la API que provee el WB.
Creamos un código para poder extraer los distintos indicadores potenciales para ser analizados, de las distintas bases de datos que se elojan en el sitio del organismo internacional. Las principales bases de datos utilizadas del BM son World Economic Indicators(WEI) y Health & Nutrition indicators.
El análisis preliminar nos permite hacer un corte en los datos, que se tomarán datos a partir de 1990, debido a al aumento de la disponibilidad de la data del WB y la externa al WB (aún a definir?). 
Elegimos a priori aproximadamente 50 indicadores, según los grandes temas o tópicos de las bases del BM que tienen que ver con factores socioeconómicos que más inciden en la esperanza de vida, y luego apoyados en base a literatura especializada en el tema. (Fuente)
Al analizar los indicadores prelimminarmente, advertimos que hay variables que podrían enriquecer nuestros análisis, pero que si bien existen en la base de datos del BM, hay muchos nulos y la calidad por tanto no es buena. Dedicimos usar bases de datos externas para aumentar la calidad de los datos, y contar con indicadores confiables en hábitos como consumo de Tabaco, Obesidaud, y gasto publico social de los gobiernos.
Bases de datos externas utilizadas:  Cepal, Naciones Unidas


In [11]:
#Etracción de Metadata para extraer luego los datos de cógidos de países e indicadores necesarios
# 50 indicadores

In [12]:
#Código para extraer los datos con la API del WB
#particionamos la información de los países e indicadores para realizar la extracción
# Países de la OEA
countries = ["USA",
    "ATG",
    "ARG",
    "BHS",
    "BRB",
    "BLZ",
    
]


# Indicadores elegidos luego de consultar las distintas base de datos en desarrollo y crecimiento del WB

indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]

# Años deseados para extraer los datos
#En base analisis preliminares y a la disponibildiad de las series de tiempo de lo indicadores, 
# optamos por tomar la data a partir de 1990, cuando comienza a aumentar la cantidad de datos 
# relacionados con el desarrollo
#Vamos a bajar los datos en distintos dataframes que luego se van a concatenar para unificarlo en un dataset
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df0 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df0

Unnamed: 0,País,Indicador,Año,Valor
0,United States,Account ownership at a financial institution o...,2022,
1,United States,Account ownership at a financial institution o...,2021,94.95
2,United States,Account ownership at a financial institution o...,2020,
3,United States,Account ownership at a financial institution o...,2019,
4,United States,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,Belize,Total alcohol consumption per capita (liters o...,1994,
9698,Belize,Total alcohol consumption per capita (liters o...,1993,
9699,Belize,Total alcohol consumption per capita (liters o...,1992,
9700,Belize,Total alcohol consumption per capita (liters o...,1991,


In [13]:
#Continuamos con la extraccion extracción de datos de los países
countries = ["CAN",
    "CHL",
    "COL",
    "CRI",
    "CUB",
    "DOM",
    "ECU"]
    
    

# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]




# Años elegidos
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df

Unnamed: 0,País,Indicador,Año,Valor
0,Canada,Account ownership at a financial institution o...,2022,
1,Canada,Account ownership at a financial institution o...,2021,99.63
2,Canada,Account ownership at a financial institution o...,2020,
3,Canada,Account ownership at a financial institution o...,2019,
4,Canada,Account ownership at a financial institution o...,2018,
...,...,...,...,...
11314,Ecuador,Total alcohol consumption per capita (liters o...,1994,
11315,Ecuador,Total alcohol consumption per capita (liters o...,1993,
11316,Ecuador,Total alcohol consumption per capita (liters o...,1992,
11317,Ecuador,Total alcohol consumption per capita (liters o...,1991,


In [14]:
#Continuamos con el mismo proceso
countries = [
    "GRD",
    "GTM",
    "GUY",
    "HTI",
    "HND",
    "JAM" ]
# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]




# Años elegidos
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df_a = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df_a

Unnamed: 0,País,Indicador,Año,Valor
0,Grenada,Account ownership at a financial institution o...,2022,
1,Grenada,Account ownership at a financial institution o...,2021,
2,Grenada,Account ownership at a financial institution o...,2020,
3,Grenada,Account ownership at a financial institution o...,2019,
4,Grenada,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,Jamaica,Total alcohol consumption per capita (liters o...,1994,
9698,Jamaica,Total alcohol consumption per capita (liters o...,1993,
9699,Jamaica,Total alcohol consumption per capita (liters o...,1992,
9700,Jamaica,Total alcohol consumption per capita (liters o...,1991,


In [19]:
#Sigue la lista de países para exraer la data ...
countries = [ 
             "BOL", "BRA", "PER", "DOM", "KNA", "SLV"
             ]
# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]

# Años elegidos 
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df1 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df1

Unnamed: 0,País,Indicador,Año,Valor
0,Bolivia,Account ownership at a financial institution o...,2022,
1,Bolivia,Account ownership at a financial institution o...,2021,68.89
2,Bolivia,Account ownership at a financial institution o...,2020,
3,Bolivia,Account ownership at a financial institution o...,2019,
4,Bolivia,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,El Salvador,Total alcohol consumption per capita (liters o...,1994,
9698,El Salvador,Total alcohol consumption per capita (liters o...,1993,
9699,El Salvador,Total alcohol consumption per capita (liters o...,1992,
9700,El Salvador,Total alcohol consumption per capita (liters o...,1991,


In [23]:
#Sigue la lista de países para exraer la data ...
countries = [
    "MEX", "NIC", "PAN", "PRY" 
]


# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]

# Años elegidos 
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df2 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df2

Unnamed: 0,País,Indicador,Año,Valor
0,Mexico,Account ownership at a financial institution o...,2022,48.97
1,Mexico,Account ownership at a financial institution o...,2021,
2,Mexico,Account ownership at a financial institution o...,2020,
3,Mexico,Account ownership at a financial institution o...,2019,
4,Mexico,Account ownership at a financial institution o...,2018,
...,...,...,...,...
6463,Paraguay,Total alcohol consumption per capita (liters o...,1994,
6464,Paraguay,Total alcohol consumption per capita (liters o...,1993,
6465,Paraguay,Total alcohol consumption per capita (liters o...,1992,
6466,Paraguay,Total alcohol consumption per capita (liters o...,1991,


In [20]:
#Ultima extracción de data para analizar los países que integran la OEA
countries =[ "LCA", "VCT", "SUR", "TTO", "URY", "VEN"  ]

# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]




# Años elegidos
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df3 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df3

Unnamed: 0,País,Indicador,Año,Valor
0,St. Lucia,Account ownership at a financial institution o...,2022,
1,St. Lucia,Account ownership at a financial institution o...,2021,
2,St. Lucia,Account ownership at a financial institution o...,2020,
3,St. Lucia,Account ownership at a financial institution o...,2019,
4,St. Lucia,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1994,
9698,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1993,
9699,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1992,
9700,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1991,


In [93]:
# Unificamos los dataframes
concatenated_df2 = pd.concat([data_df0, data_df, data_df1, data_df3, data_df2, data_df_a], ignore_index=True)

# El argumento 'ignore_index=True' restablece los índices de fila para que sean secuenciales.

# Ahora 'concatenated_df' contiene todos los datos de los tres DataFrames.


In [26]:
concatenated_df2

Unnamed: 0,País,Indicador,Año,Valor
0,United States,Account ownership at a financial institution o...,2022,
1,United States,Account ownership at a financial institution o...,2021,94.95
2,United States,Account ownership at a financial institution o...,2020,
3,United States,Account ownership at a financial institution o...,2019,
4,United States,Account ownership at a financial institution o...,2018,
...,...,...,...,...
56590,Jamaica,Total alcohol consumption per capita (liters o...,1994,
56591,Jamaica,Total alcohol consumption per capita (liters o...,1993,
56592,Jamaica,Total alcohol consumption per capita (liters o...,1992,
56593,Jamaica,Total alcohol consumption per capita (liters o...,1991,


In [27]:
concatenated_df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56595 entries, 0 to 56594
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   País       56595 non-null  object 
 1   Indicador  56595 non-null  object 
 2   Año        56595 non-null  object 
 3   Valor      39436 non-null  float64
dtypes: float64(1), object(3)
memory usage: 1.7+ MB


In [83]:
#Agregamos data de unos indicadores Obesidad, Gasto publico social, y Consumo de Tabado,  de base de datos externna
#Falta codigo para extraer base de datos eternas
#preliminarmente subimos lso cvs.
archivo1_csv = 'obesidad.csv'
archivo2_csv = "consumo_tabaco (1).csv"
archivo3_csv = "gasto_social.csv"

# Cargar el archivo CSV en un DataFrame
data_df5 = pd.read_csv(archivo1_csv)
data_df6 = pd.read_csv(archivo2_csv)
data_df7 = pd.read_csv(archivo3_csv)


In [95]:
data_df5 = data_df5.reset_index(drop=True)
data_df6 = data_df6.reset_index(drop=True)
data_df7 = data_df7.reset_index(drop=True)


In [96]:
#unificamos datafraames de fuentes externas , para Tabaco, Obesidad y Gasto público social
concatenated_df3 = pd.concat([data_df5, data_df6, data_df7], ignore_index=True)

In [97]:
concatenated_df3.head()

Unnamed: 0,Country,Año,"Proportion of adults who are obese, 20 years old and over ,by sex","Age-standardized prevalence of current tobacco use among persons aged 15 years and older, by sex",Gasto público social según la clasificación por funciones del gobierno
0,Antigua and Barbuda,1990,8.6,,
1,Antigua and Barbuda,1991,8.9,,
2,Antigua and Barbuda,1992,9.2,,
3,Antigua and Barbuda,1993,9.6,,
4,Antigua and Barbuda,1994,9.9,,


In [98]:
#Cambiar nombre columna de "Country" a "País"

# Cambiar el nombre de la columna 'Año'
concatenated_df3 = concatenated_df3.rename(columns={'Country': 'País'})


In [99]:
concatenated_df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56595 entries, 0 to 56594
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   País       56595 non-null  object 
 1   Indicador  56595 non-null  object 
 2   Año        56595 non-null  object 
 3   Valor      39436 non-null  float64
dtypes: float64(1), object(3)
memory usage: 1.7+ MB


In [100]:
concatenated_df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3947 entries, 0 to 3946
Data columns (total 5 columns):
 #   Column                                                                                            Non-Null Count  Dtype  
---  ------                                                                                            --------------  -----  
 0   País                                                                                              3947 non-null   object 
 1   Año                                                                                               3947 non-null   int64  
 2   Proportion of adults who are obese, 20 years old and over ,by sex                                 2835 non-null   float64
 3   Age-standardized prevalence of current tobacco use among persons aged 15 years and older, by sex  504 non-null    float64
 4   Gasto público social según la clasificación por funciones del gobierno                            608 non-null    float64
dtype

In [105]:
#Revisar este codigo

# Crear la columna "Indicador" combinando las tres columnas específicas
concatenated_df3["Indicador"] = concatenated_df3[["Proportion of adults who are obese, 20 years old and over ,by sex", "Age-standardized prevalence of current tobacco use among persons aged 15 years and older, by sex", "Gasto público social según la clasificación por funciones del gobierno"]].astype(str).apply(lambda x: ' '.join(x.dropna()), axis=1)

# Seleccionar solo las columnas que necesitas de concatenated_df3
concatenated_df3 = concatenated_df3[["Country", "Año", "Indicador", "Valor"]]

# Restablecer los índices antes de la concatenación
concatenated_df2.reset_index(drop=True, inplace=True)
columnas_necesarias.reset_index(drop=True, inplace=True)

# Concatenar concatenated_df2 y columnas_necesarias
concatenated_df2_df3 = pd.concat([concatenated_df2, concatenated_df3], ignore_index=True)


KeyError: "['Country', 'Valor'] not in index"

In [106]:
concatenated_df2_df3

Unnamed: 0,País,Indicador,Año,Valor
0,United States,Account ownership at a financial institution o...,2022,
1,United States,Account ownership at a financial institution o...,2021,94.95
2,United States,Account ownership at a financial institution o...,2020,
3,United States,Account ownership at a financial institution o...,2019,
4,United States,Account ownership at a financial institution o...,2018,
...,...,...,...,...
66292,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1994,
66293,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1993,
66294,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1992,
66295,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1991,


In [35]:
#Vemos la estructura de variables y filas
#Observamos que hay muchos nulos en la columna de los vaores numericos, que haya que analiza en detalle
concatenated_df2_df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66297 entries, 0 to 66296
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   País       66297 non-null  object 
 1   Indicador  66297 non-null  object 
 2   Año        66297 non-null  object 
 3   Valor      45940 non-null  float64
dtypes: float64(1), object(3)
memory usage: 2.0+ MB


In [36]:

#Vamos a agrupar los indicadores por países y por año 
# 1. Cambia el tipo de dato de la columna 'Año' a numérico (int)
concatenated_df2_df3['Año'] = pd.to_numeric(concatenated_df2_df3['Año'], errors='coerce')

# 2. Realiza una agregación (por ejemplo, suma) en caso de entradas duplicadas
df_aggregated_df2_df3 = concatenated_df2_df3.groupby(['País', 'Año', 'Indicador'])['Valor'].sum().unstack()

# Esto creará un nuevo DataFrame 'df_aggregated' con cada indicador como una columna individual y las entradas duplicadas se agregan mediante la suma (Ver concepto).


In [38]:
#Exportamos la tabla final de datos crudos a un csv 
nombre_del_archivo = 'df_aggregated_df2_df3.csv'
df_aggregated_df2_df3.to_csv(nombre_del_archivo, index=False)

In [None]:
#Verifico que la concatenacion esté correcta

# Concatenar DataFrames verticalmente (incorrecto en este caso)
concatenated_df2_df3 = pd.concat([df2, df3], axis=0)

# Verificar las dimensiones después de la concatenación
expected_shape = (df1.shape[0] + df2.shape[0], max(df1.shape[1], df2.shape[1]))
if concatenated_df.shape == expected_shape:
    print("Dimensiones correctas después de la concatenación.")
else:
    print("Dimensiones incorrectas después de la concatenación.")

# Verificar las columnas después de la concatenación
expected_columns = df1.columns.union(df2.columns)
if concatenated_df.columns.equals(expected_columns):
    print("Columnas correctas después de la concatenación.")
else:
    print("Columnas incorrectas después de la concatenación.")


In [39]:
#Tabla de indicadores por países y año

df_aggregated_df2_df3

Unnamed: 0_level_0,Indicador,Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+),"Birth rate, crude (per 1,000 people)",CO2 emissions (kg per PPP $ of GDP),Capital health expenditure (% of GDP),"Compulsory education, duration (years)",Control of Corruption: Estimate,"Current health expenditure per capita, PPP (current international $)","Death rate, crude (per 1,000 people)",Diabetes prevalence (% of population ages 20 to 79),Domestic general government health expenditure (% of GDP),...,Poverty gap at $6.85 a day (2017 PPP) (%),Pregnant women receiving prenatal care (%),Prevalence of overweight (% of adults),"Prevalence of overweight, weight for height (% of children under 5)",Prevalence of undernourishment (% of population),Rural population (% of total population),"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)","Unemployment, total (% of total labor force) (modeled ILO estimate)",Urban population,Urban population (% of total population)
País,Año,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Antigua and Barbuda,1990,0.00,20.916,0.301290,0.0,0.0,0.000000,0.000000,7.666,0.0,0.000000,...,0.0,0.0,33.7,0.0,0.0,64.574,0.0000,0.000,22435.0,35.426
Antigua and Barbuda,1991,0.00,18.504,0.289409,0.0,0.0,0.000000,0.000000,7.753,0.0,0.000000,...,0.0,0.0,34.4,0.0,0.0,64.535,0.0000,0.000,22568.0,35.465
Antigua and Barbuda,1992,0.00,19.228,0.276056,0.0,0.0,0.000000,0.000000,7.745,0.0,0.000000,...,0.0,0.0,34.9,0.0,0.0,64.915,0.0000,0.000,22686.0,35.085
Antigua and Barbuda,1993,0.00,18.597,0.260243,0.0,0.0,0.000000,0.000000,7.666,0.0,0.000000,...,0.0,0.0,35.5,0.0,0.0,65.291,0.0000,0.000,22850.0,34.709
Antigua and Barbuda,1994,0.00,18.990,0.242297,0.0,0.0,0.000000,0.000000,7.533,0.0,0.000000,...,0.0,0.0,36.1,0.0,0.0,65.666,0.0000,0.000,23029.0,34.334
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Venezuela, RB",2018,0.00,35.664,0.000000,0.0,34.0,-3.024075,1277.226835,13.728,0.0,2.491263,...,0.0,0.0,0.0,0.0,45.4,23.584,0.0000,10.040,52617224.0,176.416
"Venezuela, RB",2019,0.00,33.776,0.000000,0.0,34.0,-3.055198,938.268572,14.034,0.0,1.623506,...,0.0,0.0,0.0,0.0,49.8,23.520,6.0325,10.184,51129226.0,176.480
"Venezuela, RB",2020,0.00,32.426,0.000000,0.0,34.0,-3.152036,1180.238466,15.150,0.0,3.353269,...,0.0,0.0,0.0,0.0,45.8,23.442,0.0000,15.060,50302174.0,176.558
"Venezuela, RB",2021,168.78,31.754,0.000000,0.0,34.0,-3.244983,0.000000,16.248,19.2,0.000000,...,0.0,0.0,0.0,0.0,0.0,23.350,0.0000,12.942,49815066.0,176.650


In [47]:
#Composición de la tabla, nuevas columnas de indicadores y filas con años/países
df_aggregated_df2_df3.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1122 entries, ('Antigua and Barbuda', 1990) to ('Venezuela, RB', 2022)
Data columns (total 49 columns):
 #   Column                                                                                                           Non-Null Count  Dtype  
---  ------                                                                                                           --------------  -----  
 0   Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+)  1122 non-null   float64
 1   Birth rate, crude (per 1,000 people)                                                                             1122 non-null   float64
 2   CO2 emissions (kg per PPP $ of GDP)                                                                              1122 non-null   float64
 3   Capital health expenditure (% of GDP)                                                                            1122 non-null   fl

In [46]:
# Para contar tanto los valores nulos (NaN) como los ceros:
nulos_ceros = df_aggregated_df2_df3.isnull().sum() + (df_aggregated_df2_df3 == 0).sum()

print(nulos_ceros) 
print("(",(nulos_ceros/1122)*100,")%")
nulos_ceros[]


Indicador
Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+)    1034
Birth rate, crude (per 1,000 people)                                                                                 34
CO2 emissions (kg per PPP $ of GDP)                                                                                 108
Capital health expenditure (% of GDP)                                                                               493
Compulsory education, duration (years)                                                                              279
Control of Corruption: Estimate                                                                                     321
Current health expenditure per capita, PPP (current international $)                                                405
Death rate, crude (per 1,000 people)                                                                                 34
Diabetes prevalence (% of popu

In [48]:
df_aggregated_df2_df3.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1122 entries, ('Antigua and Barbuda', 1990) to ('Venezuela, RB', 2022)
Data columns (total 49 columns):
 #   Column                                                                                                           Non-Null Count  Dtype  
---  ------                                                                                                           --------------  -----  
 0   Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+)  1122 non-null   float64
 1   Birth rate, crude (per 1,000 people)                                                                             1122 non-null   float64
 2   CO2 emissions (kg per PPP $ of GDP)                                                                              1122 non-null   float64
 3   Capital health expenditure (% of GDP)                                                                            1122 non-null   fl

In [52]:
# Eliminamos columnas que tienen más de 65% de nulos de la base del BM
#Tomamos nota de las variables qeu eliminamos, para tomarls de fuenes externas
#Dejamos aún algunas columnas que tiene muchos nulos evidentes, y dejamos aquellos que podrían mejorar luego de la limpieza de datos.
# Lista de palabras clave para las columnas a eliminar
palabras_a_eliminar = [
    "Account ownership",
    "Educational attainment,",
    "Educational attainment,",
    "Infant and young",
    "Literacy rate,",
    "pure alcohol"
]

# Encuentra y elimina las columnas que contienen alguna de las palabras clave
columnas_a_eliminar = [col for col in df_aggregated_df2_df3.columns if any(palabra in col for palabra in palabras_a_eliminar)]
df_filtrado_nulos = df_aggregated_df2_df3.drop(columns=columnas_a_eliminar)




In [53]:
#Eliminar Columna indicador Diabetes en niños

columna_a_eliminar = next((col for col in df_filtrado_nulos.columns if col.startswith("Diabetes prevalence") and "child" in col), None)

if columna_a_eliminar is not None:
    df_filtrado_nulos = df_filtrado_nulos.drop(columns=columna_a_eliminar)
     


In [54]:
df_filtrado_nulos.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1122 entries, ('Antigua and Barbuda', 1990) to ('Venezuela, RB', 2022)
Data columns (total 43 columns):
 #   Column                                                                                        Non-Null Count  Dtype  
---  ------                                                                                        --------------  -----  
 0   Birth rate, crude (per 1,000 people)                                                          1122 non-null   float64
 1   CO2 emissions (kg per PPP $ of GDP)                                                           1122 non-null   float64
 2   Capital health expenditure (% of GDP)                                                         1122 non-null   float64
 3   Compulsory education, duration (years)                                                        1122 non-null   float64
 4   Control of Corruption: Estimate                                                               1122 non-

In [None]:
#La base de datos mantiene aún muchos nulos en datos de países de las islas del Caribe.
#Para aumentar la calidad de los datos, vamos a quitar del análisis los países con una población menor a 2 millones de personas, que básicamente son las islas mencionadas más pequeñas.
#Umbral = 2000000





# Define un umbral de población (2 millones en este caso)
umbral_población = 2000000

# Filtra los países con población mayor al umbral
df_filtrado_nulos_Sin_Caribe = df_filtrado_nulos.groupby('País').filter(lambda x: x['Population, total'].max() > umbral_población)




In [None]:
df_filtrado_nulos_Sin_Caribe.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 759 entries, ('Argentina', 1990) to ('Venezuela, RB', 2022)
Data columns (total 44 columns):
 #   Column                                                                                                Non-Null Count  Dtype  
---  ------                                                                                                --------------  -----  
 0   Birth rate, crude (per 1,000 people)                                                                  759 non-null    float64
 1   CO2 emissions (kg per PPP $ of GDP)                                                                   759 non-null    float64
 2   Capital health expenditure (% of GDP)                                                                 759 non-null    float64
 3   Compulsory education, duration (years)                                                                759 non-null    float64
 4   Control of Corruption: Estimate                                   

In [None]:
df_filtrado_nulos_Sin_Caribe.head()

Unnamed: 0_level_0,Indicador,"Birth rate, crude (per 1,000 people)",CO2 emissions (kg per PPP $ of GDP),Capital health expenditure (% of GDP),"Compulsory education, duration (years)",Control of Corruption: Estimate,"Current health expenditure per capita, PPP (current international $)","Death rate, crude (per 1,000 people)",Diabetes prevalence (% of population ages 20 to 79),Domestic general government health expenditure (% of GDP),"Domestic general government health expenditure per capita, PPP (current international $)",...,Poverty gap at $6.85 a day (2017 PPP) (%),Pregnant women receiving prenatal care (%),Prevalence of overweight (% of adults),"Prevalence of overweight, weight for height (% of children under 5)",Prevalence of undernourishment (% of population),Rural population (% of total population),"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)","Unemployment, total (% of total labor force) (modeled ILO estimate)",Urban population,Urban population (% of total population)
País,Año,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Argentina,1990,21.989,0.427859,0.0,0.0,0.0,0.0,7.743,0.0,0.0,0.0,...,0.0,0.0,48.7,0.0,0.0,13.016,0.0,0.0,28389540.0,86.984
Argentina,1991,21.844,0.400371,0.0,0.0,0.0,0.0,7.536,0.0,0.0,0.0,...,4.5,0.0,49.3,0.0,0.0,12.672,0.0,5.44,28910601.0,87.328
Argentina,1992,21.683,0.369563,0.0,0.0,0.0,0.0,7.595,0.0,0.0,0.0,...,5.0,0.0,49.9,0.0,0.0,12.458,0.0,6.36,29386348.0,87.542
Argentina,1993,21.57,0.340845,0.0,0.0,0.0,0.0,7.631,0.0,0.0,0.0,...,5.5,95.0,50.5,0.0,0.0,12.248,0.0,10.1,29859584.0,87.752
Argentina,1994,21.419,0.320044,0.0,0.0,0.0,0.0,7.403,0.0,0.0,0.0,...,5.7,0.0,51.1,11.1,0.0,12.04,0.0,11.76,30336257.0,87.96


In [None]:
import pandas as pd

# Supongamos que tienes un DataFrame llamado df_filtrado_nulos_Sin_Caribe
# y que 'País' y 'Año' están en el índice

# 1. Filtrar las columnas numéricas
columnas_numericas = df_filtrado_nulos_Sin_Caribe.select_dtypes(include='number')

# 2. Crear un DataFrame vacío para almacenar las correlaciones por país
correlaciones_por_pais = pd.DataFrame()

# 3. Iterar a través de los países y calcular las correlaciones para cada uno
for pais in df_filtrado_nulos_Sin_Caribe.index.get_level_values('País').unique():
    datos_pais = df_filtrado_nulos_Sin_Caribe.xs(pais, level='País', drop_level=False)
    correlaciones = datos_pais.corrwith(datos_pais['Life expectancy at birth, total (years)'])
    correlaciones_por_pais = pd.concat([correlaciones_por_pais, correlaciones], axis=1, sort=False)

# Transponer el DataFrame de correlaciones para que los países sean las filas y los indicadores las columnas
correlaciones_por_pais = correlaciones_por_pais.T

# Imprimir las correlaciones
print("Correlaciones de indicadores numéricos por país con Esperanza de Vida:")
print(correlaciones_por_pais)


Correlaciones de indicadores numéricos por país con Esperanza de Vida:
   Birth rate, crude (per 1,000 people)  CO2 emissions (kg per PPP $ of GDP)  \
0                              0.769019                             0.400697   
0                              0.553211                             0.159017   
0                              0.477770                             0.483241   
0                              0.767757                             0.348921   
0                              0.542633                             0.292770   
0                              0.470423                             0.229455   
0                              0.558151                             0.378914   
0                              0.676471                                  NaN   
0                              0.660309                             0.311954   
0                              0.593779                             0.484526   
0                              0.498134          

In [None]:
correlaciones_por_pais.head(100)

Unnamed: 0,"Birth rate, crude (per 1,000 people)",CO2 emissions (kg per PPP $ of GDP),Capital health expenditure (% of GDP),"Compulsory education, duration (years)",Control of Corruption: Estimate,"Current health expenditure per capita, PPP (current international $)","Death rate, crude (per 1,000 people)",Diabetes prevalence (% of population ages 20 to 79),Domestic general government health expenditure (% of GDP),"Domestic general government health expenditure per capita, PPP (current international $)",...,Poverty gap at $6.85 a day (2017 PPP) (%),Pregnant women receiving prenatal care (%),Prevalence of overweight (% of adults),"Prevalence of overweight, weight for height (% of children under 5)",Prevalence of undernourishment (% of population),Rural population (% of total population),"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)","Unemployment, total (% of total labor force) (modeled ILO estimate)",Urban population,Urban population (% of total population)
0,0.769019,0.400697,0.219555,-0.043365,0.098457,0.312727,0.967689,0.062824,0.323658,0.303186,...,0.194124,0.075038,0.335196,0.058614,0.294207,0.149099,0.107467,0.137313,-0.19216,-0.149099
0,0.553211,0.159017,0.433155,0.101643,-0.002336,0.415349,0.495251,0.079589,0.452302,0.387887,...,0.242822,0.123391,0.322811,0.094767,0.368006,-0.010054,0.153904,-0.145745,-0.040174,0.010054
0,0.47777,0.483241,0.275348,-0.009387,0.2138,0.408673,0.887702,0.08003,0.401578,0.407687,...,0.159368,0.11913,0.307279,0.085662,0.242975,0.022155,0.124308,0.104225,-0.047544,-0.022155
0,0.767757,0.348921,0.334708,-0.00853,0.023635,0.33738,0.985748,0.079702,0.34597,0.33646,...,0.179966,0.036705,0.330184,0.029653,0.302708,0.095959,0.100214,0.162844,-0.209193,-0.095959
0,0.542633,0.29277,0.289604,0.021668,0.097501,0.341573,0.901336,0.074265,0.370077,0.317375,...,0.032427,-0.011661,0.319733,0.120969,0.335737,0.051659,0.113746,0.067728,-0.16143,-0.051659
0,0.470423,0.229455,0.226812,0.015002,-0.012546,0.380711,0.812565,0.067595,0.395621,0.381242,...,0.254229,0.10164,0.319394,0.082999,0.337671,0.086413,0.121109,0.00751,-0.119684,-0.086413
0,0.558151,0.378914,0.256316,-0.071576,0.013785,0.273027,0.762766,0.047761,0.299825,0.267628,...,0.116808,0.103081,0.352482,0.056814,0.274131,0.201812,0.089783,-0.20518,-0.21308,-0.201812
0,0.676471,,0.189297,-0.016219,0.28606,0.281334,0.707088,0.035484,0.311878,0.277205,...,,0.167485,0.372768,0.047309,0.311234,0.143097,0.102004,0.09251,-0.047479,-0.143097
0,0.660309,0.311954,0.085722,-0.040619,-0.174798,0.33215,0.973464,0.07748,0.334782,0.301088,...,0.279958,0.109584,0.323019,0.096929,0.222665,0.13284,0.104055,-0.080883,-0.158273,-0.13284
0,0.593779,0.484526,0.32436,-0.005803,-0.043671,0.346861,0.772898,0.064582,0.335491,0.307159,...,0.069732,0.021434,0.348186,0.143206,0.279982,0.061489,0.121098,0.008559,-0.147819,-0.061489


In [None]:
# Especifica el nombre del archivo CSV sin la ruta completa
nombre_archivo_csv = 'Tabla_sin_caribe.csv'

# Exporta el DataFrame a un archivo CSV en la misma carpeta
df_filtrado_nulos_Sin_Caribe.to_csv(nombre_archivo_csv)
