ETL. Extracción y transformación
EDA. Análisis preliminar de calidad de datos

In [1]:
import pandas as pd
import requests
import json

Como el objetivo principal del trabajo es analizar cúales con los factores socioeconómicos que más inciden en la Eperanza de Vida de los (35) países de que integran la OEA (Organización de Estados Americanos), nos vamos vamos a focalizar en buscar indicadores que tengan que ver con el desarrollo económico y el crecimiento de los países. 
Como fuente de datos, nos vamos a nutrir de las bases de datos del World Bank y de base de datos externas de otros organismos internacionales especializados en desarrollo.
Vamos a poceder a extrer los datos de la base del Banco Mundial utilizando la API que provee el WB.
Creamos un código para poder extraer los distintos indicadores potenciales para ser analizados, de las distintas bases de datos que se elojan en el sitio del organismo internacional. Las principales bases de datos utilizadas del BM son World Economic Indicators(WEI) y Health & Nutrition indicators.
El análisis preliminar nos permite hacer un corte en los datos, que se tomarán datos a partir de 1990, debido a al aumento de la disponibilidad de la data del WB y la externa al WB (aún a definir?). 
Elegimos a priori aproximadamente 50 indicadores, según los grandes temas o tópicos de las bases del BM que tienen que ver con factores socioeconómicos que más inciden en la esperanza de vida, y luego apoyados en base a literatura especializada en el tema. (Fuente)
Al analizar los indicadores prelimminarmente, advertimos que hay variables que podrían enriquecer nuestros análisis, pero que si bien existen en la base de datos del BM, hay muchos nulos y la calidad por tanto no es buena. Dedicimos usar bases de datos externas para aumentar la calidad de los datos, y contar con indicadores confiables en hábitos como consumo de Tabaco, Obesidad, y gasto publico social de los gobiernos.
Bases de datos externas utilizadas:  Cepal, Naciones Unidas


In [2]:
#Etracción de Metadata para extraer luego los datos de cógidos de países e indicadores necesarios
# 50 indicadores

In [3]:
#Código para extraer los datos con la API del WB
#particionamos la información de los países e indicadores para realizar la extracción
# Países de la OEA
countries = ["USA",
    "ATG",
    "ARG",
    "BHS",
    "BRB",
    "BLZ",
    
]


# Indicadores elegidos luego de consultar las distintas base de datos en desarrollo y crecimiento del WB

indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]

# Años deseados para extraer los datos
#En base analisis preliminares y a la disponibildiad de las series de tiempo de lo indicadores, 
# optamos por tomar la data a partir de 1990, cuando comienza a aumentar la cantidad de datos 
# relacionados con el desarrollo
#Vamos a bajar los datos en distintos dataframes que luego se van a concatenar para unificarlo en un dataset
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df0 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df0

Unnamed: 0,País,Indicador,Año,Valor
0,United States,Account ownership at a financial institution o...,2022,
1,United States,Account ownership at a financial institution o...,2021,94.95
2,United States,Account ownership at a financial institution o...,2020,
3,United States,Account ownership at a financial institution o...,2019,
4,United States,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,Belize,Total alcohol consumption per capita (liters o...,1994,
9698,Belize,Total alcohol consumption per capita (liters o...,1993,
9699,Belize,Total alcohol consumption per capita (liters o...,1992,
9700,Belize,Total alcohol consumption per capita (liters o...,1991,


In [4]:
#Continuamos con la extraccion extracción de datos de los países
countries = ["CAN",
    "CHL",
    "COL",
    "CRI",
    "CUB",
    "DOM",
    "ECU"]
    
    

# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]




# Años elegidos
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df

Unnamed: 0,País,Indicador,Año,Valor
0,Canada,Account ownership at a financial institution o...,2022,
1,Canada,Account ownership at a financial institution o...,2021,99.63
2,Canada,Account ownership at a financial institution o...,2020,
3,Canada,Account ownership at a financial institution o...,2019,
4,Canada,Account ownership at a financial institution o...,2018,
...,...,...,...,...
11281,Ecuador,Total alcohol consumption per capita (liters o...,1994,
11282,Ecuador,Total alcohol consumption per capita (liters o...,1993,
11283,Ecuador,Total alcohol consumption per capita (liters o...,1992,
11284,Ecuador,Total alcohol consumption per capita (liters o...,1991,


In [5]:
#Continuamos con el mismo proceso
countries = [
    "GRD",
    "GTM",
    "GUY",
    "HTI",
    "HND",
    "JAM" ]
# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]




# Años elegidos
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df_a = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df_a

Unnamed: 0,País,Indicador,Año,Valor
0,Grenada,Account ownership at a financial institution o...,2022,
1,Grenada,Account ownership at a financial institution o...,2021,
2,Grenada,Account ownership at a financial institution o...,2020,
3,Grenada,Account ownership at a financial institution o...,2019,
4,Grenada,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,Jamaica,Total alcohol consumption per capita (liters o...,1994,
9698,Jamaica,Total alcohol consumption per capita (liters o...,1993,
9699,Jamaica,Total alcohol consumption per capita (liters o...,1992,
9700,Jamaica,Total alcohol consumption per capita (liters o...,1991,


In [6]:
#Sigue la lista de países para exraer la data ...
countries = [ 
             "BOL", "BRA", "PER", "DOM", "KNA", "SLV"
             ]
# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]

# Años elegidos 
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df1 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df1

Unnamed: 0,País,Indicador,Año,Valor
0,Bolivia,Account ownership at a financial institution o...,2022,
1,Bolivia,Account ownership at a financial institution o...,2021,68.89
2,Bolivia,Account ownership at a financial institution o...,2020,
3,Bolivia,Account ownership at a financial institution o...,2019,
4,Bolivia,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,El Salvador,Total alcohol consumption per capita (liters o...,1994,
9698,El Salvador,Total alcohol consumption per capita (liters o...,1993,
9699,El Salvador,Total alcohol consumption per capita (liters o...,1992,
9700,El Salvador,Total alcohol consumption per capita (liters o...,1991,


In [7]:
#Sigue la lista de países para exraer la data ...
countries = [
    "MEX", "NIC", "PAN", "PRY" 
]


# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]

# Años elegidos 
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df2 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df2

Unnamed: 0,País,Indicador,Año,Valor
0,Mexico,Account ownership at a financial institution o...,2022,48.97
1,Mexico,Account ownership at a financial institution o...,2021,
2,Mexico,Account ownership at a financial institution o...,2020,
3,Mexico,Account ownership at a financial institution o...,2019,
4,Mexico,Account ownership at a financial institution o...,2018,
...,...,...,...,...
6463,Paraguay,Total alcohol consumption per capita (liters o...,1994,
6464,Paraguay,Total alcohol consumption per capita (liters o...,1993,
6465,Paraguay,Total alcohol consumption per capita (liters o...,1992,
6466,Paraguay,Total alcohol consumption per capita (liters o...,1991,


In [8]:
#Ultima extracción de data para analizar los países que integran la OEA
countries =[ "LCA", "VCT", "SUR", "TTO", "URY", "VEN"  ]

# Indicadores que deseas consultar
indicators = [
    "FX.OWN.TOTL.ZS",
    "SP.DYN.CBRT.IN",
    "SH.XPD.KHEX.GD.ZS",
    "EN.ATM.CO2E.PP.GD",
    "SE.COM.DURS",
    "CC.EST",
    "SH.XPD.CHEX.PP.CD",
    "SP.DYN.CDRT.IN",
    "SH.XPD.GHED.GD.ZS",
    "SH.XPD.GHED.PP.CD",
    "SH.XPD.PVTD.CH.ZS",
    "SE.TER.CUAT.BA.ZS",
    "SE.SEC.CUAT.LO.ZS",
    "SP.DYN.TFRT.IN",
    "NY.GDP.MKTP.PP.KD",
    "SI.POV.GINI",
    "SH.STA.IYCF.ZS",
    "SP.DYN.LE00.FE.IN",
    "SP.DYN.LE00.IN",
    "SE.ADT.LITR.ZS",
    "SP.DYN.AMRT.FE",
    "SP.DYN.AMRT.MA",
    "SP.DYN.IMRT.IN",
    "SH.DTH.NMRT",
    "SH.STA.BASS.ZS",
    "SH.H2O.SMDW.ZS",
    "EN.ATM.PM25.MC.ZS",
    "PV.EST",
    "SP.POP.80UP.FE",
    "SP.POP.80UP.MA",
    "SP.POP.TOTL.FE.IN",
    "SP.POP.TOTL.FE.ZS",
    "SP.POP.TOTL.MA.IN",
    "SP.POP.TOTL.MA.ZS",
    "SP.POP.TOTL",
    "SI.POV.UMIC.GP",
    "SH.STA.OWAD.ZS",
    "SH.STA.OWGH.ZS",
    "SE.XPD.TOTL.GD.ZS",
    "SP.RUR.TOTL.ZS",
    "SL.UEM.TOTL.ZS",
    "SP.URB.TOTL",
    "SP.URB.TOTL.IN.ZS",
      "EN.POP.DNST",
"SP.DYN.LE00.MA.IN",
"SH.STA.ANVC.ZS",
"SN.ITK.DEFC.ZS",
"SH.STA.DIAB.ZS",
"SH.ALC.PCAP.LI"
]




# Años elegidos
start_year = "1990"
end_year = "2022"

# Lista para almacenar DataFrames individuales
data_frames = []

# URL base de la API del Banco Mundial
base_url = "http://api.worldbank.org/v2/country"

# Realiza las consultas para cada país, indicador y año
for country_code in countries:
    for indicator in indicators:
        # Construye la URL de la consulta
        url = f"{base_url}/{country_code}/indicator/{indicator}?date={start_year}:{end_year}&format=json"
        
        # Realiza la solicitud GET a la API del Banco Mundial
        response = requests.get(url)
        
        # Verifica si la solicitud fue exitosa
        if response.status_code == 200:
            data = response.json()
            # Los datos se encuentran en data[1]
            for entry in data[1]:
                year = entry['date']
                value = entry['value']
                indicator_name = entry['indicator']['value'] 
                country_name = entry['country']['value']
                data_frames.append(pd.DataFrame({"País": [country_name], "Indicador": [indicator_name], "Año": [year], "Valor": [value]}))

# Concatenar todos los DataFrames individuales en uno
data_df3 = pd.concat(data_frames, ignore_index=True)

# Mostrar los datos en una tabla
data_df3

Unnamed: 0,País,Indicador,Año,Valor
0,St. Lucia,Account ownership at a financial institution o...,2022,
1,St. Lucia,Account ownership at a financial institution o...,2021,
2,St. Lucia,Account ownership at a financial institution o...,2020,
3,St. Lucia,Account ownership at a financial institution o...,2019,
4,St. Lucia,Account ownership at a financial institution o...,2018,
...,...,...,...,...
9697,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1994,
9698,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1993,
9699,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1992,
9700,"Venezuela, RB",Total alcohol consumption per capita (liters o...,1991,


In [9]:
# Unificamos los dataframes
concatenated_df2 = pd.concat([data_df0, data_df, data_df1, data_df3, data_df2, data_df_a], ignore_index=True)

# El argumento 'ignore_index=True' restablece los índices de fila para que sean secuenciales.

# Ahora 'concatenated_df' contiene todos los datos de los tres DataFrames.


In [10]:
concatenated_df2

Unnamed: 0,País,Indicador,Año,Valor
0,United States,Account ownership at a financial institution o...,2022,
1,United States,Account ownership at a financial institution o...,2021,94.95
2,United States,Account ownership at a financial institution o...,2020,
3,United States,Account ownership at a financial institution o...,2019,
4,United States,Account ownership at a financial institution o...,2018,
...,...,...,...,...
56557,Jamaica,Total alcohol consumption per capita (liters o...,1994,
56558,Jamaica,Total alcohol consumption per capita (liters o...,1993,
56559,Jamaica,Total alcohol consumption per capita (liters o...,1992,
56560,Jamaica,Total alcohol consumption per capita (liters o...,1991,


In [11]:
concatenated_df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56562 entries, 0 to 56561
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   País       56562 non-null  object 
 1   Indicador  56562 non-null  object 
 2   Año        56562 non-null  object 
 3   Valor      39404 non-null  float64
dtypes: float64(1), object(3)
memory usage: 1.7+ MB


In [43]:
#Agregamos data de unos indicadores Obesidad, Gasto publico social, y Consumo de Tabado,  de base de datos externna
#Falta codigo para extraer base de datos eternas
#preliminarmente subimos lso cvs.
archivo1_csv = '../proyecto_final_bootcamp_henry//datasets/obesidad.csv'
archivo2_csv = "../proyecto_final_bootcamp_henry/datasets/consumo_tabaco (1).csv"
archivo3_csv = "../proyecto_final_bootcamp_henry/datasets/gasto_social.csv"

# Cargar el archivo CSV en un DataFrame
data_df5 = pd.read_csv(archivo1_csv)
data_df6 = pd.read_csv(archivo2_csv)
data_df7 = pd.read_csv(archivo3_csv)


In [67]:
#unificamos datafraames de fuentes externas , para Tabaco, Obesidad y Gasto público social
concatenated_df3 = pd.concat([data_df5, data_df6, data_df7], ignore_index=True)

In [68]:
concatenated_df3.head()

Unnamed: 0,Country,Año,"Proportion of adults who are obese, 20 years old and over ,by sex","Age-standardized prevalence of current tobacco use among persons aged 15 years and older, by sex",Gasto público social según la clasificación por funciones del gobierno
0,Antigua and Barbuda,1990,8.6,,
1,Antigua and Barbuda,1991,8.9,,
2,Antigua and Barbuda,1992,9.2,,
3,Antigua and Barbuda,1993,9.6,,
4,Antigua and Barbuda,1994,9.9,,


In [69]:
#Cambiar nombre columna de "Country" a "País"

# Cambiar el nombre de la columna 'Año'
concatenated_df3 = concatenated_df3.rename(columns={'Country': 'País'})


In [None]:
concatenated_df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56595 entries, 0 to 56594
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   País       56595 non-null  object 
 1   Indicador  56595 non-null  object 
 2   Año        56595 non-null  object 
 3   Valor      39436 non-null  float64
dtypes: float64(1), object(3)
memory usage: 1.7+ MB


In [70]:
concatenated_df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3947 entries, 0 to 3946
Data columns (total 5 columns):
 #   Column                                                                                            Non-Null Count  Dtype  
---  ------                                                                                            --------------  -----  
 0   País                                                                                              3947 non-null   object 
 1   Año                                                                                               3947 non-null   int64  
 2   Proportion of adults who are obese, 20 years old and over ,by sex                                 2835 non-null   float64
 3   Age-standardized prevalence of current tobacco use among persons aged 15 years and older, by sex  504 non-null    float64
 4   Gasto público social según la clasificación por funciones del gobierno                            608 non-null    float64
dtype

In [76]:
import pandas as pd

# Suponiendo que concatenated_df3 es el primer DataFrame y concatenated_df2 es el segundo DataFrame

# Asegúrate de que la columna 'Año' sea del mismo tipo de datos en ambos DataFrames (por ejemplo, int64)
concatenated_df2['Año'] = concatenated_df2['Año'].astype(int)

# Fusiona el segundo DataFrame (concatenated_df2) con el primer DataFrame (concatenated_df3) en base a las columnas 'País' y 'Año'
result = pd.merge(concatenated_df2, concatenated_df3, on=['País', 'Año'])

# Si lo deseas, puedes renombrar la columna 'Indicador' para que coincida con el nombre original
result.rename(columns={'Indicador_y': 'Indicador', 'Valor_y': 'Valor'}, inplace=True)

# Muestra el DataFrame resultante
print(result)


                       País  \
0       Antigua and Barbuda   
1       Antigua and Barbuda   
2       Antigua and Barbuda   
3       Antigua and Barbuda   
4       Antigua and Barbuda   
...                     ...   
153361              Jamaica   
153362              Jamaica   
153363              Jamaica   
153364              Jamaica   
153365              Jamaica   

                                                Indicador   Año   Valor  \
0       Account ownership at a financial institution o...  2016     NaN   
1       Account ownership at a financial institution o...  2016     NaN   
2       Account ownership at a financial institution o...  2016     NaN   
3                    Birth rate, crude (per 1,000 people)  2016  11.659   
4                    Birth rate, crude (per 1,000 people)  2016  11.659   
...                                                   ...   ...     ...   
153361  Diabetes prevalence (% of population ages 20 t...  1990     NaN   
153362  Diabetes prevalence

In [91]:
import pandas as pd

# Suponiendo que concatenated_df3 es el primer DataFrame y concatenated_df2 es el segundo DataFrame

# Asegúrate de que la columna 'Año' sea del mismo tipo de datos en ambos DataFrames (por ejemplo, int64)
concatenated_df3['Año'] = concatenated_df2['Año'].astype(int)

# Fusiona el segundo DataFrame (concatenated_df2) con el primer DataFrame (concatenated_df3) en base a las columnas 'País' y 'Año'
result = pd.merge(concatenated_df2, concatenated_df3, on=['País', 'Año'])

# Si lo deseas, puedes renombrar la columna 'Indicador' para que coincida con el nombre original
result.rename(columns={'Indicador_y': 'Indicador', 'Valor_y': 'Valor'}, inplace=True)

# Muestra el DataFrame resultante
print(result)


                       País  \
0       Antigua and Barbuda   
1       Antigua and Barbuda   
2       Antigua and Barbuda   
3       Antigua and Barbuda   
4       Antigua and Barbuda   
...                     ...   
153361              Jamaica   
153362              Jamaica   
153363              Jamaica   
153364              Jamaica   
153365              Jamaica   

                                                Indicador   Año  Valor  \
0       Account ownership at a financial institution o...  2022    NaN   
1       Account ownership at a financial institution o...  2022    NaN   
2       Account ownership at a financial institution o...  2022    NaN   
3                    Birth rate, crude (per 1,000 people)  2022    NaN   
4                    Birth rate, crude (per 1,000 people)  2022    NaN   
...                                                   ...   ...    ...   
153361  Total alcohol consumption per capita (liters o...  1990    NaN   
153362  Total alcohol consumption p

In [92]:
# Asigna el DataFrame resultante a la variable concatenated_df3
concatenated_df2_df3 = result


In [93]:
concatenated_df2_df3

Unnamed: 0,País,Indicador,Año,Valor,"Proportion of adults who are obese, 20 years old and over ,by sex","Age-standardized prevalence of current tobacco use among persons aged 15 years and older, by sex",Gasto público social según la clasificación por funciones del gobierno
0,Antigua and Barbuda,Account ownership at a financial institution o...,2022,,8.6,,
1,Antigua and Barbuda,Account ownership at a financial institution o...,2022,,18.5,,
2,Antigua and Barbuda,Account ownership at a financial institution o...,2022,,10.8,,
3,Antigua and Barbuda,"Birth rate, crude (per 1,000 people)",2022,,8.6,,
4,Antigua and Barbuda,"Birth rate, crude (per 1,000 people)",2022,,18.5,,
...,...,...,...,...,...,...,...
153361,Jamaica,Total alcohol consumption per capita (liters o...,1990,,24.4,,
153362,Jamaica,Total alcohol consumption per capita (liters o...,1990,,18.7,,
153363,Jamaica,Total alcohol consumption per capita (liters o...,1990,,10.4,,
153364,Jamaica,Total alcohol consumption per capita (liters o...,1990,,,20.1,


In [94]:
# Elimina las últimas 3 columnas
columns_to_remove = [
    'Proportion of adults who are obese, 20 years old and over ,by sex',
    'Age-standardized prevalence of current tobacco use among persons aged 15 years and older, by sex',
    'Gasto público social según la clasificación por funciones del gobierno'
]

concatenated_df2_df3 = concatenated_df2_df3.drop(columns=columns_to_remove)


In [95]:
concatenated_df2_df3

Unnamed: 0,País,Indicador,Año,Valor
0,Antigua and Barbuda,Account ownership at a financial institution o...,2022,
1,Antigua and Barbuda,Account ownership at a financial institution o...,2022,
2,Antigua and Barbuda,Account ownership at a financial institution o...,2022,
3,Antigua and Barbuda,"Birth rate, crude (per 1,000 people)",2022,
4,Antigua and Barbuda,"Birth rate, crude (per 1,000 people)",2022,
...,...,...,...,...
153361,Jamaica,Total alcohol consumption per capita (liters o...,1990,
153362,Jamaica,Total alcohol consumption per capita (liters o...,1990,
153363,Jamaica,Total alcohol consumption per capita (liters o...,1990,
153364,Jamaica,Total alcohol consumption per capita (liters o...,1990,


In [96]:

#Vamos a agrupar los indicadores por países y por año 
# 1. Cambia el tipo de dato de la columna 'Año' a numérico (int)
concatenated_df2_df3['Año'] = pd.to_numeric(concatenated_df2_df3['Año'], errors='coerce')

# 2. Realiza una agregación (por ejemplo, suma) en caso de entradas duplicadas
df_aggregated_df2_df3 = concatenated_df2_df3.groupby(['País', 'Año', 'Indicador'])['Valor'].sum().unstack()

# Esto creará un nuevo DataFrame 'df_aggregated' con cada indicador como una columna individual y las entradas duplicadas se agregan mediante la suma (Ver concepto).


In [97]:
#Exportamos la tabla final de datos crudos a un csv 
nombre_del_archivo = 'df_aggregated_df2_df3.csv'
df_aggregated_df2_df3.to_csv(nombre_del_archivo, index=False)

In [102]:
#Tabla de indicadores por países y año

df_aggregated_df2_df3

Unnamed: 0_level_0,Indicador,Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+),"Birth rate, crude (per 1,000 people)",CO2 emissions (kg per PPP $ of GDP),Capital health expenditure (% of GDP),"Compulsory education, duration (years)",Control of Corruption: Estimate,"Current health expenditure per capita, PPP (current international $)","Death rate, crude (per 1,000 people)",Diabetes prevalence (% of population ages 20 to 79),Domestic general government health expenditure (% of GDP),...,Poverty gap at $6.85 a day (2017 PPP) (%),Pregnant women receiving prenatal care (%),Prevalence of overweight (% of adults),"Prevalence of overweight, weight for height (% of children under 5)",Prevalence of undernourishment (% of population),Rural population (% of total population),"Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)","Unemployment, total (% of total labor force) (modeled ILO estimate)",Urban population,Urban population (% of total population)
País,Año,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Antigua and Barbuda,1990,0.00,41.832,0.602579,0.000000,0.0,0.000000,0.000000,15.332,0.0,0.000000,...,0.0,0.0,67.4,0.0,0.0,129.148,0.0000,0.000,44870.0,70.852
Antigua and Barbuda,1991,0.00,37.008,0.578818,0.000000,0.0,0.000000,0.000000,15.506,0.0,0.000000,...,0.0,0.0,68.8,0.0,0.0,129.070,0.0000,0.000,45136.0,70.930
Antigua and Barbuda,1992,0.00,38.456,0.552112,0.000000,0.0,0.000000,0.000000,15.490,0.0,0.000000,...,0.0,0.0,69.8,0.0,0.0,129.830,0.0000,0.000,45372.0,70.170
Antigua and Barbuda,1993,0.00,37.194,0.520486,0.000000,0.0,0.000000,0.000000,15.332,0.0,0.000000,...,0.0,0.0,71.0,0.0,0.0,130.582,0.0000,0.000,45700.0,69.418
Antigua and Barbuda,1994,0.00,37.980,0.484595,0.000000,0.0,0.000000,0.000000,15.066,0.0,0.000000,...,0.0,0.0,72.2,0.0,0.0,131.332,0.0000,0.000,46058.0,68.668
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Uruguay,2018,0.00,46.420,0.318995,0.806297,56.0,4.829370,8560.667916,39.452,0.0,24.923353,...,4.4,0.0,0.0,45.6,10.0,18.664,0.0000,33.360,13068544.0,381.336
Uruguay,2019,0.00,43.288,0.323461,0.837991,56.0,4.781612,9104.855617,40.228,0.0,25.015905,...,4.4,0.0,0.0,0.0,10.0,18.296,21.8106,34.920,13086376.0,381.704
Uruguay,2020,0.00,41.356,0.325995,0.619265,56.0,5.459737,8442.283732,37.728,0.0,26.269562,...,6.8,0.0,0.0,0.0,10.0,17.940,0.0000,41.320,13101164.0,382.060
Uruguay,2021,370.65,52.365,0.000000,0.000000,70.0,7.931268,0.000000,60.810,45.0,0.000000,...,8.0,0.0,0.0,0.0,0.0,21.985,0.0000,46.450,16378035.0,478.015


In [103]:
#Composición de la tabla, nuevas columnas de indicadores y filas con años/países
df_aggregated_df2_df3.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 891 entries, ('Antigua and Barbuda', 1990) to ('Uruguay', 2022)
Data columns (total 49 columns):
 #   Column                                                                                                           Non-Null Count  Dtype  
---  ------                                                                                                           --------------  -----  
 0   Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+)  891 non-null    float64
 1   Birth rate, crude (per 1,000 people)                                                                             891 non-null    float64
 2   CO2 emissions (kg per PPP $ of GDP)                                                                              891 non-null    float64
 3   Capital health expenditure (% of GDP)                                                                            891 non-null    float64
 

In [105]:
# Para contar tanto los valores nulos (NaN) como los ceros:
nulos_ceros = df_aggregated_df2_df3.isnull().sum() + (df_aggregated_df2_df3 == 0).sum()

print(nulos_ceros) 
print("(",(nulos_ceros/1122)*100,")%")



Indicador
Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+)    815
Birth rate, crude (per 1,000 people)                                                                                27
CO2 emissions (kg per PPP $ of GDP)                                                                                 85
Capital health expenditure (% of GDP)                                                                              401
Compulsory education, duration (years)                                                                             215
Control of Corruption: Estimate                                                                                    243
Current health expenditure per capita, PPP (current international $)                                               321
Death rate, crude (per 1,000 people)                                                                                27
Diabetes prevalence (% of population a

In [106]:
nulos_ceros.head(50)

Indicador
Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+)    815
Birth rate, crude (per 1,000 people)                                                                                27
CO2 emissions (kg per PPP $ of GDP)                                                                                 85
Capital health expenditure (% of GDP)                                                                              401
Compulsory education, duration (years)                                                                             215
Control of Corruption: Estimate                                                                                    243
Current health expenditure per capita, PPP (current international $)                                               321
Death rate, crude (per 1,000 people)                                                                                27
Diabetes prevalence (% of population a

In [108]:
# Eliminamos columnas que tienen más de 65% de nulos de la base del BM
#Tomamos nota de las variables qeu eliminamos, para tomarls de fuenes externas
#Dejamos aún algunas columnas que tiene muchos nulos evidentes, y dejamos aquellos que podrían mejorar luego de la limpieza de datos.
# Lista de palabras clave para las columnas a eliminar
palabras_a_eliminar = [
    "Account ownership",
    "Educational attainment,",
    "Educational attainment,",
    "Infant and young",
    "Literacy rate,",
    "pure alcohol"
]

# Encuentra y elimina las columnas que contienen alguna de las palabras clave
columnas_a_eliminar = [col for col in df_aggregated_df2_df3.columns if any(palabra in col for palabra in palabras_a_eliminar)]
df_filtrado_nulos = df_aggregated_df2_df3.drop(columns=columnas_a_eliminar)




In [None]:
#Eliminar Columna indicador Diabetes en niños

columna_a_eliminar = next((col for col in df_filtrado_nulos.columns if col.startswith("Diabetes prevalence") and "child" in col), None)

if columna_a_eliminar is not None:
    df_filtrado_nulos = df_filtrado_nulos.drop(columns=columna_a_eliminar)
     


In [109]:
df_filtrado_nulos.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 891 entries, ('Antigua and Barbuda', 1990) to ('Uruguay', 2022)
Data columns (total 43 columns):
 #   Column                                                                                        Non-Null Count  Dtype  
---  ------                                                                                        --------------  -----  
 0   Birth rate, crude (per 1,000 people)                                                          891 non-null    float64
 1   CO2 emissions (kg per PPP $ of GDP)                                                           891 non-null    float64
 2   Capital health expenditure (% of GDP)                                                         891 non-null    float64
 3   Compulsory education, duration (years)                                                        891 non-null    float64
 4   Control of Corruption: Estimate                                                               891 non-null    

In [110]:
#La base de datos mantiene aún muchos nulos en datos de países de las islas del Caribe.
#Para aumentar la calidad de los datos, vamos a quitar del análisis los países con una población menor a 2 millones de personas, que básicamente son las islas mencionadas más pequeñas.
#Umbral = 2000000





# Define un umbral de población (2 millones en este caso)
umbral_población = 2000000

# Filtra los países con población mayor al umbral
df_filtrado_nulos_Sin_Caribe = df_filtrado_nulos.groupby('País').filter(lambda x: x['Population, total'].max() > umbral_población)




In [None]:
df_filtrado_nulos_Sin_Caribe.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 759 entries, ('Argentina', 1990) to ('Venezuela, RB', 2022)
Data columns (total 44 columns):
 #   Column                                                                                                Non-Null Count  Dtype  
---  ------                                                                                                --------------  -----  
 0   Birth rate, crude (per 1,000 people)                                                                  759 non-null    float64
 1   CO2 emissions (kg per PPP $ of GDP)                                                                   759 non-null    float64
 2   Capital health expenditure (% of GDP)                                                                 759 non-null    float64
 3   Compulsory education, duration (years)                                                                759 non-null    float64
 4   Control of Corruption: Estimate                                   

In [111]:
df_filtrado_nulos_Sin_Caribe.head()

Unnamed: 0_level_0,Indicador,"Birth rate, crude (per 1,000 people)",CO2 emissions (kg per PPP $ of GDP),Capital health expenditure (% of GDP),"Compulsory education, duration (years)",Control of Corruption: Estimate,"Current health expenditure per capita, PPP (current international $)","Death rate, crude (per 1,000 people)",Diabetes prevalence (% of population ages 20 to 79),Domestic general government health expenditure (% of GDP),"Domestic general government health expenditure per capita, PPP (current international $)",...,"Population, total",Poverty gap at $6.85 a day (2017 PPP) (%),Pregnant women receiving prenatal care (%),Prevalence of overweight (% of adults),"Prevalence of overweight, weight for height (% of children under 5)",Prevalence of undernourishment (% of population),Rural population (% of total population),"Unemployment, total (% of total labor force) (modeled ILO estimate)",Urban population,Urban population (% of total population)
País,Año,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Argentina,1990,87.956,1.711437,0.0,0.0,0.0,0.0,30.972,0.0,0.0,0.0,...,130550628.0,0.0,0.0,194.8,0.0,0.0,52.064,0.0,113558160.0,347.936
Argentina,1991,87.376,1.601483,0.0,0.0,0.0,0.0,30.144,0.0,0.0,0.0,...,132423052.0,18.0,0.0,197.2,0.0,0.0,50.688,21.76,115642404.0,349.312
Argentina,1992,86.732,1.478251,0.0,0.0,0.0,0.0,30.38,0.0,0.0,0.0,...,134273140.0,20.0,0.0,199.6,0.0,0.0,49.832,25.44,117545392.0,350.168
Argentina,1993,86.28,1.363379,0.0,0.0,0.0,0.0,30.524,0.0,0.0,0.0,...,136108960.0,22.0,380.0,202.0,0.0,0.0,48.992,40.4,119438336.0,351.008
Argentina,1994,85.676,1.280178,0.0,0.0,0.0,0.0,29.612,0.0,0.0,0.0,...,137954784.0,22.8,0.0,204.4,44.4,0.0,48.16,47.04,121345028.0,351.84


In [112]:
# Especifica el nombre del archivo CSV sin la ruta completa
nombre_archivo_csv = 'Tabla_sin_caribe.csv'

# Exporta el DataFrame a un archivo CSV en la misma carpeta
df_filtrado_nulos_Sin_Caribe.to_csv(nombre_archivo_csv)
