# Narración Gráfica
Esta base de datos contiene la cantidad de embarazos no deseados cada 1000 mujeres, en un periodo de estimación de 5 años.  
Primero descargamos la base de datos, extraída del sitio de la [World Health Organization (WHO)](https://www.who.int/data/gho/data/indicators/indicator-details/GHO/SRH_PREGNANCY_UNINTENDED_RATE).

In [1]:
!gdown https://drive.google.com/uc?id=1YrzlwaDBBBvqLckUiiolt-TVdV7wSgxo

Downloading...
From: https://drive.google.com/uc?id=1YrzlwaDBBBvqLckUiiolt-TVdV7wSgxo
To: /content/BASE DE DATOS NACIMIENTOS NO DESEADOS.csv
  0% 0.00/77.6k [00:00<?, ?B/s]100% 77.6k/77.6k [00:00<00:00, 17.3MB/s]


Importamos las librerías necesarias para poder ver los datos.

In [2]:
# Pandas y Numpy
import pandas as pd
import numpy as np

Cargamos la base de datos como tal.

In [3]:
df = pd.read_csv('BASE DE DATOS NACIMIENTOS NO DESEADOS.csv', sep=',')
print("Cantidad de datos y variables :" + str(df.shape))
df.head()

Cantidad de datos y variables :(288, 34)


Unnamed: 0,IndicatorCode,Indicator,ValueType,ParentLocationCode,ParentLocation,Location type,SpatialDimValueCode,Location,Period type,Period,...,FactValueUoM,FactValueNumericLowPrefix,FactValueNumericLow,FactValueNumericHighPrefix,FactValueNumericHigh,Value,FactValueTranslationID,FactComments,Language,DateModified
0,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,EMR,Eastern Mediterranean,Country,SOM,Somalia,Year,2015-2019,...,,,65,,160,100 [65-160],,,EN,2023-04-26T04:00:00.000Z
1,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,TGO,Togo,Year,2015-2019,...,,,74,,144,100 [74-144],,,EN,2023-04-26T04:00:00.000Z
2,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,EMR,Eastern Mediterranean,Country,SOM,Somalia,Year,2015-2019,...,,,76,,135,100 [76-135],,,EN,2023-04-26T04:00:00.000Z
3,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,TGO,Togo,Year,2015-2019,...,,,81,,126,100 [81-126],,,EN,2023-04-26T04:00:00.000Z
4,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,CPV,Cabo Verde,Year,2015-2019,...,,,62,,183,102 [62-183],,,EN,2023-04-26T04:00:00.000Z


Se puede ver que hay varias columnas que están con datos nulos, para esto eliminaremos aquellas columnas.

In [4]:
df = df.dropna(axis=1, how='any')
df.head()

Unnamed: 0,IndicatorCode,Indicator,ValueType,ParentLocationCode,ParentLocation,Location type,SpatialDimValueCode,Location,Period type,Period,IsLatestYear,Dim1 type,Dim1,Dim1ValueCode,FactValueNumeric,FactValueNumericLow,FactValueNumericHigh,Value,Language,DateModified
0,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,EMR,Eastern Mediterranean,Country,SOM,Somalia,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,99.65,65,160,100 [65-160],EN,2023-04-26T04:00:00.000Z
1,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,TGO,Togo,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,99.96,74,144,100 [74-144],EN,2023-04-26T04:00:00.000Z
2,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,EMR,Eastern Mediterranean,Country,SOM,Somalia,Year,2015-2019,True,Uncertainty interval,80% uncertainty,UNCERTAINTY_INTERVAL_UI80,99.65,76,135,100 [76-135],EN,2023-04-26T04:00:00.000Z
3,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,TGO,Togo,Year,2015-2019,True,Uncertainty interval,80% uncertainty,UNCERTAINTY_INTERVAL_UI80,99.96,81,126,100 [81-126],EN,2023-04-26T04:00:00.000Z
4,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,CPV,Cabo Verde,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,101.8,62,183,102 [62-183],EN,2023-04-26T04:00:00.000Z


Seleccionamos solo los datos que son con intervalo de confianza de 95%, al tener mayor probabilidad de tener el valor real dentro de dicho intervalo.

In [5]:
df = df[df['Dim1'] == '95% uncertainty']
df.head()

Unnamed: 0,IndicatorCode,Indicator,ValueType,ParentLocationCode,ParentLocation,Location type,SpatialDimValueCode,Location,Period type,Period,IsLatestYear,Dim1 type,Dim1,Dim1ValueCode,FactValueNumeric,FactValueNumericLow,FactValueNumericHigh,Value,Language,DateModified
0,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,EMR,Eastern Mediterranean,Country,SOM,Somalia,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,99.65,65,160,100 [65-160],EN,2023-04-26T04:00:00.000Z
1,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,TGO,Togo,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,99.96,74,144,100 [74-144],EN,2023-04-26T04:00:00.000Z
4,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,CPV,Cabo Verde,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,101.8,62,183,102 [62-183],EN,2023-04-26T04:00:00.000Z
6,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,GHA,Ghana,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,102.0,78,146,102 [78-146],EN,2023-04-26T04:00:00.000Z
8,SRH_PREGNANCY_UNINTENDED_RATE,Unintended pregnancy rate (model-estimated),numeric,AFR,Africa,Country,NAM,Namibia,Year,2015-2019,True,Uncertainty interval,95% uncertainty,UNCERTAINTY_INTERVAL_UI95,104.1,78,161,104 [78-161],EN,2023-04-26T04:00:00.000Z


Filtramos los países primero solo por paises europeos y americanos, dado que ocuparemos países europeos desarrollados y de latinoamérica.

In [6]:
df_continentes = df[df['ParentLocationCode'].isin(['AMR', 'EUR'])]
df_continentes.head()
codigos_paises = df_continentes['SpatialDimValueCode'].unique()
paises = df_continentes['Location'].unique()
print('Códigos países diferentes:')
print(codigos_paises)
print('Países diferentes:')
print(paises)

Códigos países diferentes:
['BOL' 'HTI' 'MNE' 'ALB' 'NLD' 'BLR' 'BIH' 'ESP' 'CHE' 'BEL' 'SRB' 'PRT'
 'DEU' 'TKM' 'MKD' 'MDA' 'LTU' 'UKR' 'HUN' 'SVK' 'ITA' 'UZB' 'POL' 'FRA'
 'BGR' 'DNK' 'CAN' 'GRC' 'HRV' 'SVN' 'ISL' 'FIN' 'LVA' 'NOR' 'CZE' 'USA'
 'SWE' 'GBR' 'TJK' 'EST' 'KGZ' 'URY' 'ROU' 'CRI' 'ARM' 'PRI' 'NIC' 'KAZ'
 'SLV' 'MEX' 'COL' 'GTM' 'RUS' 'PRY' 'BRA' 'SUR' 'HND' 'LCA' 'ARG' 'CHL'
 'BLZ' 'CUB' 'PAN' 'GUY' 'BRB' 'ECU' 'TTO' 'DOM' 'JAM' 'PER' 'AZE' 'GEO']
Países diferentes:
['Bolivia (Plurinational State of)' 'Haiti' 'Montenegro' 'Albania'
 'Netherlands (Kingdom of the)' 'Belarus' 'Bosnia and Herzegovina' 'Spain'
 'Switzerland' 'Belgium' 'Serbia' 'Portugal' 'Germany' 'Turkmenistan'
 'North Macedonia' 'Republic of Moldova' 'Lithuania' 'Ukraine' 'Hungary'
 'Slovakia' 'Italy' 'Uzbekistan' 'Poland' 'France' 'Bulgaria' 'Denmark'
 'Canada' 'Greece' 'Croatia' 'Slovenia' 'Iceland' 'Finland' 'Latvia'
 'Norway' 'Czechia' 'United States of America' 'Sweden'
 'United Kingdom of Great Britain

Ahora filtramos solo los paises que nos interesan, los cuales son:
EUROPA:
- Francia
- Alemania
- Italia
- Suiza
- Finlandia
- Holanda
- Reino Unido
- España
- Portugal
- Dinamarca
- Grecia
- Noruega

LATINOAMÉRICA:
- Argentina
- Bolivia
- Chile
- Colombia
- Ecuador
- Uruguay
- Perú
- República Dominicana
- Cuba

In [7]:
paises_filtro = ['FRA','DEU', 'ITA', 'CHE', 'FIN', 'NLD', 'GBR', 'ESP', 'PRT', 'DNK', 'GRC', 'NOR',
                 'ARG', 'BOL', 'CHL', 'COL', 'ECU', 'URY', 'PER', 'DOM', 'CUB']
df_paises = df_continentes[df_continentes['SpatialDimValueCode'].isin(paises_filtro)]
paises = df_paises['Location'].unique()
print('Países filtrados:')
print(paises)

Países filtrados:
['Bolivia (Plurinational State of)' 'Netherlands (Kingdom of the)' 'Spain'
 'Switzerland' 'Portugal' 'Germany' 'Italy' 'France' 'Denmark' 'Greece'
 'Finland' 'Norway' 'United Kingdom of Great Britain and Northern Ireland'
 'Uruguay' 'Colombia' 'Argentina' 'Chile' 'Cuba' 'Ecuador'
 'Dominican Republic' 'Peru']


Ahora tomamos solo las columnas que nos sirven más y cambiamos los nombres de las columnas para poder tener la información más clara y concisa.

In [8]:
columnas_seleccionadas = ['Indicator', 'Location', 'FactValueNumeric', 'FactValueNumericLow', 'FactValueNumericHigh']
df_final = df_paises.filter(items=columnas_seleccionadas)
df_final=df_final.rename(columns={"Indicator": "Indicador",
                      "Location":"Pais",
                      "FactValueNumeric":"Valor_Estimado",
                      "FactValueNumericLow":"Limite_Inferior",
                      "FactValueNumericHigh":"Limite_Superior"})
df_final

Unnamed: 0,Indicador,Pais,Valor_Estimado,Limite_Inferior,Limite_Superior
10,Unintended pregnancy rate (model-estimated),Bolivia (Plurinational State of),104.7,73,158
44,Unintended pregnancy rate (model-estimated),Netherlands (Kingdom of the),17.53,14,24
52,Unintended pregnancy rate (model-estimated),Spain,20.26,17,26
53,Unintended pregnancy rate (model-estimated),Switzerland,20.31,17,27
62,Unintended pregnancy rate (model-estimated),Portugal,20.9,18,25
65,Unintended pregnancy rate (model-estimated),Germany,21.29,19,27
80,Unintended pregnancy rate (model-estimated),Italy,28.32,19,45
85,Unintended pregnancy rate (model-estimated),France,28.98,24,37
90,Unintended pregnancy rate (model-estimated),Denmark,29.8,25,41
96,Unintended pregnancy rate (model-estimated),Greece,32.97,23,54


Por último, realizamos el guardado de la base datos limpia, para descargarla posteriormente.

In [10]:
from google.colab import files
df_final.to_csv('base_limpia.csv', index=False)
files.download('base_limpia.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>