# Pair Unión y Limpieza de Datos


### Ejercicios Unión de Datos

Para realizar este pair programming deberéis usar el conjunto de datos de world-data-2023-part1.csv y el de world-data-2023-part2.csv.

Hoy tenemos un dataset nuevo, en este encontraremos las siguientes columnas:

-GDP: Producto Interno Bruto, el valor total de bienes y servicios producidos en el país.

-Gross primary education enrollment (%): Tasa bruta de matriculación en educación primaria.

-Gross tertiary education enrollment (%): Tasa bruta de matriculación en educación terciaria.

-Infant mortality: Número de muertes por cada 1,000 nacidos vivos antes de cumplir un año de edad.

-Largest city: Nombre de la ciudad más grande del país.

-Life expectancy: Número promedio de años que se espera que viva un recién nacido.

-Maternal mortality ratio: Número de muertes maternas por cada 100,000 nacidos vivos.

-Minimum wage: Nivel de salario mínimo en moneda local.

-Official language: Idioma(s) oficial(es) hablado(s) en el país.

-Out of pocket health expenditure: Porcentaje del gasto total en salud pagado directamente por individuos.

-Physicians per thousand: Número de médicos por cada mil personas.

-Population: Población total del país.

-Population: Labor force participation (%): Porcentaje de la población que forma parte de la fuerza laboral.

-Tax revenue (%): Ingresos fiscales como porcentaje del PIB.

-Total tax rate: Carga fiscal total como porcentaje de las ganancias comerciales.

-Unemployment rate: Porcentaje de la fuerza laboral que está desempleada.

-Urban_population: Porcentaje de la población que vive en áreas urbanas.

-coordinates: Coordenadas de latitud y longitud de la ubicación del país.

-country: Nombre del país.

Tienes a tu disposición dos conjuntos de datos, "world-data-2023-part1.csv" y "world-data-2023-part2.csv", que contienen información de una serie de indicadores y datos de distintos países. Tu tarea es explorar estos conjuntos de datos y determinar qué tienen en común en términos de columnas y datos.

Luego, debes crear un nuevo DataFrame que combine la información de ambos conjuntos de datos en un solo conjunto de datos. Para hacerlo, debes seleccionar el método de unión de Pandas que consideres más apropiado para esta situación y justificar por qué crees que ese método es el mejor en tu informe.

Asegúrate de realizar los siguientes pasos:

-Explora y carga ambos conjuntos de datos en pandas DataFrames.

-Identifica las columnas comunes entre los dos conjuntos de datos.

-Utiliza el método de unión de Pandas que consideres más adecuado para combinar los datos de ambos años en un solo DataFrame.

-Explica por qué elegiste ese método de unión y cómo se llevaron a cabo los pasos anteriores.

### Ejercicios de Limpieza

-Después de la unión de datos, tenemos dos columnas de "country". Elimina una de ellas.

-Los nombres de las columnas no son homogeneos. Cambia los nombres de las columnas de tal forma que:

--No tengan espacios.

--Estén en minúscula.

--No tengan paréntesis, es decir, quitar "(%)", "(Km2)".

-Algunas columnas tiene "\n". Eliminalos de los nombres de las columnas.

-Algunas columnas tienen ":". Eliminalos de los nombres de las columnas.

-La columnas coordinates tiene la latitud y la longitud en una sola columna. Crea dos columnas nuevas, una con la longitud y otra con la latitud. Una vez hecho, elimina la columna de coordinates.

-Las columnas unemployment_rate, total_tax_rate, tax_revenue, population_labor_force_participation, out_of_pocket_health_expenditure, gross_tertiary_education_enrollment, gross_primary_education_enrollment, forested_area, cpi_change, agricultural_land tienen "%". Elimina los "%" de los valores de las columnas.

-Haz lo mismo para las columnas de gasoline_price, gdp, minimum_wage, pero eliminando "$".

-Guarda el DataFrame para usarlo en el pairprogramming de mañana.

### Ejercicios de Filtrado

-Encuentra todos los países cuya mortalidad infantil esté entre 40 y 50 personas por kilómetro cuadrado.

-Encuentra los países cuyas tasas de natalidad son mayores o iguales a 20 y su esperanza de vida es mayor de 75 años.

-Encuentra las ciudades cuyos paises contienen la palabra "la" en su nombre.

-Encuentra los países cuyos medicos por cada 1000 habitantes (physicians_per_thousand) sea mayores de 5.

-Encuentra los países cuyatasa de fertilidad sea mayor a 6.

-Encuentra los países cuya moneda es el euro (EUR) y tienen una tasa de natalidad superior al promedio.

-Encuentra los países cuyas tasas de mortalidad infantil son superiores a 70.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)

In [2]:
df1 = pd.read_csv("world-data-2023_part1.csv", index_col = 0)
df2 = pd.read_csv("world-data-2023_part2.csv", index_col = 0)

In [3]:
df1.head(2)

Unnamed: 0,Country,Density\n(P/Km2),Abbreviation,Agricultural Land( %),Land Area(Km2),Armed Forces size,Birth Rate,Calling Code,Capital/Major City,Co2-Emissions,CPI,CPI Change (%),Currency-Code,Fertility Rate,Forested Area (%),Gasoline Price
0,Afghanistan,60,AF,58.10%,652230,323000,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70
1,Albania,105,AL,43.10%,28748,9000,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36


In [4]:
df2.head(2)

Unnamed: 0,GDP,Gross primary education enrollment (%),Gross tertiary education enrollment (%),Infant mortality,Largest city,Life expectancy,Maternal mortality ratio,Minimum wage,Official language,Out of pocket health expenditure,Physicians per thousand,Population,Population: Labor force participation (%),Tax revenue (%),Total tax rate,Unemployment rate,Urban_population,country,coordinates
0,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,Afghanistan,"('33.93911 ', '67.709953')"
1,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,Albania,"('41.153332 ', '20.168331')"


In [5]:
df_final = pd.concat([df1,df2], axis=1)

In [6]:
df_final.head()

Unnamed: 0,Country,Density\n(P/Km2),Abbreviation,Agricultural Land( %),Land Area(Km2),Armed Forces size,Birth Rate,Calling Code,Capital/Major City,Co2-Emissions,CPI,CPI Change (%),Currency-Code,Fertility Rate,Forested Area (%),Gasoline Price,GDP,Gross primary education enrollment (%),Gross tertiary education enrollment (%),Infant mortality,Largest city,Life expectancy,Maternal mortality ratio,Minimum wage,Official language,Out of pocket health expenditure,Physicians per thousand,Population,Population: Labor force participation (%),Tax revenue (%),Total tax rate,Unemployment rate,Urban_population,country,coordinates
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,Afghanistan,"('33.93911 ', '67.709953')"
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,Albania,"('41.153332 ', '20.168331')"
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,Algeria,"('28.033886 ', '1.659626')"
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,Andorra,"('42.506285 ', '1.521801')"
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,Angola,"('-11.202692 ', '17.873887')"


In [7]:
df_final.shape

(195, 35)

# Ejercicios de limpieza

In [10]:
# 1. Después de la unión de datos, tenemos dos columnas de "country". Elimina una de ellas. 
df_final.drop("country", axis = 1, inplace = True)
df_final.head()

Unnamed: 0,Country,Density\n(P/Km2),Abbreviation,Agricultural Land( %),Land Area(Km2),Armed Forces size,Birth Rate,Calling Code,Capital/Major City,Co2-Emissions,CPI,CPI Change (%),Currency-Code,Fertility Rate,Forested Area (%),Gasoline Price,GDP,Gross primary education enrollment (%),Gross tertiary education enrollment (%),Infant mortality,Largest city,Life expectancy,Maternal mortality ratio,Minimum wage,Official language,Out of pocket health expenditure,Physicians per thousand,Population,Population: Labor force participation (%),Tax revenue (%),Total tax rate,Unemployment rate,Urban_population,coordinates
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')"
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')"
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')"
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')"
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')"


In [11]:
# 2. Los nombres de las columnas no son homogeneos. Cambia los nombres de las columnas de tal forma que: 

columnas = df_final.columns
nuevas_columnas = {col: col.split("(")[0].strip().replace(" ", "_").replace("\n", "").replace(":", "").lower() for col in columnas}
df_final.rename(columns=nuevas_columnas, inplace = True)

In [12]:
df_final.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')"
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')"
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')"
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')"
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')"


In [13]:
# Crea dos columnas nuevas, una con la longitud y otra con la latitud. Una vez hecho, elimina la columna de coordinates. 
df_final[["latitude", "longitude"]] = df_final["coordinates"].str.split(",", expand = True)
df_final.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates,latitude,longitude
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')",('33.93911 ','67.709953')
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')",('41.153332 ','20.168331')
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')",('28.033886 ','1.659626')
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')",('42.506285 ','1.521801')
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')",('-11.202692 ','17.873887')


In [14]:
df_final["latitude"] = df_final["latitude"].str.replace("(", "").str.replace("'", "")
df_final["longitude"] = df_final["longitude"].str.replace(")", "").str.replace("'", "")

In [15]:
df_final.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates,latitude,longitude
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')",33.93911,67.709953
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')",41.153332,20.168331
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')",28.033886,1.659626
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')",42.506285,1.521801
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')",-11.202692,17.873887


In [16]:
df_final.drop("coordinates", axis = 1, inplace = True)
df_final.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,33.93911,67.709953
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,41.153332,20.168331
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,28.033886,1.659626
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,42.506285,1.521801
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,-11.202692,17.873887


In [17]:
# Elimina los "%" de los valores de las columnas. 
reemplazar = ["unemployment_rate", "total_tax_rate", "tax_revenue", "population_labor_force_participation", "out_of_pocket_health_expenditure", "gross_tertiary_education_enrollment", "gross_primary_education_enrollment", "forested_area", "cpi_change",   "agricultural_land"]

for col in reemplazar: 
    df_final[col] = df_final[col].str.replace("%", "")

In [18]:
df_final.head(1)

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
0,Afghanistan,60,AF,58.1,652230,323000,32.49,93.0,Kabul,8672,149.9,2.3,AFN,4.47,2.1,$0.70,"$19,101,353,833",104.0,9.7,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.4,0.28,38041754,48.9,9.3,71.4,11.12,9797273,33.93911,67.709953


In [19]:
# 5. Haz lo mismo para las columnas de `gasoline_price`, `gdp`, `minimum_wage`, pero eliminando "$". 
reemplazar2 = ["gasoline_price", "gdp", "minimum_wage"]

for col in reemplazar2:
    df_final[col] = df_final[col].str.replace("$", "")

In [20]:
df_final.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
0,Afghanistan,60,AF,58.1,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.3,AFN,4.47,2.1,0.7,19101353833,104.0,9.7,47.9,Kabul,64.5,638.0,0.43,Pashto,78.4,0.28,38041754,48.9,9.3,71.4,11.12,9797273,33.93911,67.709953
1,Albania,105,AL,43.1,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.4,ALL,1.62,28.1,1.36,15278077447,107.0,55.0,7.8,Tirana,78.5,15.0,1.12,Albanian,56.9,1.2,2854191,55.7,18.6,36.6,12.33,1747593,41.153332,20.168331
2,Algeria,18,DZ,17.4,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.0,DZD,3.02,0.8,0.28,169988236398,109.9,51.4,20.1,Algiers,76.7,112.0,0.95,Arabic,28.1,1.72,43053054,41.2,37.2,66.1,11.7,31510100,28.033886,1.659626
3,Andorra,164,AD,40.0,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.0,1.51,3154057987,106.4,,2.7,Andorra la Vella,,,6.63,Catalan,36.4,3.33,77142,,,,,67873,42.506285,1.521801
4,Angola,26,AO,47.5,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.1,AOA,5.52,46.3,0.97,94635415870,113.5,9.3,51.6,Luanda,60.8,241.0,0.71,Portuguese,33.4,0.21,31825295,77.5,9.2,49.1,6.89,21061025,-11.202692,17.873887


In [21]:
df_final.to_csv("files/world_data_full.csv")

# Ejercicio de Filtrado

In [22]:
# 1. Encuentra todos los países cuya mortalidad infantil esté entre 40 y 50 personas por kilómetro cuadrado.
df_final[df_final["infant_mortality"].between(40,50)]

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
0,Afghanistan,60,AF,58.1,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.3,AFN,4.47,2.1,0.7,19101353833,104.0,9.7,47.9,Kabul,64.5,638.0,0.43,Pashto,78.4,0.28,38041754,48.9,9.3,71.4,11.12,9797273,33.93911,67.709953
26,Burkina Faso,76,BF,44.2,274200,11000.0,37.93,226.0,Ouagadougou,3418,106.58,-3.2,XOF,5.19,19.3,0.98,15745810235,96.1,6.5,49.0,Ouagadougou,61.2,320.0,0.34,French,36.1,0.08,20321378,66.4,15.0,41.3,6.26,6092349,12.238333,-1.561593
27,Burundi,463,BI,79.2,27830,31000.0,39.01,257.0,Bujumbura,495,182.11,-0.7,BIF,5.41,10.9,1.21,3012334882,121.4,6.1,41.0,Bujumbura,61.2,548.0,,Kirundi,19.1,0.1,11530580,79.2,13.6,41.2,1.43,1541177,-3.373056,29.918886
47,Djibouti,43,DJ,73.4,23200,13000.0,21.47,253.0,Djibouti City,620,120.25,3.3,DJF,2.73,0.2,1.32,3318716359,75.3,5.3,49.8,Djibouti City,66.6,248.0,,French,20.4,0.22,973560,60.2,,37.9,10.3,758549,11.825138,42.590275
72,Haiti,414,HT,66.8,27750,0.0,24.35,509.0,Port-au-Prince,2978,179.29,12.5,HTG,2.94,3.5,0.81,8498981821,113.6,1.1,49.5,Port-au-Prince,63.7,480.0,0.25,French,36.3,0.23,11263077,67.2,,42.7,13.78,6328948,18.971187,-72.285215
89,Kiribati,147,KI,42.0,811,,27.89,686.0,South Tarawa,66,99.55,0.6,AUD,3.57,15.0,,194647202,101.3,,41.2,South Tarawa,68.1,92.0,,English,0.2,0.2,117606,,22.0,32.7,,64489,1.8368976,-157.3768317
125,Niger,19,NE,36.1,1267000,10000.0,46.08,227.0,Niamey,2017,109.32,-2.5,XOF,6.91,0.9,0.88,12928145120,74.7,4.4,48.0,Niamey,62.0,509.0,0.29,French,52.3,0.04,23310715,72.0,11.8,47.2,0.47,3850231,17.607789,8.081666
166,Sudan,25,SD,28.7,1861484,124000.0,32.18,249.0,Khartoum,20000,1344.19,51.0,SDG,4.41,8.1,0.95,18902284476,76.8,16.9,42.1,Omdurman,65.1,295.0,0.41,Arabic,63.2,0.26,42813238,48.4,8.0,45.4,16.53,14957233,12.862807,30.217636
175,Togo,152,TG,70.2,56785,10000.0,33.11,228.0,Lom�,3000,113.3,0.7,XOF,4.32,3.1,0.71,5459979417,123.8,14.5,47.4,Lom�,60.8,396.0,0.34,French,51.0,0.08,8082366,77.6,16.9,48.2,2.04,3414638,8.619543,0.824782
192,Yemen,56,YE,44.6,527968,40000.0,30.45,967.0,Sanaa,10609,157.58,8.1,YER,3.79,1.0,0.92,26914402224,93.6,10.2,42.9,Sanaa,66.1,164.0,,Arabic,81.0,0.31,29161922,38.0,,26.6,12.91,10869523,15.552727,48.516388


In [23]:
# 2. Encuentra los países cuyas tasas de natalidad son mayores o iguales a 3 y su esperanza de vida es mayor de 75 años.
condicion1 = df_final["birth_rate"] >= 20
condicion2 = df_final["life_expectancy"] > 75
df_final[condicion1 & condicion2]

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
2,Algeria,18,DZ,17.4,2381741,317000,24.28,213.0,Algiers,150006,151.36,2.0,DZD,3.02,0.8,0.28,169988236398,109.9,51.4,20.1,Algiers,76.7,112.0,0.95,Arabic,28.1,1.72,43053054,41.2,37.2,66.1,11.7,31510100,28.033886,1.659626
74,Honduras,89,HN,28.9,112090,23000,21.6,504.0,Tegucigalpa,9813,150.34,4.4,HNL,2.46,40.0,0.98,25095395475,91.5,26.2,15.1,Tegucigalpa,75.1,65.0,1.01,Spanish,49.1,0.31,9746117,68.8,17.3,39.1,5.39,5626433,15.199999,-86.241905
82,Israel,400,IL,24.6,20770,178000,20.8,972.0,Jerusalem,65166,108.15,0.8,ILS,3.09,7.7,1.57,395098666122,104.9,63.4,3.0,Jerusalem,82.8,3.0,7.58,Hebrew,24.4,4.62,9053300,64.0,23.1,25.3,3.86,8374393,31.046051,34.851612


In [24]:
# 3. Encuentra las ciudades cuyos paises contienen la palabra "la" en su nombre.
df_final[df_final["country"].str.contains("la")]

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
4,Angola,26,AO,47.5,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.1,AOA,5.52,46.3,0.97,94635415870,113.5,9.3,51.6,Luanda,60.8,241.0,0.71,Portuguese,33.4,0.21,31825295,77.5,9.2,49.1,6.89,21061025,-11.202692,17.873887
13,Bangladesh,1265,BD,70.6,148460,221000.0,18.18,880.0,Dhaka,84246,179.68,5.6,BDT,2.04,11.0,1.12,302571254131,116.5,20.6,25.1,Dhaka,72.3,173.0,0.51,Bengali,71.8,0.58,167310838,59.0,8.8,33.4,4.19,60987417,23.684994,90.356331
15,Belarus,47,BY,42.0,207600,155000.0,9.9,375.0,Minsk,58280,,5.6,BYN,1.45,42.6,0.6,63080457023,100.5,87.4,2.6,Minsk,74.2,2.0,1.49,Russian,34.5,5.19,9466856,64.1,14.7,53.3,4.59,7482982,53.709807,27.953389
59,Finland,18,FI,7.5,338145,25000.0,8.6,358.0,Helsinki,45871,112.33,1.0,EUR,1.41,73.1,1.45,268761201365,100.2,88.2,1.4,Helsinki,81.7,3.0,,Swedish,19.9,3.81,5520314,59.1,20.8,36.6,6.59,4716888,61.92411,25.748151
68,Guatemala,167,GT,36.0,108889,43000.0,24.56,502.0,Guatemala City,16777,142.92,3.7,GTQ,2.87,32.7,0.79,76710385880,101.9,21.8,22.1,Guatemala City,74.1,95.0,1.6,Spanish,55.8,0.35,16604026,62.3,10.6,35.2,2.46,8540945,15.783471,-90.230759
76,Iceland,3,IS,18.7,103000,0.0,12.0,354.0,Reykjav��,2065,129.0,3.0,ISK,1.71,0.5,1.69,24188035739,100.4,71.8,1.5,Reykjav��,82.7,4.0,,Icelandic,17.0,4.08,361313,75.0,23.3,31.9,2.84,339110,64.963051,-19.020835
81,Republic of Ireland,72,,64.5,70273,9000.0,12.5,353.0,Dublin,37711,106.58,0.9,EUR,1.75,11.0,1.37,388698711348,100.9,77.8,3.1,Connacht,82.3,5.0,10.79,Irish,15.2,3.31,5007069,62.1,18.3,26.1,4.93,3133123,53.41291,-8.24389
102,Malawi,203,MW,61.4,118484,15000.0,34.12,265.0,Lilongwe,1298,418.34,9.4,MWK,4.21,33.2,1.15,7666704427,142.5,0.8,35.3,Lilongwe,63.8,349.0,0.12,English,11.0,0.04,18628747,76.7,17.3,34.5,5.65,3199301,-13.254308,34.301525
103,Malaysia,99,MY,26.3,329847,136000.0,16.75,60.0,Kuala Lumpur,248289,121.46,0.7,MYR,2.0,67.6,0.45,364701517788,105.3,45.1,6.7,Johor Bahru,76.0,29.0,0.93,Malaysian language,36.7,1.51,32447385,64.3,12.0,38.7,3.32,24475766,4.210484,101.975766
107,Marshall Islands,329,MH,63.9,181,,29.03,692.0,Majuro,143,,,USD,4.05,70.2,1.44,221278000,84.7,23.7,27.4,Majuro,65.2,,2.0,Marshallese,10.0,0.42,58791,,17.8,65.9,,45514,7.131474,171.184478


In [25]:
# 4. Encuentra los países cuyos medicos por cada 1000 habitantes (physicians_per_thousand) sea mayores de 5. 
df_final[df_final["physicians_per_thousand"] > 5]

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
9,Austria,109,AT,32.4,83871,21000.0,9.7,43.0,Vienna,61448.0,118.06,1.5,EUR,1.47,46.9,1.2,446314739528,103.1,85.1,2.9,Vienna,81.6,5.0,,German,17.9,5.17,8877067,60.7,25.4,51.4,4.67,5194416,47.516231,14.550072
15,Belarus,47,BY,42.0,207600,155000.0,9.9,375.0,Minsk,58280.0,,5.6,BYN,1.45,42.6,0.6,63080457023,100.5,87.4,2.6,Minsk,74.2,2.0,1.49,Russian,34.5,5.19,9466856,64.1,14.7,53.3,4.59,7482982,53.709807,27.953389
42,Cuba,106,CU,59.9,110860,76000.0,10.17,53.0,Havana,28284.0,,,CUP,1.62,31.3,1.4,100023000000,101.9,41.4,3.7,Havana,78.7,36.0,0.05,Spanish,,8.42,11333483,53.6,,,1.64,8739135,21.521757,-77.781167
63,Georgia,57,GE,34.5,69700,26000.0,13.47,995.0,Tbilisi,10128.0,133.61,4.9,GEL,2.06,40.6,0.76,17743195770,98.6,63.9,8.7,Tbilisi,73.6,25.0,0.05,Georgian,57.3,7.12,3720382,68.3,21.7,9.9,14.4,2196476,42.315407,43.356892
66,Greece,81,GR,47.6,131957,146000.0,8.1,30.0,Athens,62434.0,101.87,0.2,EUR,1.35,31.7,1.54,209852761469,99.6,136.6,3.6,Macedonia,81.3,3.0,4.46,Greek,35.5,5.48,10716322,51.8,26.2,51.9,17.24,8507474,39.074208,21.824312
99,Lithuania,43,LT,47.2,65300,34000.0,10.0,370.0,Vilnius,12963.0,118.38,2.3,EUR,1.63,34.8,1.16,54219315600,103.9,72.4,3.3,Vilnius,75.7,8.0,2.41,Lithuanian,32.1,6.35,2786844,61.6,16.9,42.6,6.35,1891013,55.169438,23.881275
113,Monaco,26337,MC,,2,,5.9,377.0,Monaco City,,,,EUR,,,2.0,7184844193,,,2.6,Monaco City,,,11.72,French,6.1,6.56,38964,,,,,38964,43.7384176,7.4246158
140,Portugal,111,PT,39.5,92212,52000.0,8.5,351.0,Lisbon,48742.0,110.62,0.3,EUR,1.38,34.6,1.54,237686075635,106.2,63.9,3.1,Lisbon,81.3,8.0,3.78,Portuguese,27.7,5.12,10269417,58.8,22.8,39.8,6.33,6753579,39.399872,-8.224454
149,San Marino,566,SM,16.7,61,,6.8,378.0,City of San Marino,,110.63,1.0,EUR,1.26,0.0,,1637931034,108.1,42.5,1.7,City of San Marino,85.4,,,Italian,18.3,6.11,33860,,18.1,36.2,,32969,43.94236,12.457777
187,Uruguay,20,UY,82.6,176215,22000.0,13.86,598.0,Montevideo,6766.0,202.92,7.9,UYU,1.97,10.7,1.5,56045912952,108.5,63.1,6.4,Montevideo,77.8,17.0,1.66,Spanish,16.2,5.05,3461734,64.0,20.1,41.8,8.73,3303394,-32.522779,-55.765835


In [26]:
# 5. Encuentra los países cuyatasa de fertilidad sea mayor a 4.
df_final[df_final["fertility_rate"] > 6]

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
125,Niger,19,NE,36.1,1267000,10000,46.08,227.0,Niamey,2017,109.32,-2.5,XOF,6.91,0.9,0.88,12928145120,74.7,4.4,48.0,Niamey,62.0,509.0,0.29,French,52.3,0.04,23310715,72.0,11.8,47.2,0.47,3850231,17.607789,8.081666
160,Somalia,25,SO,70.3,637657,20000,41.75,252.0,Mogadishu,645,,,SOS,6.07,10.0,1.41,4720727278,23.4,2.5,76.6,Bosaso,57.1,829.0,,Arabic,,0.02,15442905,47.4,0.0,,11.35,7034861,5.152149,46.199616


In [27]:
# 6. Encuentra los países cuya moneda es el euro (EUR) y tienen una tasa de natalidad superior al promedio.
media_natalidad = df_final["birth_rate"].mean()
condicion_natalidad = df_final["birth_rate"] > media_natalidad
condicion_monedas = df_final["currency-code"] == "EUR"

df_final[condicion_natalidad & condicion_monedas]

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude


In [28]:
# 7. Encuentra los países cuyas tasas de mortalidad infantil son superiores a 70
df_final[df_final["infant_mortality"]> 70]

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population_labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,latitude,longitude
33,Central African Republic,8,CF,8.2,622984,8000,35.35,236.0,Bangui,297,186.86,37.1,,4.72,35.6,1.41,2220307369,102.0,3.0,84.5,Bangui,52.8,829.0,0.37,French,39.6,0.06,4745185,72.0,8.6,73.3,3.68,1982064,6.611111,20.939444
34,Chad,13,TD,39.7,1284000,35000,42.17,235.0,N'Djamena,1016,117.7,-1.0,XAF,5.75,3.8,0.78,11314951343,86.8,3.3,71.4,N'Djamena,54.0,1140.0,0.6,French,56.4,0.04,15946876,70.7,,63.5,1.89,3712273,15.454166,18.732207
126,Nigeria,226,NG,77.7,923768,215000,37.91,234.0,Abuja,120369,267.51,11.4,NGN,5.39,7.2,0.46,448120428859,84.7,10.2,75.7,Lagos,54.3,917.0,0.54,English,72.2,0.38,200963599,52.9,1.5,34.8,8.1,102806948,9.081999,8.675277
155,Sierra Leone,111,SL,54.7,71740,9000,33.41,232.0,Freetown,1093,234.16,14.8,SLL,4.26,43.1,1.08,3941474311,112.8,2.0,78.5,Freetown,54.3,1120.0,0.57,English,38.2,0.03,7813215,57.9,8.6,30.7,4.43,3319366,8.460555,-11.779889
160,Somalia,25,SO,70.3,637657,20000,41.75,252.0,Mogadishu,645,,,SOS,6.07,10.0,1.41,4720727278,23.4,2.5,76.6,Bosaso,57.1,829.0,,Arabic,,0.02,15442905,47.4,0.0,,11.35,7034861,5.152149,46.199616
