# Inflación Global

En el siguiente proyecto voy a realizar un proceso de ETL a un archivo CSV que contiene datos sobre la inflación de cada país desde el año 1980 hasta el 2024. El objetivo de dicho proyecto es mostrar la inflación global a partir del 2000 hasta el año 2023. 

## Configuración de Entorno

In [1]:
! pip install --upgrade pip
! pip install -r requirements.txt



## Objeto de Trabajo

In [2]:
# Cargo el conjunto de datos desde el archivo CSV utilizando pandas
import pandas as pd

# Carga del CSV 

infl_dt = pd.read_csv('C:/Users/roble/OneDrive/Escritorio/pr/inflation_rate_project/global_inflation_data.csv')



## Exploración de Datos

In [3]:
# ¿Cómo es nuestro conjunto de datos?

infl_dt

Unnamed: 0,country_name,indicator_name,1980,1981,1982,1983,1984,1985,1986,1987,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Afghanistan,Annual average inflation (consumer prices) rate,13.4,22.2,18.2,15.9,20.4,8.7,-2.1,18.4,...,-0.66,4.38,4.98,0.63,2.3,5.44,5.06,13.71,9.1,
1,Albania,Annual average inflation (consumer prices) rate,,,,,,,,,...,1.90,1.30,2.00,2.00,1.4,1.60,2.00,6.70,4.8,4.0
2,Algeria,Annual average inflation (consumer prices) rate,9.7,14.6,6.6,7.8,6.3,10.4,14.0,5.9,...,4.80,6.40,5.60,4.30,2.0,2.40,7.20,9.30,9.0,6.8
3,Andorra,Annual average inflation (consumer prices) rate,,,,,,,,,...,-1.10,-0.40,2.60,1.00,0.5,0.10,1.70,6.20,5.2,3.5
4,Angola,Annual average inflation (consumer prices) rate,46.7,1.4,1.8,1.8,1.8,1.8,1.8,1.8,...,9.20,30.70,29.80,19.60,17.1,22.30,25.80,21.40,13.1,22.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
191,Vietnam,Annual average inflation (consumer prices) rate,25.2,69.6,95.4,49.5,64.9,91.6,453.5,360.4,...,0.60,2.70,3.50,3.50,2.8,3.20,1.80,3.20,3.4,3.4
192,West Bank and Gaza,Annual average inflation (consumer prices) rate,,,,,,,,,...,1.40,-0.20,0.20,-0.20,1.6,-0.70,1.20,3.70,3.4,2.7
193,Yemen,Annual average inflation (consumer prices) rate,,,,,,,,,...,22.00,21.30,30.40,33.60,15.7,21.70,31.50,29.50,14.9,17.3
194,Zambia,Annual average inflation (consumer prices) rate,11.7,14.0,12.5,19.7,20.0,37.4,48.0,43.0,...,10.10,17.90,6.60,7.50,9.2,15.70,22.00,11.00,10.6,9.6


In [4]:
# ¿Qué tipo de dato son las variables del conjunto de datos?

infl_dt.dtypes

country_name       object
indicator_name     object
1980              float64
1981              float64
1982              float64
1983              float64
1984              float64
1985              float64
1986              float64
1987              float64
1988              float64
1989              float64
1990              float64
1991              float64
1992              float64
1993              float64
1994              float64
1995              float64
1996              float64
1997              float64
1998              float64
1999              float64
2000              float64
2001              float64
2002              float64
2003              float64
2004              float64
2005              float64
2006              float64
2007              float64
2008              float64
2009              float64
2010              float64
2011              float64
2012              float64
2013              float64
2014              float64
2015              float64
2016        

In [5]:
# ¿Cuántas variables de cada tipo de dato tenemos en el conjunto de datos?

(
    infl_dt
    .dtypes
    .value_counts()
)


float64    45
object      2
Name: count, dtype: int64

In [6]:
# ¿Cuántas filas y columnas tenemos en el conjunto de datos?

infl_dt.shape

# Entonces en el dataframe hay 196 filas y 47 columnas.

(196, 47)

In [7]:
# ¿Existen valores nulos explícitos en el conjunto de datos?

(
    infl_dt
    .isnull()
    .any()
)

country_name      False
indicator_name    False
1980               True
1981               True
1982               True
1983               True
1984               True
1985               True
1986               True
1987               True
1988               True
1989               True
1990               True
1991               True
1992               True
1993               True
1994               True
1995               True
1996               True
1997               True
1998               True
1999               True
2000               True
2001               True
2002               True
2003               True
2004               True
2005               True
2006               True
2007               True
2008               True
2009               True
2010               True
2011               True
2012               True
2013               True
2014               True
2015               True
2016               True
2017               True
2018               True
2019            

In [8]:
# De tener observaciones con valores nulos, ¿cuántas tenemos por cada variable?

(
    infl_dt
    .isnull()
    .sum()
)  

country_name       0
indicator_name     0
1980              56
1981              52
1982              51
1983              51
1984              51
1985              51
1986              51
1987              49
1988              49
1989              49
1990              46
1991              41
1992              38
1993              27
1994              25
1995              24
1996              20
1997              17
1998              15
1999              14
2000              13
2001               9
2002               7
2003               6
2004               5
2005               3
2006               3
2007               3
2008               3
2009               3
2010               3
2011               4
2012               3
2013               2
2014               2
2015               2
2016               2
2017               1
2018               1
2019               1
2020               2
2021               2
2022               2
2023               4
2024               5
dtype: int64

In [9]:
# Naturaleza de los valores nulos

infl_dt_null=infl_dt[infl_dt.isnull().any(axis=1)]

infl_dt_null

Unnamed: 0,country_name,indicator_name,1980,1981,1982,1983,1984,1985,1986,1987,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Afghanistan,Annual average inflation (consumer prices) rate,13.4,22.2,18.2,15.9,20.4,8.7,-2.1,18.4,...,-0.66,4.38,4.98,0.63,2.3,5.44,5.06,13.71,9.1,
1,Albania,Annual average inflation (consumer prices) rate,,,,,,,,,...,1.9,1.3,2.0,2.0,1.4,1.6,2.0,6.7,4.8,4.0
3,Andorra,Annual average inflation (consumer prices) rate,,,,,,,,,...,-1.1,-0.4,2.6,1.0,0.5,0.1,1.7,6.2,5.2,3.5
6,Argentina,Annual average inflation (consumer prices) rate,,,,,,,,,...,,,25.7,34.3,53.5,42.0,48.4,72.4,121.7,93.7
7,Armenia,Annual average inflation (consumer prices) rate,,,,,,,,,...,3.7,-1.4,1.2,2.5,1.4,1.2,7.2,8.6,3.5,4.0
8,Aruba,Annual average inflation (consumer prices) rate,,,,,,,,3.6,...,0.5,-0.9,-1.0,3.6,3.9,-1.3,0.7,5.5,4.5,2.3
11,Azerbaijan,Annual average inflation (consumer prices) rate,,,,,,,,,...,4.0,12.4,12.9,2.3,2.6,2.8,6.7,13.9,10.3,5.6
16,Belarus,Annual average inflation (consumer prices) rate,,,,,,,,,...,13.5,11.8,6.0,4.9,5.6,5.5,9.5,15.2,4.7,5.7
22,Bosnia and Herzegovina,Annual average inflation (consumer prices) rate,,,,,,,,,...,-1.0,-1.6,0.8,1.4,0.6,-1.1,2.0,14.0,5.5,3.0
25,Brunei Darussalam,Annual average inflation (consumer prices) rate,,,0.0,1.2,3.1,2.1,1.8,1.2,...,-0.5,-0.3,-1.3,1.0,-0.4,1.9,1.7,3.7,1.7,1.5


Lo que se observa es que hay varios paises que contienen valores nulos de diferentes naturalezas, algunas son subsecuentes de cambios historicos como en el caso de Ucrania, el cual en el año 1992 luego de la disolución de la Unión Soviética se formo como un estado independiente. Tambien debido a mejoras en la recopilación de datos o si los países quisieron compartir dicha información. Por consiguiente no voy aplicar ninguna acción sobre estos valores nulos. 

## Transformación


Objetivos:

- Tomar datos desde el 2000 hasta el 2023.
- Armar datasets separado por continentes.
- Agregar una nueva columna, indicando el continente al que pertenece.
- Unir esos datasets en uno nuevo.

### Primer objetivo: Datos del 2000 al 2023

In [10]:
# Nuevo DataFrame que contenga datos del 2000 al 2023. Para ello voy a unir las columnas de los años en uno solo.

# Creamos una lista de años que usaremos como índice para las nuevas filas
years = [str(year) for year in range (2000, 2023)]

# Usando la función 'melt()' para combinar las columnas de años en una nueva columna llamada 'Year'

infl_dt_1 = pd.melt(infl_dt, id_vars=['country_name', 'indicator_name'], value_vars=years, var_name='Year', value_name='Inflation')

# Esto creará un nuevo DataFrame ('df_melted') donde cada fila representará un país, un indicador de inflación, un año y su correspondiente inflación

infl_dt_1

Unnamed: 0,country_name,indicator_name,Year,Inflation
0,Afghanistan,Annual average inflation (consumer prices) rate,2000,0.0
1,Albania,Annual average inflation (consumer prices) rate,2000,0.0
2,Algeria,Annual average inflation (consumer prices) rate,2000,0.3
3,Andorra,Annual average inflation (consumer prices) rate,2000,
4,Angola,Annual average inflation (consumer prices) rate,2000,325.0
...,...,...,...,...
4503,Vietnam,Annual average inflation (consumer prices) rate,2022,3.2
4504,West Bank and Gaza,Annual average inflation (consumer prices) rate,2022,3.7
4505,Yemen,Annual average inflation (consumer prices) rate,2022,29.5
4506,Zambia,Annual average inflation (consumer prices) rate,2022,11.0


In [11]:
# Ahora me gustaría cambiar el nombre de las columnas "country_name" e "indicator_name" por:

infl_dt_1 = infl_dt_1.rename(columns={'country_name':'Country', 'indicator_name':'Indicator'})

infl_dt_1

Unnamed: 0,Country,Indicator,Year,Inflation
0,Afghanistan,Annual average inflation (consumer prices) rate,2000,0.0
1,Albania,Annual average inflation (consumer prices) rate,2000,0.0
2,Algeria,Annual average inflation (consumer prices) rate,2000,0.3
3,Andorra,Annual average inflation (consumer prices) rate,2000,
4,Angola,Annual average inflation (consumer prices) rate,2000,325.0
...,...,...,...,...
4503,Vietnam,Annual average inflation (consumer prices) rate,2022,3.2
4504,West Bank and Gaza,Annual average inflation (consumer prices) rate,2022,3.7
4505,Yemen,Annual average inflation (consumer prices) rate,2022,29.5
4506,Zambia,Annual average inflation (consumer prices) rate,2022,11.0


### Segundo objetivo: Datasets por continentes

#### Africa 

In [12]:
country_afr = [ "Algeria", "Angola", "Benin", "Botswana", "Burkina Faso", "Burundi", "Cabo Verde",
    "Cameroon", "Chad", "Comoros", "Democratic Republic of the Congo", "Djibouti", "Egypt",
    "Equatorial Guinea", "Eritrea", "Eswatini", "Ethiopia", "Gabon", "Gambia", "Ghana",
    "Guinea", "Guinea-Bissau", "Ivory Coast", "Kenya", "Lesotho", "Liberia", "Libya",
    "Madagascar", "Malawi", "Mali", "Mauritania", "Mauritius", "Morocco", "Mozambique",
    "Namibia", "Niger", "Nigeria", "Rwanda", "Sao Tome and Principe", "Senegal", "Seychelles",
    "Sierra Leone", "Somalia", "South Africa", "South Sudan", "Sudan", "Tanzania", "Togo",
    "Tunisia", "Uganda", "Zambia", "Zimbabwe", "North Sudan", "Western Sahara"]

infl_dt_1_afr = infl_dt_1[infl_dt_1['Country'].isin(country_afr)]

infl_dt_1_afr

Unnamed: 0,Country,Indicator,Year,Inflation
2,Algeria,Annual average inflation (consumer prices) rate,2000,0.3
4,Angola,Annual average inflation (consumer prices) rate,2000,325.0
19,Benin,Annual average inflation (consumer prices) rate,2000,4.0
23,Botswana,Annual average inflation (consumer prices) rate,2000,8.5
27,Burkina Faso,Annual average inflation (consumer prices) rate,2000,-0.2
...,...,...,...,...
4487,Togo,Annual average inflation (consumer prices) rate,2022,7.6
4490,Tunisia,Annual average inflation (consumer prices) rate,2022,8.3
4494,Uganda,Annual average inflation (consumer prices) rate,2022,7.2
4506,Zambia,Annual average inflation (consumer prices) rate,2022,11.0


#### América

In [13]:
country_amer = ["Canada", "United States","Mexico", "Greenland", "Bermuda", "Saint Pierre and Miquelon","Belize", "Costa Rica", "El Salvador", "Guatemala", "Honduras", "Nicaragua", "Panama", "Antigua and Barbuda", "Bahamas", "Barbados", "Cuba", "Dominica", "Dominican Republic", "Grenada", "Haiti", "Jamaica", "Saint Kitts and Nevis", "Saint Lucia", "Saint Vincent and the Grenadines", "Trinidad and Tobago", "Puerto Rico", "Turks and Caicos Islands", "Cayman Islands", "British Virgin Islands", "United States Virgin Islands", "Anguilla", "Montserrat","Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Guyana", "Paraguay", "Peru", "Suriname", "Uruguay", "Venezuela", "French Guiana"]

infl_dt_1_amer = infl_dt_1[infl_dt_1['Country'].isin(country_amer)]

infl_dt_1_amer

Unnamed: 0,Country,Indicator,Year,Inflation
5,Antigua and Barbuda,Annual average inflation (consumer prices) rate,2000,-0.2
6,Argentina,Annual average inflation (consumer prices) rate,2000,-0.9
15,Barbados,Annual average inflation (consumer prices) rate,2000,2.4
18,Belize,Annual average inflation (consumer prices) rate,2000,0.6
21,Bolivia,Annual average inflation (consumer prices) rate,2000,4.6
...,...,...,...,...
4477,Suriname,Annual average inflation (consumer prices) rate,2022,52.4
4489,Trinidad and Tobago,Annual average inflation (consumer prices) rate,2022,5.8
4498,United States,Annual average inflation (consumer prices) rate,2022,8.0
4499,Uruguay,Annual average inflation (consumer prices) rate,2022,9.1


#### Asia

In [14]:
country_asia = ["Afghanistan", "Armenia", "Azerbaijan", "Bahrain", "Bangladesh", "Bhutan","Brunei", "Cambodia", "China", "Cyprus", "Georgia", "India", "Indonesia","Iran", "Iraq", "Israel", "Japan", "Jordan", "Kazakhstan", "Kuwait","Kyrgyzstan", "Laos", "Lebanon", "Malaysia", "Maldives", "Mongolia","Myanmar", "Nepal", "North Korea", "Oman", "Pakistan", "Palestine","Philippines", "Qatar", "Saudi Arabia", "Singapore", "South Korea","Sri Lanka", "Syria", "Taiwan", "Tajikistan", "Thailand", "Turkmenistan","United Arab Emirates", "Uzbekistan", "Vietnam", "Yemen"]

infl_dt_1_as = infl_dt_1[infl_dt_1['Country'].isin(country_asia)]

infl_dt_1_as

Unnamed: 0,Country,Indicator,Year,Inflation
0,Afghanistan,Annual average inflation (consumer prices) rate,2000,0.0
7,Armenia,Annual average inflation (consumer prices) rate,2000,-0.8
11,Azerbaijan,Annual average inflation (consumer prices) rate,2000,1.8
13,Bahrain,Annual average inflation (consumer prices) rate,2000,-0.7
14,Bangladesh,Annual average inflation (consumer prices) rate,2000,3.6
...,...,...,...,...
4491,Turkmenistan,Annual average inflation (consumer prices) rate,2022,11.2
4496,United Arab Emirates,Annual average inflation (consumer prices) rate,2022,4.8
4500,Uzbekistan,Annual average inflation (consumer prices) rate,2022,11.4
4503,Vietnam,Annual average inflation (consumer prices) rate,2022,3.2


#### Europa

In [15]:
country_eu = ["Albania", "Andorra", "Armenia", "Austria", "Azerbaijan", "Belarus", "Belgium","Bosnia and Herzegovina", "Bulgaria", "Croatia", "Cyprus", "Czech Republic","Denmark", "Estonia", "Finland", "France", "Georgia", "Germany", "Greece","Hungary", "Iceland", "Ireland", "Italy", "Kazakhstan", "Kosovo", "Latvia", "Liechtenstein", "Lithuania", "Luxembourg", "Malta", "Moldova", "Monaco","Montenegro", "Netherlands", "North Macedonia", "Norway", "Poland", "Portugal", "Romania", "Russia", "San Marino", "Serbia", "Slovakia","Slovenia", "Spain", "Sweden", "Switzerland", "Turkey", "Ukraine", "United Kingdom", "Vatican City", "Faroe Islands", "Gibraltar"]

infl_dt_1_eu = infl_dt_1[infl_dt_1['Country'].isin(country_eu)]

infl_dt_1_eu

Unnamed: 0,Country,Indicator,Year,Inflation
1,Albania,Annual average inflation (consumer prices) rate,2000,0.0
3,Andorra,Annual average inflation (consumer prices) rate,2000,
7,Armenia,Annual average inflation (consumer prices) rate,2000,-0.8
10,Austria,Annual average inflation (consumer prices) rate,2000,2.0
11,Azerbaijan,Annual average inflation (consumer prices) rate,2000,1.8
...,...,...,...,...
4474,Spain,Annual average inflation (consumer prices) rate,2022,8.3
4478,Sweden,Annual average inflation (consumer prices) rate,2022,8.1
4479,Switzerland,Annual average inflation (consumer prices) rate,2022,2.8
4495,Ukraine,Annual average inflation (consumer prices) rate,2022,20.2


#### Oceanía 

In [16]:
country_ocean = ["Australia", "Fiji", "Marshall Islands", "Solomon Islands", "Kiribati", "Micronesia", "Nauru", "New Zealand", "Palau", "Papua New Guinea","Samoa", "Tonga", "Tuvalu", "Vanuatu"]

infl_dt_1_oc = infl_dt_1[infl_dt_1['Country'].isin(country_ocean)]

infl_dt_1_oc

Unnamed: 0,Country,Indicator,Year,Inflation
9,Australia,Annual average inflation (consumer prices) rate,2000,4.5
58,Fiji,Annual average inflation (consumer prices) rate,2000,1.1
89,Kiribati,Annual average inflation (consumer prices) rate,2000,0.4
109,Marshall Islands,Annual average inflation (consumer prices) rate,2000,
121,Nauru,Annual average inflation (consumer prices) rate,2000,
...,...,...,...,...
4460,Samoa,Annual average inflation (consumer prices) rate,2022,8.7
4470,Solomon Islands,Annual average inflation (consumer prices) rate,2022,5.5
4488,Tonga,Annual average inflation (consumer prices) rate,2022,8.5
4492,Tuvalu,Annual average inflation (consumer prices) rate,2022,11.5


### Tercer paso: Agregar nueva columna, indicando el continente al que pertenece. A su vez guardo el data frame como CSV

In [17]:
# AFRICA

infl_dt_1_afr['Continent'] = 'AFR'

df_infl_dt_afr = pd.DataFrame(infl_dt_1_afr)

df_infl_dt_afr.to_csv('./csv/infl_dt_afr.csv')

# AMERICA

infl_dt_1_amer['Continent'] = 'AMER'

df_infl_dt_amer = pd.DataFrame(infl_dt_1_amer)

df_infl_dt_amer.to_csv('./csv/infl_dt_amer.csv')

# ASIA

infl_dt_1_as['Continent'] = 'AS'

df_infl_dt_as = pd.DataFrame(infl_dt_1_as)

df_infl_dt_as.to_csv('./csv/infl_dt_as.csv')

# EUROPA

infl_dt_1_eu['Continent'] = 'EU'

df_infl_dt_eu = pd.DataFrame(infl_dt_1_eu)

df_infl_dt_eu.to_csv('./csv/infl_dt_eu.csv')

# OCEANIA

infl_dt_1_oc['Continent'] = 'OC'

df_infl_dt_oc = pd.DataFrame(infl_dt_1_oc)

df_infl_dt_oc.to_csv('./csv/infl_dt_oc.csv')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  infl_dt_1_afr['Continent'] = 'AFR'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  infl_dt_1_amer['Continent'] = 'AMER'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  infl_dt_1_as['Continent'] = 'AS'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_in

### Cuarto objetivo: Unir esos datasets en uno nuevo. 

In [18]:
# Creo una lista llamada data_frames que contendra todos mis DataFrames creados

data_frames = [df_infl_dt_afr, df_infl_dt_amer, df_infl_dt_as, df_infl_dt_eu, df_infl_dt_oc]

# Combino todos los DataFrames en uno solo

inflation_data = pd.concat(data_frames, ignore_index= True)

inflation_data

Unnamed: 0,Country,Indicator,Year,Inflation,Continent
0,Algeria,Annual average inflation (consumer prices) rate,2000,0.3,AFR
1,Angola,Annual average inflation (consumer prices) rate,2000,325.0,AFR
2,Benin,Annual average inflation (consumer prices) rate,2000,4.0,AFR
3,Botswana,Annual average inflation (consumer prices) rate,2000,8.5,AFR
4,Burkina Faso,Annual average inflation (consumer prices) rate,2000,-0.2,AFR
...,...,...,...,...,...
4066,Samoa,Annual average inflation (consumer prices) rate,2022,8.7,OC
4067,Solomon Islands,Annual average inflation (consumer prices) rate,2022,5.5,OC
4068,Tonga,Annual average inflation (consumer prices) rate,2022,8.5,OC
4069,Tuvalu,Annual average inflation (consumer prices) rate,2022,11.5,OC


### Ahora guardo el DataFrame en un CSV en la carpeta de mi proyecto.

In [19]:
df = pd.DataFrame(inflation_data)
df.to_csv('./csv/inflation_data.csv')