# **Inflation Prediction Pipeline with MLOps Focus**

This project aims to develop an MLOps pipeline for predicting inflation rates in Argentina. The primary focus is on building a robust production pipeline rather than solely optimizing the model's predictive accuracy. To achieve this, the project integrates several key macroeconomic variables known to influence inflation:

- Official exchange rate  
- Informal (blue) exchange rate
- Inflation  
- Monetary supply (M2)  
- Interest rate  
- Commodity prices (e.g., crude oil)  


The data for this analysis was sourced primarily from the **Central Bank of Argentina (BCRA)**, except for the informal exchange rate ("blue dollar"), which was obtained from **Ámbito Financiero**, a trusted financial news platform.  

By leveraging these variables, the pipeline demonstrates how machine learning can be effectively applied to track and predict inflation trends in real-world economic scenarios.


In [1]:
import pandas as pd

In [8]:
# Official exchange rate
er = pd.read_csv("../data/raw/exchange_rate.csv", sep=";")

# Mapping months from Spanish to English: 
month_map = {
    "ene": "Jan", "feb": "Feb", "mar": "Mar", "abr": "Apr", "may": "May", "jun": "Jun",
    "jul": "Jul", "ago": "Aug", "sep": "Sep", "oct": "Oct", "nov": "Nov", "dic": "Dec"
}

er["Mes"] = er["Mes"].replace(month_map, regex=True)

# Convert the Date Column to the correct format:
er["Mes"] = pd.to_datetime(er["Mes"], format="%b-%y", errors="coerce")

# Set Index:
er.set_index("Mes", inplace=True)

# Serie declaration:
s1 = er["Tipo de cambio nominal promedio mensual"]

# Original index to Montly index:
s1.index = s1.index.to_period('M')

#Serie check:
s1

Mes
2002-03      2.3989
2002-04      2.8551
2002-05      3.3287
2002-06      3.6213
2002-07      3.6071
             ...   
2024-06    903.7794
2024-07    923.7652
2024-08    942.9204
2024-09    961.8254
2024-10    981.5682
Freq: M, Name: Tipo de cambio nominal promedio mensual, Length: 272, dtype: float64

In [10]:
# Informal (blue) exchange rate
ier = pd.read_csv("../data/raw/informal_exchange_rate.csv", sep=";")

# Convert the Date Column to the correct format:
ier["Fecha"] = pd.to_datetime(ier["Fecha"], format="%d/%m/%Y", errors="coerce")

# Replacing the commas by dots in order to convert the columns to float values:
ier["Compra"] = ier["Compra"].str.replace(",", ".").astype(float)
ier["Venta"] = ier["Venta"].str.replace(",", ".").astype(float)

# Set index
ier.set_index("Fecha", inplace=True)

# Convert the serie from daily to montly data taking in account the median values:
s2 = ier["Venta"].resample("ME").median()

# Original index to Montly index:
s2.index = s2.index.to_period('M')

s2


Fecha
2002-01       1.800
2002-02       2.060
2002-03       2.440
2002-04       2.930
2002-05       3.365
             ...   
2024-07    1437.500
2024-08    1355.000
2024-09    1252.500
2024-10    1215.000
2024-11    1135.000
Freq: M, Name: Venta, Length: 275, dtype: float64

In [11]:
# Inflation
inflation = pd.read_csv("../data/raw/inflation_data.csv", sep=";")

# Convert the Date Column to the correct format:
inflation["Fecha"] = pd.to_datetime(inflation["Fecha"], format="%d/%m/%Y", errors="coerce")

# Replacing the commas by dots in order to convert the column to float values:
inflation["Valor"] = inflation["Valor"].str.replace(",", ".").astype(float)

# Set index
inflation.set_index("Fecha", inplace=True)

# Serie declaration:
s3 = inflation["Valor"]

# Original index to Montly index:
s3.index = s3.index.to_period('M')

s3

Fecha
1943-02   -0.6
1943-03    1.6
1943-04    0.7
1943-05   -0.9
1943-06    0.1
          ... 
2024-06    4.6
2024-07    4.0
2024-08    4.2
2024-09    3.5
2024-10    2.7
Freq: M, Name: Valor, Length: 981, dtype: float64

In [13]:
# Monetary supply (M2)
m2 = pd.read_csv("../data/raw/M2_variation.csv", sep=";")

# Convert the Date Column to the correct format:
m2["Fecha"] = pd.to_datetime(m2["Fecha"], format="%d/%m/%Y", errors="coerce")

# Replacing the commas by dots in order to convert the column to float values:
m2["Valor"] = m2["Valor"].str.replace(",", ".").astype(float)

# Set index
m2.set_index("Fecha", inplace=True)

# Convert the serie from daily to montly data taking in account the median values:
s4 = m2["Valor"].resample("ME").median()

# Original index to Montly index:
s4.index = s4.index.to_period('M')

s4

Fecha
2004-02     56.20
2004-03     59.50
2004-04     62.75
2004-05     62.30
2004-06     56.75
            ...  
2024-07    196.70
2024-08    179.00
2024-09    159.40
2024-10    126.00
2024-11    107.30
Freq: M, Name: Valor, Length: 250, dtype: float64

In [14]:
# Interest rate
interest = pd.read_csv("../data/raw/interest_rate.csv", sep=";")

# Convert the Date Column to the correct format:
interest["Fecha"] = pd.to_datetime(interest["Fecha"], format="%d/%m/%Y", errors="coerce")

# Replacing the commas by dots in order to convert the column to float values:
interest["Valor"] = interest["Valor"].str.replace(",", ".").astype(float)

# Set index
interest.set_index("Fecha", inplace=True)

# Convert the serie from daily to montly data taking in account the median values:
s5 = interest["Valor"].resample("ME").median()

# Original index to Montly index:
s5.index = s5.index.to_period('M')

s5

Fecha
1992-01    17.28
1992-02    15.69
1992-03    15.09
1992-04    15.15
1992-05    15.33
           ...  
2024-07    35.63
2024-08    37.54
2024-09    38.21
2024-10    39.00
2024-11    35.93
Freq: M, Name: Valor, Length: 395, dtype: float64

In [19]:
# Commodity prices
ipmp = pd.read_csv("../data/raw/IPMP.csv", sep=";")

# Replace the new line by an space in columns' name:
ipmp.columns = ipmp.columns.str.replace(r'\n', ' ', regex=True)

# Mapping months from Spanish to English: 
ipmp["Período"] = ipmp["Período"].replace(month_map, regex=True)

# Convert the Date Column to the correct format:
ipmp["Período"] = pd.to_datetime(ipmp["Período"], format="%b-%y", errors="coerce")

# Set Index:
ipmp.set_index("Período", inplace=True)

# Original index to Montly index:
ipmp.index = ipmp.index.to_period('M')

# Series Declaration:
s6 = ipmp['IPMP (dic-01=100)']
s7 = ipmp['IPMP Agropecuario (dic-01=100)']
s8 = ipmp['IPMP Metales (dic-01=100)']
s9 = ipmp['IPMP Petróleo (dic-01=100)']

s6, s7, s8, s9

(Período
 1997-01    154.6
 1997-02    150.8
 1997-03    153.3
 1997-04    149.9
 1997-05    149.3
            ...  
 2024-06    268.0
 2024-07    262.3
 2024-08    249.2
 2024-09    250.3
 2024-10    256.6
 Freq: M, Name: IPMP (dic-01=100), Length: 334, dtype: float64,
 Período
 1997-01    146.9
 1997-02    148.0
 1997-03    155.5
 1997-04    156.0
 1997-05    150.6
            ...  
 2024-06    218.4
 2024-07    210.9
 2024-08    198.8
 2024-09    202.1
 2024-10    206.7
 Freq: M, Name: IPMP Agropecuario (dic-01=100), Length: 334, dtype: float64,
 Período
 1997-01    134.4
 1997-02    133.5
 1997-03    136.8
 1997-04    135.1
 1997-05    136.4
            ...  
 2024-06    459.7
 2024-07    463.4
 2024-08    472.6
 2024-09    489.9
 2024-10    515.0
 Freq: M, Name: IPMP Metales (dic-01=100), Length: 334, dtype: float64,
 Período
 1997-01    126.2
 1997-02    112.0
 1997-03    103.3
 1997-04     93.9
 1997-05    102.9
            ...  
 2024-06    428.1
 2024-07    442.8
 2024-08    4

In [20]:
#From Series to Dataframe:
series = [s1, s2, s3, s4, s5, s6, s7, s8, s9]

data = pd.concat(series, axis=1)

# Columns' names:
data.columns = ['Official Exchange Rate', 'Informal Exchange Rate', 'Inflation', 'Monetary Supply (M2)', 'Interest Rate', 
                       'IPMP', 'IPMP Agropecuario', 'IPMP Metales', 'IPMP Petróleo']

# Dropping NAs:
data = data.dropna()

# Checking the result:
print("Refined and Combined Dataset:")
print(data.head())

# From DataFrame to CSV file:
data.to_csv("../data/processed/combined_cleaned_data.csv")


Dataset combinado y limpio:
         Official Exchange Rate  Informal Exchange Rate  Inflation  \
2004-02                  2.9319                    2.94        0.1   
2004-03                  2.8976                    2.92        0.6   
2004-04                  2.8359                    2.84        0.9   
2004-05                  2.9197                    2.93        0.7   
2004-06                  2.9603                    2.97        0.6   

         Monetary Supply (M2)  Interest Rate   IPMP  IPMP Agropecuario  \
2004-02                 56.20          2.265  167.8              167.4   
2004-03                 59.50          2.400  183.7              184.7   
2004-04                 62.75          2.100  184.5              187.0   
2004-05                 62.30          2.420  181.9              180.8   
2004-06                 56.75          2.440  172.3              170.2   

         IPMP Metales  IPMP Petróleo  
2004-02         164.9          165.0  
2004-03         169.9       