# <font color= #99C8F5> **ARIMA Models** </font>

_by Isabel Valladolid, Sofía Maldonado & Vivienne Toledo_

15/02/2026.

This project will cover the **_Real Median Household Income_** indicator from the Federal Reserve Bank of St. Louis (FRED).

# <font color= #99C8F5> **Libraries & Data** </font>

In [10]:
# General
import pandas as pd
import numpy as np
from dotenv import load_dotenv
import os

# Data
from fredapi import Fred

# Visualization
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Preprocessing
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Models
from statsmodels.tsa.arima.model import ARIMA

# Metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Configuración de gráficos
plt.style.use('ggplot')

In [8]:
# Get the API KEY
load_dotenv()
FRED_API_KEY = os.getenv('FRED_API_KEY')

# <font color= #99C8F5> **Visualization** </font>

In [9]:
try:
    fred = Fred(api_key=FRED_API_KEY)

    # Usaremos 'MEHOINUSA672N': Real Median Household Income 
    series_id = 'MEHOINUSA672N'
    df = pd.DataFrame(fred.get_series(series_id), columns=['value'])
    df.index.name = 'date'

    # IMPORTANTE: Establecemos la frecuencia explícita (Mensual - Inicio)
    df.index = pd.DatetimeIndex(df.index).to_period('M').to_timestamp()
    df.index.freq = 'MS'

except Exception as e:
    print(f"Error: {e}")

fig = px.line(df, x=df.index, y='value', title=f'Time Series: {series_id}')
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Index',

)
fig.show()

Error: Inferred frequency YS-JAN from passed values does not conform to passed frequency MS


In [12]:
# Checando si la serie es estacionaria
val_dickey_fuller = adfuller(df['value'])
kpss_test = kpss(df['value'])
print(f"Dickey-Fuller: {val_dickey_fuller[1]}")
print(f"KPSS: {kpss_test[1]}")

Dickey-Fuller: 0.9792411831307773
KPSS: 0.01



The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is smaller than the p-value returned.




No es estacionaria :(

In [13]:
df['diff_1'] = df['value'].diff()

In [16]:
fig = px.line(df, x=df.index, y='diff_1', title=f'Time Series, 1 Differentiation: {series_id}')
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Index',
)
fig.show()

Todavía no es estacionaria, de nuevo...

In [19]:
df['diff_2'] = df['diff_1'].diff()

In [21]:
fig = px.line(df, x=df.index, y='diff_2', title=f'Time Series, 2 Differentiations: {series_id}')
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Index',
)
fig.show()

In [20]:
# Checando si la serie diferenciada es estacionaria
val_dickey_fuller = adfuller(df['diff_2'].dropna())
kpss_test = kpss(df['diff_2'].dropna())
print(f"Dickey-Fuller: {val_dickey_fuller[1]}")
print(f"KPSS: {kpss_test[1]}")

Dickey-Fuller: 0.0002864865522887045
KPSS: 0.1



The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is greater than the p-value returned.


