### ARIMA Time-Series Forecasting for Portfolio Yield Projections

In financial analytics, particularly for managing emerging market bond portfolios, time-series forecasting enables the prediction of future values based on historical patterns, supporting scenario analysis and strategic decision-making. The Autoregressive Integrated Moving Average (ARIMA) model is a widely utilized statistical method for this purpose, especially for univariate time series like bond yields. 

ARIMA combines three components: autoregression (dependence on past values), integration (differencing to achieve stationarity), and moving average (dependence on past errors). This model is suitable for forecasting non-seasonal data with trends.

The ARIMA model is mathematically defined as:

$$\Phi(B) \nabla^d y_t = \Theta(B) \epsilon_t$$
where:

- $y_t$: The observed value of the time series (e.g., yield) at time $t$.
- $\Phi(B)$: The autoregressive operator of order $p$, a polynomial in the backshift operator $B$ (where $B y_t = y_{t-1}$), capturing the relationship with previous observations.
- $\nabla^d$: The differencing operator of order $d$, applied to render the series stationary by removing trends (e.g., $\nabla y_t = y_t - y_{t-1}$).
- $\Theta(B)$: The moving average operator of order $q$, modeling the dependency on past forecast errors.
- $\epsilon_t$: The white noise error term, assumed to be independently and identically distributed with zero mean.


This equation describes a process where the differenced series is regressed on its lagged values and errors. Parameters $p$, $d$, and $q$ are selected based on data characteristics, such as through Augmented Dickey-Fuller (ADF) tests for stationarity and autocorrelation analysis.

To implement ARIMA on the yield series, the data is first cleaned and tested for stationarity. The ADF test statistic of -1.840 and p-value of 0.361 indicate non-stationarity, justifying $d = 1$. An ARIMA(1,1,1) model is fitted, balancing simplicity with explanatory power for this dataset. The forecast for the next five periods (daily yields post-November 21, 2025) is approximately 4.957, 4.955, 4.954, 4.953, and 4.953, suggesting a slight stabilization around 4.95%. These projections can inform liquidity models by anticipating yield impacts on bond valuations.

In [1]:
import pandas as pd  
from statsmodels.tsa.arima.model import ARIMA 
from statsmodels.tsa.stattools import adfuller

In [2]:
df = pd.read_csv('/Users/bekay/Documents/Studies/MSc Financial Engineering/Models/BAMLEM2BRRBBBCRPIEY.csv')

In [3]:
# Parse dates and convert yields to numeric.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y', errors='coerce')
df['Yield'] = pd.to_numeric(df['BAMLEM2BRRBBBCRPIEY'], errors='coerce')

In [4]:
# Drop rows with NaN in key columns and set index.
df = df.dropna(subset=['Date', 'Yield']).set_index('Date')

In [5]:
# Resample to business day frequency to resolve the warning.
# df.asfreq('B'): A method that conforms the DataFrame to a specified frequency ('B' for business days, excluding weekends); fills missing dates with NaN.
df = df.asfreq('B')  # Sets frequency attribute on the index, enabling time-aware forecasting.
df = df.ffill()

In [6]:
df.head()

Unnamed: 0_level_0,BAMLEM2BRRBBBCRPIEY,Yield
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-11-23,2.68,2.68
2020-11-24,2.68,2.68
2020-11-25,2.68,2.68
2020-11-26,2.67,2.67
2020-11-27,2.65,2.65


In [7]:
# Test for stationarity using ADF.
adf_test = adfuller(df['Yield'])

In [8]:
# Fit ARIMA(1,1,1) model.
model = ARIMA(df['Yield'], order=(1,1,1)) 
fitted_model = model.fit()

In [9]:
# Forecast next 5 periods.

forecast = fitted_model.forecast(steps=5) 

# Display results.
print(f"ADF Statistic: {adf_test[0]:.3f}")
print(f"p-value: {adf_test[1]:.3f}")
print("Forecasted Yields:")
print(forecast.round(3))

ADF Statistic: -1.838
p-value: 0.362
Forecasted Yields:
2025-11-24    4.957
2025-11-25    4.955
2025-11-26    4.954
2025-11-27    4.953
2025-11-28    4.953
Freq: B, Name: predicted_mean, dtype: float64
