# Presentation - Final - AI & ML

## Introduction & Motivation

### **Enhancing Econometric QE Models with Machine Learning**

The goal of this approach is to improve how we forecast **Quantitative Easing (QE)** decisions by combining **econometric structure** with **machine learning flexibility**.

---

#### **1. The Econometric Baseline**

Traditionally, we model the probability that the Fed starts QE next quarter as a **logistic regression**:

$$
P(QE_{t+1}=1 \mid X_t) = \frac{1}{1 + e^{-(\beta_0 + \beta^\top X_t)}}.
$$

Here:

* $QE_{t+1}$ is 1 if the Fed expands its balance sheet by more than $100B next quarter, 0 otherwise.
* $X_t$ is a vector of predictors (e.g., Fed balance sheet growth, VIX level, lagged changes).
* $\beta_0, \beta$ are estimated parameters.

This model assumes the relationship between the predictors and QE probability is **linear and additive**.
In other words, each variable has a constant effect that does not depend on the level of the others.

---

#### **2. Moving Beyond Linearity**

Economic behavior is rarely linear — especially monetary policy, which can change sharply when markets reach stress thresholds.
To allow for this, we replace the linear term $(\beta_0 + \beta^\top X_t)$ with a **flexible, data-driven function** learned by a machine learning algorithm:

$$
P(QE_{t+1}=1 \mid X_t) = f_{\text{ML}}(X_t).
$$

Here, $f_{\text{ML}}$ can be:

* A **Random Forest**, which averages many decision trees:
  $$
  f_{\text{RF}}(X_t) = \frac{1}{T}\sum_{i=1}^{T} h_i(X_t),
  $$
  capturing patterns like “QE is likely if both volatility and balance sheet growth are high.”
* Or an **XGBoost model**, which builds trees sequentially to improve predictions at each step:
  $$
  f_{\text{XGB}}(X_t) = \sum_{m=1}^{M} \gamma_m h_m(X_t).
  $$

Both methods can learn **nonlinear thresholds**, **variable interactions**, and **regime changes** that a simple regression cannot.

---

#### **3. Why It Matters**

This hybrid approach keeps the **probabilistic structure** of econometrics — still predicting $P(QE_{t+1}=1)$ —
but allows the data to reveal **how** variables combine in practice, without forcing linearity.

So instead of assuming:

$$
\text{QE Probability} = \text{constant} + a \times \text{Fed Growth} + b \times \text{VIX},
$$

we estimate:

$$
\text{QE Probability} = f_{\text{ML}}(\text{Fed Growth}, \text{VIX}, \text{Lags}, \text{Rates}, \ldots),
$$

where $f_{\text{ML}}$ automatically adapts to the shape of the real relationship.

---

In short:

> We extend the classical econometric model by replacing its fixed linear formula with a flexible function learned by machine learning. This allows the QE prediction to depend on complex and realistic patterns in macro-financial data — such as nonlinear interactions, stress thresholds, and changing policy regimes.

---

Would you like me to append a small section next that defines **$X_t$** explicitly — describing what each feature represents economically (Fed_Growth, VIX, lags, etc.)?


## **SLIDE 2 RESEARCH QUESTION**

We model the probability that the Federal Reserve will initiate or continue **Quantitative Easing (QE)** in the next quarter as a function of current macro-financial conditions.

---

### **1. Econometric Baseline**

$$
P(QE_{t+1} = 1 \mid X_t) = \frac{1}{1 + e^{-(\beta_0 + \beta^\top X_t)}}
$$

This is a **logistic regression**, where:

* $X_t$ contains indicators such as Fed balance sheet growth, VIX level, and their lags.
* $\beta$ measures how each variable affects QE probability.

The model assumes each feature contributes **linearly and independently**, meaning:

* An increase in volatility (VIX) always shifts QE probability by the same amount,
* regardless of what other variables are doing.

This simplicity aids interpretation but limits flexibility — it cannot detect **thresholds** or **interaction effects** between stress and policy variables.

---

### **2. Machine Learning Extension**

$$
P(QE_{t+1} = 1 \mid X_t) = f_{\text{ML}}(X_t)
$$

Here, $f_{\text{ML}}$ is a **nonlinear function** learned from data (via Random Forests or XGBoost).
Instead of estimating constant coefficients, the model discovers complex patterns such as:

* QE becomes more likely when **volatility** and **Fed balance sheet expansion** rise **together**.
* A spike in VIX triggers QE only if **previous growth** is slowing.

This allows for **state-dependent responses**, where the same variable has different effects depending on the overall market regime.

---

In short, the econometric model provides structure and interpretability, while the machine learning version introduces flexibility to capture **nonlinear policy reactions** and **joint stress conditions** that better explain QE dynamics.

# Data and Methodolody

To model the probability of future **Quantitative Easing (QE)** actions, we build a dataset from macro-financial indicators retrieved from the **Federal Reserve Economic Data (FRED)** database. The focus is on capturing both the **monetary policy stance** of the Federal Reserve and the **financial stress conditions** prevailing in markets.

---

### **1. Monetary Policy Stance – Federal Reserve Securities Holdings**

The first series, `WSHOSHO`, measures the **total securities held by the Federal Reserve** on its balance sheet.
An increase in these holdings typically reflects **asset purchase programs**—the main mechanism through which QE is implemented.

We denote the series as $\text{Fed\_Securities}_t$, representing the level of holdings at time $t$.
To quantify the **rate of policy expansion**, we compute the quarterly percentage growth:

$$
\text{Fed\_Growth}_t = \left( \frac{\text{Fed\_Securities}_t}{\text{Fed\_Securities}_{t-1}} - 1 \right) \times 100
$$

This variable measures how rapidly the Fed is expanding (or contracting) its securities portfolio.
High values of $\text{Fed\_Growth}_t$ signal a more **accommodative** stance, often associated with QE initiation or continuation.

To capture persistence, we also include its **lagged value**:

$$
\text{Fed\_Growth\_Lag1}_t = \text{Fed\_Growth}_{t-1}
$$

which represents the previous quarter's monetary expansion rate.

---

### **2. Financial Market Stress – The VIX Index**

The second series, `VIXCLS`, represents the **CBOE Volatility Index (VIX)**—a forward-looking measure of expected stock market volatility.
Higher VIX values imply greater uncertainty and financial stress, often preceding central bank interventions.

We denote this as $\text{VIX}_t$, and compute both its **change rate** and **stress threshold**:

$$
\text{VIX\_Change}_t = \left( \frac{\text{VIX}_t}{\text{VIX}_{t-1}} - 1 \right) \times 100
$$

which captures the **percentage increase or decrease in market volatility**, and

$$
\text{VIX\_High}_t =
\begin{cases}
1, & \text{if } \text{VIX}_t > 30 \\
0, & \text{otherwise}
\end{cases}
$$

where the threshold of 30 marks episodes of **elevated stress** in financial markets.
Lagged changes ($\text{VIX\_Change\_Lag1}_t$) are also included to account for **delayed policy reactions**.

---

### **3. Target Variable – Future QE Indicator**

The dependent variable represents whether the Fed **expanded its securities holdings by more than \$100 billion** in the subsequent quarter.
This threshold is used to identify significant balance sheet expansions consistent with QE phases.

$$
QE_{t+1} =
\begin{cases}
1, & \text{if } \text{Fed\_Securities}_{t+1} - \text{Fed\_Securities}_t > 100 \\
0, & \text{otherwise}
\end{cases}
$$

Thus, the target variable captures **the onset or continuation of QE** in the next quarter, conditional on information available today.

---

### **4. Predictive Feature Set**

The feature vector used to predict future QE decisions is:

$$
X_t = \{ \text{Fed\_Growth}_t, \text{Fed\_Growth\_Lag1}_t, \text{VIX\_Level}_t, \text{VIX\_Change}_t, \text{VIX\_Change\_Lag1}_t, \text{VIX\_High}_t \}
$$

This vector combines information about **monetary policy trends** (growth and lags of Fed assets) with **financial stress indicators** (VIX levels, changes, and stress flags).

---

### **5. Conceptual Framework**

The model's goal is to estimate the conditional probability:

$$
P(QE_{t+1} = 1 \mid X_t)
$$

That is, given the current state of financial and policy variables $X_t$, what is the likelihood that the Federal Reserve will engage in quantitative easing in the **next quarter**?

This framework provides a bridge between **macroeconomic reasoning** (Fed reaction functions) and **data-driven inference**, serving as the foundation for both the econometric (logistic) and machine learning (random forest, XGBoost) models that follow.



# Results

The Random Forest model was introduced to capture **nonlinear relationships** in the prediction of quantitative easing (QE) decisions, which the traditional **logistic regression** model—based on a linear probability function—might overlook.

---

### 1. Logistic Regression (Baseline)

The baseline model assumes that the probability of a QE event in the next quarter depends linearly on a set of predictors $X_t$, through the logistic transformation:

$$
P(QE_{t+1} = 1 \mid X_t) = \frac{1}{1 + \exp(-(\beta_0 + \beta^\top X_t))}.
$$

Here, $X_t = [\text{Fed\_Growth}_t, \text{Fed\_Growth\_Lag1}_t, \text{VIX\_Level}_t, \text{VIX\_Change}_t, \text{VIX\_Change\_Lag1}_t, \text{VIX\_High}_t]$ summarizes both the monetary policy stance and financial market stress.
Each coefficient $\beta_i$ measures the **marginal log-odds impact** of its variable on the likelihood of QE.

Empirically, the logistic regression found only **Fed_Growth** significant ($p = 0.03$), implying that higher Fed balance sheet expansion increases the probability of QE continuation. The model achieved an accuracy of 0.73 and AUC around 0.75—adequate, but limited by its linear specification.

---

### 2. Random Forest (Nonlinear Extension)

To address this, we applied a **Random Forest (RF)** classifier, which models

$$
P(QE_{t+1} = 1 \mid X_t) = f_{\text{RF}}(X_t),
$$

where $f_{\text{RF}}$ is the ensemble average of $B$ decision trees $T_b(X_t)$:

$$
f_{\text{RF}}(X_t) = \frac{1}{B} \sum_{b=1}^{B} T_b(X_t).
$$

Each tree partitions the predictor space into regions $R_{b,j}$ and predicts the majority class within each region. By averaging over many trees, Random Forest reduces variance and avoids overfitting—effectively capturing nonlinear interactions such as:

* how the Fed's reaction to market volatility ($\text{VIX}_t$) might depend on current or past balance sheet growth ($\text{Fed\_Growth}_t$),
* or how persistent volatility shocks affect QE probability differently under high-stress vs. low-stress regimes.

---

### 3. Model Implementation

In practice:

* The RF model was trained using $500$ trees ($B = 500$),
* Two variables were randomly selected at each split to introduce feature diversity,
* The model optimized for **classification accuracy** and **out-of-bag (OOB)** error.

The out-of-bag error stabilized at **26.6%**, corresponding to an accuracy of approximately **0.77** on the test set—an improvement over the logistic model.
The **AUC** increased to **0.85**, showing superior discriminative ability.

---

### 4. Comparative Insights

| Model               | Accuracy | AUC   | Key Insights                                                |
| ------------------- | -------- | ----- | ----------------------------------------------------------- |
| Logistic Regression | 0.73     | ~0.75 | Captures linear effects; only Fed_Growth significant        |
| Random Forest       | 0.77     | 0.85  | Captures nonlinearities between policy and market variables |

The Random Forest's higher AUC and accuracy suggest that QE decisions are influenced by **interacting and nonlinear dynamics** rather than simple additive effects. For instance, moderate volatility changes may trigger QE only when combined with already high Fed growth or lagged stress indicators.

---

### 5. Interpretation

Conceptually, logistic regression assumes a smooth, monotonic response curve, while Random Forest allows the probability surface $f_{\text{RF}}(X_t)$ to be **piecewise and adaptive** to the data.
This flexibility means that the model can approximate:

$$
P(QE_{t+1} = 1 \mid X_t) \approx E[Y \mid X_t] = \mathbb{E}_{\text{data}}[\text{QE event at } t+1 \mid X_t],
$$

without requiring a specific functional form for $f(\cdot)$.

---

### 6. Economic Meaning

From an economic perspective:

* The **Fed_Growth** variable remains the strongest predictor—confirming that expansionary balance sheet movements are precursors to future QE.
* The **interaction with volatility** (captured implicitly by the RF) shows that the Fed's response to financial stress is **state-dependent**.
* The higher predictive power of RF thus reveals that QE actions are not triggered by single linear thresholds but by **conditional relationships** between market stress and prior policy expansion.

---

**In summary**, the Random Forest extends the logistic model by allowing flexible, data-driven mapping from economic indicators to QE outcomes. Its superior performance—accuracy 0.77 vs. 0.73, AUC 0.85 vs. ~0.75—demonstrates that nonlinear dependencies play a crucial role in explaining the timing and likelihood of Federal Reserve quantitative easing actions.