# Part II — Regime Dynamics via Markov Chains

This notebook builds directly on **Part I — Gold Market Risk: Setup & Definitions**.

In Part I, we constructed a clean gold price dataset (using GLD as a proxy), engineered interpretable market features (returns, volatility, momentum, and price z-scores), formally defined a forward-looking risk event (a ≥10% drawdown within 90 trading days), and evaluated a probabilistic risk model using a time-respecting 80/20 split.

In this part, we do not introduce another supervised prediction model.
Instead, we shift perspective from event prediction to state-based market structure.

We model the gold market as a **discrete-time stochastic process**, in which each trading day is assigned to a finite set of regimes and regime evolution is governed by a **Markov chain**. The analysis proceeds by (i) formally defining regimes as functions of observable market features, (ii) imposing a first-order Markov assumption on regime transitions, and (iii) studying regime persistence, transition probabilities, and multi-step dynamics.

Risk is therefore interpreted indirectly, through the likelihood of entering or remaining in adverse regimes, rather than through direct price or event forecasts. This regime-based formulation complements Part I by emphasizing the structure, persistence, and dynamics of market conditions underlying observed risk outcomes.

## 1. Data continuity from Part I

As in Part I, our analysis is based on daily GLD data starting in 2006.

We reuse the same feature set constructed previously:
- daily log returns
- rolling volatility
- rolling momentum
- price z-scores

This ensures that:
- the Markov regime analysis is **fully consistent** with the earlier risk framework,
- results across Part I and Part II are directly comparable.

For reproducibility, this notebook rebuilds the dataset from raw data using the same logic.
(Alternatively, the same dataset could be loaded from a saved CSV produced in Part I.)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf

## 2. Reconstructing the gold dataset

We download GLD prices and rebuild the feature set exactly as in Part I.
The object `g` will again represent our core time series, indexed by date.

This repetition is intentional:
- it keeps this notebook self-contained,
- and avoids hidden dependencies between notebooks.

In [2]:
symbol = "GLD"
start = "2006-01-01"

df = yf.download(symbol, start=start, auto_adjust=True, progress=False)

# Handle possible MultiIndex columns
if isinstance(df.columns, pd.MultiIndex):
    df.columns = df.columns.get_level_values(0)

df.columns = [c.lower() for c in df.columns]

# Base price series
g = df[["close"]].copy()
g.index = pd.to_datetime(g.index)

print("Start:", g.index.min().date())
print("End:  ", g.index.max().date())
print("Rows:", len(g))

g.tail()

Start: 2006-01-03
End:   2026-01-27
Rows: 5048


Unnamed: 0_level_0,close
Date,Unnamed: 1_level_1
2026-01-21,443.600006
2026-01-22,451.790009
2026-01-23,458.0
2026-01-26,464.700012
2026-01-27,476.100006


## 3. Feature construction (same definitions as Part I)

We now reconstruct the same interpretable features used previously.

These features are not chosen to optimize prediction accuracy,
but to capture market state characteristics such as:
- turbulence,
- persistence,
- and overextension.

They will later be used to define regimes, not to fit a classifier.

In [4]:
# Daily log returns
g["ret_1d"] = np.log(g["close"]).diff()

# Rolling volatility (annualized)
g["vol_20d"] = g["ret_1d"].rolling(20).std() * np.sqrt(252)
g["vol_60d"] = g["ret_1d"].rolling(60).std() * np.sqrt(252)

# Rolling momentum
g["mom_20d"] = np.log(g["close"]).diff(20)
g["mom_60d"] = np.log(g["close"]).diff(60)

# Price z-score relative to 1-year history
roll = 252
mu = g["close"].rolling(roll).mean()
sd = g["close"].rolling(roll).std()
g["z_252"] = (g["close"] - mu) / sd

g.tail()

Unnamed: 0_level_0,close,ret_1d,vol_20d,vol_60d,mom_20d,mom_60d,z_252
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2026-01-21,443.600006,0.014464,0.267546,0.203606,0.105912,0.157941,2.506406
2026-01-22,451.790009,0.018294,0.264075,0.205709,0.101387,0.179594,2.640009
2026-01-23,458.0,0.013652,0.264278,0.196244,0.101873,0.22148,2.725997
2026-01-26,464.700012,0.014523,0.263931,0.196146,0.120539,0.243195,2.81674
2026-01-27,476.100006,0.024236,0.271179,0.199735,0.133165,0.271225,2.993552


## 4. From features to regimes

In Part I, features were inputs to a probabilistic model.
Here, the same features are used to **categorize each day into a market regime**.

The idea is simple:
- markets tend to operate in recurring states (calm, trending, stressed),
- these states persist for some time,
- and transitions between states are not random.

A Markov chain provides a minimal mathematical framework to model this behavior.

## 5. Defining discrete market states

We define four regimes using volatility and momentum:

- **State 0 — Calm / Normal**
  - moderate volatility
  - no strong momentum

- **State 1 — Trend Up**
  - strong positive momentum
  - volatility not elevated

- **State 2 — Trend Down**
  - weak or negative momentum
  - volatility not elevated

- **State 3 — Stress**
  - high volatility, regardless of direction

Thresholds are defined using historical quantiles,
making the classification adaptive rather than fixed.

In [5]:
mc = g.dropna(subset=["ret_1d","vol_20d","mom_60d","z_252"]).copy()

vol_hi = mc["vol_20d"].quantile(0.75)
mom_hi = mc["mom_60d"].quantile(0.75)
mom_lo = mc["mom_60d"].quantile(0.25)

vol_hi, mom_hi, mom_lo

(np.float64(0.18668631017090082),
 np.float64(0.0716311266520755),
 np.float64(-0.02929651871799077))

In [6]:
def assign_state(row):
    vol = row["vol_20d"]
    mom = row["mom_60d"]

    if vol >= vol_hi:
        return 3          # Stress
    if mom >= mom_hi:
        return 1          # Trend up
    if mom <= mom_lo:
        return 2          # Trend down
    return 0              # Calm / normal

mc["state"] = mc.apply(assign_state, axis=1).astype(int)

state_names = {
    0: "Calm / Normal",
    1: "Trend Up",
    2: "Trend Down",
    3: "Stress"
}

mc["state_name"] = mc["state"].map(state_names)
mc["state_name"].value_counts()

Unnamed: 0_level_0,count
state_name,Unnamed: 1_level_1
Calm / Normal,1973
Stress,1200
Trend Down,882
Trend Up,742


## 6. Markov assumption

We now treat the regime sequence as a stochastic process:

$
S_t \in \{0,1,2,3\}
$

and assume a first-order Markov property:

$
P(S_{t+1} \mid S_t, S_{t-1}, \dots) = P(S_{t+1} \mid S_t)
$

This is a modeling assumption, not a claim about true market behavior.
Its usefulness is evaluated empirically.

In [7]:
s = mc["state"]
s_next = s.shift(-1)

counts = pd.crosstab(s, s_next)
P = counts.div(counts.sum(axis=1), axis=0)

counts, P

(state   0.0  1.0  2.0   3.0
 state                      
 0      1804   81   75    13
 1        72  643    0    27
 2        70    0  799    13
 3        27   17    8  1147,
 state       0.0       1.0       2.0       3.0
 state                                        
 0      0.914344  0.041054  0.038013  0.006589
 1      0.097035  0.866577  0.000000  0.036388
 2      0.079365  0.000000  0.905896  0.014739
 3      0.022519  0.014178  0.006672  0.956631)

## 7. Transition matrix interpretation

Each row of the matrix represents the current regime.
Each column represents the next-day regime.

Entry (i, j) is the estimated probability of transitioning
from regime i today to regime j tomorrow.

High diagonal values indicate persistence of regimes.
Off-diagonal mass indicates regime switching.

In [8]:
Pmat = P.values
states = list(P.index)
stress_idx = states.index(3)

for k in [5, 20, 60]:
    Pk = np.linalg.matrix_power(Pmat, k)
    print(f"\nProbability of being in Stress after {k} days:")
    for i, st in enumerate(states):
        print(f"  Start in {state_names[st]:<14} → {Pk[i, stress_idx]:.3f}")


Probability of being in Stress after 5 days:
  Start in Calm / Normal  → 0.042
  Start in Trend Up       → 0.133
  Start in Trend Down     → 0.061
  Start in Stress         → 0.808

Probability of being in Stress after 20 days:
  Start in Calm / Normal  → 0.159
  Start in Trend Up       → 0.233
  Start in Trend Down     → 0.162
  Start in Stress         → 0.484

Probability of being in Stress after 60 days:
  Start in Calm / Normal  → 0.244
  Start in Trend Up       → 0.252
  Start in Trend Down     → 0.243
  Start in Stress         → 0.276


## 8. Relation to the risk framework in Part I

In Part I, risk was defined explicitly as a future drawdown event.

Here, risk is implicit:
- some regimes (e.g. Stress) are historically associated with adverse outcomes,
- the Markov chain quantifies how likely the market is to *enter* or *remain* in such regimes.

Thus:
- Part I answers: *"What is the probability of a drawdown given today’s features?"*
- Part II answers: *"How does the market move between risk regimes over time?"*

Together, they provide relatively more complementary views of market risk.

## 9. Mathematical formulation of the regime process

Let $\{P_t\}_{t=1}^T$denote the adjusted daily close price of GLD.

From Part I and Sections 2–3 of this notebook, we constructed a feature vector
$x_t = (\text{vol}_{20,t}, \text{mom}_{60,t},  z_{252,t}) \in \mathbb{R}^3$ for each trading day $t$.

We now define a discrete-valued regime (state) process $\{S_t\}_{t=1}^T$ such that
$S_t \in \{0,1,2,3\}$, where each value corresponds to a qualitatively distinct market condition.

## 10. State definition

States are defined deterministically as functions of $x_t$ using historical quantile thresholds.

Let $q^{\text{vol}}_{0.75}$ denote the $75\%$ quantile of $\text{vol}_{20,t}$,
and let $q^{\text{mom}}_{0.75}$ and $q^{\text{mom}}_{0.25}$ denote the corresponding quantiles of $\text{mom}_{60,t}$.

The regime assignment function $f : \mathbb{R}^3 \to \{0,1,2,3\}$ is defined as follows:

- $S_t = 3$ if $\text{vol}_{20,t} \ge q^{\text{vol}}_{0.75}$ (stress regime),
- $S_t = 1$ if $\text{vol}_{20,t} < q^{\text{vol}}_{0.75}$ and $\text{mom}_{60,t} \ge q^{\text{mom}}_{0.75}$ (upward trend),
- $S_t = 2$ if $\text{vol}_{20,t} < q^{\text{vol}}_{0.75}$ and $\text{mom}_{60,t} \le q^{\text{mom}}_{0.25}$ (downward trend),
- $S_t = 0$ otherwise (calm or neutral regime).

This construction induces a discrete stochastic process $\{S_t\}$ from continuous market features.

## 11. Markov assumption

We model the regime process $\{S_t\}$ as a first-order, time-homogeneous Markov chain.

Formally, we assume that for all $t$ and all states $i,j \in \{0,1,2,3\}$,
the conditional probability satisfies
$P(S_{t+1}=j \mid S_t=i, S_{t-1}, \dots, S_1) = P(S_{t+1}=j \mid S_t=i)$.

This assumption implies that the future regime depends on the present regime only,
and not on the full past history of the process.

## 12. Transition matrix

Let $P \in \mathbb{R}^{4 \times 4}$ denote the transition matrix of the Markov chain.

Each entry is defined as
$P_{ij} = P(S_{t+1}=j \mid S_t=i)$ for $i,j \in \{0,1,2,3\}$.

Empirically, $P_{ij}$ is estimated by relative frequencies:
$P_{ij} = \frac{\#\{t : S_t=i \text{ and } S_{t+1}=j\}}{\#\{t : S_t=i\}}$.

By construction, each row of $P$ sums to one, i.e. $\sum_{j=0}^3 P_{ij} = 1$ for all $i$.

## 13. Multi-step regime dynamics

For an integer horizon $k \ge 1$, the $k$-step transition matrix is given by $P^k$.

The entry $(P^k)_{ij}$ represents the probability
$P(S_{t+k}=j \mid S_t=i)$.

In particular, if state $3$ is interpreted as a stress regime,
then $(P^k)_{i3}$ quantifies the probability that the market is in a stress state
after $k$ trading days, conditional on being in regime $i$ today.

## 14. Expected time to enter the stress regime

Define the hitting time of the stress regime as
$\tau = \inf\{t \ge 0 : S_t = 3\}$.

Let $h_i = \mathbb{E}[\tau \mid S_0 = i]$ denote the expected time to first reach state $3$
starting from state $i$.

The vector $h = (h_0,h_1,h_2,h_3)$ satisfies the linear system:
- $h_3 = 0$,
- $h_i = 1 + \sum_{j=0}^3 P_{ij} h_j$ for $i \ne 3$.

This system uniquely determines the expected time to stress from each regime.

## 15. Relation to the drawdown risk framework in Part I

In Part I, we defined a binary event variable
$Y_t = \mathbf{1}\{\min_{1 \le s \le 90} (P_{t+s}/P_t - 1) \le -0.10\}$,
representing the occurrence of a $10\%$ drawdown within the next $90$ trading days.

The Markov framework does not model $Y_t$ directly.
Instead, it characterizes the evolution of the latent regime $S_t$.

The two approaches are linked via conditional probabilities of the form
$P(Y_t=1 \mid S_t=i)$,
which can be estimated empirically.

Thus:
- Part I models $P(Y_t=1 \mid x_t)$ directly,
- Part II models $P(S_{t+k}=j \mid S_t=i)$ and regime persistence.

Together, they decompose market risk into **state identification** and **state dynamics**.

## 16. Interpretation

The Markov chain model provides a structural description of gold market regimes,
their persistence, and their transition dynamics.

All probabilities are historical estimates under a stationarity assumption.
No causal interpretation or predictive guarantee is implied.

This regime-based view complements the probabilistic risk estimates developed in Part I,
offering an alternative lens on market risk dynamics.