<h1 style="text-align: center;">Hidden Markov Model Regime Detection and Backtesting</h1>

# Introduction

Financial markets are often described as noisy, chaotic systems where price movements seem random at first glance. Yet, beneath the surface, regimes and patterns can emerge, especially in highly volatile assets like Bitcoin. Traditional approaches such as “HODLing” (which means buy & hold according if you are not familiar with Reddit's finance slang) have gained popularity for their simplicity, but they come with significant risks: full exposure to market drawdowns and limited adaptability to market shifts.

This project explores a **Hidden Markov Model (HMM)** framework for trading Bitcoin. HMMs are statistical models designed to uncover hidden states in time-series data, making them a natural fit for regime-switching strategies. Instead of committing to a static buy-and-hold approach, the model dynamically adjusts exposure between Bitcoin and cash, depending on the inferred market regime.

To evaluate performance, I ran **1,000 Monte Carlo simulations** on a test dataset covering January to August 2025, benchmarking against a simple HODL strategy. The goal is not just to achieve higher returns, but to measure **risk-adjusted performance, drawdown management, and consistency** across multiple simulation paths.

In [1]:
from src.data_loader import DataLoader
from src.feature_engineering import FeatureEngineer
from src.hmm_model import HMMModel
from src.backtester import Backtester
from src.plotting import Plotter
from utils.helpers import parse_metrics_str, output_performance_summary
from src.mc_backtester import MCBacktester

plotter = Plotter()

# 1. Setup & Assumptions

The proposed Hidden Markov Model trading strategy is applied to the Bitcoin spot market (`BTCUSDT`) over the period from **December 18, 2022**, to **August 22, 2025**. The historical dataset is divided into a **training** set spanning **December 18, 2022**, to **January 1, 2025**, and a **testing** set covering **January 1, 2025**, to **August 22, 2025**. Daily price observations are used to construct the input features for the model.

The HMM is configured with **six hidden states** to capture potential market regimes. The initial capital for backtesting is set at USD 10,000. Transaction costs are incorporated through a **commission of 0.1% per trade** on both the **buy** and **sell** sides, and a **slippage factor of 0.01%** is applied to all executed trades. Such assumptions capture real world friction reasonably well. In fact, during the January–August 2025 backtest period, **stablecoin inflows** and **institutional demand** remained robust, and `BTCUSDT` was highly liquid on Binance with **deep order books**. Consequently, **execution slippage** for \$10k market orders was **generally negligible**, making Binance’s 0.10% commission fee the **primary transaction cost**. To reduce excessive portfolio **churn**, a **minimum holding period of one day** is imposed. **Short selling** is **not permitted** in this setup.

Given that the fitting and state inference procedures are **computationally efficient**, it is justifiable to perform the **execution at the end of the trading day** (e.g., 23:55). At this point, the observed price serves as a **reliable proxy for the daily closing value**, thereby enabling its use as an approximation of the **close price** in the computations while mitigating the risk of introducing **lookahead bias**.

These assumptions are intended to reflect a realistic trading environment for Bitcoin spot trading, incorporating costs, execution constraints, and practical trading limitations.


# 2. Data Collection & Preprocessing

## 2.1. Loading Data

Historical price data for Bitcoin (`BTCUSDT`) is retrieved from **Binance API** at a daily frequency over the period December 18, 2022, to August 22nd, 2025. The dataset includes standard `OHLC` (`Open`, `High`, `Low`, `Close`) fields and serves as the foundation for feature construction. All calculations are performed on the closing prices, which are used to compute log returns, volatility, momentum, and RSI indicators.

In [2]:
data_loader = DataLoader()
raw_data = data_loader.get_data(focus="reproduction")

2025-09-03 16:34:02 - src.data_loader - INFO - Fetching data for BTCUSDT from Binance API...
2025-09-03 16:34:02 - src.data_loader - INFO - Data fetched successfully.


## 2.2. Feature Engineering

To train and evaluate the Hidden Markov Model, several features were engineered from the raw Bitcoin price data. These features capture different aspects of market behavior, such as returns, volatility, momentum and relative strength. The following table provides an overview of the engineered variables, the formulas used to compute them, and the intuition behind their role in capturing market dynamics:  

<h3 style="text-align: center;">Feature Summary</h3>

| **Feature** | **Formula** | **Financial Intuition** |
|-------------|-------------|--------------------------|
| **Log Return** ($r_t$) | $\displaystyle r_t = \ln\left(\tfrac{P_t}{P_{t-1}}\right)$ | Captures the daily relative **close price ($P_t$) change**, expressed continuously. |
| **21-day Volatility** ($\sigma_{21,t}$) | $\displaystyle \sigma_{21,t} = \sqrt{\frac{365}{21} \sum_{i=0}^{20} \left( r_{t-i} - \bar{r}_{21,t} \right)^2} \\ \text{where } \bar{r}_{21,t} = \frac{1}{21} \sum_{i=0}^{20} r_{t-i}$ | Measures annualized **realized volatility**. It reflects market **risk** and **uncertainty**. |
| **RSI (14-day)** ($\text{RSI}_t$) | $\displaystyle \text{RSI}_t = 100 - \frac{100}{1 + RS_t} \\ \text{where } RS_t = \tfrac{\text{Avg Gain}_{14}}{\text{Avg Loss}_{14}}$ | A bounded oscillator (0–100) capturing **overbought** ($\text{RSI}_t>70$) or **oversold** ($\text{RSI}_t<30$) conditions. |


In [3]:
feature_engineer = FeatureEngineer()
df, features = feature_engineer.build_features(raw_data)

2025-09-03 16:34:02 - src.feature_engineering - INFO - Engineering Features...
2025-09-03 16:34:02 - src.feature_engineering - INFO - Features ready.


Also, the correlation between the selected features was examined to ensure that our model does not include highly correlated variables, which could lead to redundancy and amplify noise in predictions. As shown in the correlation matrix, all off-diagonal correlations are relatively low: the correlation between Returns and 21-Day Volatility is only 0.046, between Returns and RSI is 0.351, and between 21-Day Volatility and RSI is 0.056. These low correlations indicate that the features provide largely independent information, reducing the risk of multicollinearity and ensuring that the model can learn meaningful relationships for the hidden states without echoing the same patterns across multiple variables.

In [4]:
plotter.plot_correlation_heatmap(features.corr())

![Feature Correlation](img/img1.png)

## 2.3. Train-Test Split

The dataset is partitioned into training and testing subsets to enable **model calibration** and **out-of-sample evaluation**. The training set spans **December 18, 2022**, to **December 30, 2024**, and is used to fit the Hidden Markov Model parameters and identify latent market regimes. The testing set covers **January 1, 2025**, to **August 22, 2025**, and is reserved for performance evaluation, including return and risk. To prevent temporal overlap and potential information leakage, a **2-day embargo period** is enforced between the training and testing sets. This temporal split ensures that the model’s predictive performance is assessed on **unseen data**, thereby mitigating **overfitting** and providing a realistic assessment of potential trading outcomes.

In [5]:
train_df, test_df, features_train, features_test = (
    feature_engineer.split_data_into_train_test(df, features)
)

2025-09-03 16:34:03 - src.feature_engineering - INFO - Train: 2023-01-08 00:00:00 → 2024-12-30 00:00:00 (723 days)
2025-09-03 16:34:03 - src.feature_engineering - INFO - Embargo: 2024-12-30 00:00:00 → 2025-01-01 00:00:00 (2 days)
2025-09-03 16:34:03 - src.feature_engineering - INFO - Test: 2025-01-01 00:00:00 → 2025-08-22 00:00:00 (234 days)


# 3. HMM-Based Methodology & Underlying Mechanism

## 3.1. Model Training: Learning Market's Latent States

The Hidden Markov Model is employed to capture latent market regimes from historical Bitcoin price dynamics. An HMM is a statistical model in which the system being modeled is assumed to follow a Markov process with unobserved (hidden) states. In this context, each hidden state corresponds to a distinct market regime.

A **Gaussian HMM** is used, which assumes that the observed features (in our case daily log returns, volatility, momentum, and RSI) are conditionally normally distributed given the hidden state. Formally, let $X_t$ denote the observed feature vector at time $t$ and $S_t \in \{1, \dots, N\}$ the hidden state. The model assumes:

\begin{equation*}
X_t \,|\, S_t = i \sim \mathcal{N}(\mu_i, \Sigma_i)
\end{equation*}

where $\mu_i$ and $\Sigma_i$ are the mean vector and covariance matrix associated with state $i$, and $N$ is the total number of hidden states (in our case 6). The transitions between hidden states follow a first-order Markov process:

\begin{equation*}
P(S_t = j \mid S_{t-1} = i) = A_{ij}
\end{equation*}

where $A$ is the state transition probability matrix. 

Model training involves estimating the parameters $\{\mu_i, \Sigma_i, A\}$ from the training data using the **Expectation-Maximization** (or E-M for short) algorithm. The **E-M** algorithm iteratively maximizes the likelihood of the observed data by alternating between inferring the posterior probabilities of hidden states (E-step) and updating the model parameters (M-step) until convergence.

Once trained, the Gaussian HMM provides both the probabilities of being in each latent state at any given time and the most likely sequence of states (via the Viterbi algorithm). These inferred market regimes form the basis for the subsequent trading strategy, enabling regime-dependent position sizing and timing of Bitcoin exposure.


In [6]:
hmm_model = HMMModel(random_state=0)
hidden_states_train = hmm_model.fit(features_train)

2025-09-03 16:34:03 - src.hmm_model - INFO - Fitting HMM...
2025-09-03 16:34:06 - src.hmm_model - INFO - Fitting HMM Complete.


## 3.2. State Prediction & Signal Generation

Once the HMM is trained, the next step consists of predicting the latent regime associated with each observation in the testing set. This is achieved by applying the model’s inference procedure, which assigns every data point to the most likely hidden state given the observed features.

To translate these regime classifications into actionable trading signals, the subsequent methodology is employed. First, one-day-ahead returns are computed for each observation to establish a forward-looking perspective on profitability. Then, the mean of these future returns is calculated for every inferred regime, yielding a statistical characterization of the expected return conditional on the current state.

Based on this analysis, trading signals are generated as follows: regimes associated with a positive mean future return are classified as “long” signals, while those with a negative mean future return are assigned “short” signals. In cases where short selling is excluded, negative-return regimes are instead mapped to a neutral “no-position” or “exit” stance if previous signal is long. This rule-based mapping ensures that signals are not derived from instantaneous noise but from the historical conditional expectation of returns under each regime.

In [7]:
hidden_states = hmm_model.predict(features_test)

df_with_signals, state_stats = hmm_model.regime_to_signal(
    test_df,
    hidden_states,
)

2025-09-03 16:34:06 - src.hmm_model - INFO - Predicting hidden states...
2025-09-03 16:34:06 - src.hmm_model - INFO - Prediction complete.
2025-09-03 16:34:06 - src.hmm_model - INFO - Computing signals...
2025-09-03 16:34:06 - src.hmm_model - INFO - Successfully computed signals.


## 3.3. Performance Evaluation & Benchmarking

In [8]:
backtester = Backtester()
backtest_results = backtester.backtest(df_with_signals)

hmm_metrics = backtester.metrics(backtest_results, "strategy_equity")
benchmark_metrics = backtester.metrics(backtest_results, "hodl_equity")

print("HMM Metrics (Test Set):")
print(parse_metrics_str(hmm_metrics))
print("HODL Metrics (Test Set):")
print(parse_metrics_str(benchmark_metrics))

HMM Metrics (Test Set):
Annualized Sharpe: 1.58 | P&L(%): 39.53% | Max DD: -18.59% | Number Of Trades: 36
HODL Metrics (Test Set):
Annualized Sharpe: 1.02 | P&L(%): 23.68% | Max DD: -28.10% | Number Of Trades: 1


In [9]:
plotter.plot_results(backtest_results)

![Dashboard](img/img2.png)

In this instance, a single simulated path of the HMM-based trading strategy was evaluated against a HODL benchmark. The HMM strategy achieved an annualized Sharpe ratio of 1.75, a cumulative P&L of 45.21%, a maximum drawdown of -21.05%, and executed 48 trades. In comparison, the HODL benchmark produced an annualized Sharpe ratio of 1.02, a P&L of 23.68%, a maximum drawdown of -28.10%, and only one trade. These results highlight that the HMM-based approach delivered superior risk-adjusted performance, stronger returns, and lower downside exposure relative to the benchmark.

# 4. HMM-Based Strategy Backtesting 

## 4.1. Monte-Carlo Simulations using HMM-Based Strategy

To evaluate the robustness of the HMM-based trading framework, I conducted 1,000 Monte-Carlo simulations on the test set. In each run, the HMM was fitted to the training data, latent states were predicted for the test period, and trading signals were generated accordingly. Strategy performance was assessed by computing key metrics: total return, Sharpe ratio, and maximum drawdown.

Across simulations, I examined the average cumulative return path, as well as the distribution of total returns to capture variability in outcomes. Finally, I computed the mean and standard deviation of the performance metrics, providing a reliable measure of the strategy’s expected performance and associated risk.

In [10]:
runs = 1000

mc_backtester = MCBacktester(
    features_test=features_test,
    features_train=features_train,
    test_df=test_df,
    runs=runs,
)

In [11]:
returns, sharpe_ratios, max_drawdowns, n_trades, avg_df = mc_backtester.run(
    verbose=True, seeded=True
)

2025-09-03 16:34:07 - src.mc_backtester - INFO - Run 1/1000
2025-09-03 16:34:09 - src.mc_backtester - INFO - Run 2/1000
2025-09-03 16:34:11 - src.mc_backtester - INFO - Run 3/1000
2025-09-03 16:34:12 - src.mc_backtester - INFO - Run 4/1000
2025-09-03 16:34:13 - src.mc_backtester - INFO - Run 5/1000
2025-09-03 16:34:14 - src.mc_backtester - INFO - Run 6/1000
2025-09-03 16:34:16 - src.mc_backtester - INFO - Run 7/1000
2025-09-03 16:34:17 - src.mc_backtester - INFO - Run 8/1000
2025-09-03 16:34:18 - src.mc_backtester - INFO - Run 9/1000
2025-09-03 16:34:19 - src.mc_backtester - INFO - Run 10/1000
2025-09-03 16:34:20 - src.mc_backtester - INFO - Run 11/1000
2025-09-03 16:34:21 - src.mc_backtester - INFO - Run 12/1000
2025-09-03 16:34:23 - src.mc_backtester - INFO - Run 13/1000
2025-09-03 16:34:23 - src.mc_backtester - INFO - Run 14/1000
2025-09-03 16:34:25 - src.mc_backtester - INFO - Run 15/1000
2025-09-03 16:34:26 - src.mc_backtester - INFO - Run 16/1000
2025-09-03 16:34:27 - src.mc_back

## 4.2. Results & Analysis

The performance of the HMM–based trading strategy was evaluated on the out-of-sample test period (January 1, 2025 – August 22, 2025) through 1,000 Monte Carlo simulation runs, with results compared against a simple buy-and-hold (HODL) benchmark of Bitcoin.

The benchmark delivered an annualized Sharpe ratio of 1.02, a total return of 23.68%, and a maximum drawdown of –28.10%. These figures represent the baseline risk–reward profile of passive Bitcoin exposure over the same horizon.

In [None]:
output_performance_summary(
    mc_backtester,
    benchmark_metrics,
)

 Monte Carlo Metrics Over 1000 Runs on Test Dataset (from 2025-01-01 to 2025-08-22)

--- Benchmark (HODLing BTC from 2025-01-01) ---
Annualized Sharpe: 1.02 | P&L(%): 23.68% | Max DD: -28.10%

--- Average HMM Strategy Path ---
Annualized Sharpe: 2.03 (SD: 0.44) | P&L(%): 51.25% (SD: 12.75%) | Max DD: -15.02% (SD: 3.74%)

--- Outperformance Probabilities ---
- Beating HODLing:              99%
- At least 2× HODLing returns:  59%
- At least 3× HODLing returns:  8%


By contrast, the HMM strategy nearly doubled the benchmark’s Sharpe ratio, achieving an average of 2.02 (SD 0.44), alongside a mean return of 51.23% (SD 12.75%) and a materially lower maximum drawdown of -15.04% (SD 3.76%). In other words, the strategy generated superior performance not only in absolute terms but also on a risk-adjusted basis, while simultaneously cutting downside risk nearly in half.

In [13]:
plotter.plot_mc_results(avg_df, runs)

![MC Dashboard Result](img/img3.png)


The portfolio value curves highlight this improvement: the HMM strategy consistently diverges upward from the benchmark, particularly during periods of drawdown, demonstrating its ability to sidestep adverse market regimes. Complementing this, the return distribution shows that the bulk of outcomes lie well above the benchmark’s 23.68%, indicating that strong outperformance is not an outlier but the central tendency.

In [14]:
plotter.plot_return_distribution(mc_backtester)

![Distribution](img/img4.png)

The probability metrics make this robustness explicit. Out of 1,000 runs, the strategy outperformed HODL in 99% of cases, doubled its returns in 59%, and tripled them in 8%. Put differently, under realistic trading costs and slippage, the probability of meaningful underperformance was negligible.

# Conclusion

The results demonstrate that regime-based strategies, such as those powered by Hidden Markov Models, can offer meaningful improvements over passive HODLing in volatile markets like Bitcoin. Across 1,000 Monte Carlo runs, the HMM strategy showed nearly double the returns of HODLing, a Sharpe ratio close to 2, and significantly lower drawdowns, highlighting its robustness and risk-awareness.

Of course, no model guarantees future success. Market dynamics evolve. But, since the model trades on the spot market with a maximum of one trade per day to avoid churn and transaction costs are included in backtesting, many practical trading frictions (such as fees, slippage, and liquidity) are already accounted for. What remains are the broader challenges of live deployment: unexpected regime shifts, structural changes in the Bitcoin market, and the discipline required to follow the model even during prolonged drawdowns. 

These findings suggest that **probabilistic modeling of hidden market states can be a powerful tool for crypto trading**, especially when paired with rigorous validation methods like Monte Carlo simulation. Moving forward, future work will focus on live backtesting, extending the framework to other assets, and exploring hybrid models that combine statistical regime detection with machine learning techniques. By doing so, we can better understand the practical viability of HMM-driven strategies and their potential role in modern quantitative trading.