A hybrid forecasting model for next-day Bitcoin closing price prediction, combining Hidden Markov Models (HMM) for market regime detection with LSTM networks for sequential learning. Benchmarked against a standalone LSTM baseline.
Bitcoin's price is highly non-linear and behaves differently across market conditions — bull runs, bear markets, and sideways consolidation each follow distinct patterns. A standard LSTM treats all time steps equally and misses these structural shifts.
This project tackles that with a two-stage hybrid approach:
- HMM detects hidden market regimes from historical price features and assigns a latent state label to each time step.
- HMM-LSTM feeds those regime labels alongside raw price data into an LSTM, giving the model structural market context when learning price sequences.
The hybrid model is compared against a standalone LSTM baseline to quantify the benefit of regime-aware inputs.
| Model | MAE | RMSE | MAPE |
|---|---|---|---|
| LSTM | 1471.92 | 1830.40 | 3.04% |
| HMM-LSTM | 1074.29 | 1585.33 | 2.03% |
The hybrid HMM-LSTM model outperforms the standalone LSTM across all three metrics, achieving a ~27% reduction in MAE and a ~1% improvement in MAPE, demonstrating that incorporating hidden market regime information meaningfully improves prediction accuracy.
Bitcoin-Prediction-Model/
│
├── 1_DATA.ipynb # Data collection, cleaning, and preprocessing
├── 2_HYPERTUNEHMM.ipynb # HMM hyperparameter tuning (hidden states, covariance)
├── 3_HMMLSTM.ipynb # Hybrid HMM + LSTM model training and evaluation
├── 4_LSTM.ipynb # Standalone LSTM baseline for benchmarking
│
├── Dataset_Raw.csv # Original raw BTC price data
├── Cleaned_Data.csv # Cleaned and preprocessed data
├── Dataset_Ready.csv # Final dataset with HMM states, ready for modelling
│
└── Image/ # Output plots and visualisations
- Source: Investing.com — daily BTC/USD historical data
- Date range: 2019 – 2024
- Input feature: Daily closing price
- Preprocessing includes cleaning, normalisation, and sequence windowing
(lookback window: 30 days)
A Gaussian HMM is trained on the BTC price series to identify distinct latent market regimes (e.g. bullish, bearish, ranging, volatile).
Hyperparameter tuning was performed over the number of hidden states (2–4), with model selection based on the Bayesian Information Criterion (BIC):
| Hidden States | Covariance | Iterations | BIC |
|---|---|---|---|
| 2 | tied | 1000 | 39208.95 |
| 3 | tied | 1000 | 37963.15 |
| 4 | tied | 1000 | 37149.49 ✅ |
4 hidden states yielded the lowest BIC and was selected as the final configuration.
Each time step in the dataset is labelled with its corresponding HMM state (0–3) and
passed as an additional input feature to the LSTM.
The LSTM receives a sequence of 30 time steps, where each step includes the closing price and its HMM regime label. This enriched input allows the model to distinguish between structurally different market phases.
Architecture:
Input(shape=(30, features))
→ LSTM(200 units, return_sequences=True)
→ LSTM(200 units, return_sequences=False)
→ Dense(50, activation='relu')
→ Dense(1, activation='linear')
Training configuration:
| Parameter | Value |
|---|---|
| Epochs | 50 |
| Batch size | 32 |
| Train/Val split | 70% / 30% |
| Optimiser | Adam |
| Loss | Mean Squared Error |
LSTM hyperparameters (layers, units, batch size, train split, dropout) were tuned experimentally. The configuration above represents the best-performing setup and is the version uploaded to this repository.
An identical LSTM architecture trained on closing price only — without HMM state inputs. Used as a direct benchmark to isolate the contribution of regime detection.
- Python — core language
- Jupyter Notebook — development environment
- hmmlearn — Hidden Markov Model
- TensorFlow / Keras — LSTM network
- Pandas / NumPy — data manipulation
- Scikit-learn — preprocessing and evaluation
- Matplotlib / Seaborn — visualisation
pip install numpy pandas matplotlib seaborn scikit-learn hmmlearn tensorflow1_DATA.ipynb → Prepare and clean the dataset
2_HYPERTUNEHMM.ipynb → Tune and fit the HMM, generate regime labels
3_HMMLSTM.ipynb → Train and evaluate the hybrid HMM-LSTM model
4_LSTM.ipynb → Train and evaluate the standalone LSTM baseline
This project was developed as a final thesis/academic project and is intended purely for research and educational purposes.
This is NOT financial advice (NFA). Nothing in this repository should be interpreted as a recommendation to buy, sell, or trade Bitcoin or any other asset. Cryptocurrency markets are highly volatile and unpredictable.
Always Do Your Own Research (DYOR) before making any financial decisions. The authors take no responsibility for any financial losses incurred from the use of this model.
This project is open-source and available under the MIT License.