### Imports

In [1]:
# Import necessary libraries for data manipulation and visualization
import pandas as pd
import polars as pl
import numpy as np

from datetime import datetime

from hmmlearn.hmm import GaussianHMM
from pandas_datareader.data import DataReader

import matplotlib.pyplot as plt

# Verify all libraries are imported correctly
print("Libraries imported successfully")

Libraries imported successfully


### Data Management

In [2]:
# Import necessary libraries
import yfinance as yf

# Set start and end dates for the data extraction
start_date = "2017-01-01"
end_date = "2022-06-01"
symbol = "SPY"

# Fetch data using yfinance
data = yf.download(symbol, start=start_date, end=end_date)


[*********************100%%**********************]  1 of 1 completed


In [3]:
data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-01-03,225.039993,225.830002,223.880005,225.240005,198.560043,91366500
2017-01-04,225.619995,226.75,225.610001,226.580002,199.741318,78744400
2017-01-05,226.270004,226.580002,225.479996,226.399994,199.582642,78379000
2017-01-06,226.529999,227.75,225.899994,227.210007,200.296707,71559900
2017-01-09,226.910004,227.070007,226.419998,226.460007,199.63559,46939700


In [4]:
# Convert to Polars DataFrame and include dates
import polars as pl

data.reset_index(inplace=True) # need to add the date as index otherwise it will drop the date during conversion
data = pl.DataFrame(data)

# Select relevant columns: Date, Open, High, Low, Adj Close, and Volume
data = data.select(["Date", "Open", "High", "Low", "Adj Close", "Volume"])

In [5]:

print(data.columns)
print(data.schema)

['Date', 'Open', 'High', 'Low', 'Adj Close', 'Volume']
OrderedDict({'Date': Datetime(time_unit='ns', time_zone=None), 'Open': Float64, 'High': Float64, 'Low': Float64, 'Adj Close': Float64, 'Volume': Int64})


In [6]:
data.head(3)

Date,Open,High,Low,Adj Close,Volume
datetime[ns],f64,f64,f64,f64,i64
2017-01-03 00:00:00,225.039993,225.830002,223.880005,198.560043,91366500
2017-01-04 00:00:00,225.619995,226.75,225.610001,199.741318,78744400
2017-01-05 00:00:00,226.270004,226.580002,225.479996,199.582642,78379000


In [7]:
# Create a copy of the dataframe and add returns and range columns
df = data.clone()

# Calculate daily returns
df = df.with_columns((pl.col("Adj Close") / pl.col("Adj Close").shift(1) - 1).alias("Returns")) # the -1 returns the percent return in decimals

# Calculate daily range (volatility)
df = df.with_columns((pl.col("High") / pl.col("Low")-1).alias("Range")) # the -1 returns the percent return in decimals

# Drop NaN values created by the pct_change method
# In Polars, the drop_nulls method does not require the inplace=True parameter as it returns a new DataFrame by default. 
df = df.drop_nulls()

# Display the updated dataframe with new columns
df.head()

Date,Open,High,Low,Adj Close,Volume,Returns,Range
datetime[ns],f64,f64,f64,f64,i64,f64,f64
2017-01-04 00:00:00,225.619995,226.75,225.610001,199.741318,78744400,0.005949,0.005053
2017-01-05 00:00:00,226.270004,226.580002,225.479996,199.582642,78379000,-0.000794,0.004879
2017-01-06 00:00:00,226.529999,227.75,225.899994,200.296707,71559900,0.003578,0.008189
2017-01-09 00:00:00,226.910004,227.070007,226.419998,199.63559,46939700,-0.003301,0.002871
2017-01-10 00:00:00,226.479996,227.449997,226.009995,199.63559,63771900,0.0,0.006371


In [8]:
df.tail()

Date,Open,High,Low,Adj Close,Volume,Returns,Range
datetime[ns],f64,f64,f64,f64,i64,f64,f64
2022-05-24 00:00:00,392.559998,395.149994,386.959991,380.571075,91448800,-0.007634,0.021165
2022-05-25 00:00:00,392.309998,399.450012,391.890015,383.933441,91472900,0.008835,0.019291
2022-05-26 00:00:00,398.670013,407.040009,398.450012,391.60495,82168300,0.019981,0.021559
2022-05-27 00:00:00,407.910004,415.380005,407.700012,401.218506,84768700,0.024549,0.018837
2022-05-31 00:00:00,413.549988,416.459991,410.029999,398.967285,95937000,-0.005611,0.015682


## Structure Data

In [9]:
# Structure Data
X_train = df.select(['Returns', 'Range']) # in polars we use .select

# Display the first few rows of the structured data
X_train.head()

Returns,Range
f64,f64
0.005949,0.005053
-0.000794,0.004879
0.003578,0.008189
-0.003301,0.002871
0.0,0.006371


Hidden Markov Model (HMM) for Market Analysis
Overview
This notebook demonstrates how to use a Hidden Markov Model (HMM) to analyze market states using financial data.
Data Preparation

We're using SPI (S&P 500 Index) data from 2017 to 2022.
The data includes features like returns and ranges.

Training the HMM
Import required libraries

Parameters:

n_components=4: We're looking for 4 hidden states

covariance_type="full": Using full covariance matrix

n_iter=100: Maximum number of iterations



## HMM Learning

In [18]:
#Data Preparation
import polars as pl
import numpy as np
from hmmlearn.hmm import GaussianHMM

# Convert Polars DataFrame to NumPy array
X_train_np = X_train.to_numpy()
print("Shape of X_train_np:", X_train_np.shape)

# Display first few rows of the data
print("\nFirst few rows of X_train_np:")
print(X_train_np[:5])

# Determine the number of features
n_features = X_train_np.shape[1]
print(f"\nNumber of features detected: {n_features}")

# Comments:
# - We convert the Polars DataFrame to a NumPy array because HMM requires this format.
# - The shape of X_train_np tells us how many trading days (rows) and features (columns) we have.
# - For trading, we typically use 2 features: returns and volatility (or range).
# - Checking the first few rows helps us verify that our data looks correct (e.g., reasonable values for returns and volatility).
# - The number of features is crucial for properly configuring our HMM to detect market regimes.

Shape of X_train_np: (1361, 2)

First few rows of X_train_np:
[[ 0.00594921  0.00505296]
 [-0.00079441  0.00487851]
 [ 0.00357779  0.00818949]
 [-0.00330069  0.00287081]
 [ 0.          0.00637141]]

Number of features detected: 2


In [22]:
# Fit Model
hmm_model = GaussianHMM(n_components=4, covariance_type="full", n_iter=100).fit(X_train)
print("Model Score:", hmm_model.score(X_train))

# Comments:
# - We use GaussianHMM instead of GMM (Gaussian Mixture Model) as it's better suited for time series data in trading.
# - n_components=4: We assume 4 hidden market states (e.g., bull, bear, sideways, volatile).
# - covariance_type="full": Allows for complex relationships between features, capturing nuanced market dynamics.
# - n_iter=100: Maximum number of iterations for the training algorithm. Adjust if needed for convergence.
# - .fit(X_train): Trains the model on our historical market data in one step.
# - The model score is the log-likelihood of the data given the model. Higher values indicate better fit.
# - In trading, a well-fitted model can more accurately identify current market regimes and potentially predict transitions.

Model Score: 9825.275877197631


In [25]:
# Check Results
hidden_states = hmm_model.predict(X_train)
print("First 20 hidden states:", hidden_states[:20])
print("\nUnique states:", np.unique(hidden_states))
print("\nState frequencies:")
for i in range(hmm_model.n_components):
    print(f"State {i}: {np.sum(hidden_states == i) / len(hidden_states):.2%}")

# Comments:
# - model.predict() assigns the most likely hidden state to each data point in our trading time series.
# - In trading context, each state represents a different market regime (e.g., bullish, bearish, volatile, sideways).
# - The first 20 states give us a quick look at how the model interprets the beginning of our data.
# - Unique states confirm that all defined states (0 to 3) are being used.
# - State frequencies show how often each market regime occurs in our data.
# - This information can be used to:
#   1. Identify current market conditions.
#   2. Develop regime-specific trading strategies.
#   3. Analyze how different regimes affect trading performance.

First 20 hidden states: [3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3]

Unique states: [0 1 2 3]

State frequencies:
State 0: 31.81%
State 1: 1.91%
State 2: 1.25%
State 3: 65.03%


In [28]:
# Regime state means for each feature
state_means = hmm_model.means_
state_means

array([[-0.00144613,  0.01793295],
       [-0.03152274,  0.04738416],
       [ 0.03835461,  0.04783927],
       [ 0.00188421,  0.00677365]])

In [29]:
# Regime state means for each feature
state_means = hmm_model.means_

print("Regime state means for each feature:")
for i, mean in enumerate(state_means):
    print(f"State {i}:")
    print(f"  Returns: {mean[0]:.6f}")
    print(f"  Volatility/Range: {mean[1]:.6f}")
    print()

# Comments:
# - hmm_model.means_ gives us the average feature values for each hidden state.
# - In trading context:
#   - The first value (index 0) typically represents average returns.
#   - The second value (index 1) typically represents average volatility or range.
# - Interpreting the states:
#   - High positive returns + low volatility might indicate a bullish trend.
#   - Low or negative returns + high volatility might indicate a bearish or volatile market.
#   - Returns close to zero + low volatility might indicate a sideways market.
# - This information helps characterize each market regime, allowing for:
#   1. Better understanding of current market conditions.
#   2. Development of state-specific trading strategies.
#   3. Risk assessment based on the identified market regime.

Regime state means for each feature:
State 0:
  Returns: -0.001446
  Volatility/Range: 0.017933

State 1:
  Returns: -0.031523
  Volatility/Range: 0.047384

State 2:
  Returns: 0.038355
  Volatility/Range: 0.047839

State 3:
  Returns: 0.001884
  Volatility/Range: 0.006774



In [30]:
# Regime state covariances for each feature
state_covars = hmm_model.covars_

print("Regime state covariances for each feature:")
for i, covar in enumerate(state_covars):
    print(f"State {i}:")
    print("  Covariance matrix:")
    print(f"    [{covar[0][0]:.6f}  {covar[0][1]:.6f}]")
    print(f"    [{covar[1][0]:.6f}  {covar[1][1]:.6f}]")
    
    # Calculate correlation
    correlation = covar[0][1] / np.sqrt(covar[0][0] * covar[1][1])
    print(f"  Correlation between returns and volatility: {correlation:.6f}")
    print()

# Comments:
# - hmm_model.covars_ gives us the covariance matrices for each hidden state.
# - In trading context:
#   - Diagonal elements [0][0] and [1][1] represent variances of returns and volatility respectively.
#   - Off-diagonal elements [0][1] and [1][0] represent covariance between returns and volatility.
# - Interpreting the covariances:
#   - Higher values on the diagonal indicate more variability in that feature for that state.
#   - The correlation helps understand the relationship between returns and volatility in each state.
# - Trading implications:
#   1. Risk assessment: Higher variances indicate more unpredictable market behavior.
#   2. Strategy development: Different strategies might be appropriate for high vs. low volatility states.
#   3. Portfolio management: Correlation information can inform diversification decisions.

Regime state covariances for each feature:
State 0:
  Covariance matrix:
    [0.000218  -0.000004]
    [-0.000004  0.000085]
  Correlation between returns and volatility: -0.027007

State 1:
  Covariance matrix:
    [0.001288  0.000152]
    [0.000152  0.000662]
  Correlation between returns and volatility: 0.164224

State 2:
  Covariance matrix:
    [0.001395  0.000987]
    [0.000987  0.000966]
  Correlation between returns and volatility: 0.850243

State 3:
  Covariance matrix:
    [0.000037  0.000014]
    [0.000014  0.000020]
  Correlation between returns and volatility: 0.512806



State 2 (Best for trading):
- Highest positive mean returns (3.8355%)
- High volatility/range (4.7839%)
- Strongest positive correlation between returns and volatility (0.850243)
- Occurs least frequently (1.25% of the time)

Why it's favorable:
1. High positive returns indicate a strong bullish trend.
2. High volatility provides opportunities for larger price movements.
3. Strong positive correlation between returns and volatility suggests that as volatility increases, returns tend to increase as well, which is ideal for trend-following strategies.
4. Its rarity makes it a potentially high-reward opportunity when identified.

Other states:

State 0:
- Slightly negative returns (-0.1446%)
- Moderate volatility (1.7933%)
- Very weak negative correlation (-0.027007)
- Occurs 31.81% of the time
Analysis: This could represent a mildly bearish or sideways market with some unpredictability.

State 1:
- Strongly negative returns (-3.1523%)
- High volatility (4.7384%)
- Weak positive correlation (0.164224)
- Occurs 1.91% of the time
Analysis: This likely represents a bearish, possibly crisis-like market state. High risk, but potential opportunities for short-selling or put options.

State 3:
- Slightly positive returns (0.1884%)
- Low volatility (0.6774%)
- Moderate positive correlation (0.512806)
- Most common state (65.03% of the time)
Analysis: This represents a calm, slightly bullish market. Low risk, but also limited opportunities for significant gains.

Trading implications:
1. State 2 is ideal for aggressive long positions or call options when identified.
2. State 3, being the most common, could be used as a baseline for "normal" market conditions.
3. State 0 might require more cautious, possibly range-bound trading strategies.
4. State 1, while rare, might signal a need for defensive strategies or shorting opportunities.

Strategy suggestions:
1. Develop a system to quickly identify transitions into State 2 for optimal entry points.
2. Use State 3 for accumulation or position building with lower risk.
3. Be prepared with hedging strategies when State 1 is identified.
4. Consider using options strategies that benefit from increased volatility when transitioning from State 3 to States 0, 1, or 2.

Remember, while State 2 appears most favorable, its rarity (1.25% occurrence) means it's crucial to have strategies for all market conditions, especially the more common States 3 and 0.