<a href="https://colab.research.google.com/github/4nuragb/PGM/blob/main/Hidden_Markov_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [26]:
!pip install yfinance
!pip install talib-binary



In [27]:
import numpy as np
import pandas as pd
# import tensorflow as tf
import yfinance as yf
import talib as ta

Dataset used- SPY ETF prices that replicates the S&P 500 index.

In [28]:
SPY= yf.download("SPY",start="2018-01-01",end="2018-12-31")

[*********************100%***********************]  1 of 1 completed


In [29]:
data=SPY[['Open','High','Low','Adj Close']].copy()
data.tail()

Unnamed: 0_level_0,Open,High,Low,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2018-12-21,246.740005,249.710007,239.979996,228.137024
2018-12-24,239.039993,240.839996,234.270004,222.108963
2018-12-26,235.970001,246.179993,233.759995,233.330994
2018-12-27,242.570007,248.289993,238.960007,235.122375
2018-12-28,249.580002,251.399994,246.449997,234.819061


First step- Identify the states we want to model and analyze. 
We will simply consider whether the price moves up, down or is unchanged.<br>
The three possible states are:
<li> Up: The price has increased today from yesterday's price</li>
<li> Down: The price today has decreased compared to yesterday's price.</li>
<li> Flat: The price remains unchanged compared to yesterday's price.</li>
<br>
To obtain the states in our data frame, the first task is to calculate the daily return, although it should be remembered hat the logarithmic return is usually better fitted to a normal distribution.

In [30]:
data['pct_ret']=data['Adj Close'].pct_change()
data.tail()

Unnamed: 0_level_0,Open,High,Low,Adj Close,pct_ret
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-12-21,246.740005,249.710007,239.979996,228.137024,-0.02049
2018-12-24,239.039993,240.839996,234.270004,222.108963,-0.026423
2018-12-26,235.970001,246.179993,233.759995,233.330994,0.050525
2018-12-27,242.570007,248.289993,238.960007,235.122375,0.007677
2018-12-28,249.580002,251.399994,246.449997,234.819061,-0.00129


We then identifu the possible states according to the return. The Flat state could be defined as a range and hence to consider an up/down as a minimum movement.

In [31]:
data['state']=data['pct_ret'].apply(lambda x:'Up' if(x>0.001) 
else ('Down' if (x<-0.001)
else 'Flat'))

In [32]:
data.tail()

Unnamed: 0_level_0,Open,High,Low,Adj Close,pct_ret,state
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-21,246.740005,249.710007,239.979996,228.137024,-0.02049,Down
2018-12-24,239.039993,240.839996,234.270004,222.108963,-0.026423,Down
2018-12-26,235.970001,246.179993,233.759995,233.330994,0.050525,Up
2018-12-27,242.570007,248.289993,238.960007,235.122375,0.007677,Up
2018-12-28,249.580002,251.399994,246.449997,234.819061,-0.00129,Down


We are interested in analyzing the transitions in the prior day's price to today's price. Therefore we need to add a new column with the prior state.

In [33]:
data['priorstate']=data['state'].shift()
data.tail()

Unnamed: 0_level_0,Open,High,Low,Adj Close,pct_ret,state,priorstate
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-12-21,246.740005,249.710007,239.979996,228.137024,-0.02049,Down,Down
2018-12-24,239.039993,240.839996,234.270004,222.108963,-0.026423,Down,Down
2018-12-26,235.970001,246.179993,233.759995,233.330994,0.050525,Up,Down
2018-12-27,242.570007,248.289993,238.960007,235.122375,0.007677,Up,Up
2018-12-28,249.580002,251.399994,246.449997,234.819061,-0.00129,Down,Up


With the current state and the prior state, we can build the frequency distribution matrix.

In [34]:
#Frequency Distributions 
states= data[['priorstate','state']].dropna()
states_mat=states.groupby(['priorstate','state']).size().unstack()
states_mat

state,Down,Flat,Up
priorstate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Down,46,6,50
Flat,18,3,11
Up,39,22,54


Here we have gotten the frequency distribution of the transitions, which allows us to build the initial probability matrix or transition matrix at time t0.

In [35]:
#Initial transition matrix 
transition_matrix=states_mat.apply(lambda x: x/float(x.sum()), axis=1)
transition_matrix

state,Down,Flat,Up
priorstate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Down,0.45098,0.058824,0.490196
Flat,0.5625,0.09375,0.34375
Up,0.33913,0.191304,0.469565


This would be our transition matrix in t0, we can build the Markov Chain by multiplying this transition matrix by itself to obtain the probability matrix in t1 which would allow us to make one-day forecasts.

In [36]:
t0=transition_matrix.copy()
t1=round(t0.dot(t0),4)
t1

state,Down,Flat,Up
priorstate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Down,0.4027,0.1258,0.4715
Flat,0.423,0.1076,0.4694
Up,0.4198,0.1277,0.4525


If we continue multiplying the transition matrix that we have obtained in t1 by the original transition matrix in t0, we obtain the probabilities in time t2.

In [37]:
#Find the transition matrix at t2
t2=round(t0.dot(t1),4)
t2

state,Down,Flat,Up
priorstate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Down,0.4123,0.1257,0.4621
Flat,0.4105,0.1247,0.4648
Up,0.4146,0.1232,0.4622


Multiplying the transition matrix that we have obtained in t2 by the original transition matrix in t0, we obtain the probabilities in time t3 and so on until we find the equilibrium matrix where the probabilities do not change therefore we cannot continue evolving the prediction.

In [38]:
#Find the transition matrix at t3 
# t3= round(t0.dot(t2),10)
t3=t0.dot(t2)
t3

state,Down,Flat,Up
priorstate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Down,0.413322,0.124416,0.462308
Flat,0.412922,0.124747,0.462387
Up,0.413036,0.124335,0.462663


Interestingly, you can get identical results by raising the initial transition matrix to 'n' days to obtain the same result.

In [39]:
pd.DataFrame(np.linalg.matrix_power(t0,4))

Unnamed: 0,0,1,2
0,0.413316,0.124425,0.462259
1,0.412912,0.124752,0.462336
2,0.413031,0.124355,0.462615


To find out the equilibrium matrix we can iterate the process up to the probabilities don't change more.

In [40]:
#Find the equilibrium matrix 
i=1
a=t0.copy()
b=t0.dot(t0)

prediction=[t0]

while(not(a.equals(b))):
  print("Iteration number: "+str(i))
  i+=1
  a=b.copy()
  b=b.dot(t0)
  prediction.append(a)

Iteration number: 1
Iteration number: 2
Iteration number: 3
Iteration number: 4
Iteration number: 5
Iteration number: 6
Iteration number: 7
Iteration number: 8
Iteration number: 9
Iteration number: 10
Iteration number: 11
Iteration number: 12
Iteration number: 13
Iteration number: 14
Iteration number: 15
Iteration number: 16
Iteration number: 17
Iteration number: 18
Iteration number: 19
Iteration number: 20
Iteration number: 21
Iteration number: 22
Iteration number: 23


In [45]:
d=0
for x in prediction:
  print(d,')',x)
  d=d+1

0 ) state          Down      Flat        Up
priorstate                             
Down        0.45098  0.058824  0.490196
Flat        0.56250  0.093750  0.343750
Up          0.33913  0.191304  0.469565
1 ) state           Down      Flat        Up
priorstate                              
Down        0.402712  0.125820  0.471468
Flat        0.422987  0.107638  0.469375
Up          0.419794  0.127713  0.452493
2 ) state           Down      Flat        Up
priorstate                              
Down        0.412278  0.125678  0.462043
Flat        0.410485  0.124766  0.464749
Up          0.414612  0.123231  0.462158
3 ) state           Down      Flat        Up
priorstate                              
Down        0.413316  0.124425  0.462259
Flat        0.412912  0.124752  0.462336
Up          0.413031  0.124355  0.462615
4 ) state           Down      Flat        Up
priorstate                              
Down        0.413153  0.124410  0.462438
Flat        0.413180  0.124431  0.462388
U