# Hidden Markov Models (continued)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import quantecon as qe

%matplotlib inline

## Review:

**What is a hidden Markov model?**

A hidden Markov model is a model in which there is a hidden state, $x_t$, that follows a Markov process and an observed state, $y_t$, that is a function of $x_t$ and randomness

**Conditional probabilities everywhere!**

Conditional probabilities are crucial to being able to evaluate objects of interest in the HMMs because there is the unobserved dependence.

We repeatedly use several laws of probability to manipulate the probabilites into something we can compute:


_Bayes law_

Bayes law states,

$$P(A | B) = \frac{P(A) P(B | A)}{P(B)}$$

[Wikipedia](https://en.wikipedia.org/wiki/Bayes%27_theorem)

_Definition of a conditional probability_

The definition of conditional probability states:

$$P(A | B) = \frac{P(A, B)}{P(B)}$$

which implies

$$P(A, B) = P(A | B) P(B)$$

[Wikipedia](https://en.wikipedia.org/wiki/Conditional_probability)

_Law of total probability_

Let $A$ be an event in sample space $X$ and let $\{B_n\}$ be a finite partition of the sample space. Then,

$$P(A) = \sum_n P(A | B_n) P(B_n)$$

[Wikipedia](https://en.wikipedia.org/wiki/Law_of_total_probability)

**Objects of interest**

1. ~~$P(x_t | y^t)$: Can we use the history of observed returns to identify whether we are currently in a bear or bull market -- This is known as the "filtering problem".~~
2. $P(x_\tau | y^t)$ where $\tau < t$: Can we use the history of observed returns to identify whether we were in a bear or bull market in the past -- This is known as the "smoothing problem"
3. $P(x_\tau | y^t)$ where $\tau > t$: Can we use the data we've observed until now to predict the state in the future -- This is known as the "forecasting (or prediction) problem"
4. $P(y^t)$: What is the likelihood of having observed the returns that we see -- This is known as the "likelihood problem"
5. $\hat{x}^t$: What is the most likely sequence of market conditions to have generated the data we see -- This is known as the "most likely hidden path"

## Discrete state HMMs

Recall the model we used last time.

The weekly returns for a particular stock alternate between bear and bull cycles according to a Markov chain. You have been told that the transition matrix that describes this Markov chain is given by:

\begin{align*}
  \begin{bmatrix} p_{\text{bear}} & 1 - p_{\text{bear}} \\ 1 - p_{\text{bull}} & p_{\text{bull}} \end{bmatrix}
\end{align*}

where $p_{\text{bear}} = 0.85$ and $p_{\text{bull}} = 0.7$.

Returns can either be negative ($N$), zero ($Z$), or positive ($P$).

The weekly returns that an individual earns are random and depend on whether the market is in a bear or bull cycle.

\begin{align*}
  r_{\text{bear}} = \begin{cases} N \text{ with probability } 0.2 \\ Z \text{ with probability } 0.75 \\ P \text{ with probability } 0.05 \end{cases} \\
  r_{\text{bull}} = \begin{cases} N \text{ with probability } 0.1 \\ Z \text{ with probability } 0.6 \\ P \text{ with probability } 0.3 \end{cases}
\end{align*}

**Simulate data**

We start by simulating the output of such a model.

In [None]:
# Two years of data
T = 104

p_bear = 0.85
p_bull = 0.7
P = np.array([[p_bear, 1 - p_bear], [1 - p_bull, p_bull]])

r_bear_probs = np.array([0.2, 0.75, 0.05])
r_bull_probs = np.array([0.1, 0.6, 0.3])

mc = qe.MarkovChain(P)


def simulate_bb_model(mc, r_bear_probs, r_bull_probs, T):
    # First simulate the bear/bull component
    bb_idx = mc.simulate_indices(T)

    realized_returns = np.zeros(T, dtype=int)
    for t, bb in enumerate(bb_idx):
        # Build the discrete random variable for each period
        if bb == 0:
            r_probs = qe.DiscreteRV(r_bear_probs)
        else:
            r_probs = qe.DiscreteRV(r_bull_probs)

        realized_returns[t] = r_probs.draw()[0]

    return bb_idx, realized_returns


**Examining the data**

In [None]:
bb_idx, realized_returns = simulate_bb_model(mc, r_bear_probs, r_bull_probs, 104)

In [None]:
def plot_bb_model_output(bb_idx, realized_returns):
    # Relevant plotting stuff
    T = bb_idx.shape[0]
    tvalues = np.arange(T)

    fig, ax = plt.subplots(2, 1, figsize=(8, 10), sharex=True)
    ax0, ax1 = ax

    ax0.scatter(tvalues, bb_idx)
    ax0.set_yticks([0, 1])
    ax0.set_yticklabels(["Bear", "Bull"])
    ax0.spines["right"].set_visible(False)
    ax0.spines["top"].set_visible(False)

    ax1.scatter(tvalues, realized_returns)
    ax1.set_yticks([0, 1, 2])
    ax1.set_yticklabels(["Negative", "Zero", "Positive"])
    ax1.spines["right"].set_visible(False)
    ax1.spines["top"].set_visible(False)

    pass

plot_bb_model_output(bb_idx, realized_returns)

#### Filtering problem

The filtering problem is about using the history of observed data to identify the current hidden state, i.e. $P(x_t | y^t)$

The probabilities will be computed recursively.

Let

$$\alpha(x_t) \equiv P(x_t, y^{t})$$

then, $\alpha(x_0) = P(y_0 | x_0) P(x_0)$

Recursively, if we have $\alpha(x_{t-1})$ then

\begin{align*}
  \alpha(x_t) &= P(x_t, y^{t}) \\
  &= \sum_{x_{t-1}} P(x_t, x_{t-1} y^{t}) \\
  &= \sum_{x_{t-1}} P(y_t | x_{t-1}, x_{t}) P(y^{t-1} | x_{t-1}, x_{t}) P(x_{t} x_{t-1}) \\
  &= P(y_t | x_{t}) \sum_{x_{t-1}} P(y^{t-1} | x_{t-1}) P(x_{t} | x_{t-1}) P(x_{t-1}) \\
  &= P(y_t | x_{t}) \sum_{x_{t-1}} P(y^{t-1}, x_{t-1}) P(x_{t} | x_{t-1}) \\
  &= P(y_t | x_{t}) \sum_{x_{t-1}} \alpha(x_{t-1}) P(x_{t} | x_{t-1}) \\
\end{align*}

Now notice that

\begin{align*}
  P(x_t | y^t) &= \frac{P(x_t, y^t)}{P(y^t)} \\
  &\propto P(x_t, y^t) \\
  &= \alpha(x_t)
\end{align*}

Let's see whether we can compute the probability of being in a bear/bull market.

In [None]:
# Allocate memory for our alphas
t_of_interest = 104
alphas = np.zeros((t_of_interest, 2))

# Solve for period 0 -- Equal probability of starting
# in bear/bull market
alphas[0, 0] = r_bear_probs[realized_returns[0]] * 0.5
alphas[0, 1] = r_bull_probs[realized_returns[0]] * 0.5

for t in range(1, t_of_interest):

    # Sum over  x_{t-1}
    predictor_bear = 0.0
    predictor_bull = 0.0
    for j in range(2):
        #            alpha(x_{t-1}) P(x_t | x_{t-1})
        predictor_bear += alphas[t-1, j]*mc.P[j, 0]
        predictor_bull += alphas[t-1, j]*mc.P[j, 1]

    alphas[t, 0] = r_bear_probs[realized_returns[t]]*predictor_bear
    alphas[t, 1] = r_bull_probs[realized_returns[t]]*predictor_bull

# Convert with normalizing factor!
filtering_probs = np.divide(alphas, alphas.sum(axis=1)[:, None])

print(f"Probability of bear/bull is {filtering_probs[-1, :]}")
print(f"Actual state is {bb_idx[t_of_interest-1]}")

In [None]:
tvalues = np.arange(bb_idx.shape[0])

fig, ax = plt.subplots(3, 1, sharex=True, figsize=(10, 8))

ax[0].scatter(tvalues, bb_idx)
ax[1].scatter(tvalues, realized_returns)
ax[2].plot(tvalues, filtering_probs[:, 1])
ax[2].set_ylim(0, 1)

#### Smoothing problem

The smoothing problem is about using all of the information at hand to determine the likelihood of being in a bear/bull market using all available data.

We again define a useful recursion

Let

$$\beta(x_t) \equiv P(y^{t+1:T} | x_t)$$

with $\beta(x_T) = 1$

Then,

\begin{align*}
  \beta(x_t) &= P(y^{t+1:T} | x_t) \\
  &= \sum_{x_{t+1}} P(y_{t+1}, y^{t+2:T}, x_{t+1} | x_{t}) \\
  &= \sum_{x_{t+1}} P(y_{t+1} | y^{t+2:T}, x_{t+1}, x_{t}) P(y^{t+2:T}, x_{t+1} | x_t) \\
  &= \sum_{x_{t+1}} P(y_{t+1} | x_{t+1}) P(y^{t+2:T} | x_{t+1}, x_{t}) P(x_{t+1} | x+t) \\
  &= \sum_{x_{t+1}} P(y_{t+1} | x_{t+1}) \beta(x_{t+1}) P(x_{t+1} | x+t) \\
\end{align*}

Now notice that

\begin{align*}
  P(x_t, y^T) &= P(x_t, y^t, y^{t+1:T}) \\
  &= P(y^{t+1:T} | x_t, y^t) P(x_t, y^t) \\
  &= \beta(x_t) \alpha(x_t)
\end{align*}

and thus,

\begin{align*}
  P(x_t | y^T) &= \frac{\beta(x_t) \alpha(x_t)}{\sum_{x} \beta(x_t=x) \alpha(x_t=x)} 
\end{align*}

In [None]:
# Allocate memory for our alphas
betas = np.zeros((t_of_interest, 2))

# Solve for period T -- This is just defined as 1
betas[-1, 0] = 1
betas[-1, 1] = 1

for tp1 in range(t_of_interest-1, 0, -1):

    # Sum over  x_{t-1} (exponent of sum is product of exponents)
    value_bear = 0.0
    value_bull = 0.0
    for j in range(2):
        _probs = r_bear_probs if j == 0 else r_bull_probs
        _val = (
            np.log(_probs[realized_returns[tp1]]) + 
            np.log(betas[tp1, j])
        )

        value_bear += np.exp(_val + np.log(mc.P[0, j]))
        value_bull += np.exp(_val + np.log(mc.P[1, j]))

    betas[tp1-1, 0] = value_bear
    betas[tp1-1, 1] = value_bull

smoothing_probs = alphas*betas / np.sum(alphas*betas, axis=1)[:, None]

How do these differ from the filtering probabilities?

In [None]:
# Make cool graphs
fig, ax = plt.subplots(figsize=(10, 8))

ax.scatter(tvalues, bb_idx, color="k", alpha=0.5)
ax.annotate("Bull Market", (2, 1.05), color="k")
ax.annotate("Bear Market", (2, -0.05), color="k")

ax.scatter(tvalues, (1 + realized_returns)/4, color="DarkBlue", alpha=0.25)
ax.annotate("Positive Return", (99, 0.8), color="DarkBlue")
ax.annotate("Zero Return", (99, 0.55), color="DarkBlue")
ax.annotate("Negative Return", (99, 0.3), color="DarkBlue")

ax.plot(
    tvalues, filtering_probs[:, 1], color="DarkOrange",
    alpha=0.7, linestyle="--"
)
ax.annotate("Filtered Probabilities", (95, 0.9), color="DarkOrange")
ax.plot(
    tvalues, smoothing_probs[:, 1], color="DarkGreen",
    alpha=0.7, linestyle="--"
)
ax.annotate("Smoothed Probabilities", (95, 0.4), color="DarkGreen")

ax.set_xlim((0, 110))
ax.set_ylim((-0.1, 1.1))

ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)

#### Prediction

We might be interested in predicting the hidden state or observations in the future.

For now, we'll focus on the one-step prediction, but it can be generalized.

**Predicting the hidden state**

The one-step prediction probability would be $P(x_{t+1} | y^t)$

\begin{align*}
  P(x_{t+1} | y^t) &= \sum_{x_t} P(x_{t+1}, x_{t} | y^t) \\
  &= \sum_{x_t} P(x_{t+1} | x_{t}, y^t) P(x_{t} | y^t) \\
  &= \sum_{x_t} P(x_{t+1} | x_{t}) P(x_{t} | y^t) \\
\end{align*}

**Predicting the observation**

The one-step prediction probability would be $P(y_{t+1} | y^t)$

\begin{align*}
  P(y_{t+1} | y^t) &= \sum_{x_t} \sum_{x_{t+1}} P(y_{t+1}, x_{t+1}, x_{t} | y^t) \\
  &= \sum_{x_t} \sum_{x_{t+1}} P(y_{t+1} | y^t, x_{t+1}, x_{t}) P(x_{t+1}, x_{t} | y^t) \\
  &= \sum_{x_t} \sum_{x_{t+1}} P(y_{t+1} | x_{t+1}) P(x_{t+1}, x_{t} | y^t) \\
  &= \sum_{x_t} \sum_{x_{t+1}} P(y_{t+1} | x_{t+1}) P(x_{t+1} | x_{t}) \underbrace{P(x_t | y^t)}_{\text{filtering probability}} \\
\end{align*}

#### Likelihood

The likelihood probabilities are specified by

\begin{align*}
  P(y^T) &= \sum_{x_T} P(x_T, y^T) \\
  &= \sum_{x_T} \alpha(x_T)
\end{align*}


**Forward filter backward sample**

If we want to draw sample paths from our sequence, we can do a "forward filter backward sample" procedure:

We want to draw a sample from $P(x^T | y^T)$

\begin{align*}
  P(x^T | y^T) &= P(x_0 | x^{1:T}, y^T) P(x^{1:T} | y^T) \\
  &= P(x_0 | x^{1:T}, y^T) P(x_1 | x^{2:T}, y^T) P(x^{2:T} | y^T) \\
  &= P(x_0 | x^{1:T}, y^T) P(x_1 | x^{2:T}, y^T) \dots P(x_T | y^T) \\
  &= P(x_0 | x_1, y^T) P(x_1 | x_2, y^T) \dots P(x_T | y^T)
\end{align*}

Now, note that we can sample from $P(x_T | y^T)$ -- It is just the filtered probability at period $T$.

Then for any $t$, we have $P(x_t | x_{t+1}, y^T)$:

\begin{align*}
  P(x_t | x_{t+1}, y^T) &= \frac{P(x_t, x_{t+1} | y^T)}{P(x_{t+1} | y^T)} \\
  &=  \frac{P(x_{t+1} | x_t, y^t) P(x_t | y^T)}{P(x_{t+1} | y^T)} \\
  &\propto P(x_{t+1} | x_t) \beta(x_t)
\end{align*}

**Most likely hidden state sequence**

In addition to knowing the probabilities and being able to sample from our hidden states, we might be interested in what is the most likely sequence of hidden states that could have generated our data.

This would be the solution to

\begin{align*}
  \max_{x^T} P(x^T | y^T) &\propto \max_{x^{T}} P(x^T, y^T) \\
  &= \max_{x^T} P(y_T | x_T, x^{T-1}, y^{T-1}) P(x_T, x^{T-1}, y^{T-1}) \\
  &= \max_{x^T} P(y_T | x_T) P(x_T | x^{T-1}, y^{T-1}) P(x^{T-1}, y^{T-1}) \\
  &= \max_{x^T} P(y_T | x_T) P(x_T | x_{T-1}) P(x^{T-1}, y^{T-1}) \\
  &= \dots \\
  &= \max_{x^T} \prod_{t=0}^T P(y_t | x_t) P(x_t | x_{t-1}) \\
  &= \max_{x^{T-1}} \left( \prod_{t=0}^{T-1} P(y_t | x_t) P(x_t | x_{t-1}) \right) \max_{x_T} P(y_T | x_T) P(x_T | x_{T-1}) \\
\end{align*}

Let

\begin{align*}
  \mu(x_{t-1}) &= \max P(y_t | x_t) P(x_t | x_{t-1}) \mu(x_{t})
\end{align*}

Does this look familiar yet? What if we take logs and write

\begin{align*}
  \log(\mu(x^{t-1})) = \max_{x_t} \log(P(y_t | x_t)) + \log(P(x_{t} | x_{t-1})) + \log(\mu(x_t))
\end{align*}

Wait! This does look familiar - This is very similar to the "minimum distance problem" we discussed in the dynamic programming section!

The solution in this context is referred to as the Viterbi algorithm and it is very similar to value function iteration -- We will compute the $\{\mu(x_t)\}$ values first and then sequentially maximize our "Bellman equation"

**Building a class to automate the "hard" work**

In [None]:
class HMMBB(object):
    """
    Class to make it easy to compare various parameters
    """
    def __init__(self, p_bear, p_bull, r_bear_probs, r_bull_probs):
        # Build Markov chain for hidden state
        self.P = np.array([[p_bear, 1 - p_bear], [1 - p_bull, p_bull]])
        self.mc = qe.MarkovChain(self.P)

        # Build "emission probabilities"
        self.emp = np.row_stack([r_bear_probs, r_bull_probs])

    def simulate(self, T):
        # First simulate the bear/bull component
        bb_idx = self.mc.simulate_indices(T)

        realized_returns = np.zeros(T, dtype=int)
        for t, bb in enumerate(bb_idx):
            # Build the discrete random variable of emission
            # probabilities for each period
            r_probs = qe.DiscreteRV(self.emp[bb, :])

            # Draw random returns for period t
            realized_returns[t] = r_probs.draw()[0]

        return bb_idx, realized_returns

    def forward(self, realized_returns):
        """
        Computes \alpha(x_t) := P(x_t, y^t)
        """
        T = realized_returns.size
        alphas = np.zeros((T, 2))

        # Solve for period 0 -- Equal probability of starting
        # in bear/bull market
        alphas[0, 0] = self.emp[0, :][realized_returns[0]] * 0.5
        alphas[0, 1] = self.emp[1, :][realized_returns[0]] * 0.5

        for t in range(1, T):
            # Sum over  x_{t-1}
            predictor_bear = 0.0
            predictor_bull = 0.0
            for j in range(2):
                #            alpha(x_{t-1}) P(x_t | x_{t-1})
                predictor_bear += alphas[t-1, j]*self.P[j, 0]
                predictor_bull += alphas[t-1, j]*self.P[j, 1]

            alphas[t, 0] = self.emp[0, :][realized_returns[t]]*predictor_bear
            alphas[t, 1] = self.emp[1, :][realized_returns[t]]*predictor_bull

        return alphas

    def filter_probabilities(self, alphas):
        """
        Normalizes the alpha(x_t) values to compute
        the filtered probabilites
        
        P(x_t | y^t)
        """
        return alphas / np.sum(alphas, axis=1)[:, None]

    def backward(self, realized_returns):
        """
        Computes \beta(x_t) := P(y^{t+1:T} | x_t)
        """
        # Allocate memory for our alphas
        T = realized_returns.size
        betas = np.zeros((T, 2))

        # Solve for period T -- This is just defined as 1
        betas[-1, 0] = 1
        betas[-1, 1] = 1
        for tp1 in range(T-1, 0, -1):

            # Sum over  x_{t-1} (exponent of sum is product of exponents)
            value_bear = 0.0
            value_bull = 0.0
            for j in range(2):
                _probs = self.emp[j, :]
                _val = (
                    np.log(_probs[realized_returns[tp1]]) + 
                    np.log(betas[tp1, j])
                )

                value_bear += np.exp(_val + np.log(self.P[0, j]))
                value_bull += np.exp(_val + np.log(self.P[1, j]))

            betas[tp1-1, 0] = value_bear
            betas[tp1-1, 1] = value_bull

        return betas

    def smooth_probabilities(self, alphas, betas):
        """
        Uses alpha(x_t) and beta(x_t) to compute the
        smoothed probabilities
        
        P(x_t | y^T)
        """
        return alphas*betas / np.sum(alphas*betas, axis=1)[:, None]

    def ffbs(self, realized_returns):
        # Allocate memory for output
        T = realized_returns.size
        x_sample = np.zeros(T, dtype=int)

        # Compute forward probabilities and filtered probabilities
        alphas = self.forward(realized_returns)
        betas = self.backward(realized_returns)
        filtered_probabilities = self.filter_probabilities(alphas)

        # Now sample going backwards
        sample_probs = filtered_probabilities[-1, :]
        for t in range(T-1, -1, -1):
            # Sample from current probabilities
            rv = qe.DiscreteRV(sample_probs)
            x_sample[t] = rv.draw()

            # Update sampling probabilities
            if t-1 > 0:
                sample_probs = self.P[:, x_sample[t]]*betas[t-1, :]
                sample_probs = sample_probs / sample_probs.sum()

        return x_sample

    def viterbi(self, realized_returns):
        # Allocate memory for the mus
        T = realized_returns.shape[0]
        log_mus = np.ones((T, 2))

        # Compute log mu values (use log for stability)
        for t in range(T-1, 0, -1):
            # Set mu value for each possible hidden state
            for xtm1 in range(2):
                # Take max over xts to fill in mu
                possible_values = []
                for xt in range(2):
                    possible_values.append(
                        np.log(self.emp[xt, realized_returns[t]]) +
                        np.log(self.P[xtm1, xt]) + 
                        log_mus[t, xt]
                    )

                log_mus[t-1, xtm1] = max(possible_values)

        xt_star = np.zeros(T, dtype=int)
        xt_star[0] = np.argmax(
            np.log(self.emp[:, realized_returns[0]]) +
            np.log(0.5) +
            log_mus[0, :]
        )
        for t in range(T):
            xt_star[t] = np.argmax(
                np.log(self.emp[:, realized_returns[t]]) +
                np.log(self.P[xt_star[t-1], :]) +
                log_mus[t, :]
            )

        return xt_star

    def killer_graph(self, bb_idx, realized_returns):
        # Size of data
        T = bb_idx.shape[0]
        tvalues = np.arange(T)

        # Compute alpha, beta, filtered, and smoothed
        alphas = self.forward(realized_returns)
        betas = self.backward(realized_returns)
        filtering_probs = self.filter_probabilities(alphas)
        smoothing_probs = self.smooth_probabilities(alphas, betas)
        
        # Make cool graphs
        fig, ax = plt.subplots(figsize=(10, 8))

        ax.scatter(tvalues, bb_idx, color="k", alpha=0.5)
        ax.annotate("Bull Market", (2, 1.05), color="k")
        ax.annotate("Bear Market", (2, -0.05), color="k")

        ax.scatter(tvalues, (1 + realized_returns)/4, color="DarkBlue", alpha=0.25)
        ax.annotate("Positive Return", (T-5, 0.8), color="DarkBlue")
        ax.annotate("Zero Return", (T-5, 0.55), color="DarkBlue")
        ax.annotate("Negative Return", (T-5, 0.3), color="DarkBlue")

        ax.plot(
            tvalues, filtering_probs[:, 1], color="DarkOrange",
            alpha=0.7, linestyle="--"
        )
        ax.annotate("Filtered Probabilities", (T-10, 0.9), color="DarkOrange")
        ax.plot(
            tvalues, smoothing_probs[:, 1], color="DarkGreen",
            alpha=0.7, linestyle="--"
        )
        ax.annotate("Smoothed Probabilities", (T-10, 0.4), color="DarkGreen")

        ax.set_xlim((0, 1.1*T))
        ax.set_ylim((-0.1, 1.1))

        ax.spines["right"].set_visible(False)
        ax.spines["top"].set_visible(False)

        return fig

    def killer_graph_2(self, bb_idx, realized_returns):
        # Size of data
        T = bb_idx.shape[0]
        tvalues = np.arange(T)

        # Compute n sample paths
        n = 50
        sample_paths = [self.ffbs(realized_returns) for i in range(n)]
        most_likely = self.viterbi(realized_returns)
        
        # Make cool graphs
        fig, ax = plt.subplots(figsize=(10, 8))

        ax.scatter(tvalues, bb_idx, color="k", alpha=0.5)
        ax.annotate("Bull Market", (2, 1.05), color="k")
        ax.annotate("Bear Market", (2, -0.05), color="k")

        for i in range(n):
            ax.plot(
                tvalues, sample_paths[i]/3 + 1/3,
                color="k", alpha=0.1
            )

        ax.scatter(tvalues, most_likely/2 + 0.25, color="Green")
        ax.annotate("Viterbi Path", (T, 0.75), color="Green")

        ax.set_xlim((0, 1.1*T))
        ax.set_ylim((-0.1, 1.1))

        ax.spines["right"].set_visible(False)
        ax.spines["top"].set_visible(False)

        return fig

_Filtering/smoothing probabilities and realized returns_

In [None]:
hmm = HMMBB(0.95, 0.95, np.array([0.7, 0.25, 0.05]), np.array([0.1, 0.3, 0.6]))

bb_idx, realized_returns = hmm.simulate(156)
hmm.killer_graph(bb_idx, realized_returns);

_Sample paths and Viterbi paths_

In [None]:
hmm = HMMBB(0.9, 0.9, np.array([0.75, 0.2, 0.05]), np.array([0.1, 0.3, 0.6]))

bb_idx, realized_returns = hmm.simulate(156)
hmm.killer_graph_2(bb_idx, realized_returns);

### Estimating a HMM!

Up until this point, we've assumed a particular set of parameters.

Is it possible to find the parameters of a HMM? Yes!

We use an algorithm known as the Baum-Welch algorithm -- It is a special case of a set of algorithms called EM algorithms (expectation maximization algorithms).

Roughly the way it works is:

1. Guess parameter values
2. Compute the $\alpha$ and $\beta$ probabilities (Expectation)
3. Use these probabilities to update parameter values (Maximization)

[Wikipedia](https://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm)

#### More formally...

**Step 1**

We can guess any set of parameters that we'd like, but it often makes sense to use whatever information you might have...

In our case,

* we might reflect our belief that the bear/bull markets are at least somewhat persistent by guessing $p_{\text{bear}} = 0.7$ and $p_{\text{bull}} = 0.7$
* we would also reflect this in our guesses of emission probabilities

In [None]:
p_bear_0 = 0.7
p_bull_1 = 0.7

emp_0 = np.array([[0.6, 0.3, 0.1], [0.1, 0.3, 0.6]])

**Step 2**

Compute the forward ($\alpha$) and backward ($\beta$) probabilities

**Step 3**

Use these probabilities to update our beliefs about the parameters. Let,

* $\gamma_i(t) \equiv P(x_t | y^T, \theta) = \frac{\alpha \beta}{\sum \alpha \beta}$ (smooth probabilities)
* $\xi_{ij} \equiv P(x_t=i, x_{t+1}=j | y^T, \theta) = \frac{\alpha(x_t) \beta(x_{t+1}) P(x_{t+1} | x_t) P(y_{t+1} | x_{t+1})}{\sum_{k=0}^1 \sum_{w=0}^1 \alpha(x_t=k) P(x_{t+1}=w | x_t=k) \beta(x_{t+1}=w) P(y_{t+1} | x_{t+1}=w)}$

then

* $p_{\text{bear}} = \frac{\sum_{t=0}^{T-1} \xi_{00}(t)}{\sum_{t=0}^{T-1}}$ (number of transitions from 0 to 0 )
* $p_{\text{bull}} = \frac{\sum_{t=0}^{T-1} \xi_{11}(t)}{\sum_{t=0}^{T-1}}$ (number of transitions from 1 to 1 )
* $b_i(k) = \frac{\sum_{t=0}^T \mathbb{1}_{y_t = k} \gamma_i(t)}{\sum_{t=0}^T \gamma_i(t)}$ (fraction of time state $i$ generated observation $k$)

In [None]:
import hmmlearn.hmm as hml

In [None]:
hmm = HMMBB(0.8, 0.8, np.array([0.75, 0.2, 0.05]), np.array([0.1, 0.3, 0.6]))
bb_idx, realized_returns = hmm.simulate(5000)

hmm_res = hml.MultinomialHMM(n_components=3).fit(realized_returns[:, None])

In [None]:
print("Transition probabilities")
print(f"\tModel: {hmm.P}")
print(f"\tEstimate: {hmm_res.transmat_}")


In [None]:
print("Emission probabilities")
print(f"\tModel: {hmm.emp}")
print(f"\tEstimate: {hmm_res.emissionprob_}")


**Useful References**

* [Blog post by Jonathan Hui](https://jonathan-hui.medium.com/machine-learning-hidden-markov-model-hmm-31660d217a61)
* [Slides by Martin Haugh @ Columbia](http://www.columbia.edu/~mh2078/MachineLearningORFE/HMMs_MasterSlides.pdf)