### Hidden Markov Model
A Hidden Markov Model (HMM) is a statistical model that represents a system with **transitions** between a finite set of **hidden states** over time. The key feature of an HMM is that, while the states of the system are hidden and not directly observable, there are **observable outcomes** associated with each state.

![image.png](attachment:33cbd362-b069-4c29-b949-e0fe84c9584d.png)

Here are the main components of a Hidden Markov Model:

- `States / Hidden States (S)`:
    - Process being modeled (e.g. user interactions with an ad). The system can move from one state to another one. Such move is called **transition.** Hidden states directly affect observed states.
<br><br>
- `Outcomes / Observations / Emissions (O)`:
    - What outcome is probable at each state (e.g. if it's rainy I am likely sad rather than happy).
<br><br>
- `Transition Probabilities`: 
    - Describes how likely a transition is from one state to another one.
<br><br>
- `Emission Probabilities`: 
    - Describes what outcome is most likely if a system is in a certain state. (e.g. if it's rainy today how likely that I am sad?).
<br><br>
- `Initial State Probability`:
    - The model has an initial probability distribution over the hidden states, representing the likelihood of starting in each state.
    
For example, in the above picture we have:
- 2 hidden states: 'sunny' and 'rainy'
- 6 transitions: state can transit into yourself
- 3 observations: 'walking', 'shopping' and 'cleaning'
- 6 emission probabilities
- 2 Initial/Starting Probabilities

### What is Transition and Emission Probability Matrix?
**Transition Probability Matrix**

The Transition Probability Matrix, often denoted as `A`, is a square matrix where `Aij` represents the probability of transitioning from hidden state `i` to hidden state `j`. 

Each row of the matrix corresponds to the probabilities of transitioning from a specific state, and the probabilities in each row should sum to 1.

![image.png](attachment:b48d2542-d108-4f60-8bd6-9b738e452e57.png)

- The main assumption is that the **new state only depends on previous state (no long-term capturing)** 
- Everything starts with the **starting probability S** where there is no previous hidden state


**Emission Probability Matrix**

Also called as Observation Probability Matrix: is a matrix where `Bij` represents the probability of observing emission `j` when the system is in hidden state `i`. Each row of the matrix corresponds to a hidden state, and the probabilities in each row should sum to 1.

### HMM Example
Let's take a look at the example where we are going to model mood of a person based on weather. Mood can be either sad or happy whereas weather can be either sunny or rainy.

![image.png](attachment:e60abf80-5dcb-42b1-a4b3-fc54803231e6.png)

Consider the following probabilities of being happy or sad based on weather:
- Sunny weather: 0.8 happy, 0.2 sad
- Rainy weather: 0.6 sad, 0.4 happy

Let's also consider the following probabilities for transitions between sunny and rainy:
- Sunny: 0.8 sunny, 0.2 rainy 
- Rainy: 0.6 rainy, 0.4 sunny

![image.png](attachment:bc030a4f-36f1-441a-9591-b60737e59e48.png)

### How to Find Transition or Emission Probabilities?
Instead of considering, we can define them from the data we have. 

**Transition Probabilities**
![image.png](attachment:6660a092-006c-4cd8-b86e-1379cb68a156.png)

**Emission Probabilities**
- How many times sunny day made a person happy 
- How many times sunny day made a person sad
- ...

![image.png](attachment:95e66c3c-73dd-45ca-baeb-7ce523402147.png)

- The estimation depends on sequence length. The more sequences we have, the preciser the estimation of the probabilities is 

### How to Predict the Weather Based on Outcome Variables
We can predict the weather based on outcome variables.

**Weather Prediction / No Outcome Variables**

Let's say we have to predict the weather but we don't know person mood. To know what is probability distribution for sunny and rainy we need to use the HHM and solve the equation.

![image.png](attachment:1f8742cc-3bd6-4306-9b0d-83637c85a0b6.png)

**Weather Prediction / Outcome Variables**

To calculate this kind of probability we need to apply `Bayes Theorem`. The theorem provides a way to update the probability of an event based on a new evidence or information. When we don't know the mood of a person, we know that the probability of being rainy is 1/3 and 2/3 for sunny. But what the probability will be if we found out the mood (sad or happy). How this will change the probability?

Probabilities (1/3 - rainy and 2/3 sunny) are called **Prior probabilities**. On average scenario we would have 2 sunny days and 1 rainy out of 3 days. Besides, we also know what is aprobability of being happy for both sunny and rainy days. If we had 5 days we would have the following picture:

![image.png](attachment:d20d044b-5c4b-438c-9059-04af6f5cb9e5.png)

We can calculate **posterior probabilities for sunny and rainy days based on the mood**

![image.png](attachment:8d138162-607e-45ce-9080-18e235d917db.png)


![image.png](attachment:6851a579-b6c0-4015-bbff-ab997073e6b4.png)

### Hidden Markov Model and Markov Chain. Are they Different?
Yes, they are. In Markov Chain, the **focus is on the transitions between states, and there are no directly observable variables associated with each state.** A Markov Chain is a stochastic model that describes a sequence of states, where the probability of transitioning from one state to another depends only on the current state and not on the sequence of events that preceded it.

In contrast, HMM inherently account for the influence of past sequences on the current state. Observable outcomes depend not only on the current hidden state **but also on the sequence of hidden states that led to the current state.** It provides a way to model temporal dependencies and memory in the system.

**Markov Chain Example**
![image.png](attachment:72ff8032-f0f3-4893-8f0b-fdc109d6c7b5.png)

### How to Predict the Weather Based on Mood Sequence
In this case we can take sequence of n-elements and based on that sequence predict the weather in those days.
![image.png](attachment:4feae2d1-92ee-47b5-ae02-e44543bd5c36.png)

Since we hav only sunny or rainy weather we can get only two outcomes at each day (sunny or rainy).

![image.png](attachment:e7b38dfa-ec83-47dd-9283-63a7d0681d85.png)

Then we have to consider what probable weather could be in those days (consider all combinations).

![image.png](attachment:51c1e837-25f2-4e34-9258-a2ad95433a51.png)

Then we have to choose what combination maximizes the probability of having the given mood sequence (Maximum Likelihood Method). **This combination answers what weather was for the given mood sequence.**

![image.png](attachment:9e97d1e4-d7bf-4d4b-94d7-777b5753d628.png)

However, this method is not optimized in terms of computations and growths exponentially. There is another method called `Viterbi Algorithm`

### What is the Main Idea of HHM?
It provides a sequence/combination of hidden states that maximizes the probability of getting the observed outcome variables (i.e. the most likely explanation for seeing outcome variables).


### HHM Implementation
**Part of Speech Tagging (NLP)**

Take a text and assign a part of speech (e.g. noun, verb, etc) to the words in that text. In this case words are observed variables and parts of speech are hidden states. For example, for the sentence `I eat pizza`, given that the hidden state is verb what is the probability that that verb is eat?

**Example**
- https://www.kaggle.com/code/nilaychauhan/part-of-speech-tagging-with-hidden-markov-models

### Model Pros and Cons 

**Prons**
- Sequential data can be modelled with some limitations (assumptions)

**Cons**
- Current state is only depends on the previous state (this is not always true)
- HHM requires a fixed set of hidden states. If the systems change fast (new hidden states), HMM may fail.
- Usually applied for discreate type of data
- Some optimization algorithms may not converge (local optima instead of global one)

### Reference
- https://www.youtube.com/watch?v=kqSzLo9fenk&t=797s