This notebook implements **solutions** presented in the [README](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md).

Some questions are addressed using the [hmmlearn](https://hmmlearn.readthedocs.io/en/latest/index.html#) python package.
- To install it on **Windows**, you may want to get an already [compiled version](https://www.lfd.uci.edu/~gohlke/pythonlibs/#hmmlearn).

In [1]:
from hmmlearn import hmm
import numpy as np
from collections import Counter

In [2]:
seed = 93
np.random.seed(seed)

In [3]:
states = ["left_lane", "right_lane"]
observations = ["low_speed", "high_speed"]

In [4]:
discrete_model = hmm.MultinomialHMM(n_components=2,
                                    algorithm='viterbi',  # decoder algorithm.
                                    random_state=seed,
                                    n_iter=10,
                                    tol=0.01  # EM convergence threshold (gain in log-likelihood)
                                   )

# [Q1](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md#q1) How to easily estimate the parameters of our HMM?
Here are the results of the estimation procedure based on **counting from the labelled data**. These estimates constitute the **MLE**.

This question gives the chance to mentiont the **stationary state distribution**.
- Using **sampling**.
- Using the **transition matrix**.

In [5]:
discrete_model.startprob_ = np.array(
    [1/3, 2/3]
)
discrete_model.transmat_ = np.array([
    [0.6, 0.4],  # P(state_t+1|state_t=state_0)
    [0.2, 0.8]]  # P(state_t+1|state_t=state_1)
)
discrete_model.emissionprob_ = np.array(
    [[0.4, 0.6],  # P(obs|state_0)
     [0.8, 0.2]]  # P(obs|state_1)
)

### Stationary distributions from sampling
Generate random samples from the HMM model, using the derived parameters.

A **large sample** is drawn.
- Counting occurences should lead to the **stationary state distributions**.
- `1/3` vs `2/3`

In [6]:
sample_obs, sample_states = discrete_model.sample(100000)
print(Counter(sample_states.flatten()))
print(Counter(sample_obs.flatten()))

Counter({1: 66134, 0: 33866})
Counter({0: 66520, 1: 33480})


### Stationary distribution from the transition matrix
The **stationary distribution** is a **left eigenvector** (as opposed to the usual right eigenvectors) of the **transition matrix**.

In [7]:
# Compute the stationary distribution of states.
eigvals, eigvecs = np.linalg.eig(discrete_model.transmat_.T)
eigvec = np.real_if_close(eigvecs[:, np.argmax(eigvals)])

# normalisation
stat_distr = eigvec / eigvec.sum()
print(stat_distr)

[0.33333333 0.66666667]


# [Q2](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md#q2) - Given a single speed observation, what is the probability for the car to be in each of the two lanes?

Here, two methods of `hmmlearn` are used:
- The built-in [scoring](https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn.hmm.MultinomialHMM.predict) method.
- The built-in [decoding](https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn.hmm.MultinomialHMM.decode) method.

### Posterior probabilities
In the [README](README.md), the **posterior conditional probabilities** were computed using Bayes' rule. For instance:
- p(`left_lane`|`low_speed`) = p(`low_speed`|`left_lane`) * p(`left_speed`) / p(`low_speed`) 

**Posterior probabilities** for each state can be computed with `predict_proba()` and corresponding most likely states with `predict()`:
- P(`left lane`  | `high speed`) = `1/3` `*` `0.6` / `1/3` = `0.6`
- P(`left lane`  | `low speed`)  = `1/3` `*` `0.4` / `2/3` = `0.2`
- P(`right lane` | `high speed`) = `2/3` `*` `0.2` / `1/3` = `0.4`
- P(`right lane` | `low speed`)  = `2/3` `*` `0.8` / `2/3` = `0.8`

In [8]:
for i in range(len(observations)):
    p = discrete_model.predict_proba(np.array([[i]]))
    most_likely = discrete_model.predict(np.array([[i]]))
    print("p(state|'{}') = {} => most likely state is '{}'".format(
        observations[i], p, states[int(most_likely[0])]))

p(state|'low_speed') = [[0.2 0.8]] => most likely state is 'right_lane'
p(state|'high_speed') = [[0.6 0.4]] => most likely state is 'left_lane'


### Joint probabilities
A second approach is to compute the **join probabilities**, as used in the **Viterbi algorithm**:
- p(`left_lane`,`low_speed`) = p(`low_speed`|`left_lane`) * p(`left_speed`)
- This corresponds to **alpha(`right_lane`, `t=1`)** for observation [`low_speed`].

Join probabilities:
- P(`left lane`  , `high speed`) = `1/3` `*` `0.6` = `3/15`  # *highest alpha for* `obs` = [`high_speed`] 
- P(`right lane` , `high speed`) = `2/3` `*` `0.2` = `2/15`
- P(`left lane`  , `low speed`)  = `1/3` `*` `0.4` = `2/15`
- P(`right lane` , `low speed`)  = `2/3` `*` `0.8` = `8/15`  # *highest alpha for* `obs` = [`low_speed`] 

In [9]:
# The Viterbi decoding algorithm uses the `alpha*` values.
# And the first `alpha*` are `alpha` values.
# Hence, the decoding method should return the state with the **highest joint probability**.
obs_0 = np.array([[0]]).T
obs_1 = np.array([[1]]).T
# Log probability of the maximum likelihood path through the HMM
logprob_0, state_0 = discrete_model.decode(obs_0)  # 8/15 = alpha[`low_speed`](`right_lane`)
logprob_1, state_1 = discrete_model.decode(obs_1)  # 2/5 = alpha[`high_speed`](`left_lane`)

print("{} -> {}".format(states[int(state_0)], observations[int(obs_0)]))
print("prob = {}".format(np.exp(logprob_0)))
print("---")
print("{} -> {}".format(states[int(state_1)], observations[int(obs_1)]))
print("prob = {}".format(np.exp(logprob_1)))

right_lane -> low_speed
prob = 0.5333333333333333
---
left_lane -> high_speed
prob = 0.19999999999999998


# [Q3](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md#q3) - What is the probability to observe a particular sequence of speed measurements?

The presented solution uses the [scoring](https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn.hmm.MultinomialHMM.predict) method of `hmmlearn`.

The **marginal probabilities of an observation sequence** can be found by:
- **The sum over any column of the product `alpha` \* `beta`** (done in th next question).
- In `hmmlearn`, this is given by the `score()` method.

### For a single observation:
- P(`low speed`) = P(`left lane`, `low speed`) + P(`right lane`, `low speed`) = `2/15` + `8/15` = `2/3`
- P(`high speed`) = P(`left lane`, `high speed`) + P(`right lane`, `high speed`) = `3/15` + `2/15` = `1/3`

In [10]:
# For a single observation
# Compute the log probability under the model.
for i in range(len(observations)):
    p = np.exp(discrete_model.score(np.array([[i]])))
    print("p({}) = {} ".format(observations[i], p))

print("---")
# Compute the log probability under the model and compute posteriors.
for i in range(len(observations)):
    p = (discrete_model.score_samples(np.array([[i]])))
    print("p({}) = {}\n   posterior p(state|{}) = {}".format(observations[i], np.exp(p[0]), observations[i],  p[1]))

p(low_speed) = 0.6666666666666667 
p(high_speed) = 0.33333333333333337 
---
p(low_speed) = 0.6666666666666667
   posterior p(state|low_speed) = [[0.2 0.8]]
p(high_speed) = 0.33333333333333337
   posterior p(state|high_speed) = [[0.6 0.4]]


### For the example used in [README](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md) : [`low_speed`, `high_speed`, `low_speed`]

In [11]:
obs_sequence = np.array([[0, 1, 0]]).T
p = np.exp(discrete_model.score(obs_sequence))
print("p({}) = {} ".format([observations[i] for i in obs_sequence.T[0]], p))

p(['low_speed', 'high_speed', 'low_speed']) = 0.13184 


# [Q4](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md#q4) - Given a sequence of speed observations, what is the most likely current lane?

This question about **filtering** gives the chance to introduce the **Forward Algorithm** (and the **Backward Algorithm**).
- First, the **Forward Algorithm** is implemented to build the `alpha` table.
- Then, **filtering** can be completed by normalizing `alpha` values (using marginals).

From the `alpha` and `beta` tables, **marginal probabilities** are computed.
- For the **full observation sequence**.
- For **sub-sequences** of the observation sequence.

Finally, and as for the [README](README.md#q4), multiple inference techniques are presented.
- **Filtering**
- **Smoothing**
- **Prediction**

In [12]:
n_state = discrete_model.n_components
start_prob = discrete_model.startprob_
emit_prob = discrete_model.emissionprob_
transmat_prob = discrete_model.transmat_

### Forward algorithm

Used to build the `alpha` table for an observation sequence.

Initialization:
- `alpha`(`i`, `t=0`) = p(`obs_0`, `lane_i`) = p(`obs_0`|`lane_i`) * p(`lane_i`)

Recursion:
- `alpha`(`i`,`t+1`) = [emission at `t+1`] * SUM[over `state_j`][transition `t`->`t+1`*`alpha`(`j`,`t`)]

In [13]:
def forward_algo(obs_seq):
    alpha = np.zeros((len(obs_seq), n_state))
    alpha[0] = np.transpose(emit_prob)[obs_seq[0]] * start_prob
    for t in range(alpha.shape[0]-1):
        alpha[t+1] = np.transpose(emit_prob)[obs_seq[t+1]] * np.dot(alpha[t], transmat_prob)
    return alpha

In [14]:
obs_sequence = np.array([[0, 1, 0]]).T
alpha = forward_algo(obs_sequence)
print(alpha)

[[0.13333333 0.53333333]
 [0.112      0.096     ]
 [0.03456    0.09728   ]]


### Backward Algorithm
Used to build the `alpha` table for an observation sequence.
Initialization:
- `beta`(`i`, `t=T`) = `1`

Recursion:
- `beta`(`i`,`t+1`) = SUM[over `state_j`][emission at `t+1` * transition `t`->`t+1` * `beta`(`j`,`t`)]

In [15]:
def backward_algo(obs_seq):
    beta = np.zeros((len(obs_seq), n_state))
    beta[len(obs_seq) - 1] = np.ones((n_state))
    for t in reversed(range(len(obs_seq)-1)):
        beta[t] = np.dot(beta[t + 1] * np.transpose(emit_prob)[obs_seq[t + 1]], np.transpose(transmat_prob))
    return beta

In [16]:
beta = backward_algo(obs_sequence)
print(beta)

[[0.2592 0.1824]
 [0.56   0.72  ]
 [1.     1.    ]]


### Marginals for observation sub-sequences
Summing over the `t`-th `alpha` column gives the probability of observing the sub-sequence `obs_sequence`[`:, t`].

- `sub_seq_marginals`[i] = p(`obs_1` ... `obs_i`) = sum over `k` of `alpha(`t`=`i`, `lane`=`k`)`

In [17]:
sub_seq_marginals = np.sum(alpha, axis=1)
print(sub_seq_marginals)

[0.66666667 0.208      0.13184   ]


### Marginal of the full observation sequence
The marginal probability of the observation sequence can be obtained
by summing the product `alpha`*`beta` at any `t`.

In [18]:
prod = np.multiply(alpha, beta)
marginals = np.sum(prod, axis=1)
print(marginals)

[0.13184 0.13184 0.13184]


### Filtering
Filtering can be obtained by normalizing the last `alpha` values (using the marginal probability of the full observation sequence).

In [19]:
print(alpha[-1]/marginals[-1])

[0.26213592 0.73786408]


### Prediction
The `pi` variable is introduced:
- `pi`(`lane i`, `time t+k+1`) = SUM over `state` `j` of [transition `j`->`i` * `pi`(`lane i`, `time t+k`)]

In [20]:
def pi_algo(k, alpha, beta):
    pi = np.zeros((k, n_state))
    pi[0]=alpha[0]/marginals[0]
    for t in range(pi.shape[0]-1): 
        pi[t+1] = np.dot(pi[t], transmat_prob)
    return pi

In [21]:
pi = pi_algo(4, alpha, beta)
print(pi)

[[1.01132686 4.04530744]
 [1.41585761 3.6407767 ]
 [1.5776699  3.4789644 ]
 [1.64239482 3.41423948]]


### Smoothing
The `gamma` variable is introduced:
- `gamma`(`k`, `t`) = p(`lane(t)`=`k` | `[observation sequence (1...T)]`)
- for `t` in [`1` ... `T`].

In [22]:
def gamma_algo(alpha, beta):
    prod = np.multiply(alpha, beta)
    marginals = np.sum(prod, axis=1)
    gamma = np.divide(prod, marginals[:, np.newaxis])
    return gamma

In [23]:
gamma = gamma_algo(alpha, beta)
print(gamma)

[[0.26213592 0.73786408]
 [0.47572816 0.52427184]
 [0.26213592 0.73786408]]


# [Q5](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md#q5) - Given a sequence of speed observations, what is the most likely underlying lane sequence?

### Decoding
The **inferred optimal hidden states** can be obtained by calling [`decode()`](https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn.hmm.MultinomialHMM.decode) method.

In [24]:
obs_sequence = np.array([[0, 1, 0]]).T
# Find most likely state sequence corresponding to obs_sequence
logprob, state_sequence = discrete_model.decode(obs_sequence)

# Log probability of the produced state sequence
for o, s in zip(obs_sequence.T[0], state_sequence):
    print("{} -> {}".format(states[int(s)], observations[int(o)]))
print("\nprob = {}\nlog_prob = {}".format(np.exp(logprob), logprob))

right_lane -> low_speed
right_lane -> high_speed
right_lane -> low_speed

prob = 0.05461333333333335
log_prob = -2.9074772257991035


# [Q6](https://github.com/chauvinSimon/hmm_for_autonomous_driving/blob/master/README.md#q6) - How to estimate the parameters of our HMM when no state annotations are present in the training data?

### Unsupervised Learning
**The HMM parameters** are estimated based on some observation data using the [`fit()`](https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn.hmm.MultinomialHMM.fit) method. It implements the **Baum-Welch algorithm**.

In [25]:
# The `fit()` method is not "const". Therefore save the original parameters for comparison.
old_transmat = discrete_model.transmat_
old_emissionprob = discrete_model.emissionprob_
old_startprob = discrete_model.startprob_

#### Random unbalanced sampling

In [26]:
# On purpose, one could generate an unbalanced sampling.
biased_sampling = np.random.choice([0, 1], size=(100,), p=[1./10, 9./10])
print("biased_sampling = \n{}".format(biased_sampling))

obs_sequence = np.array([biased_sampling]).T
new_model = discrete_model.fit(obs_sequence)

biased_sampling = 
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0]


### Interpretation

Due to **random sampling**, two consecutive observations are **independent**.
- Therefore the **new transition matrix** is almost **uniform**.

The significant **unbalance in the observation distribution** is captured by the **emission model**.

In [27]:
print("---\nold_transmat = \n{}".format(old_transmat))
print("new_transmat = \n{}".format(new_model.transmat_))
print("---\nold_emissionprob = \n{}".format(old_emissionprob))
print("new_emissionprob = \n{}".format(new_model.emissionprob_))
print("---\nold_startprob = \n{}".format(old_startprob))
print("new_startprob = \n{}".format(new_model.startprob_))

---
old_transmat = 
[[0.6 0.4]
 [0.2 0.8]]
new_transmat = 
[[0.45266665 0.54733335]
 [0.43276709 0.56723291]]
---
old_emissionprob = 
[[0.4 0.6]
 [0.8 0.2]]
new_emissionprob = 
[[0.19693203 0.80306797]
 [0.07730514 0.92269486]]
---
old_startprob = 
[0.33333333 0.66666667]
new_startprob = 
[0.34208472 0.65791528]


# Todo
Here are some **ideas for future works**:
- **Evaluate the trained model** on a **test set**.
    - After training, it would be interesting to **assess the new model** by **submitting an observation drawn from the same distribution**.
    - Similar to other supervised learning approaches. 
- Use **real data**.
    - The [NGSIM US-101 highway dataset](https://catalog.data.gov/dataset/next-generation-simulation-ngsim-vehicle-trajectories) contains a collection of detailed **vehicle trajectory data** recorded on a **5-lane highway**.
    - It could be interesting to **derive HMM parameters** based on this data.