# T-intersection QMDP
In this part you will attempt to formulate the POMDP of T-intersection, and use QMDP to solve it.

In [None]:
# @markdown Run this cell to install dependencies.
%%capture

% cd /content
! git clone https://github.com/buzi-princeton/MDP.git

In [None]:
from MDP.mdp import MDP
import numpy as np
import matplotlib.pyplot as plt
import os
from scipy.stats import beta
from scipy.stats import binom

# MDP formulation
First, create the MDP to capture the underlying MDP of the T-intersection POMDP, using the MDP class.

In [None]:
# Create the T-intersection MDP
class TIntersection(MDP):
  def __init__(self):
    self.gam = 0.9
    self.goal = ["G1", "G2"]
    self.actions = ["forward", "stop"]
    super().__init__(states=[self.goal], actions=self.actions)

    self.populate_data()

  def populate_data(self):
    # use self.add_route(s, a, s') from MDP to add route to MDP
    # use self.add_reward(s, a, r) from MDP to add reward
    ####
    self.add_route(["G1"], "forward", ["G1"], p=0.8)
    self.add_route(["G1"], "forward", ["G2"], p=0.2)
    self.add_route(["G2"], "forward", ["G1"], p=0.1)
    self.add_route(["G2"], "forward", ["G2"], p=0.9)
    
    self.add_route(["G1"], "stop", ["G1"], p=0.5)
    self.add_route(["G1"], "stop", ["G2"], p=0.5)
    self.add_route(["G2"], "stop", ["G1"], p=0.05)
    self.add_route(["G2"], "stop", ["G2"], p=0.95)

    self.add_reward(["G1"], "forward", 10)
    self.add_reward(["G2"], "forward", -100)
    self.add_reward(["G1"], "stop", -1)
    self.add_reward(["G2"], "stop", -1)
    ####

Next, reuse your code for value iteration to solve the underlying MDP

In [None]:
# value iteration
def value_iteration(threshold = .001, mdp=None):
  if mdp is None:
    raise ValueError("MDP cannot be None")
  numa, nums, R, P = mdp.get_mdp()
  V_star = np.zeros(nums)
  pi_star = np.zeros(nums)
  ####
  count = 0
  while True:
    V_old = V_star
    
    # compute quality matrix with Q.shape = [num_states, num_actions]
    Q = R + mdp.gam * np.einsum("jia,j->ia", P, V_star) # i = current state, j = next state, a = action
    # find action with highest quality for each state
    pi_star = np.argmax(Q, axis=1)
    V_star = Q[range(nums), pi_star]
    
    # update iteration counter
    count += 1
    # check convergence
    if np.max(np.abs(V_star - V_old)) < threshold:
      break
  
  print("Value iteration: {} iterations".format(count))
  ####
  return V_star, pi_star

In [None]:
# Test
t_intersection = TIntersection()
V_star, pi_star = value_iteration(mdp=t_intersection)
print("V_star: ", V_star)
print("pi_star: ", pi_star)

## Bayesian Inference
Next, we will write the Bayesian update for our belief. Using beta-binomial model to update our belief, assuming that observation for when the other car is heading toward $G_1$ is 0, and 1 when it is heading for $G_2$, write down your prior and your posterior update, choose your prior hyperparameters and reason your choice.

We provide you with a **beta_dist** class. Since we are using conjugate prior, the posterior and prior will share the same form, and thus updating the posterior is similar to changing the hyperparameters of the prior to reflect the update. Fill in the missing code for function **update_beta_params()** and **get_mean()**.

In [None]:
# supporting class for beta distribution

class beta_dist:
  def __init__(self, a = None, b = None):
    self.a = a
    self.b = b
      
  #Get the beta pdf
  def get_pdf(self):
    x = np.linspace(0, 1, 1000)
    fx = beta.pdf(x, self.a, self.b)
    dens_dict = {'x': x, 'fx': fx}
    return(dens_dict)
      
  #Update parameters:
  def update_beta_params(self, n, num_successes):
    ####
    self.a += num_successes
    self.b += n - num_successes
    ####
  
  def get_mean(self):
    ####
    return self.a / (self.a + self.b)
    ####

In [None]:
# Test your beta_dist with a batch of observations
# create prior using the found hyperparams

# define a, b for prior
####
a = 1
b = 1
####

goal_dist = beta_dist(a, b)
prior = goal_dist.get_pdf()

# likelihood observation
observations = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0]

# update the distribution
goal_dist.update_beta_params(len(observations), sum(observations))
posterior = goal_dist.get_pdf()
print("The updated hyperparameters are:")
print(goal_dist.a, goal_dist.b)

# calculate the mean of posterior
post_mean = goal_dist.get_mean()

print("The mean value of our posterior is %s" %post_mean)

#Plot prior and posterior
plt.plot(prior['x'], prior['fx'])
plt.plot(posterior['x'], posterior['fx'])
plt.axvline(x = post_mean, color = "green")
plt.legend(['Prior Distribution', 'Posterior Distribution', 'Posterior mean'], loc='upper left')

plt.show()

Report your posterior mean shift for the following observations
* A single **0**
* A series of 10 **0**
* A series of 10 **1**
* 5 **1** and 5 **0**

In [None]:
####
goal_dist = beta_dist(1, 1)
prior_mean = goal_dist.get_mean()
print("Prior:       ({:2}, {:2}), mean: {:.4f}".format(goal_dist.a, goal_dist.b, prior_mean))

obs1 = [0]
goal_dist.update_beta_params(len(obs1), sum(obs1))
post1_mean = goal_dist.get_mean()
print("Posterior 1: ({:2}, {:2}), mean: {:.4f} (shift: {:5.2f})".format(goal_dist.a, goal_dist.b, post1_mean, post1_mean - prior_mean))

obs2 = [0] * 10
goal_dist.update_beta_params(len(obs2), sum(obs2))
post2_mean = goal_dist.get_mean()
print("Posterior 1: ({:2}, {:2}), mean: {:.4f} (shift: {:5.2f})".format(goal_dist.a, goal_dist.b, post2_mean, post2_mean - post1_mean))

obs3 = [1] * 10
goal_dist.update_beta_params(len(obs3), sum(obs3))
post3_mean = goal_dist.get_mean()
print("Posterior 1: ({:2}, {:2}), mean: {:.4f} (shift: {:5.2f})".format(goal_dist.a, goal_dist.b, post3_mean, post3_mean - post2_mean))

obs4 = [1] * 5 + [0] * 5
goal_dist.update_beta_params(len(obs4), sum(obs4))
post4_mean = goal_dist.get_mean()
print("Posterior 1: ({:2}, {:2}), mean: {:.4f} (shift: {:5.2f})".format(goal_dist.a, goal_dist.b, post4_mean, post4_mean - post3_mean))
####

## QMDP
Let's now write the QMDP function to generalize our value calculated before to belief space.

First, create an array of beliefs, then write a function that takes in the pre-computed value function, the MDP and the belief to retun a single next best action to take.

In [None]:
def QMDP(V_star, belief, mdp=None):
  if mdp is None:
    raise ValueError("MDP cannot be None")
  
  numa, nums, R, P = mdp.get_mdp()

  # compute MDP-value for state-action pairs (Q)
  ####
  # compute quality values
  Q = R + mdp.gam * np.einsum("jia,j->ia", P, V_star) # i = current state, j = next state, a = action
  # find best action in expectation based on given belief
  action = mdp.a[np.argmax(belief @ Q)]
  ####
  return action

In [None]:
# Test
for p in np.arange(0.0, 1.1, 0.1):
  belief = [p, 1 - p]
  print("Belief: [{:.2f}, {:.2f}]\tAction to take: ".format(belief[0], belief[1]), QMDP(V_star, belief, mdp=t_intersection))

It can be now seen that we will not go for **forward** action if we are not too sure that the other car will be heading toward G1.

Now, we will test QMDP in a real situation. Your car will be at the T-intersection. The intent of the other car will be randomly chosen every time the game is reset (either $G_1$ or $G_2$). You will be able to see our car, running QMDP, either following with our current plan, or waiting for the other car to finish its turn before we continue with our plan.

First, let's write some supporting function for the visualizer class that we will be using.

In [None]:
import random
from MDP.visualizer.t_intesection import TIntersectionSimulation

class QMDPTIntersectionVisualizer(TIntersectionSimulation):
  def __init__(self, mdp, Q, random_obs=False, seed=None):
    self.random_obs = random_obs
    self.seed = seed
    
    self.goal_dist = beta_dist(1, 1)
    super().__init__(mdp, Q)
  
    self.obs1_prob = { # probability for making observation 1 (G2)
        0: [0.5] * 20 + list(np.linspace(0.5, 0.0, 20)), # true_state = G1
        1: [0.5] * 4  + [0.6, 0.6, 0.7, 0.7, 0.8, 0.95], # true_state = G2
    }
    self.observations = []
    if self.random_obs and self.seed is not None:
      random.seed(self.seed)
  
  def reset(self):
    # reset our self.goal_dist to initial prior
    # reset our belief to initial probability

    ####
    self.goal_dist = beta_dist(1, 1) # uniform distribution
    self.belief = [self.goal_dist.get_mean(), 1 - self.goal_dist.get_mean()] # or [0.5, 0.5]
    
    self.observations = []
    if self.random_obs and self.seed is not None:
      random.seed(self.seed)
    ####

    # randomize the true_state, this will be used as our observation, if not be 
    # overwritten later
    self.true_state = np.random.choice(self.mdp.num_s)
    self.t = 0
    self.our_t = 0
  
  def update_belief(self, observation):
    # update the distribution
    # use the observation input to update our self.goal_dist, which is an object
    # of the beta_dist that we used before
    ####
    # randomize observations
    if self.random_obs:
      obs_count = len(self.observations)
      if obs_count < len(self.obs1_prob[observation]):
        observation = 1 if random.random() < self.obs1_prob[observation][obs_count] else 0
    
    self.observations.append(observation)
    self.goal_dist.update_beta_params(1, 1 - observation)  # success = observation 0 (i.e. other truck goes to G1)
    self.belief = [self.goal_dist.get_mean(), 1 - self.goal_dist.get_mean()]
    ####
  
  def get_next_action(self):
    # return a single next best action based on current belief and Q
    
    ####
    return np.argmax(self.belief @ self.Q)
    ####

Let's create 2 GIFs files for each case of true state (0 for $G_1$ and 1 for $G_2$).

In [None]:
import imageio
from IPython.display import Image
from tqdm.notebook import tqdm

# Get the Q value to pass to the visualizer
####
Q = t_intersection.R + t_intersection.gam * np.einsum("jia,j->ia", t_intersection.P, V_star)
####
t_intersection_simulation = QMDPTIntersectionVisualizer(t_intersection, Q, random_obs=False, seed=0)

# reset your true state here
true_state = 0

folder = "figure"
sub_folder = "qmdp-{}".format(true_state)
sub_folder_pdf = "qmdp-{}-pdf".format(true_state)
sub_folder_results = "results"

fig_folder = os.path.join("/content", folder)
fig_prog_folder = os.path.join(fig_folder, sub_folder)
fig_prog_folder_pdf = os.path.join(fig_folder, sub_folder_pdf)
fig_results_folder = os.path.join(fig_folder, sub_folder_results)
os.makedirs(fig_prog_folder, exist_ok=True)
os.makedirs(fig_prog_folder_pdf, exist_ok=True)
os.makedirs(fig_results_folder, exist_ok=True)

t_intersection_simulation.reset()
t_intersection_simulation.set_true_state(true_state)

for i in tqdm(range(60)):
  t_intersection_simulation.step()
  t_intersection_simulation.plot()
  plt.text(10, 28, "Observation count: {}".format(len(t_intersection_simulation.observations)))
  plt.savefig(os.path.join(fig_prog_folder, "{}.png".format(i)), dpi=75)
  plt.clf()

  pdf = t_intersection_simulation.goal_dist.get_pdf()
  pdf_mean = t_intersection_simulation.goal_dist.get_mean()
  plt.plot(1 - pdf['x'], pdf['fx'])
  plt.axvline(x = 1 - pdf_mean, color = "green")
  plt.title("Belief for $G_2$")
  plt.legend(['Posterior Distribution', 'Posterior mean'], loc='upper left')
  plt.savefig(os.path.join(fig_prog_folder_pdf, "{}.png".format(i)), dpi=75)
  plt.clf()

print("Observations: {}".format(t_intersection_simulation.observations))

In [163]:
#@title { run: "auto" , form-width: "30%"}
#@markdown # Set experiment tag for snapshot

experiment_tag = "A0"  # @param {type:"string"}

In [None]:
gif_path = os.path.join(fig_results_folder, '{}_{}.gif'.format(experiment_tag, true_state))
length = len([i for i in os.listdir(os.path.join(fig_prog_folder)) if ".png" in i])

with imageio.get_writer(gif_path, mode='I') as writer:
  for i in tqdm(range(length)):
    print(i, end='\r')
    filename = os.path.join(fig_prog_folder, str(i)+".png")
    image = imageio.imread(filename)
    writer.append_data(image)

gif_pdf_path = os.path.join(fig_results_folder, '{}_{}_pdf.gif'.format(experiment_tag, true_state))
length = len([i for i in os.listdir(os.path.join(fig_prog_folder_pdf)) if ".png" in i])

with imageio.get_writer(gif_pdf_path, mode='I') as writer:
  for i in tqdm(range(length)):
    print(i, end='\r')
    filename = os.path.join(fig_prog_folder_pdf, str(i)+".png")
    image = imageio.imread(filename)
    writer.append_data(image)

display(Image(open(gif_path,'rb').read(), width=400))
display(Image(open(gif_pdf_path,'rb').read(), width=400))

In [None]:
# !zip /content/results.zip /content/figure/results/*

## Task 2: Simulation Results

### A. Results under ideal observations
In this experiment, our own truck always observes the true state. Hence, the belief about the state of the other truck increases monotonically for the true state.

\#| \| |(G1, forward)|(G2, forward)|(G1, stop)|(G2, stop)| \| |forward|stop| \| |remark
-| - |-|-|-|-| - |-|-| - |-
A0| \| |[0.8, 0.2]|[0.1, 0.9]|[0.5, 0.5]|[0.05, 0.95]| \| |[10, -100]|[-1, -1]| \| |
A1| \| |**[0.9, 0.1]**|**[0.6, 0.4]**|[0.5, 0.5]|[0.05, 0.95]| \| |[10, -100]|[-1, -1]| \| | *less conservative (more prob. on s' = G1)*
A2| \| |**[0.9, 0.1]**|**[0.9, 0.1]**|[0.5, 0.5]|[0.05, 0.95]| \| |[10, -100]|[-1, -1]| \| | *less conservative (more prob. on s' = G1)*
A3| \| |**[0.9, 0.1]**|**[0.9, 0.1]**|[0.5, 0.5]|[0.05, 0.95]| \| |**[10, -10]**|[-1, -1]| \| | *collision (more prob. on s' = G1, higher reward for forward)*
A4| \| |**[0.9, 0.1]**|**[0.9, 0.1]**|[0.5, 0.5]|[0.05, 0.95]| \| |[10, -100]|**[-14, -14]**| \| | *almost collision (more prob. on s' = G1, smaller reward for stop)*

### B. Results under more realistic observations
In this experiment, we randomize the observations of our own truck and increase the probability of observing the true state over time. This increases the uncertainty in the belief about the state of the other truck.

\#| \| |(G1, forward)|(G2, forward)|(G1, stop)|(G2, stop)| \| |forward|stop| \| |remark
-| - |-|-|-|-| - |-|-| - |-
B0| \| |[0.8, 0.2]|[0.1, 0.9]|[0.5, 0.5]|[0.05, 0.95]| \| |[10, -100]|[-1, -1]| \| | *too conservative, does not move (higher uncertainty for s' = G1)*
B1| \| |[0.8, 0.2]|[0.1, 0.9]|[0.5, 0.5]|[0.05, 0.95]| \| |**[10, -50]**|[-1, -1]| \| | *conservative, starts moving very late*
B2| \| |[0.8, 0.2]|[0.1, 0.9]|[0.5, 0.5]|[0.05, 0.95]| \| |**[10, -10]**|[-1, -1]| \| | *almost collision*
B3| \| |[0.8, 0.2]|[0.1, 0.9]|[0.5, 0.5]|[0.05, 0.95]| \| |[10, -100]|**[-14, -14]**| \| | *less conservative, starts moving earlier*