# T-intersection QMDP
In this part you will attempt to formulate the POMDP of T-intersection, and use QMDP to solve it.

In [1]:
# @markdown Run this cell to install dependencies.
%%capture

% cd /content
! git clone https://github.com/buzi-princeton/MDP.git

In [2]:
from MDP.mdp import MDP
import numpy as np
import os
from scipy.stats import beta
from scipy.stats import binom

# MDP formulation
First, create the MDP to capture the underlying MDP of the T-intersection POMDP, using the MDP class.

In [3]:
# Create the T-intersection MDP
class TIntersection(MDP):
  def __init__(self):
    self.gam = 0.9
    self.goal = ["G1", "G2"]
    self.actions = ["forward", "stop"]
    super().__init__(states=[self.goal], actions=self.actions)

    self.populate_data()

  def populate_data(self):
    # use self.add_route(s, a, s') from MDP to add route to MDP
    # use self.add_reward(s, a, r) from MDP to add reward
    ####
    ## YOUR CODE HERE
    raise NotImplementedError("Your TIntersection MDP is empty!")
    ####

Next, reuse your code for value iteration to solve the underlying MDP

In [4]:
# value iteration
def value_iteration(threshold = .001, mdp=None):
  if mdp is None:
    raise ValueError("MDP cannot be None")
  numa, nums, R, P = mdp.get_mdp()
  V_star = np.zeros(nums)
  pi_star = np.zeros(nums)
  ####
  ## YOUR CODE HERE
  raise NotImplementedError("Your value iteration is empty!")
  ####
  return V_star, pi_star

In [None]:
# Test
t_intersection = TIntersection()
V_star, pi_star = value_iteration(mdp=t_intersection)
print("V_star: ", V_star)
print("pi_star: ", pi_star)

## Bayesian Inference
Next, we will write the Bayesian update for our belief. Using beta-binomial model to update our belief, assuming that observation for when the other car is heading toward $G_1$ is 0, and 1 when it is heading for $G_2$, write down your prior and your posterior update, choose your prior hyperparameters and reason your choice.

We provide you with a **beta_dist** class. Since we are using conjugate prior, the posterior and prior will share the same form, and thus updating the posterior is similar to changing the hyperparameters of the prior to reflect the update. Fill in the missing code for function **update_beta_params()** and **get_mean()**.

In [None]:
# supporting class for beta distribution

class beta_dist:
  def __init__(self, a = None, b = None):
    self.a = a
    self.b = b
      
  #Get the beta pdf
  def get_pdf(self):
    x = np.linspace(0, 1, 1000)
    fx = beta.pdf(x, self.a, self.b)
    dens_dict = {'x': x, 'fx': fx}
    return(dens_dict)
      
  #Update parameters:
  def update_beta_params(self, n, num_successes):
    ####
    ## YOUR CODE HERE
    raise NotImplementedError("Your posterior update is empty!")
    ####
  
  def get_mean(self):
    ####
    ## YOUR CODE HERE
    raise NotImplementedError("Your mean function is empty!")
    ####

In [None]:
# Test your beta_dist with a batch of observations
# create prior using the found hyperparams

# define a, b for prior
####
## YOUR CODE HERE
raise NotImplementedError("Missing a, b for prior")
a = None
b = None
####

goal_dist = beta_dist(a, b)
prior = goal_dist.get_pdf()

# likelihood observation
observations = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0]

# update the distribution
goal_dist.update_beta_params(len(observations), sum(observations))
posterior = goal_dist.get_pdf()
print("The updated hyperparameters are:")
print(goal_dist.a, goal_dist.b)

# calculate the mean of posterior
post_mean = goal_dist.get_mean()

print("The mean value of our posterior is %s" %post_mean)

#Plot prior and posterior
plt.plot(prior['x'], prior['fx'])
plt.plot(posterior['x'], posterior['fx'])
plt.axvline(x = post_mean, color = "green")
plt.legend(['Prior Distribution', 'Posterior Distribution', 'Posterior mean'], loc='upper left')

plt.show()

Report your posterior mean shift for the following observations
* A single **0**
* A series of 10 **0**
* A series of 10 **1**
* 5 **1** and 5 **0**

In [None]:
####
## YOUR CODE HERE
raise NotImplementedError("Report the posterior mean for 4 cases")

####

## QMDP
Let's now write the QMDP function to generalize our value calculated before to belief space.

First, create an array of beliefs, then write a function that takes in the pre-computed value function, the MDP and the belief to retun a single next best action to take.

In [None]:
def QMDP(V_star, belief, mdp=None):
  if mdp is None:
    raise ValueError("MDP cannot be None")
  
  numa, nums, R, P = mdp.get_mdp()

  # compute MDP-value for state-action pairs (Q)
  ####
  ## YOUR CODE HERE
  raise NotImplementedError("Your QMDP function is empty")

  ####
  return None

In [None]:
# Test
for p in np.arange(0.0, 1.1, 0.1):
  belief = [p, 1 - p]
  print("Belief: [{:.2f}, {:.2f}]\tAction to take: ".format(belief[0], belief[1]), QMDP(V_star, belief, mdp=t_intersection))

It can be now seen that we will not go for **forward** action if we are not too sure that the other car will be heading toward G1.

Now, we will test QMDP in a real situation. Your car will be at the T-intersection. The intent of the other car will be randomly chosen every time the game is reset (either $G_1$ or $G_2$). You will be able to see our car, running QMDP, either following with our current plan, or waiting for the other car to finish its turn before we continue with our plan.

First, let's write some supporting function for the visualizer class that we will be using.

In [None]:
from MDP.visualizer.t_intesection import TIntersectionSimulation

class QMDPTIntersectionVisualizer(TIntersectionSimulation):
  def __init__(self, mdp, Q):
    
    self.goal_dist = beta_dist(1, 1)
    super().__init__(mdp, Q)
  
  def reset(self):
    # reset our self.goal_dist to initial prior
    # reset our belief to initial probability

    ####
    ## YOUR CODE HERE
    raise NotImplementedError("reset is not done")
    """
    self.goal_dist = ...
    self.belief = ...
    """
    ####

    # randomize the true_state, this will be used as our observation, if not be 
    # overwritten later
    self.true_state = np.random.choice(self.mdp.num_s)
    self.t = 0
    self.our_t = 0
  
  def update_belief(self, observation):
    # update the distribution
    # use the observation input to update our self.goal_dist, which is an object
    # of the beta_dist that we used before
    ####
    ## YOUR CODE HERE
    raise NotImplementedError("update_belief is empty")
    """
    self.goal_dist.update_beta_params(..., ...)
    self.belief = None
    """
    ####
  
  def get_next_action(self):
    # return a single next best action based on current belief and Q
    
    ####
    ## YOUR CODE HERE
    raise NotImplementedError("get_next_action cannot be none")

    ####

Let's create 2 GIFs files for each case of true state (0 for $G_1$ and 1 for $G_2$).

In [None]:
import imageio
from IPython.display import Image
from tqdm.notebook import tqdm

# Get the Q value to pass to the visualizer
####
## YOUR CODE HERE
raise NotImplementedError("Q cannot be none")
Q = None
####
t_intersection_simulation = QMDPTIntersectionVisualizer(t_intersection, Q)

# reset your true state here
true_state = 0

folder = "figure"
sub_folder = "qmdp-{}".format(true_state)

fig_folder = os.path.join("/content", folder)
fig_prog_folder = os.path.join(fig_folder, sub_folder)
os.makedirs(fig_prog_folder, exist_ok=True)

t_intersection_simulation.reset()
t_intersection_simulation.set_true_state(true_state)

for i in tqdm(range(100)):
  t_intersection_simulation.step()
  t_intersection_simulation.plot()
  plt.savefig(os.path.join(fig_prog_folder, "{}.png".format(i)), dpi=200)
  plt.clf()

gif_path = os.path.join(fig_prog_folder, 'result.gif')
length = len([i for i in os.listdir(os.path.join(fig_prog_folder)) if ".png" in i])

with imageio.get_writer(gif_path, mode='I') as writer:
  for i in tqdm(range(length)):
    print(i, end='\r')
    filename = os.path.join(fig_prog_folder, str(i)+".png")
    image = imageio.imread(filename)
    writer.append_data(image)
Image(open(gif_path,'rb').read(), width=400)