# Comparison of Solutions for Markov Decision Processes

In this project, we will be comparing a variety of classic dynamic programming and linear programming methods. From the dynamic programming realm, the solution methods are:
-  Value Iteration
-  Policy Iteration

and for linear programming methods:
-  First order methods
-  Interior point methods
-  Simplex methods

## Introduction

Markov Decision Processes (MDPs) provide a mathematical formulation for stochastic decision making

In [2]:
import numpy as np
import numpy.linalg as la
import numpy.random as rn
import matplotlib.pyplot as plt

Below is an example of a candidate Grid World that will be solved using the above methods.

## Generating the MDP

## Value Iteration

## Policy Iteration

## Simplex Method

## First Order Methods

#### Projection Function:

##### Derivation of Projection Function

In [3]:
def proj(y, A, b):
    x = y - np.dot(np.dot(A.T, la.inv(np.dot(A, A.T))), np.dot(A, y) - b)
    return x

### Projected Gradient Descent

In [1]:
def graddes(x, A, b, t, eta0):
    eta = eta0/np.sqrt(t + 1)
    # gradient is always a vector of ones of shape x.shape
    grad = np.ones(x.shape)
    y = x - eta*grad
    v = proj(y, A, b)
    
    return x

### Projected Accelerated Gradient Descent

In [2]:
def accelgrad(x, v, A, b, t, eta0):
    eta = eta0/np.sqrt(t+1)
    grad = np.ones(x.shape)
    
    v = v - eta*grad
    
    theta_old = theta
    theta = (1 + np.sqrt(1 + 4*theta**2))/2
    
    xprev = x
    x = proj(v, A, b)
    
    v = x + (theta_old - 1)/(theta)*(x - xprev)
    
    return x, v, theta

## Interior Point Methods