Welcome!  If you are new to Google Colab/Jupyter notebooks, you might take a look at [this notebook](https://colab.research.google.com/notebooks/basic_features_overview.ipynb) first.

**I recommend you run the first code cell of this notebook immediately, to start provisioning drake on the cloud machine, then you can leave this window open as you [read the textbook](http://underactuated.csail.mit.edu/dp.html).**

# Notebook Setup

The following cell will:
- on Colab (only), install Drake to `/opt/drake`, install Drake's prerequisites via `apt`, and add pydrake to `sys.path`.  This will take approximately two minutes on the first time it runs (to provision the machine), but should only need to reinstall once every 12 hours.  If you navigate between notebooks using Colab's "File->Open" menu, then you can avoid provisioning a separate machine for each notebook.
- import packages used throughout the notebook.

You will need to rerun this cell if you restart the kernel, but it should be fast (even on Colab) because the machine will already have drake installed.

In [None]:
import importlib
import sys
from urllib.request import urlretrieve

# Install drake (and underactuated).
if 'google.colab' in sys.modules and importlib.util.find_spec('underactuated') is None:
    urlretrieve(f"http://underactuated.csail.mit.edu/scripts/setup/setup_underactuated_colab.py",
                "setup_underactuated_colab.py")
    from setup_underactuated_colab import setup_underactuated
    setup_underactuated(underactuated_sha='f422346ae0a8862ea8c3d7b44d30010599bfd1d1', drake_version='0.25.0', drake_build='releases')

# Imports.
import matplotlib.animation as animation
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import pydrake.all
from IPython.display import display, HTML
from pydrake.all import DiagramBuilder, LinearSystem, Simulator
from pydrake.systems.controllers import (DynamicProgrammingOptions,
                                         FittedValueIteration)

import underactuated
from underactuated.jupyter import SetupMatplotlibBackend

plt.rcParams.update({"savefig.transparent": True})

# The Grid World

The setup here is *almost* identical as the simplest version described in the notes.  The only difference is that this agent is allowed to move diagonally in a single step; this is slightly easier to code since I can have two actions (one for left/right, and another for up/down), and write the dynamics as the trivial linear system ${\bf x}[n+1] = {\bf u}[n].$  Only the value iteration code needs to know that the states and actions are actually restricted to the integers.

The obstacle (pit of despair) is provided by the method below.  Play around with it!  The rest of the code is mostly to support visualization.

TODO(russt): Pull a few more of the visualization frills from my (very) old [matlab code](https://github.com/RobotLocomotion/drake/blob/last_sha_with_original_matlab/drake/examples/GridWorld.m).  At very least, I want to draw the vector field of the resulting policy.

In [None]:
def grid_world():
    time_step = 1
    # TODO(russt): Support discrete-time systems in the dynamic programming code, and use this properly.
    #plant = LinearSystem(A=np.eye(2), B=np.eye(2), C=np.eye(2), D=np.zeros((2,2)), time_period=time_step)
    # for now, just cheat because I know how to make the discrete system as a continuous that will be discretized.
    plant = LinearSystem(A=np.zeros((2,2)), B=np.eye(2), C=np.eye(2), D=np.zeros((2,2)))
    simulator = Simulator(plant)
    options = DynamicProgrammingOptions()

    xbins = range(0, 21)
    ybins = range(0, 21)
    state_grid = [set(xbins), set(ybins)]

    input_grid = [set([-1, 0, 1]), set([-1, 0, 1])]

    goal = [2, 8]

    def obstacle(x):
        return x[0]>=6 and x[0]<=8 and x[1]>=4 and x[1]<=7

    [X, Y] = np.meshgrid(xbins, ybins)

    frames=[]
    def draw(iteration, mesh, cost_to_go, policy):
        J = np.reshape(cost_to_go, X.shape)
        artists = [ax.imshow(J, cmap=cm.jet)]
        frames.append(artists)

    options.visualization_callback = draw

    def min_time_cost(context):
        x = context.get_continuous_state_vector().CopyToVector()
        x = np.round(x)
        if obstacle(x):
            return 10
        if np.array_equal(x, goal):
            return 0
        return 1
        
    cost_function = min_time_cost
    options.convergence_tol = .1;

#    fig = plt.figure(figsize=(8, 8))
#    ax = fig.gca()
    (fig,ax) = plt.subplots()
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_title("Cost-to-Go")

    policy, cost_to_go = FittedValueIteration(simulator, cost_function, state_grid,
                                            input_grid, time_step, options)

    ax.invert_yaxis()
    plt.colorbar(frames[-1][0])

    # create animation using the animate() function
    ani = animation.ArtistAnimation(fig, frames, interval=200, blit=True, repeat=False)
    plt.close('all')

    display(HTML(ani.to_jshtml()))

grid_world()

Your turn.  Change the cost.  Change the obstacles.