# Gridworld

**Author:** ZHENG Wenjie

**Last Update:** 2021-08-19

This notebook is related to Example 3.5 of the book. It replicates Figure 3.2.

For the plot renderer, I used 'notebook_connected' to reduce the file size. For personal use, replace it with 'notebook'.

## Bellman equation for the state values

$$
\begin{align}
v_{11} &= 0.5(-1 + \gamma v_{11}) + 0.25 \gamma v_{12} + 0.25 \gamma v_{21} \\
v_{12} &= 10 + \gamma v_{52} \\
v_{13} &= 0.25(-1 + \gamma v_{13}) + 0.25 \gamma v_{12} + 0.25 \gamma v_{14} + 0.25 \gamma v_{23} \\
v_{14} &= 5 + \gamma v_{34} \\
v_{15} &= 0.5(-1 + \gamma v_{15}) + 0.25 \gamma v_{14} + 0.25 \gamma v_{25} \\
v_{21} &= 0.25(-1 + \gamma v_{21}) + 0.25 \gamma v_{11} + 0.25 \gamma v_{22} + 0.25 \gamma v_{31} \\
v_{22} &= 0.25 \gamma v_{12} + 0.25 \gamma v_{21} + 0.25 \gamma v_{23} + 0.25 \gamma v_{32} \\
v_{23} &= 0.25 \gamma v_{13} + 0.25 \gamma v_{22} + 0.25 \gamma v_{24} + 0.25 \gamma v_{33} \\
v_{24} &= 0.25 \gamma v_{14} + 0.25 \gamma v_{23} + 0.25 \gamma v_{25} + 0.25 \gamma v_{34} \\
v_{25} &= 0.25(-1 + \gamma v_{25}) + 0.25 \gamma v_{15} + 0.25 \gamma v_{24} + 0.25 \gamma v_{35} \\
v_{31} &= 0.25(-1 + \gamma v_{31}) + 0.25 \gamma v_{21} + 0.25 \gamma v_{32} + 0.25 \gamma v_{41} \\
v_{32} &= 0.25 \gamma v_{22} + 0.25 \gamma v_{31} + 0.25 \gamma v_{33} + 0.25 \gamma v_{42} \\
v_{33} &= 0.25 \gamma v_{23} + 0.25 \gamma v_{32} + 0.25 \gamma v_{34} + 0.25 \gamma v_{43} \\
v_{34} &= 0.25 \gamma v_{24} + 0.25 \gamma v_{33} + 0.25 \gamma v_{35} + 0.25 \gamma v_{44} \\
v_{35} &= 0.25(-1 + \gamma v_{35}) + 0.25 \gamma v_{25} + 0.25 \gamma v_{34} + 0.25 \gamma v_{45} \\
v_{41} &= 0.25(-1 + \gamma v_{41}) + 0.25 \gamma v_{31} + 0.25 \gamma v_{42} + 0.25 \gamma v_{51} \\
v_{42} &= 0.25 \gamma v_{32} + 0.25 \gamma v_{41} + 0.25 \gamma v_{43} + 0.25 \gamma v_{52} \\
v_{43} &= 0.25 \gamma v_{33} + 0.25 \gamma v_{42} + 0.25 \gamma v_{44} + 0.25 \gamma v_{53} \\
v_{44} &= 0.25 \gamma v_{34} + 0.25 \gamma v_{43} + 0.25 \gamma v_{45} + 0.25 \gamma v_{54} \\
v_{45} &= 0.25(-1 + \gamma v_{45}) + 0.25 \gamma v_{35} + 0.25 \gamma v_{44} + 0.25 \gamma v_{55} \\
v_{51} &= 0.5(-1 + \gamma v_{51}) + 0.25 \gamma v_{41} + 0.25 \gamma v_{52} \\
v_{52} &= 0.25(-1 + \gamma v_{52}) + 0.25 \gamma v_{42} + 0.25 \gamma v_{51} + 0.25 \gamma v_{53} \\
v_{53} &= 0.25(-1 + \gamma v_{53}) + 0.25 \gamma v_{43} + 0.25 \gamma v_{52} + 0.25 \gamma v_{54} \\
v_{54} &= 0.25(-1 + \gamma v_{54}) + 0.25 \gamma v_{44} + 0.25 \gamma v_{53} + 0.25 \gamma v_{55} \\
v_{55} &= 0.5(-1 + \gamma v_{55}) + 0.25 \gamma v_{45} + 0.25 \gamma v_{54} 
\end{align}
$$

In [1]:
import numpy as np
import plotly.express as px
import plotly.figure_factory as ff
import plotly.io as pio
pio.renderers.default = 'notebook_connected' # or 'notebook' for personal use

In [2]:
A = np.zeros((25, 25))

A[0,0], A[0,1], A[0,5] = 2, 1, 1
A[1,21] = 4
A[2,1], A[2,2], A[2,3], A[2,7] = 1, 1, 1, 1
A[3,13] = 4
A[4,3], A[4,4], A[4,9] = 1, 2, 1

A[5,0], A[5,5], A[5,6], A[5,10] = 1, 1, 1, 1
A[6,1], A[6,5], A[6,7], A[6,11] = 1, 1, 1, 1
A[7,2], A[7,6], A[7,8], A[7,12] = 1, 1, 1, 1
A[8,3], A[8,7], A[8,9], A[8,13] = 1, 1, 1, 1
A[9,4], A[9,8], A[9,9], A[9,14] = 1, 1, 1, 1

A[10,5], A[10,10], A[10,11], A[10,15] = 1, 1, 1, 1
A[11,6], A[11,10], A[11,12], A[11,16] = 1, 1, 1, 1
A[12,7], A[12,11], A[12,13], A[12,17] = 1, 1, 1, 1
A[13,8], A[13,12], A[13,14], A[13,18] = 1, 1, 1, 1
A[14,9], A[14,13], A[14,14], A[14,19] = 1, 1, 1, 1

A[15,10], A[15,15], A[15,16], A[15,20] = 1, 1, 1, 1
A[16,11], A[16,15], A[16,17], A[16,21] = 1, 1, 1, 1
A[17,12], A[17,16], A[17,18], A[17,22] = 1, 1, 1, 1
A[18,13], A[18,17], A[18,19], A[18,23] = 1, 1, 1, 1
A[19,14], A[19,18], A[19,19], A[19,24] = 1, 1, 1, 1

A[20,15], A[20,20], A[20,21] = 1, 2, 1
A[21,16], A[21,20], A[21,21], A[21,22] = 1, 1, 1, 1
A[22,17], A[22,21], A[22,22], A[22,23] = 1, 1, 1, 1
A[23,18], A[23,22], A[23,23], A[23,24] = 1, 1, 1, 1
A[24,19], A[24,23], A[24,24] = 1, 1, 2

In [3]:
γ = 0.90
L = np.eye(25) - 0.25*γ*A
b = [-0.5, 10, -0.25, 5, -0.5,
     -0.25, 0, 0, 0, -0.25,
     -0.25, 0, 0, 0, -0.25,
     -0.25, 0, 0, 0, -0.25,
     -0.5, -0.25, -0.25, -0.25, -0.5
    ]

In [4]:
v = np.linalg.solve(L,b).reshape(5, 5)

In [5]:
v

array([[ 3.30899634,  8.78929186,  4.42761918,  5.32236759,  1.49217876],
       [ 1.52158807,  2.99231786,  2.25013995,  1.9075717 ,  0.54740271],
       [ 0.05082249,  0.73817059,  0.67311326,  0.35818621, -0.40314114],
       [-0.9735923 , -0.43549543, -0.35488227, -0.58560509, -1.18307508],
       [-1.85770055, -1.34523126, -1.22926726, -1.42291815, -1.97517905]])

In [6]:
fig = px.imshow(v, color_continuous_scale='Blues')
for i in range(5):
    for j in range(5):
        fig.add_annotation(x=j, y=i, text=np.round(v[i,j], 2), showarrow=False, font_size=16, font_color='Red')
fig.show()

In [7]:
ff.create_annotated_heatmap(np.flipud(np.round(v, 1)))

## Discussion

If all state values increase by 1, then all rewards increase by $1-\gamma$

The $\gamma<1$ in the Bellman equation is necessary. Otherwise, the matrix is not of full rank: the sum of every row is 0.