# CS 640 - 2025 Fall - Homework 4

In this homework, you will define a Markov reward process describing a cat making random walk on a 32 x 32 grid looking for a fish.

## Instructions

1. Follow the instructions below to construct and analyze the Markov reward process.
2. Run all the cells so that all the check cells are updated.
3. Answer the question at the bottom.
4. Submit your notebook in Gradescope.

This problem is designed to be slow on a CPU but fast with a GPU.
Please plan accordingly for SCC access as needed.

In [2]:
import torch
from networkx.classes import neighbors
import numpy as np

## State Descriptions

You will construct a Markov reward problem based on the following specification.
1. The states are numbered from 0 to 1024 (inclusive).
2. The state 1024 is a special done state.
3. For state s from 0 to 1023, the state number encodes coordinates as follows.
  * x = s % 32
  * y = s / 32 (integer division)
  * x and y represent the location of a cat in a 32x32 grid.
4. At the state corresponding to x=16,y=16, there is a fish unless the cat is also there.

The following function `print_state` will visualize the state.

In [3]:
def print_state(s):
    assert 0 <= s <= 1024

    print("STATE", s)

    if s < 1024:
        # normal state indicating y, x coordinates
        output = ['🪨' for _ in range(1024)]
        output[16*32+16] = '🐟'
        output[s] = '🐱'
        for i in range(0, 1024, 32):
            print(''.join(output[i:i+32]))
    else:
        print("DONE")

    print("")

for s in (0, 2 * 32 + 5, 16*32+16, 1023, 1024):
    print_state(s)

STATE 0
🐱🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🐟🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨

## State Transitions

Construct a transition matrix P based on the following rules.

1. P should be 1025x1025.
2. P[i,j] should hold the probability of transitioning from state i to j in one step.
3. If the state is 1024 (the done state), the state stays the same with probability 1.
4. If the state is 528 (cat is on the fish), the state changes to 1024 with probability 1.
5. For any other state,
    * There are equal probabilities of transitioning to any states representing locations that are adjacent horizontally or vertically.
    * The probability of transitioning to a state representing a non-adjacent location is zero.

For example, state 0 (x=0,y=0) has a 50% chance of transitioning to state 1 (x=1,y=0) or state 32 (x=0,y=1) and 0% chance of transitioning to state 33 (x=1,y=1).

In [5]:
# YOUR CHANGES HERE

N = 32
S = N*N + 1
DONE = 1024
FISH = 528

rows,cols,vals = [],[],[]
for i in range(0,S):
    if i == FISH or i == DONE: continue
    x = i % N
    y = i // N
    neighbors = []
    if x > 0: neighbors += [i-1]
    if x < N-1: neighbors += [i+1]
    if y > 0: neighbors += [i-N]
    if y < N-1: neighbors += [i+N]
    p = 1.0 / len(neighbors)
    for j in neighbors:
        rows.append(i), cols.append(j), vals.append(p)
#special states
rows += [FISH]; cols += [DONE]; vals += [1.0]
rows += [DONE]; cols += [DONE]; vals += [1.0]

P = np.zeros((S, S), dtype=float)

for r, c, v in zip(rows, cols, vals):
    P[r, c] = v
P

array([[0.        , 0.5       , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.33333333, 0.        , 0.33333333, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.33333333, 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.33333333,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.5       , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]], shape=(1025, 1025))

### Check that all the rows sum to one.

In [6]:
P_check = P.sum(axis=1)
P_check.min(), P_check.max()

(np.float64(1.0), np.float64(1.0))

## State Rewards

Construct a reward vector R based on the following rules.

1. R should be 1025x1.
2. R[i] should hold the reward after state i.
3. The reward after state 528 (cat is on fish) is 100.
4. The reward for all other states is 0.

In [15]:
# YOUR CHANGES HERE
N=32
S = N*N + 1
R = np.zeros((S,1),dtype=int)
R[16*32+16+1][0] = 100
R.shape

(1025, 1)

## Expected State Values

Compute the value function $v_*$ for each state using $\gamma=0.9$ and save it in $v$.

In [35]:
# YOUR CHANGES HERE
gamma = 0.9
S = P.shape[0]
I = np.eye(S)
v = np.linalg.solve(I-gamma*P, R)
v = v.reshape(-1,1)


### Check special states

In [19]:
# done state
v[1024]

array([0.])

In [20]:
# cat arrived at fish
v[528]

array([0.])

In [21]:
# cat next to fish
v[529]

array([127.82346031])

In [22]:
# cat farther from fish
v[530]

array([43.38715904])

## Steady State Distribution

Use repeated exponentiation to compute the steady state distribution of this Markov reward process.
That is, take $P$ to a high enough power that the probabilities in each row are identical.
Set `ss` to be a vector of 1025 probabilities for that steady state distribution.

In [32]:
# YOUR CHANGES HERE
S = P.shape[0]
ss = np.ones(S) / S

tol = 1e-12
max_iter = 100000

for _ in range(max_iter):
    ss_next = ss @ P
    if np.max(np.abs(ss_next - ss)) < tol:
        ss = ss_next
        break
    ss = ss_next

ss

array([1.35657931e-12, 2.03320058e-12, 2.03152081e-12, ...,
       1.93325797e-12, 1.28989623e-12, 9.99999998e-01], shape=(1025,))

## Steady State Explanation

In the cell below, describe the probabilities in `ss` as concisely as possible, and explain why they are that way.

YOUR ANSWER HERE