# CS 640 - 2025 Fall - Homework 4

In this homework, you will define a Markov reward process describing a cat making random walk on a 32 x 32 grid looking for a fish.

## Instructions

1. Follow the instructions below to construct and analyze the Markov reward process.
2. Run all the cells so that all the check cells are updated.
3. Answer the question at the bottom.
4. Submit your notebook in Gradescope.

This problem is designed to be slow on a CPU but fast with a GPU.
Please plan accordingly for SCC access as needed.

In [13]:
import torch

## State Descriptions

You will construct a Markov reward problem based on the following specification.
1. The states are numbered from 0 to 1024 (inclusive).
2. The state 1024 is a special done state.
3. For state s from 0 to 1023, the state number encodes coordinates as follows.
  * x = s % 32
  * y = s / 32 (integer division)
  * x and y represent the location of a cat in a 32x32 grid.
4. At the state corresponding to x=16,y=16, there is a fish unless the cat is also there.

The following function `print_state` will visualize the state.

In [14]:
def print_state(s):
    assert 0 <= s <= 1024

    print("STATE", s)

    if s < 1024:
        # normal state indicating y, x coordinates
        output = ['🪨' for _ in range(1024)]
        output[16*32+16] = '🐟'
        output[s] = '🐱'
        for i in range(0, 1024, 32):
            print(''.join(output[i:i+32]))
    else:
        print("DONE")

    print("")

for s in (0, 2 * 32 + 5, 16*32+16, 1023, 1024):
    print_state(s)

STATE 0
🐱🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🐟🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨🪨
🪨🪨

In [15]:
# Initialize the transition matrix P
P = torch.zeros(1025, 1025)

# Rule 3: If the state is 1024 (the done state), the state stays the same with probability 1.
P[1024, 1024] = 1.0

# Rule 4: If the state is 528 (cat is on the fish), the state changes to 1024 with probability 1.
P[528, 1024] = 1.0

# Rule 5: For any other state, transition to adjacent states with equal probability.
for s in range(1024):
    if s != 528:
        x = s % 32
        y = s // 32
        neighbors = []
        # Check adjacent states (up, down, left, right)
        if y > 0:  # Up
            neighbors.append(s - 32)
        if y < 31: # Down
            neighbors.append(s + 32)
        if x > 0:  # Left
            neighbors.append(s - 1)
        if x < 31: # Right
            neighbors.append(s + 1)

        prob = 1.0 / len(neighbors)
        for neighbor in neighbors:
            P[s, neighbor] = prob

## State Transitions

Construct a transition matrix P based on the following rules.

1. P should be 1025x1025.
2. P[i,j] should hold the probability of transitioning from state i to j in one step.
3. If the state is 1024 (the done state), the state stays the same with probability 1.
4. If the state is 528 (cat is on the fish), the state changes to 1024 with probability 1.
5. For any other state,
    * There are equal probabilities of transitioning to any states representing locations that are adjacent horizontally or vertically.
    * The probability of transitioning to a state representing a non-adjacent location is zero.

For example, state 0 (x=0,y=0) has a 50% chance of transitioning to state 1 (x=1,y=0) or state 32 (x=0,y=1) and 0% chance of transitioning to state 33 (x=1,y=1).

In [16]:
# Initialize the transition matrix P
P = torch.zeros(1025, 1025)

# Rule 3: If the state is 1024 (the done state), the state stays the same with probability 1.
P[1024, 1024] = 1.0

# Rule 4: If the state is 528 (cat is on the fish), the state changes to 1024 with probability 1.
P[528, 1024] = 1.0

# Rule 5: For any other state, transition to adjacent states with equal probability.
for s in range(1024):
    if s != 528:
        x = s % 32
        y = s // 32
        neighbors = []
        # Check adjacent states (up, down, left, right)
        if y > 0:  # Up
            neighbors.append(s - 32)
        if y < 31: # Down
            neighbors.append(s + 32)
        if x > 0:  # Left
            neighbors.append(s - 1)
        if x < 31: # Right
            neighbors.append(s + 1)

        prob = 1.0 / len(neighbors)
        for neighbor in neighbors:
            P[s, neighbor] = prob

### Check that all the rows sum to one.

In [17]:
P_check = P.sum(axis=1)
P_check.min(), P_check.max()

(tensor(1.), tensor(1.))

## State Rewards

Construct a reward vector R based on the following rules.

1. R should be 1025x1.
2. R[i] should hold the reward after state i.
3. The reward after state 528 (cat is on fish) is 100.
4. The reward for all other states is 0.

In [18]:
# Initialize the reward vector R
R = torch.zeros(1025)

# Rule 3: The reward after state 528 (cat is on fish) is 100.
R[528] = 100

## Expected State Values

Compute the value function $v_*$ for each state using $\gamma=0.9$ and save it in $v$.

In [19]:
gamma = 0.9
I = torch.eye(1025)
v = torch.linalg.solve(I - gamma * P, R)

### Check special states

In [20]:
# done state
v[1024]

tensor(0.)

In [21]:
# cat arrived at fish
v[528]

tensor(100.)

In [22]:
# cat next to fish
v[529]

tensor(34.5800)

In [23]:
# cat farther from fish
v[530]

tensor(13.5796)

## Steady State Distribution

Use repeated exponentiation to compute the steady state distribution of this Markov reward process.
That is, take $P$ to a high enough power that the probabilities in each row are identical.
Set `ss` to be a vector of 1025 probabilities for that steady state distribution.

In [24]:
# Compute the steady state distribution by raising P to a high power
P_steady = torch.matrix_power(P, 1000) # Raising to a high power like 1000

# The steady state distribution is the same for all rows of P_steady
ss = P_steady[0]

## Steady State Explanation

In the cell below, describe the probabilities in `ss` as concisely as possible, and explain why they are that way.

YOUR ANSWER HERE

The steady state distribution [ss] has a probability of 1 for state 1024 (the done state) and a probability of 0 for all other states. This is because state 1024 is an absorbing state, meaning that once the process enters this state, it never leaves. The cat eventually reaches the fish at state 528 and transitions to the done state (1024), where it remains.