# Implementation: Value Iteration

**Goal**: Solve a 3-room hallway.

In [None]:
import numpy as np

# 1. Environment: 0 -- 1 -- 2 -- [3](Goal)
# State 3 is Terminal (Value 0, Reward 10 to enter)

values = np.zeros(4) # V(0), V(1), V(2), V(3)
gamma = 0.9

# 2. Iteration
for i in range(5):
    print(f"Iter {i}: {values}")
    new_values = np.zeros(4)
    
    # State 0: Can go Right to 1
    # Rew = 0 (step cost simplified to 0), Next = V(1)
    new_values[0] = 0 + gamma * values[1]
    
    # State 1: Left to 0, Right to 2. Maximize.
    right = 0 + gamma * values[2]
    left = 0 + gamma * values[0]
    new_values[1] = max(right, left)
    
    # State 2: Left to 1, Right to 3 (Goal!)
    # If we go Right: Reward 10 + V(3)
    right = 10 + gamma * values[3]
    left = 0 + gamma * values[1]
    new_values[2] = max(right, left)
    
    # State 3: Terminal. Value stays 0.
    new_values[3] = 0
    
    values = new_values

print(f"Final Values: {values}")

## Conclusion
V(2) became 10.
V(1) became 9 (10 * 0.9).
V(0) became 8.1 (9 * 0.9).
The values propagate backwards from the goal.