### Markov Reward Process (Lecture-2 till Bellman's Expectation Eq)
Like previous one, I will code the Student MDP from the lecture slides which will help to understand abstract concepts of an MRP-states,transitions and rewards and also use Bellman Equation.

#### Markov Reward process
A Markov reward process is a tuple $(S,P,R,\gamma)$:
- S is a finite set of states.
- P is a state transition probability matrix 
- R is the reward function $R_{s} = E[R_{t+1} | S_{t} = s]$
- $\gamma$ is a discount factor, $\gamma [0,1]$
- $\gamma$ close to 0 leads "myopic" evaluation
- $\gamma$ close to 1 leads "far-sighted" evaluation

#### Return 
The return $G_{t}$ is the total discounted reward from time-step t.
$G_{t} = R_{t+1}+\gamma R{t+2}+.... = \Sigma {\gamma}^{k} R_{t+k+1} k[0,\inf]$ 

In [1]:
import numpy as np 
states = {0: 'C1', 1: 'C2', 2: 'C3', 3: 'Pass', 4: 'Pub', 5: 'FB', 6: 'Sleep'} 
#immediate rewards R(t+1) for entering each state 
R = np.array([-2, -2, -2, 10, 1, -1, 0]) 

In [2]:
#Define the transition matrix P[i,j] from moving state i to state j 
P = np.array([
    # C1   C2   C3   Pass  Pub   FB    Sleep
    [0.0, 0.5, 0.0, 0.0, 0.0, 0.5, 0.0],  # Transitions from C1
    [0.0, 0.0, 0.8, 0.0, 0.0, 0.0, 0.2],  # Transitions from C2
    [0.0, 0.0, 0.0, 0.6, 0.4, 0.0, 0.0],  # Transitions from C3
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0],  # Transitions from Pass
    [0.2, 0.4, 0.4, 0.0, 0.0, 0.0, 0.0],  # Transitions from Pub
    [0.1, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0],  # Transitions from FB
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]   # Transitions from Sleep
])

In [3]:
#discount factor 
gamma = 0.9 

#### Solving the Bellman's Equation
Bellman's Equation is linear so can be solved easily. And also we will use the matrix representation to do so:

$v = (I-\gamma P)^{-1} R$

In [4]:
I = np.eye(len(states)) #len same size as P 
v = np.linalg.inv(I - gamma * P) @ R  #bellman's equation  

In [5]:
print("Calculated State-Value Function (v):")
for i in range(len(states)): 
    print(f"v({states[i]}): {v[i]:.1f}")  

Calculated State-Value Function (v):
v(C1): -5.0
v(C2): 0.9
v(C3): 4.1
v(Pass): 10.0
v(Pub): 1.9
v(FB): -7.6
v(Sleep): 0.0
