In [2]:
import numpy as np
import pandas as pd

### MP/MRP definitions and MRP Value Function definition

**Markov Process** = (finite) state set $S$ + transition probability matrix between states following the Markov property: $P(X_{t+1} = s|X_t = s') = P(X_{t+1} = s|X_t = s',...,X_0 = x)$<br>
**Markov Reward Process** = Markov Process + Reward associated with each state + discount factor
- Reward: $R_s = \mathbb{E}[R_{t+1}|S_t = s]$ or $R(s,s')$ (two consective states)
- Discount factor: $\gamma$<br>

**Return:** $G_t = R_{t+1} + \gamma R_{t+2} + ... = \sum^{\infty}_{i=t+1}\gamma^{i-t-1} R_i$ <br>
**Value function**: $v(s) = \mathbb{E}[G_{t}|S_t = s]$

### Class design for MP/MRP

In [3]:
# Define MP by state set and transition matrix
# State set for MP: map (label to index)
# Transition matrix: matrix (nparray)
"""
    E.g.,
    state = {Rain: 0, Sunny: 1, Cloudy: 1, Windy: 1}
    tran_mat = np.asarray([0.1,0.2,0.3,0.4,
            0.25,0.25,0.25,0.25,
            0.1,0.2,0.3,0.4,
            0.25,0.25,0.25,0.25]).reshape((4,4))
    # Today's weather => tmr's weather
"""
class MP:
    def __init__(self, state, tran_mat):
        self.state = state
        self.tran_mat = tran_mat

In [5]:
# Define MRP by state set and transition matrix
# State set for MRP: map (label to index)
# State with the 1st reward definition: map (label to reward)
# Transition matrix: matrix (nparray)
"""
    E.g.,
    state = {Rain: 0, Sunny: 1, Cloudy: 1, Windy: 1}
    tran_mat = np.asarray([0.1,0.2,0.3,0.4,
            0.25,0.25,0.25,0.25,
            0.1,0.2,0.3,0.4,
            0.25,0.25,0.25,0.25]).reshape((4,4))
    state_reward = {Rain: 1, Sunny: 2, Cloudy: 3, Windy: 4}
    # Today's weather => tmr's weather
"""
class MRP(MP):
    def __init__(self, state_reward):
        self.reward = state_reward

### Reward functions and distribution
- Separately implement the r(s,s') and the R(s) = \sum_{s'} p(s,s') * r(s,s') definitions of MRP
- Write code to convert/cast the r(s,s') definition of MRP to the R(s) definition of MRP (put some thought into code design here)
- Write code to generate the stationary distribution for an MP