# Lecture: Implementation of Markov decision processes

Suppose we have the markov decision process as depicted in the lecture slides.

In [None]:
!git clone https://github.com/Fjoelsak/RL.git
!cp RL/02_MDP/markov_decision_process.py ./

In [1]:
import markov_decision_process as mdp

# state space = [FB, C1, C2, C3, Sleep]
states = list(range(5))

# action space = [FB, Study, Quit, Sleep, Pub]
actions = {0: [0, 2],
           1: [0, 1],
           2: [1, 3],
           3: [1, 4],
           4: []
           }

# rewards according to the process diagram from lecture slides
rewards = {
    (0,0) : -1,    (0,2) : 0,
    (1,0) : -1,    (1,1) : -2,
    (2,1) : -2,    (2,3) : 0,
    (3,1) : 10,    (3,4) : 1
}

# transition probabilities according to the process diagram from lecture slides
transProbs = {(0,0): {0: 1},
        (0,2) : {1: 1},
        (1,0) : {0: 1},
        (1,1) : {2: 1},
        (2,1) : {3: 1},
        (2,3) : {4: 1},
        (3,1) : {4: 1},
        (3,4) : {1: 0.2, 2: 0.4, 3: 0.4}
}

Student_MDP = mdp.MarkovDecisionProcess(states, actions, rewards, transProbs, [4], 1)

{0: {0: 0.5, 2: 0.5}, 1: {0: 0.5, 1: 0.5}, 2: {1: 0.5, 3: 0.5}, 3: {1: 0.5, 4: 0.5}, 4: {}}


# Excercise: Analytical solution for an MDP for a given equally distributed policy

## Task 1
Implement the `sample()` method in the `MarkovDecisionProcess` class in `markov_decision_process.py` to sample trajectories from the MDP.

In [2]:
Student_MDP.sample(0)

[(0, 0, -1), (0, 0, -1), (0, 2, 0), (1, 1, -2), (2, 1, -2), (3, 1, 10)]

## Task 2

In the class `MarkovDecisionProcess`  in `markov_decision_process.py` a default policy is implemented as equally distributed policy. Implement the analytical solution introduced in the lecture slides by calculating $\mathcal R^{\pi}$ and $\mathcal P^{\pi}$ by averaging over the policy probabilities to the corresponding actions and apply the method already implemented for the markov reward process.


In [3]:
Student_MDP.analytical_sol()    # result should be -2.3,-1.3, 2.7, 7.4, 0

array([-2.30769231, -1.30769231,  2.69230769,  7.38461538,  0.        ])