# Dungeon example

## Problem description
Represents a dungeon where from each room you can always go LEFT or RIGHT.

There are 4 states:
- EMPTY ROOM (reward 0)
- MONSTER ROOM (reward -50)
- TREASURE ROOM (reward 10)
- EXIT (out of game)

LEFT brings to EMPTY, MONSTER, TREASURE, OUT with p = alpha (array over actions)

RIGHT brings to EMPTY, MONSTER, TREASURE, OUT with p = beta (array over actions)


In [1]:
from dungeon import DungeonMDP

In [2]:
mdp = DungeonMDP()

## Model

In [3]:
from utils import transitions_table, mdp_to_graph, plot_mdp

In [4]:
T = transitions_table(mdp)

In [5]:
T

Unnamed: 0,from_state,action,to_state,reward,probability
0,E,L,E,0,0.5
1,E,L,T,10,0.3
2,E,L,M,-50,0.1
3,E,L,O,0,0.1
4,E,R,E,0,0.5
5,E,R,T,10,0.1
6,E,R,M,-50,0.3
7,E,R,O,0,0.1
8,T,L,E,0,0.5
9,T,L,T,10,0.3


In [6]:
net = plot_mdp(mdp_to_graph(mdp))
net.show('dungeon.html')

## Value iteration

In [7]:
from IPython.display import display, clear_output
from algorithms import value_iteration
from utils import show_value_iterations

In [8]:
optimal_value, optimal_policy, value_history, policy_history = value_iteration(mdp=mdp, epsilon=1e-10)

In [9]:
optimal_value, optimal_policy

({'E': -7.1428571425822645,
  'T': -7.1428571425822645,
  'M': -7.1428571425822645,
  'O': 0.0},
 {'E': 'L', 'T': 'L', 'M': 'L', 'O': None})

In [10]:
show_value_iterations(value_history, policy_history)

Unnamed: 0,S,V,A
0,E,-7.142337,L
1,T,-7.142337,L
2,M,-7.142337,L
3,O,0.0,


quit


## Policy iteration

In [11]:
from algorithms import policy_iteration
from utils import show_policy_iterations

In [12]:
pi, pi_history = policy_iteration(mdp)

In [13]:
pi.actions

{'E': 'L', 'T': 'L', 'M': 'L', 'O': 'R'}

In [14]:
show_policy_iterations(pi_history)

Unnamed: 0,S,V,A
0,E,-7.142857,L
1,T,-7.142857,L
2,M,-7.142857,L
3,O,0.0,R



