# Frozen Lake: Agent

In reinforcement learning, an agent is acting in an environment, and learning by trial-and-error to optimize its performance in order to gain maximal cumulative reward. In this notebook the Agent class is discussed.

First the "ReinforcementLearning" module is imported, which also imports packages "numpy" as "np" and "matplotlib.pyplot" as "plt". Matplotlib is set to the interactive "notebook" mode:

In [1]:
from ReinforcementLearning import *
%matplotlib notebook

A FrozenLake object is constructed to provide the agent with a deterministic environment:

In [2]:
env = FrozenLake.make(is_slippery=False)

The agent is constructed using Class "Agent". The FrozenLake environment is passed to the constructor.

In [3]:
agent = Agent(env)  # env is stored in attribute agent.env

Moving the agent in the environment is done by calling method "step":

In [4]:
env.reset()
percept = agent.step(1)
env.render()

  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG


The "step" method returns a "Percept" object, which contains the state, the action, the next state, the reward, and the "done" flag as attributes:

In [5]:
print("state: " + str(percept.state))
print("action: " + str(percept.action))
print("next state: " + str(percept.next_state))
print("reward: " + str(percept.reward))
print("done? " + str(percept.done))

state: 0
action: 1
next state: 4
reward: 0.0
done? False


In the code below, the agent is moved to the final position, and its steps are plotted: 

In [6]:
from time import sleep

env.reset()
env.plot(show_state=True)
sleep(1)  # wait one second

steps = [1, 1, 2, 2, 1, 2]
for step in steps:
    agent.step(step) 
    env.plot(show_state=True, update=True)
    sleep(1)  # wait one second

<IPython.core.display.Javascript object>

The "plot" method accepts an optional argument "update" which is "False" by default. If it is "True", the "plot" method doesn't create a new plot, but it updates the current plot. Function "sleep" is imported from package "time" and applied to slow down the execution.

Of course, the environment can be stochastic too. Class "Agent" also has other methods which will be discussed later.