## Agent MDP interaction

The MDPLoop class handles the agent/MDP interaction and can be used to display the cumulative regret and the cumulative reward of an agent implemented in $\texttt{Colosseum}$ and of a agent that chooses actions at random, which serves as a simple baseline.

### $\texttt{PSRL}$ agent on a $\texttt{FrozenLake}$ MDP instance.

In [1]:
import matplotlib.pyplot as plt

from colosseum.agents.bayes_tools.conjugate_rewards import RewardsConjugateModel
from colosseum.agents.bayes_tools.conjugate_transitions import TransitionsConjugateModel
from colosseum.agents.episodic.psrl import PSRLEpisodic
from colosseum.experiments.experiment import MDPLoop
from colosseum.mdps.frozen_lake import FrozenLakeEpisodic
from colosseum.utils.acme.in_memory_logger import InMemoryLogger
from colosseum.utils.acme.specs import make_environment_spec

In [2]:
T = 20_000

In [None]:
mdp = FrozenLakeEpisodic(seed=42, size=4, p_frozen=0.8)
agent = PSRLEpisodic(
    environment_spec=make_environment_spec(mdp),
    seed=42,
    H=mdp.H,
    r_max=mdp.r_max,
    T=T,
    reward_prior_model=RewardsConjugateModel.N_NIG,
    transitions_prior_model=TransitionsConjugateModel.M_DIR,
    rewards_prior_prms=[0.33, 1, 1, 1],
    transitions_prior_prms=[0.017],
)

In [None]:
loop = MDPLoop(
    mdp, agent, logger=InMemoryLogger()
)  # or CSVLogger() to save the logs in csv files.
loop.run(T=T, verbose=True, log_every=10)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(3 * 6, 6))
loop.plot(["cumulative_regret", "random_cumulative_regret"], ax1)
loop.plot(["normalized_cumulative_regret", "random_normalized_cumulative_regret"], ax2)
loop.plot(["cumulative_return", "random_cumulative_return"], ax3)
plt.tight_layout()
plt.show()