## Hardness analysis

$\texttt{Colosseum}$ allows to empirically investigate the measures of hardness
in the four scenarios (see the paper for additional details) automatically for
MDP families that are implemented in the pacakge.
Note, however, that calculating the measures of hardness for customly specified
MDPs is particularly simple thanks to the $\texttt{Custom}$ class.

In the following, we show how to reproduce the empirical investigation for
the $\texttt{RiverSwim}$ MDP family and how to calculate the measures of hardness for a $\texttt{MiniGridEmpty}$ MDP and a custom MDP.

In [None]:
import numpy as np
import seaborn as sns
from scipy.stats import beta

from colosseum.experiments.increasing_size import moving_sizes
from colosseum.experiments.increasing_plazy import moving_plazy
from colosseum.experiments.increasing_prand import moving_prand
from colosseum.mdps.custom import CustomEpisodic
from colosseum.mdps.minigrid_empty import MiniGridEmptyContinuous

from colosseum.mdps.river_swim import RiverSwimEpisodic

sns.set_style()

**Scenario 1**. We vary the probability $\texttt{p_rand}$ that an MDP executes a random action instead of the action selected by an agent.

In [None]:
moving_prand(
    mdp_class=RiverSwimEpisodic,
    prandom=np.linspace(0.0001, 0.6, 10),
    mdp_kwargs=dict(size=10),
    # We investigate RiverSwim with different values of the p_lazy parameter and
    # fixed chain length of ten.
    n_seeds=1,
    # We do not need more than one seed for this MDP family since the MDP is a
    # deterministic function of the parameters. Note that this is not the case
    # for othe families such as the MiniGrid ones or the FrozenLake.
    approximate_regret=False,
    # We don't compute the cumulative regret of the tuned near-optimal agent in
    # this tutorial for brevity's sake.
    save_folder=None,
    # We don't need to save the results of the analysis.
);

**Scenario 2**. We vary the probability $\texttt{p_lazy}$ that an MDP stays in the same state instead of executing the action selected by an agent.

In [None]:
moving_plazy(
    mdp_class=RiverSwimEpisodic,
    prandom=np.linspace(0.0001, 0.6, 10),
    mdp_kwargs=dict(size=10),
    # We investigate RiverSwim with different values of the p_rand parameter and
    # fixed chain length of ten.
    n_seeds=1,
    # We do not need more than one seed for this MDP family since the MDP is a
    # deterministic function of the parameters. Note that this is not the case
    # for othe families such as the MiniGrid ones or the FrozenLake.
    approximate_regret=False,
    # We don't compute the cumulative regret of the tuned near-optimal agent in
    # this tutorial for brevity's sake.
    save_folder=None,
    # We don't need to save the results of the analysis.
);

**Scenario 3**. We vary the number of states across MDPs from the same family.

In [None]:
moving_sizes(
    mdp_class=RiverSwimEpisodic,
    sizes=np.linspace(5, 20, 6).astype(int),
    p_rand=None,
    # We investigate RiverSwim with different chain lengths with fixed p_rand.
    n_seeds=1,
    # We do not need more than one seed for this MDP family since the MDP is a
    # deterministic function of the parameters. Note that this is not the case
    # for othe families such as the MiniGrid ones or the FrozenLake.
    approximate_regret=False,
    # We don't compute the cumulative regret of the tuned near-optimal agent in
    # this tutorial for brevity's sake.
    save_folder=None
    # We don't need to save the results of the analysis.
);

**Scenario 4**. We vary the number of states across MDPs from the same family with $\texttt{p_rand}=0.1$.

In [None]:
moving_sizes(
    mdp_class=RiverSwimEpisodic,
    sizes=np.linspace(5, 20, 6).astype(int),
    p_rand=0.1,
    # We investigate RiverSwim with different chain lengths with fixed p_rand.
    n_seeds=1,
    # We do not need more than one seed for this MDP family since the MDP is a
    # deterministic function of the parameters. Note that this is not the case
    # for othe families such as the MiniGrid ones or the FrozenLake.
    approximate_regret=False,
    # We don't compute the cumulative regret of the tuned near-optimal agent in
    # this tutorial for brevity's sake.
    save_folder=None
    # We don't need to save the results of the analysis.
);

### $\texttt{MiniGridEmpty}$ MDP hardness

In [None]:
mdp = MiniGridEmptyContinuous(seed=0, size=5, lazy=0.1, random_action_p=None)
print(mdp.measures_of_hardness)
print(mdp.communication_type)

### $\texttt{Custom}$ MDP hardness

In [None]:
num_states = 4
num_actions = 2
T = [
    [[0.0, 1.00, 0.00, 0.0], [0.0, 0.0, 1.0, 0.0]],
    [[0.0, 0.00, 0.50, 0.5], [0.0, 0.8, 0.1, 0.1]],
    [[0.0, 0.50, 0.00, 0.5], [0.0, 0.1, 0.8, 0.1]],
    [[0.5, 0.25, 0.25, 0.0], [0.1, 0.1, 0.1, 0.7]],
]
np.random.seed(42)
R = {
    (s, a): beta(np.random.uniform(0, 30), np.random.uniform(0, 30))
    for s in range(num_states)
    for a in range(num_actions)
}
# R = np.random.randn(num_states, num_actions)  (FOR DETERMINISTIC REWARDS)
T_0 = {0: 1.0}
mdp = CustomEpisodic(
    seed=42,
    T_0=T_0,
    T=np.array(T),
    R=R,
    lazy=None,
    random_action_p=None,
    force_single_thread=True,
)

print(mdp.measures_of_hardness)
print(mdp.communication_type)