# Frozen Lake ⛸️🏞️

![](https://gymnasium.farama.org/_images/frozen_lake.gif)

Este ambiente de treinamento envolve cruzar um lago congelado sem cair nos buracos congelantes. O agente deve aprender a evitar ir sempre na direção desejada por conta do gelo escorregadio.

In [1]:
import gymnasium as gym
import pandas as pd
import plotly.graph_objects as go

from lib.QLearning import QLearning
from lib.Sarsa import Sarsa

## Ambientes

In [2]:
env_4x4 = gym.make('FrozenLake-v1', map_name="4x4", is_slippery=True).env
env_8x8 = gym.make("FrozenLake-v1", map_name="8x8", is_slippery=True).env

## Parâmetros e Hiperparâmetros

In [3]:
n_of_episodes = 30_000
rolling_avg_window = int(n_of_episodes // 10)

Foram escolhidos dois casos para os hyperparametros para treinamento. O `hyperparameters_1` segue uma aprendizagem mais focada na Q-table com pouca aleatorização e menos variação entre episódios. O `hyperparameters_2` tem uma aprendizagem mais aleatorizada porém busca utilizar mais a Q-table conforme os episódios, além de dar mais ênfaze no apendizado entre cada episódio.

In [4]:
hyperparameters_1 = {
    "env": env_4x4,
    "alpha": 0.1,
    "gamma": 0.99,
    "epsilon": 0.1,
    "epsilon_min": 0.1,
    "epsilon_dec": 1,
    "episodes": n_of_episodes,
}

hyperparameters_2 = {
    "env": env_4x4,
    "alpha": 0.25,
    "gamma": 0.99,
    "epsilon": 5.0,
    "epsilon_min": 0.01,
    "epsilon_dec": 0.99,
    "episodes": n_of_episodes,
}

## Treinamento

### Q-Learning

In [None]:
q_learning_qtable_1, q_learning_rewards_1, q_learning_actions_1 = QLearning(**hyperparameters_1).train()
q_learning_qtable_2, q_learning_rewards_2, q_learning_actions_2 = QLearning(**hyperparameters_2).train()

### SARSA

In [None]:
sarsa_qtable_1, sarsa_rewards_1, sarsa_actions_1 = Sarsa(**hyperparameters_1).train()
sarsa_qtable_2, sarsa_rewards_2, sarsa_actions_2 = Sarsa(**hyperparameters_2).train()

### Análise de desempenho

In [None]:
q_learning_rewards_1["Rewards_avg"] = q_learning_rewards_1["Rewards"].rolling(window=rolling_avg_window).mean()
q_learning_rewards_2["Rewards_avg"] = q_learning_rewards_2["Rewards"].rolling(window=rolling_avg_window).mean()

In [None]:
sarsa_rewards_1["Rewards_avg"] = sarsa_rewards_1["Rewards"].rolling(window=rolling_avg_window).mean()
sarsa_rewards_2["Rewards_avg"] = sarsa_rewards_2["Rewards"].rolling(window=rolling_avg_window).mean()

### Plotagem

In [None]:
fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=q_learning_rewards_1["Episodes"],
        y=q_learning_rewards_1["Rewards_avg"],
        name="<br>Q-Learning<br>hyperparameters 1<br>",
    )
)

fig.add_trace(
    go.Scatter(
        x=sarsa_rewards_1["Episodes"],
        y=sarsa_rewards_1["Rewards_avg"],
        name="<br>Sarsa<br>hyperparameters 1<br>",
    )
)

fig.add_trace(
    go.Scatter(
        x=q_learning_rewards_2["Episodes"],
        y=q_learning_rewards_2["Rewards_avg"],
        name="<br>Q-Learning<br>hyperparameters 2<br>",
    )
)

fig.add_trace(
    go.Scatter(
        x=sarsa_rewards_2["Episodes"],
        y=sarsa_rewards_2["Rewards_avg"],
        name="<br>Sarsa<br>hyperparameters 2<br>",
    )
)

fig.update_layout(
    title="Comparação entre Q-Learning e SARSA com diferentes hiperparâmetros",
    xaxis_title="Episodes",
    yaxis_title="Rewards",
    template="presentation",
    showlegend=True,
    width=1_200,
    height=700,
)

fig.show()

### Registro de saída

In [None]:
q_learning_qtable_2.save_txt("results/best_q_learning.csv")
sarsa_qtable_2.savet_xt("results/best_sarsa.csv")
fig.write_image("results/Q-Learning_x_SARSA-hiperparametros.svg")

## Testes

In [19]:
n_tests = 100
n_tests_2 = 100

In [30]:
best_q_learning = QLearning(**hyperparameters_2).load_txt("results/best_q_learning.csv")

tests_q_learning = []

for _ in range(n_tests):
    test_q_learning = pd.Series(
        [
            best_q_learning.test()
            for _ in range(n_tests_2)
        ]
    )

    counts = test_q_learning.value_counts(normalize=True)
    hits = counts[True] * 100

    tests_q_learning.append(hits)

tests_q_learning_describe = pd.Series(tests_q_learning).describe()
tests_q_learning_describe

count    100.000000
mean      77.950000
std        3.952687
min       68.000000
25%       75.000000
50%       78.000000
75%       81.000000
max       91.000000
dtype: float64

In [31]:
best_sarsa = Sarsa(**hyperparameters_2).load_txt("results/best_sarsa.csv")

tests_sarsa = []

for _ in range(n_tests):
    test_sarsa = pd.Series(
        [
            best_sarsa.test()
            for _ in range(n_tests_2)
        ]
    )

    counts = test_sarsa.value_counts(normalize=True)
    hits = counts[True] * 100

    tests_sarsa.append(hits)

tests_sarsa_describe = pd.Series(tests_sarsa).describe()
tests_sarsa_describe

count    100.000000
mean      82.860000
std        3.967087
min       72.000000
25%       80.000000
50%       83.000000
75%       85.250000
max       92.000000
dtype: float64

In [34]:
fig = go.Figure()

fig.add_trace(
    go.Histogram(
        x=tests_q_learning,
        xbins=dict(
            start=0,
            end=100,
            size=1,
        ),
        name="<br>Q-Learning<br>",
    )
)

fig.add_trace(
    go.Histogram(
        x=tests_sarsa,
        xbins=dict(
            start=0,
            end=100,
            size=1,
        ),
        name="<br>Sarsa<br>",
    )
)

fig.update_layout(
    title=f"Distribuição de acertos em {n_tests} testes",
    xaxis_title=f"% Acertos em {n_tests_2} episódios",
    yaxis_title="Número de ocorrências",
    template="presentation",
    showlegend=True,
    width=1_200,
    height=700,
)

# Show the plot
fig.show()

In [35]:
fig.write_image("results/Q-Learning_x_SARSA-tests.svg")

## Conclusão

Tanto o agente Q-Learning como o Sarsa obtiveram um desempenho semelhante durante o treinamento e mostram resultados significativamente similares. No entanto, na etapa de testes, para 100 testes com 100 episódios cada, o Sara apresentou uma média de acertos por testes superior ao Q_Learning.