# Mastering FrozenLake-v1 with Q-Learning ⛄

<img src='https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/thumbnail.jpg' alt='Unit 2 Thumbnail'>

Welcome to this hands-on tutorial where you'll code your first Reinforcement Learning agent from scratch! We will use the **Q-Learning** algorithm to train an agent to solve the `FrozenLake-v1` environment. By the end, you'll have a trained model, a video of it in action, and you'll even publish it to the Hugging Face Hub.

### 🎯 Project Objectives

- **Understand Q-Learning**: Grasp the theory behind Q-tables, actions, states, and rewards.
- **Implement from Scratch**: Code the core components of the Q-Learning algorithm using Python and NumPy.
- **Use Gymnasium**: Learn to interact with standard RL environments using the Gymnasium library.
- **Train & Evaluate**: Run the training loop and evaluate the agent's performance.
- **Share Your Work**: Push your trained agent to the Hugging Face Hub with a model card and video replay.

## Step 1: Setup and Installations

First, we need to install the necessary libraries. In a cloud environment like Google Colab, we also need to set up a virtual display to render the game and create a video.

In [None]:
!pip install numpy gymnasium pygame imageio tqdm pickle5 huggingface_hub pyvirtualdisplay > /dev/null 2>&1
!sudo apt-get update > /dev/null 2>&1
!sudo apt-get install -y python3-opengl > /dev/null 2>&1
!apt install ffmpeg xvfb > /dev/null 2>&1

In [None]:
After installation, we start the virtual display. **If you are running this on Google Colab, you might need to restart the runtime after the installations.**

from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

## Step 2: Import Libraries and Utilities

Now, let's import the libraries we'll use throughout the notebook. We also import our custom helper functions from `utils.py`.

In [None]:
import numpy as np
import gymnasium as gym
import random
from tqdm.notebook import tqdm

# Import helper functions
# If running locally without utils.py, you would define these functions here
from utils import evaluate_agent, push_to_hub

## Step 3: The Environment - FrozenLake-v1 ❄️

Let's create and understand the `FrozenLake-v1` environment.

👉 **Documentation**: [FrozenLake-v1](https://gymnasium.farama.org/environments/toy_text/frozen_lake/)

The agent's goal is to navigate from the start (S) to the goal (G) on a grid of frozen tiles (F), avoiding holes (H).

- `map_name="4x4"`: A 4x4 grid.
- `is_slippery=False`: The agent's movement is deterministic (it always moves in the intended direction).
- `render_mode="rgb_array"`: Required to capture frames for our video replay.

In [None]:
env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery=False, render_mode="rgb_array")

### Understanding the State and Action Spaces

In [None]:
print("Observation Space (State Space):", env.observation_space)
state_space = env.observation_space.n
print(f"There are {state_space} possible states\n")

print("Action Space:", env.action_space)
action_space = env.action_space.n
print(f"There are {action_space} possible actions")

- **State Space**: `Discrete(16)` means there are 16 states, one for each tile on the 4x4 grid.
- **Action Space**: `Discrete(4)` means there are 4 possible actions:
  - 0: `LEFT`
  - 1: `DOWN`
  - 2: `RIGHT`
  - 3: `UP`

## Step 4: Building the Q-Learning Algorithm

<img src='https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit3/Q-learning-2.jpg' alt='Q-Learning Pseudocode' width='800'/>

### 4.1. Initialize the Q-Table
The Q-Table stores the expected future rewards for each action in each state. We initialize it with all zeros.

In [None]:
def initialize_q_table(state_space, action_space):
    Qtable = np.zeros((state_space, action_space))
    return Qtable

q_table = initialize_q_table(state_space, action_space)
print("Q-Table Shape:", q_table.shape)

### 4.2. Define the Epsilon-Greedy Policy
This policy decides the agent's action at each step. It balances between **exploitation** (choosing the best-known action) and **exploration** (choosing a random action) to discover new strategies.

In [None]:
def epsilon_greedy_policy(Qtable, state, epsilon):
    if random.uniform(0, 1) > epsilon:
        # Exploit: choose the action with the highest Q-value
        action = np.argmax(Qtable[state][:])
    else:
        # Explore: choose a random action
        action = env.action_space.sample()
    return action

## Step 5: Define Hyperparameters and Training Loop

Hyperparameters are the settings we provide to the learning algorithm. They can significantly affect performance.

In [None]:
# Training parameters
n_training_episodes = 10000
learning_rate = 0.7

# Environment parameters
env_id = "FrozenLake-v1"
max_steps = 99
gamma = 0.95

# Exploration parameters
max_epsilon = 1.0
min_epsilon = 0.05
decay_rate = 0.0005

### 5.1. The Training Function
This function implements the main Q-Learning loop described in the pseudocode above.

In [None]:
def train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, Qtable):
    for episode in tqdm(range(n_training_episodes)):
        epsilon = min_epsilon + (max_epsilon - min_epsilon) * np.exp(-decay_rate * episode)
        state, info = env.reset()
        terminated = False
        truncated = False

        for step in range(max_steps):
            action = epsilon_greedy_policy(Qtable, state, epsilon)
            new_state, reward, terminated, truncated, info = env.step(action)

            # Q-Table update rule
            Qtable[state][action] = Qtable[state][action] + learning_rate * (reward + gamma * np.max(Qtable[new_state]) - Qtable[state][action])

            if terminated or truncated:
                break

            state = new_state
    return Qtable

## Step 6: Train the Agent

Now, let's call the `train` function to start training our agent!

In [None]:
trained_q_table = train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, q_table)

## Step 7: Evaluate the Agent

After training, we evaluate the agent's performance. We use a deterministic policy (no exploration) to see how well it performs.

In [None]:
n_eval_episodes = 100
mean_reward, std_reward = evaluate_agent(env, max_steps, n_eval_episodes, trained_q_table, seed=None)
print(f"Mean Reward = {mean_reward:.2f} +/- {std_reward:.2f}")

## Step 8: Publish to Hugging Face Hub 🔥

Finally, let's share our trained model with the community!

1. **Login to your Hugging Face account.** You'll need a token with `write` permissions.

In [None]:
from huggingface_hub import notebook_login
notebook_login()

2. **Package the model and hyperparameters.**

In [None]:
model = {
    "env_id": env_id,
    "max_steps": max_steps,
    "n_training_episodes": n_training_episodes,
    "n_eval_episodes": n_eval_episodes,
    "eval_seed": [],
    "learning_rate": learning_rate,
    "gamma": gamma,
    "max_epsilon": max_epsilon,
    "min_epsilon": min_epsilon,
    "decay_rate": decay_rate,
    "qtable": trained_q_table
}

3. **Push to the Hub!**
   - Replace `<your-username>` with your Hugging Face username.
   - Create a unique repository name.

In [None]:
username = "<your-username>" # FILL THIS
repo_name = "q-FrozenLake-v1-4x4-noSlippery"
repo_id = f"{username}/{repo_name}"

push_to_hub(repo_id, model, env)

## Conclusion and Next Steps

Congratulations! You have successfully built, trained, and published a Q-Learning agent. You can now try:

- **Experimenting with hyperparameters**: Change `learning_rate`, `gamma`, or the `epsilon` decay schedule.
- **Using the slippery version**: Set `is_slippery=True` to make the environment stochastic and more challenging.
- **Trying a larger map**: Use `map_name="8x8"`.

This project forms a solid foundation for tackling more complex problems in reinforcement learning.