# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [3]:
pip install gymnasium[toy-text]

Collecting gymnasium[toy-text]Note: you may need to restart the kernel to use updated packages.

  Downloading gymnasium-1.1.0-py3-none-any.whl.metadata (9.4 kB)
Collecting farama-notifications>=0.0.1 (from gymnasium[toy-text])
  Downloading Farama_Notifications-0.0.4-py3-none-any.whl.metadata (558 bytes)
Collecting pygame>=2.1.3 (from gymnasium[toy-text])
  Downloading pygame-2.6.1-cp312-cp312-win_amd64.whl.metadata (13 kB)
Downloading Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Downloading pygame-2.6.1-cp312-cp312-win_amd64.whl (10.6 MB)
   ---------------------------------------- 0.0/10.6 MB ? eta -:--:--
   ------ --------------------------------- 1.8/10.6 MB 10.1 MB/s eta 0:00:01
   ------------ --------------------------- 3.4/10.6 MB 9.2 MB/s eta 0:00:01
   -------------------- ------------------- 5.5/10.6 MB 9.6 MB/s eta 0:00:01
   ------------------------------ --------- 8.1/10.6 MB 10.3 MB/s eta 0:00:01
   ---------------------------------------  10.5/10.6 MB 10.6 MB/

In [4]:
import gymnasium as gym

In [5]:
# Make maze
env = gym.make('FrozenLake-v1', render_mode='human')
initial_state = env.reset()

env.render()

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
action = 2
new_state, reward, terminated, truncated, info = env.step(action)

env.render()

In [6]:
env.close()

In [7]:
import pygame
print(pygame.__version__)


2.6.1


In [9]:
desc = [
    "SFFFHFFFFF",
    "FFFFFFFFFF",
    "FFFHFFFFFF",
    "FFFFFHFFFF",
    "FFFHFFFFFF",
    "FHHFFFHFFF",
    "FHFFHFHFFF",
    "FFFHFFFFFF",
    "FFFFFFFFFF",
    "FFFFHFFFFG"
]

In [10]:
import gymnasium as gym

# Define your custom map description
desc = [
    "SFFFHFFFFF",
    "FFFFFFFFFF",
    "FFFHFFFFFF",
    "FFFFFHFFFF",
    "FFFHFFFFFF",
    "FHHFFFHFFF",
    "FHFFHFHFFF",
    "FFFHFFFFFF",
    "FFFFFFFFFF",
    "FFFFHFFFFG"
]

# Create the environment using your custom map
env = gym.make('FrozenLake-v1', desc=desc, render_mode='human')

# Reset the environment
initial_state = env.reset()

# Render the initial state
env.render()

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
action = 2  # Move RIGHT
new_state, reward, terminated, truncated, info = env.step(action)

# Render the new state after action
env.render()

# Close the environment
env.close()


###### Describe your reward system here.

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [None]:
import gymnasium as gym
import numpy as np
import pandas as pd
import random

# Create the environment with the custom map
desc = [
    "SFFFHFFFFF",
    "FFFFFFFFFF",
    "FFFHFFFFFF",
    "FFFFFHFFFF",
    "FFFHFFFFFF",
    "FHHFFFHFFF",
    "FHFFHFHFFF",
    "FFFHFFFFFF",
    "FFFFFFFFFF",
    "FFFFHFFFFG"
]
env = gym.make('FrozenLake-v1', desc=desc, render_mode='human')

# Initialize parameters
alpha = 0.8       # Learning rate
gamma = 0.95      # Discount factor
epsilon = 0.1     # Exploration rate
episodes = 10000  # Number of training episodes
max_steps = 100   # Max steps per episode

# Initialize the Q-table (states x actions)
n_actions = env.action_space.n
n_states = env.observation_space.n
Q_table = np.zeros((n_states, n_actions))

# Function to choose an action using epsilon-greedy strategy
def choose_action(state):
    if random.uniform(0, 1) < epsilon:
        return env.action_space.sample()  # Exploration: choose random action
    else:
        return np.argmax(Q_table[state])  # Exploitation: choose best action from Q-table

# Training loop
for episode in range(episodes):
    state, _ = env.reset()  # Reset the environment at the start of each episode
    terminated, truncated = False, False

    for step in range(max_steps):
        action = choose_action(state)
        new_state, reward, terminated, truncated, info = env.step(action)

        # Update the Q-table using the Q-learning formula
        Q_table[state, action] = Q_table[state, action] + alpha * (reward + gamma * np.max(Q_table[new_state]) - Q_table[state, action])

        # Transition to the new state
        state = new_state

        # End the episode if terminated or truncated
        if terminated or truncated:
            break

# Display the final Q-table
Q_table_df = pd.DataFrame(Q_table)
print("Final Q-table after training:")
print(Q_table_df)


### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [None]:
import numpy as np

# Initialize testing parameters
test_episodes = 1000
total_rewards = []
successful_episodes = 0
max_steps = 100

# Function to evaluate the trained Q-table
def test_model():
    global successful_episodes
    for episode in range(test_episodes):
        state, _ = env.reset()
        total_reward = 0
        terminated, truncated = False, False
        
        for step in range(max_steps):
            action = np.argmax(Q_table[state])  # Choose the best action based on Q-table
            new_state, reward, terminated, truncated, info = env.step(action)
            total_reward += reward
            state = new_state
            
            if terminated or truncated:
                break
        
        total_rewards.append(total_reward)
        if total_reward > 0:  # Assuming reaching the goal gives a reward
            successful_episodes += 1

# Run the test
test_model()

# Calculate and display performance metrics
success_rate = successful_episodes / test_episodes
average_reward = np.mean(total_rewards)

print(f"Success Rate: {success_rate * 100:.2f}%")
print(f"Average Reward per Episode: {average_reward:.2f}")


### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Takes a specific path based on Q-Learning

#### My code requires a lot of adjustment, so it did not perform well. 