# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [None]:
!pip install "gymnasium[toy-text]"

In [1]:
import gymnasium as gym
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
import pandas as pd
import random

In [2]:
# Make maze
maze = [
    "SFFFFHFFHH",
    "FFFFFFFFFF",
    "FFHFFFFFFH",
    "FHFFFHFFFF",
    "FFFFFHHFFF",
    "HFHFFFFFHF",
    "FFFFHFFFHF",
    "FHFFFFFFHF",
    "FFHFFFFFGH",
    "FFHFFHHFFH"
]

env = gym.make('FrozenLake-v1', desc=maze, render_mode='human')
initial_state = env.reset()

env.render()

num_states = env.observation_space.n
num_actions = env.action_space.n
q_table = [[0 for _ in range(num_actions)] for _ in range(num_states)]

alpha = 0.5
gamma = 0.9
epsilon = 1.0
epsilon_decay = 0.999
min_epsilon = 0.01

rewards = {
    'S': 5, 
    'F': -1,
    'H': 0, 
    'G': 10
}

<h4>Ice - Lose 2 points<br>
Empty space - Lose one point<br>
Small gift - Gain five points<br>
Big gift - Gain ten points, end game</h4>

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [None]:
# Train model here.
for episode in range(1000):
    state = env.reset()[0]
    terminated = False

    while not terminated:
        if random.uniform(0, 1) < epsilon:
            action = env.action_space.sample()
        else:
            action = max(range(num_actions), key=lambda a: q_table[state][a])

        new_state, reward, terminated, truncated, info = env.step(action)

        row = new_state // len(maze[0])
        col = new_state % len(maze[0])
        tile_type = maze[row][col] 

        reward = rewards.get(tile_type, 0) 

        if tile_type in ['H', 'G']:
            terminated = True

        old_value = q_table[state][action]
        next_max = max(q_table[new_state])
        new_value = (1 - alpha) * old_value + alpha * (reward + gamma * next_max)
        q_table[state][action] = new_value

        state = new_state

    epsilon = max(min_epsilon, epsilon * epsilon_decay)
# Don't forget to display your final Q table!

In [None]:
df = pd.DataFrame(q_table, columns=["Left", "Down", "Right", "Up"])
df.head()

In [None]:
env.close()

### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [None]:
# Test model here.
successes = 0
total_steps = 0

for episode in range(1000):
    state = env.reset()[0]
    terminated = False
    steps = 0
    
    while not terminated:
        action_values = [maze[3][state], maze[1][state], maze[0][state], maze[2][state]]
        action = action_values.index(max(action_values)) 
        
        new_state, reward, terminated, truncated, info = env.step(action)
        steps += 1
        
        if reward == 10:
            successes += 1
            break
        
        state = new_state
    
    total_steps += steps

success_rate = (successes / 1000) * 100
average_steps = total_steps / 1000

print(f"Success rate: {success_rate}%")
print(f"Average steps taken: {average_steps}")

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.