# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [1]:
! pip install gymnasium
import gymnasium as gym
! pip install pygame
import random
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [52]:
maze=["SFFFHFHFFH", "FHFHFHFHFH", "FFFFFFFFHH", "HFHHFFFFHF", "FFFFFHFFHH", "HHFFHFFFHF", "FFFHHFHFHF", "HFFFHFHFHH", "FFFFFHFFFH", "HFFFFFGFHF"]
maze = [list(row) for row in maze]
env = gym.make('FrozenLake-v1', desc=maze, render_mode='rgb_array', is_slippery=False)
initial_state = env.reset()
env.render()

array([[[180, 200, 230],
        [180, 200, 230],
        [180, 200, 230],
        ...,
        [180, 200, 230],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[180, 200, 230],
        [204, 230, 255],
        [204, 230, 255],
        ...,
        [180, 200, 230],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[180, 200, 230],
        [235, 245, 249],
        [204, 230, 255],
        ...,
        [180, 200, 230],
        [  0,   0,   0],
        [  0,   0,   0]],

       ...,

       [[180, 200, 230],
        [180, 200, 230],
        [180, 200, 230],
        ...,
        [180, 200, 230],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        ...,
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        ...,
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]]

In [60]:
env.close()

#### Describe your reward system here.
# EMPTY = -1
# FROZEN LAKE = -100000000000000(-100)
# GOAL = INFINITE POINTS (like +100 or something)

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [44]:
num_states = env.observation_space.n
num_actions = env.action_space.n
Q = {state: [0] * num_actions for state in range(num_states)}

In [45]:
def getReward(state):
    row = state // len(maze[0])
    col = state % len(maze[0])  
    cell_type = maze[row][col]
    if cell_type == "G":
        return 100
    elif cell_type == "H":
        return -100
    else:
        return -1

In [46]:
def updateQTable(q, alpha, gamma, current_state, next_state, action):
    current_q = q[current_state][action]
    reward = getReward(next_state)
    next_max_q = max(q[next_state])       
    new_q = ((1 - alpha) * current_q) + (alpha * (reward + (gamma * next_max_q)))  # Bellman equation
    q[current_state][action] = new_q   

In [59]:
state = 0
alpha = 0.4
gamma = 0.6
for episode in range(1000):
    state = env.reset()[0]
    terminated = False
    while not terminated:
        action = random.randint(0, 3)
        new_state, reward, terminated, truncated, info = env.step(action)
        updateQTable(Q, alpha, gamma, state, new_state, action)
        state = new_state

KeyboardInterrupt: 

In [58]:
env = gym.make('FrozenLake-v1', desc=maze, render_mode='human', is_slippery=False)
state = env.reset()[0]
terminated = False
total_reward = 0
while not terminated:
    action = max(range(num_actions), key=lambda a: Q[state][a])
    state, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    env.render()

KeyboardInterrupt: 

In [None]:
df = pd.DataFrame(Q)
df = df.T
df.columns = ["Left", "Down", "Right", "Up"]
df

In [57]:
env.close()
df.to_csv('final_q_values_2.csv', index=False)

### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [42]:
# Test model here.

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.