# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [6]:
!pip install gymnasium

Defaulting to user installation because normal site-packages is not writeable
Collecting pygame>=2.1.3
  Using cached pygame-2.6.1-cp39-cp39-win_amd64.whl (10.6 MB)
Installing collected packages: pygame
Successfully installed pygame-2.6.1


In [95]:
import gymnasium as gym
import pandas as pd
import random
maze = [
    "SFFFFFFFHF",
    "FFFFFFFFHF",
    "FFFHFFFFHF",
    "FHFFFFFHFF",
    "FFFFHFHFFF",
    "FFFHFFHFFF",
    "FFFFFFFHFF",
    "FHFFFFFHFF",
    "FFHFFFFHFF",
    "FFFFFFFGFF"
]
env = gym.make('FrozenLake-v1', desc=maze)
initial_state = env.reset()
new_state, reward, terminated, truncated, info = env.step(action)
env.render()


  gym.logger.warn(


In [96]:
env.reset()

(0, {'prob': 1})

In [101]:
env.close()

#### The goal will be +50 score, a hole will be -50, and an empty space is -1.

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [97]:
import pandas as pd
q = {
    3: [0] * 100,     
    1: [0] * 100,   
    0: [0] * 100,    
    2: [0] * 100   
}
cells = ["S", "F", "F", "F", "F", "F", "F", "F", "H", "F",
         "F", "F", "F", "F", "F", "F", "F", "F", "H", "F",
         "F", "F", "F", "H", "F", "F", "F", "F", "H", "F",
         "F", "H", "F", "F", "F", "F", "F", "H", "F", "F",
         "F", "F", "F", "F", "H", "F", "H", "F", "F", "F",
         "F", "F", "F", "H", "F", "F", "H", "F", "F", "F",
         "F", "F", "F", "F", "F", "F", "F", "H", "F", "F",
         "F", "H", "F", "F", "F", "F", "F", "H", "F", "F",
         "F", "F", "H", "F", "F", "F", "F", "H", "F", "F",
         "F", "F", "F", "F", "F", "F", "F", "G", "F", "F"]

In [98]:
def updateQ(q, alpha, gamma, step, cell, reward):
    row = [q[3][cell], q[1][cell], q[0][cell], q[2][cell]]
    bell = (1-alpha)*(q[step][cell]) + alpha*(reward + (gamma*max(row)))
    q[step][cell] = bell
        

In [99]:
# Train Q-Model
# Learning Rate - 0.5, Discount Rate - 0.5
# Reward: +100 gift, -1 empty space, -100 lake
# Belman Equation: (1-alpha)q(s,a) + alpha(R + gamma(max(s`, a`)))

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
for episode in range(10000):
    while not terminated:
        # Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
        action = random.randint(0, 3)
        new_state, reward, terminated, truncated, info = env.step(action)
        idx = new_state % len(q[3])
        if cells[new_state] == "F" or cells[new_state] == "S":
            reward = -1
        elif cells[new_state] == "H":
            reward = -100
        else:
            reward = 100
        updateQ(q, 0.5, 0.5, action, idx, reward)
    inital = env.reset()
    terminated = False
    

In [100]:
df = pd.DataFrame(q)
df.head()

Unnamed: 0,3,1,0,2
0,-2.0,-2.0,-2.0,-2.0
1,-2.0,-2.0,-2.0,-2.0
2,-2.0,-2.0,-2.0,-2.0
3,-2.0,-2.0,-2.0,-2.0
4,-2.0,-2.0,-2.0,-2.0


### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [102]:
env = gym.make('FrozenLake-v1', desc=maze, render_mode='human')
initial_state = env.reset()
env.render()
action = 2  
new_state, reward, terminated, truncated, info = env.step(action)
for step in range(13):
    action = np.argmax(Q_table[state, :])  # Choose best action from Q-table
    new_state, reward, terminated, truncated, _ = env.step(action)
        
    state = new_state
    steps += 1
env.render()
env.close()

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.