# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [1]:
import gymnasium as gym
import random
import pandas as pd
import pickle

def save_pickle(file, value):
    pickle.dump(value, open(file, "wb"))
def load_pickle(file):
    return pickle.load(open(file, "rb"))

In [2]:
# Q-table this is basiclly the instruction set for the "goodness" of each move in each square


def generateMaze(size, lakes):
    ## GENERATE MAZE
    maze = []
    mazePart = []
    for x in range(lakes):
        mazePart.append("H")
    for x in range((size[0] * size[1]) - lakes - 2):
        mazePart.append("F")

    for x in range(10):
        random.shuffle(mazePart)
    ## FINALIZE MAZE
    mazePart.insert(0, "S")
    mazePart.append("G")

    ## MAKE MAZE
    count = 0
    for y in range(size[1]):
        mazeListPart = ""
        for x in range(size[0]):
            mazeListPart += mazePart[count]
            count += 1
        maze.append(mazeListPart)

    ## MAKE Q TABLE
    qTable = []
    for x in range(size[0] * size[1]):
        qTable.append([0,0,0,0])

    return maze, qTable



maze, qTable = generateMaze([10, 10], 5)


mazeStr = ""
for x in maze:
    mazeStr += x

save_pickle("maze.p", maze)
save_pickle("qTable.p", qTable)
save_pickle("mazeString.p", mazeStr)


In [3]:
maze = load_pickle("maze.p")
qTable = load_pickle("qTable.p")
mazeStr = load_pickle("mazeString.p")

# Make maze
env = gym.make('FrozenLake-v1', desc=maze, is_slippery=False)
initial_state = env.reset()

env.render()

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
#action = 2
#new_state, reward, terminated, truncated, info = env.step(action)

env.render()


  gym.logger.warn(


#### Describe your reward system here.

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [4]:
# This function uses the Bellman Equation to update the q-table: 
    # new_q = (1-alpha) * q(s, a) + alpha * (R + gamma(max(q(s`, a`))))
def updateQTable(q, alpha, gamma, current_state, next_state, action):
    current_q = q[current_state][action]
    reward = getReward(next_state)
    if reward == 100:
        print("GOAL")
    next_state_row = [q[next_state][0], q[next_state][1], q[next_state][2], q[next_state][3]] 
    next_max_q = max(next_state_row)
    #print(current_q, reward, next_state_row, next_max_q)
    new_q = ((1-alpha) * current_q) + (alpha * (reward + (gamma * next_max_q)))
    q[current_state][action] = new_q
    
def getReward(state):
    if mazeStr[state] == "S" or mazeStr[state] == "F":
        return -1
    elif mazeStr[state] == "H":
        return -100
    elif mazeStr[state] == "G":
        return 100

In [5]:
current_state = 0
alpha = 0.2 # Exploration
gamma = 0.8 # Using Knowlage
terminated = False
for x in range(5000): #Episodes
    while not terminated:
        action = random.randint(0, 3)
        new_state, reward, terminated, truncated, info = env.step(action)
        updateQTable(qTable, alpha, gamma, current_state, new_state, action)
        current_state = new_state
    inital = env.reset()
    terminated = False
env.close()
display(pd.DataFrame(qTable))
save_pickle("qTable.p", qTable)

GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL
GOAL


Unnamed: 0,0,1,2,3
0,-3.165774,-2.714551,-2.706933,-3.165839
1,-3.166314,-2.143289,-2.133245,-2.707253
2,-2.708193,-102.167405,-1.416220,-2.133632
3,-2.135635,-0.528474,-0.518897,-1.417278
4,-1.419766,0.600147,0.604216,-0.522222
...,...,...,...,...
95,15.143799,19.588863,-102.360089,26.825205
96,-3.236101,-2.913426,-2.810666,-3.239916
97,-101.907221,45.217985,72.133230,40.709842
98,44.277369,68.588146,97.361268,56.582096


### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [6]:
# Test model here.

qTable = load_pickle("qTable.p")
# Make maze
env = gym.make('FrozenLake-v1', desc=maze, render_mode='human', is_slippery=False)
initial_state = env.reset()

env.render()

# Take a step (0: LEFT, 1: DOWN, 2: RIGHT, 3: UP)
#action = 2
#new_state, reward, terminated, truncated, info = env.step(action)

env.render()

def findKey(list, value):
    for keys in range(len(list)):
        if list[keys] == value:
            return keys


current_state = 0
terminated = False
while not terminated:
    action = findKey(qTable[current_state], max(qTable[current_state]))
    print(action)
    new_state, reward, terminated, truncated, info = env.step(action)
    current_state = new_state
env.close()


2
2
2
2
2
2
1
2
1
2
1
1
1
2
1
1
1
1


### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.