# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [1]:
import gymnasium as gym
import pandas as pd
import random

In [10]:
# Make small test maze
maze=["SF", "FH", "FG"]
env = gym.make('FrozenLake-v1', desc=maze, render_mode='human', is_slippery=False)
initial_state = env.reset()
env.render()

# My Reward System:
- +100 for gift
- -1 for empty space
- -100 for lake

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [3]:
# Set up q-table
    # key is the state of cell
    # index of list is the action
        # Left: 0, Down: 1, 2: Right, 3: Up

q = {
    0: [0,0,0,0], 
    1: [0,0,0,0], 
    2: [0,0,0,0], 
    3: [0,0,0,0],
    4: [0,0,0,0],
    5: [0,0,0,0]
}

In [4]:
# Create my own reward system
cell_types =["S", "F", "F", "H", "F", "G"]

def getReward(state):
    if cell_types[state] == "G":
        return 100
    elif cell_types[state] == "H":
        return -100
    else:
        return -1

In [5]:
# This function uses the Bellman Equation to update the q-table: 
    # new_q = (1-alpha) * q(s, a) + alpha * (R + gamma(max(q(s`, a`))))
def updateQTable(q, alpha, gamma, current_state, next_state, action):
    current_q = q[current_state][action]
    reward = getReward(next_state)
    next_state_row = [q[next_state][0], q[next_state][1], q[next_state][2], q[next_state][3]] 
    next_max_q = max(next_state_row)
    new_q = ((1-alpha) * current_q) + (alpha * (reward + (gamma * next_max_q)))
    q[current_state][action] = new_q

In [7]:
# Train Q-Model for 1000 episodes
# All actions are random
current_state = 0
alpha = 0.2
gamma = 0.8
terminated = False
for episode in range(100):
    while not terminated:
        action = random.randint(0, 3)
        new_state, reward, terminated, truncated, info = env.step(action)
        updateQTable(q, alpha, gamma, current_state, new_state, action)
        current_state = new_state
    inital = env.reset()
    terminated = False

In [8]:
df = pd.DataFrame(q)
df = df.T
df.columns = ["Left", "Down", "Right", "Up"]
df.head(6)

Unnamed: 0,Left,Down,Right,Up
0,37.032517,49.566938,26.506268,35.258347
1,35.730817,-66.13131,23.401581,24.40921
2,48.03087,66.608144,-72.419408,36.657258
3,27.90889,45.655355,20.622002,32.056474
4,46.85948,57.989579,95.995095,41.15941
5,13.667236,9.96576,6.488395,7.416808


In [9]:
env.close()
# Save final q table as a csv file
df.to_csv('final_q_values.csv', index=False)

### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [57]:
# Test your model here
maze=["SF", "FH", "FG"]
env = gym.make('FrozenLake-v1', desc=maze, render_mode='human', is_slippery=False)
initial_state = env.reset()
env.render()

In [None]:
current_state = 0
terminated = False
success_rate = 0
for i in range(1000):
    while not terminated:
        max_q = df.iloc[current_state].max()
        column_name = df.columns[(df == max_q).any()].tolist()
        current_state, terminated = takeAction(column_name[0])
    cell_types =["S", "F", "F", "H", "F", "G"]
    terminated = False
    initial_state = env.reset()
    if cell_types[current_state] == "G":
        success_rate += 1
print((success_rate/1000) * 100)

In [56]:
print(current_state)

5


In [32]:
max_q = df.iloc[current_state].max()
column_name = df.columns[(df == max_q).any()].tolist()
current_state = takeAction(column_name[0])

In [45]:
def takeAction(description):
    if description == "Down":
        new_state, reward, terminated, truncated, info = env.step(1)
    elif description == "Up":
        new_state, reward, terminated, truncated, info = env.step(3)
    elif description == "Left":
        new_state, reward, terminated, truncated, info = env.step(0)
    elif description == "Right":
        new_state, reward, terminated, truncated, info = env.step(2)
    return new_state, terminated

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://raw.githubusercontent.com/SSpindt/ML/refs/heads/main/Mazes/FrozenLake2025-02-1111-34-15-ezgif.com-video-to-gif-converter.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.