# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [10]:
import gymnasium as gym
import pandas as pd
import random
import numpy as np

In [2]:
# Make maze
env = gym.make('FrozenLake-v1', desc=[
        "SFFFFHHFHF",
        "FFFFFFHFFF",
        "FHFHFFFFFH",
        "FHFFHHFFFF",
        "FFFHFFFHHH",
        "FHHFFFHFFH",
        "FHFFHFHFHH",
        "FFFHFFFFHF",
        "FFFHFHFFFF",
        "HFFHFFFFHG",
    ], render_mode='human')
initial_state = env.reset()

env.render()


#### +1 for gift
#### 0 for basic land
#### -1 for falling in lake

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [3]:
q = {
    0: [0,0,0,0], 
    1: [0,0,0,0], 
    2: [0,0,0,0], 
    3: [0,0,0,0],
    4: [0,0,0,0],
    5: [0,0,0,0],
    6: [0,0,0,0], 
    7: [0,0,0,0], 
    8: [0,0,0,0], 
    9: [0,0,0,0],
    10: [0,0,0,0], 
    11: [0,0,0,0], 
    12: [0,0,0,0], 
    13: [0,0,0,0],
    14: [0,0,0,0],
    15: [0,0,0,0],
    16: [0,0,0,0], 
    17: [0,0,0,0], 
    18: [0,0,0,0], 
    19: [0,0,0,0],
    20: [0,0,0,0], 
    21: [0,0,0,0], 
    22: [0,0,0,0], 
    23: [0,0,0,0],
    24: [0,0,0,0],
    25: [0,0,0,0],
    26: [0,0,0,0], 
    27: [0,0,0,0], 
    28: [0,0,0,0], 
    29: [0,0,0,0],
    30: [0,0,0,0], 
    31: [0,0,0,0], 
    32: [0,0,0,0], 
    33: [0,0,0,0],
    34: [0,0,0,0],
    35: [0,0,0,0],
    36: [0,0,0,0], 
    37: [0,0,0,0], 
    38: [0,0,0,0], 
    39: [0,0,0,0],
    40: [0,0,0,0], 
    41: [0,0,0,0], 
    42: [0,0,0,0], 
    43: [0,0,0,0],
    44: [0,0,0,0],
    45: [0,0,0,0],
    46: [0,0,0,0], 
    47: [0,0,0,0], 
    48: [0,0,0,0], 
    49: [0,0,0,0],
    50: [0,0,0,0], 
    51: [0,0,0,0], 
    52: [0,0,0,0], 
    53: [0,0,0,0],
    54: [0,0,0,0],
    55: [0,0,0,0],
    56: [0,0,0,0], 
    57: [0,0,0,0], 
    58: [0,0,0,0], 
    59: [0,0,0,0],
    60: [0,0,0,0], 
    61: [0,0,0,0], 
    62: [0,0,0,0], 
    63: [0,0,0,0],
    64: [0,0,0,0],
    65: [0,0,0,0],
    66: [0,0,0,0], 
    67: [0,0,0,0], 
    68: [0,0,0,0], 
    69: [0,0,0,0],
    70: [0,0,0,0], 
    71: [0,0,0,0], 
    72: [0,0,0,0], 
    73: [0,0,0,0],
    74: [0,0,0,0],
    75: [0,0,0,0],
    76: [0,0,0,0], 
    77: [0,0,0,0], 
    78: [0,0,0,0], 
    79: [0,0,0,0],
    80: [0,0,0,0], 
    81: [0,0,0,0], 
    82: [0,0,0,0], 
    83: [0,0,0,0],
    84: [0,0,0,0],
    85: [0,0,0,0],
    86: [0,0,0,0], 
    87: [0,0,0,0], 
    88: [0,0,0,0], 
    89: [0,0,0,0],
    90: [0,0,0,0], 
    91: [0,0,0,0], 
    92: [0,0,0,0], 
    93: [0,0,0,0],
    94: [0,0,0,0],
    95: [0,0,0,0],
    96: [0,0,0,0], 
    97: [0,0,0,0], 
    98: [0,0,0,0], 
    99: [0,0,0,0],
}

In [4]:
cell_types = "SFFFFHHFHFFFFFFFHFFFFHFHFFFFFHFHFFHHFFFFFFFHFFFHHHFHHFFFHFFHFHFFHFHFHHFFFHFFFFHFFFFHFHFFFFHFFHFFFFHG"

def getReward(state):
    if cell_types[state] == "G":
        return 1
    elif cell_types[state] == "H":
        return -1
    else:
        return -10

In [5]:
def updateQTable(q, alpha, gamma, current_state, next_state, action):
    current_q = q[current_state][action]
    reward = getReward(next_state)
    next_state_row = [q[next_state][0], q[next_state][1], q[next_state][2], q[next_state][3]] 
    next_max_q = max(next_state_row)
    new_q = ((1-alpha) * current_q) + (alpha * (reward + (gamma * next_max_q)))
    q[current_state][action] = new_q

In [6]:
current_state = 0
alpha = 0.2
gamma = 0.8
terminated = False
for episode in range(100):
    while not terminated:
        action = random.randint(0, 3)
        new_state, reward, terminated, truncated, info = env.step(action)
        updateQTable(q, alpha, gamma, current_state, new_state, action)
        current_state = new_state
    inital = env.reset()
    terminated = False

In [7]:
df = pd.DataFrame(q)
df = df.T
df.columns = ["Left", "Down", "Right", "Up"]
df.head(6)

Unnamed: 0,Left,Down,Right,Up
0,-33.702409,-31.065389,-31.986499,-33.538711
1,-32.473492,-31.911449,-26.80779,-28.614133
2,-24.254346,-20.967512,-21.768512,-24.075442
3,-17.753784,-20.551406,-16.704526,-20.942146
4,-15.826799,-10.291169,-13.147941,-12.766901
5,-14.44142,-6.152298,-11.114435,-11.214485


### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [31]:
initial_state = env.reset()
env.render()

In [12]:
num_episodes = 1000
total_rewards = []
total_steps = []

for episode in range(num_episodes):
    state, _ = env.reset()
    done = False
    episode_reward = 0
    step_count = 0  # Track steps per episode

    while not done:
        action = np.argmax(q[state])  # Choose best action
        next_state, reward, done, _, _ = env.step(action)  # Step
        episode_reward += reward
        state = next_state  # Move to next state
        step_count += 1  # Increment step count

    total_rewards.append(episode_reward)
    total_steps.append(step_count)  # Store steps per episode

# Evaluation results
average_reward = np.mean(total_rewards)
success_rate = np.sum(total_rewards) / num_episodes  # Assuming reward 1 = success
average_steps = np.mean(total_steps)  # Calculate average steps per episode

print(f"Tested over {num_episodes} episodes.")
print(f"Average Reward: {average_reward:.3f}")
print(f"Success Rate: {success_rate:.2%}")
print(f"Average Steps per Episode: {average_steps:.2f}")

Tested over 1000 episodes.
Average Reward: 0.000
Success Rate: 0.00%
Average Steps per Episode: 8.15


### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

[example image](https://github.com/applepot437/ml/blob/main/maze/FrozenLake2025-02-2510-12-32-ezgif.com-video-to-gif-converter%20(1).gif)

#### My elf takes a multitude of turns, almost every time it moves, and it often falls into lakes after the first few turns. 

#### This is becauses of the flawed learning rate, and how small the discount value is. My Q-Means model has rather small values for its reward system, which would explain its poor preformance. Another reason is that my maze is rather complicated, which could make my model run poorly.