# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [1]:
import gymnasium as gym
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
import matplotlib.pyplot as plt
import pandas as pd
import random


In [14]:
map_size = 10
map_type = generate_random_map
# Make maze

env = gym.make('FrozenLake-v1', desc=map_type(size=map_size), render_mode='human', is_slippery=True)


In [3]:
#Learning Rate - 0.5, discount rate - 0.5
#Bellmann Equation = (1-Alpha)q(s,a) + alpha(reward + gamma(max(q(s',a')))

#Q Table Diagram
#       Up      Down     Right    Left
#0
#1
#2
#3
#4...

qtable = {
    "Up":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
    "Down":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 
    "Left":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],  
    "Right":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
}
df = pd.DataFrame(qtable)
df.head()


Unnamed: 0,Up,Down,Left,Right
0,0,0,0,0
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
4,0,0,0,0


In [15]:
episodes = 10

for _ in range(episodes):
    state, _ = env.reset()
    done = False

    while not done:
        action = env.action_space.sample()
        print(action)
        next_state, reward, done, truncated, _ = env.step(action)
        print(next_state, reward, done, truncated)


env.close() 


1
0 0.0 False False
3
1 0.0 True False
2
1 0.0 True False
0
10 0.0 False False
1
20 0.0 False False
0
10 0.0 False False
1
20 0.0 False False
1
20 0.0 False False
0
10 0.0 False False
3
10 0.0 False False
3
11 0.0 False False
2
21 0.0 False False
0
31 0.0 False False
2
21 0.0 False False
2
22 0.0 False False
2
32 0.0 False False
0
31 0.0 False False
2
21 0.0 False False
2
11 0.0 False False
3
10 0.0 False False
3
11 0.0 False False
1
12 0.0 True False
1
10 0.0 False False
2
11 0.0 False False
2
1 0.0 True False
0
0 0.0 False False
0
0 0.0 False False
1
0 0.0 False False
1
0 0.0 False False
3
1 0.0 True False
3
1 0.0 True False
1
10 0.0 False False
1
10 0.0 False False
2
11 0.0 False False
2
21 0.0 False False
0
20 0.0 False False
3
21 0.0 False False
0
20 0.0 False False
3
20 0.0 False False
0
30 0.0 False False
2
40 0.0 False False
2
30 0.0 False False
3
20 0.0 False False
0
10 0.0 False False
1
10 0.0 False False
1
20 0.0 False False
0
30 0.0 False False
3
20 0.0 False False
1
21 0.0

In [12]:
alpha = 0.9 #learning rate How much the current q value matches the old q value
gamma = 0.1 #discount rate How much we value future rewards

def qvalue(state, action, reward, next_state, done):
    qvalue = ((1-alpha)*qtable[state][action] + alpha*(reward + gamma*max(qtable[next_state])))

#### The reward system is as follows: Goal = +100,000, Frozen Lake = -1, Hole = -100,000,000. This will incentivize finding the most optimal path towards a goal. However, if an agent enters into a hole, said agent will be nuked from existence.  

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [20]:
#Bellman Equation
alpha = 0.9 #learning rate How much the current q value matches the old q value
gamma = 0.1 #discount rate How much we value future rewards
q_value = ((1-alpha)*qtable[state][action] + alpha*(reward + gamma*max(qtable[next_state])))




KeyError: 0

### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [None]:
# Test model here.

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.