# RFP: Maze Solvers

## Project Overview
You are invited to submit a proposal that answers the following question:

### What path will your elf take?

*Please submit your proposal by **2/11/25 at 11:59 PM**.*

## Required Proposal Components

### 1. Data Description
In the code cell below, use [Gymnasium](https://gymnasium.farama.org/) to set up a [Frozen Lake maze](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) for your project. When you are done with the set up, describe the reward system you plan on using.

*Note, a level 5 maze is at least 10 x 10 cells large and contains at least five lake cells.*

In [22]:
import gymnasium as gym
import time
import random as rand
from gymnasium.envs.toy_text.frozen_lake import generate_random_map

In [41]:
# Make maze
size = 10

custom_map = generate_random_map(size=size, p=.8)
env = gym.make("FrozenLake-v1", render_mode="human", desc=custom_map, is_slippery=False)

obs,info = env.reset()

env.render()

In [44]:
env.close()

#### Describe your reward system here.

### 2. Training Your Model
In the cell seen below, write the code you need to train a Q-Learning model. Display your final Q-table once you are done training your model.

*Note, level 5 work uses only the standard Python library and Pandas to train your Q-Learning model. A level 4 uses external libraries like Baseline3.*

In [49]:
# Train model here.
rewards = {
    "Step": -1,
    "Hole": -15,
    "Treasure": 15
}

# Actions: 0 is Left, 1 is Down, 2 is Right, 3 is Up
actions = {
    'Left' : 0,
    'Right': 2,
    'Up': 3,
    'Down': 1
}

learning_rate = .5
discount = .5

# Bellman Equation: (1-alpha) * q(s, a) + alpha(R + gamma(max(q(s', a'))))

qtable = {row: {direction: 0.0 for direction in list(actions.keys())} for row in range(100)} # Row: Tile; Column: Move Direction

qtable

{0: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 1: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 2: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 3: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 4: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 5: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 6: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 7: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 8: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 9: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 10: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 11: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 12: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 13: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 14: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 15: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 16: {'Left': 0.0, 'Right': 0.0, 'Up': 0.0, 'Down': 0.0},
 17: {'Left': 0.0, 'Righ

In [None]:
for _ in range(300):
    time.sleep(.25)
    
    action_name, action_value = rand.choice(list(actions.items()))
    obs, reward, done, truncated, info = env.step(action_value)
    env.render()

    print(f"Action: {action_value}, New State: {obs}, Reward: {reward}, Done: {done}")

    

    if done:
        env.reset()


# Don't forget to display your final Q table!

Action: 0, New State: 0, Reward: 0.0, Done: False
Action: 3, New State: 0, Reward: 0.0, Done: False
Action: 2, New State: 1, Reward: 0.0, Done: False
Action: 0, New State: 0, Reward: 0.0, Done: False
Action: 1, New State: 10, Reward: 0.0, Done: False
Action: 3, New State: 0, Reward: 0.0, Done: False
Action: 2, New State: 1, Reward: 0.0, Done: False
Action: 0, New State: 0, Reward: 0.0, Done: False
Action: 2, New State: 1, Reward: 0.0, Done: False
Action: 2, New State: 2, Reward: 0.0, Done: False
Action: 3, New State: 2, Reward: 0.0, Done: False
Action: 0, New State: 1, Reward: 0.0, Done: False
Action: 2, New State: 2, Reward: 0.0, Done: False
Action: 1, New State: 12, Reward: 0.0, Done: False
Action: 0, New State: 11, Reward: 0.0, Done: False
Action: 0, New State: 10, Reward: 0.0, Done: False
Action: 3, New State: 0, Reward: 0.0, Done: False
Action: 1, New State: 10, Reward: 0.0, Done: False
Action: 1, New State: 20, Reward: 0.0, Done: False
Action: 1, New State: 30, Reward: 0.0, Done:

KeyboardInterrupt: 

### 3. Testing Your Model
In the cell seen below, write the code you need to test your Q-Learning model for **1000 episodes**. It is important to test your model for 1000 episodes so that we are all able to compare our results.

*Note, level 5 testing uses both a success rate and an average steps taken metric to evaluate your model. Level 4 uses one or the other.*

In [None]:
# Test model here.

### 4. Final Answer
In the first cell below, describe the path your elf takes to get to the gift. *Note, a level 5 answer includes a gif of the path your elf takes in order to reach the gift.*

In the second cell seen below, describe how well your Q-Learning model performed. Make sure that you explicitly name the **learning rate**, **the discount factor**, and the **reward system** that you used when training your final model. *Note, a level 5 description describes the model's performance using two types of quantitative evidence.*

![example image](https://gymnasium.farama.org/_images/frozen_lake.gif)

#### Describe the path your elf takes here.

#### Describe how well your Q-Learning model performed here.