# 👩‍💻 Simulate Your First RL Environment with an Agent in GridWorld

## 📋 Overview
This lab invites you to delve into the core mechanics of Reinforcement Learning by simulating a simple RL environment: GridWorld. You will implement a fundamental RL loop, providing you with essential insights into how agents learn and adapt within a defined environment. By completing this activity, you'll understand the interactions between agents, actions, and rewards—key components that determine the success of an RL system.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- ✅ Set up and simulate a basic RL environment using GridWorld
- ✅ Define agent actions and rewards in the environment
- ✅ Implement and evaluate a simple random policy
- ✅ Analyze agent performance and explore ways to improve policies

## Task 1: Set Up the GridWorld Environment

**Context:** Setting up the GridWorld environment with start and goal states is the first step.

**Steps:**

1. Create a GridWorld environment with defined start and goal states.
2. Use a simple grid to model the space your agent will navigate, initially starting at one corner while trying to reach a designated goal.

In [None]:
# Task 1: Set Up the GridWorld Environment
import numpy as np

# Your code here..

💡 **Tip:** Use a grid size of 5x5 for simplicity.

⚙️ **Test Your Work:**
- Print the initial state of the environment.

**Expected output:** The starting position of the agent on the grid.

## Task 2: Define Agent Actions and Rewards

**Context:** Implementing actions and rewards allows the agent to interact meaningfully with the environment.

**Steps:**

1. Implement the possible actions for the agent (e.g., moving right, down).
2. Incorporate a reward system where the agent receives positive feedback upon reaching the goal and a small penalty for each move that does not reach the goal.

In [None]:
# Task 2: Define Agent Actions and Rewards

💡 **Tip:** Define small penalties to encourage strategic pathfinding.

⚙️ **Test Your Work:**
- Print the state and reward after a few actions.

**Expected output:** The new state and reward based on the agent's actions.

## Task 3: Create a Random Policy for Exploration

**Context:** Developing a random policy enables the agent to explore the environment.

**Steps:**

1. Develop a basic policy that allows the agent to make decisions at random from the available actions.

In [None]:
# Task 3: Create a Random Policy for Exploration

💡 **Tip:** Use `np.random.choice` to select random actions.

⚙️ **Test Your Work:**
- Print the actions selected by the random policy.

**Expected output:** A series of randomly selected actions.

## Task 4: Implement the Reinforcement Learning Loop

**Context:** The RL loop simulates the agent's interaction with the environment over multiple episodes.

**Steps:**

1. Set up the RL loop where the agent continually interacts with the environment, executing actions, updating its state, and receiving rewards.
2. Simulate multiple episodes where the agent endeavors to reach the goal, adjusting its actions over time.

In [None]:
# Task 4: Implement the Reinforcement Learning Loop

💡 **Tip:** Use a loop to simulate multiple episodes and track the total rewards and steps taken.

⚙️ **Test Your Work:**
- Print the total rewards and steps taken for each episode.

**Expected output:** The performance metrics for each episode.

### ✅ Success Checklist

- Successfully set up the GridWorld environment with defined start and goal states
- Implemented agent actions and rewards
- Developed and evaluated a random policy
- Simulated the RL loop and analyzed agent performance
- Explored ways to improve the agent's policy

### 🔍 Common Issues & Solutions

**Problem:** Agent actions not updating the state correctly.   
**Solution:** Ensure the actions are correctly defined and update the state as intended.

**Problem:** Rewards not being calculated properly.   
**Solution:** Verify the reward logic and ensure it's being applied correctly for each action.

**Problem:** Agent not reaching the goal.   
**Solution:** Check the random policy and try increasing the number of episodes for more exploration.

### 🔑 Key Points

- Setting up and simulating an RL environment helps understand the core mechanics of reinforcement learning.
- Defining clear actions and rewards is crucial for meaningful agent interactions.
- Analyzing performance and exploring improvements are key steps in optimizing RL models.

## 💻 Exemplar Solution

<details>    
<summary><strong>Click HERE to see an exemplar solution</strong></summary>    

```python
import numpy as np

class GridWorld:
    def __init__(self, size=5, start=(0, 0), goal=(4, 4)):
        self.size = size
        self.state = start
        self.goal = goal

    def reset(self):
        self.state = (0, 0)
        return self.state

    def is_goal_reached(self):
        return self.state == self.goal

    def step(self, action):
        x, y = self.state
        if action == "right" and x < self.size - 1:
            x += 1
        elif action == "down" and y < self.size - 1:
            y += 1
        self.state = (x, y)
        reward = 1 if self.is_goal_reached() else -0.04  # Small penalty for each move
        return self.state, reward

# Define agent's possible actions
actions = ["right", "down"]

def random_policy():
    return np.random.choice(actions)

# Simulation of the Reinforcement Learning Loop
env = GridWorld()
num_episodes = 10

for episode in range(num_episodes):
    state = env.reset()
    total_reward = 0
    step_count = 0
    while not env.is_goal_reached():
        action = random_policy()  # Select action based on random policy
        state, reward = env.step(action)
        total_reward += reward
        step_count += 1
    print(f"Episode {episode + 1}: Total Reward: {total_reward:.2f}, Steps Taken: {step_count}")
```