# Cleaning Agent Navigation in a Grid World

### Introduction

In this notebook, we'll simulate a simple **cleaning agent** that navigates a 2D grid environment to reach a goal while avoiding obstacles (dirt spots). The agent can move up, down, left, or right. The environment tracks its state and rewards the agent for reaching the goal.

This example covers:

- Environment modeling
- State validation
- Agent actions and policy
- Visualization using emojis in the console

The notebook is based on: https://towardsdatascience.com/reinforcement-learning-101-building-a-rl-agent-0431984ba178/


In [None]:
import numpy as np
import logging
logging.basicConfig(level=logging.INFO)

### 1. Environment Setup: The GridWorld Class

The `GridWorld` class models the environment:

- It defines a grid size (`width` × `height`).
- Tracks the agent's start and goal positions.
- Marks obstacles (dirt spots) the agent must avoid.
- Supports resetting, checking valid states, and taking action steps.


In [None]:
class GridWorld:
    """
    GridWorld environment for navigation.

    Args:
    - width: Width of the grid
    - height: Height of the grid
    - start: Start position of the agent (tuple)
    - goal: Goal position of the agent (tuple)
    - obstacles: List of obstacle positions (list of tuples)

    Methods:
    - reset: Reset environment to start state
    - is_valid_state: Check if a state is inside bounds and not blocked by an obstacle
    - step: Move the agent in the environment based on an action
    """
    def __init__(self, width: int = 5, height: int = 5, start: tuple = (0, 0), goal: tuple = (4, 4), obstacles: list = None):
        self.width = width
        self.height = height
        self.start = np.array(start)
        self.goal = np.array(goal)
        self.obstacles = [np.array(obstacle) for obstacle in obstacles] if obstacles else []
        self.state = self.start
        self.actions = {
            'up': np.array([-1, 0]),
            'down': np.array([1, 0]),
            'left': np.array([0, -1]),
            'right': np.array([0, 1])
        }

    def reset(self):
        """ Reset the environment to the start state """
        self.state = self.start
        return self.state

    def is_valid_state(self, state):
        """
        Check if a state is valid (inside grid and not an obstacle)
        """
        inside_grid = (0 <= state[0] < self.height) and (0 <= state[1] < self.width)
        not_obstacle = all((state != obstacle).any() for obstacle in self.obstacles)
        return inside_grid and not_obstacle

    def step(self, action: str):
        """
        Take a step in the environment based on action.
        Returns:
        - next_state: the agent's new position
        - reward: 100 if goal reached, else -1
        - done: True if goal reached, else False
        """
        next_state = self.state + self.actions[action]
        if self.is_valid_state(next_state):
            self.state = next_state
        reward = 100 if (self.state == self.goal).all() else -1
        done = (self.state == self.goal).all()
        return self.state, reward, done


### 2. Defining the Agent's Navigation Policy

The agent needs a way to decide which action to take at each step. Here, we use a simple heuristic policy:

- From the current position, the agent checks possible moves.
- It selects the valid move that minimizes the Manhattan distance to the goal.


In [None]:
def navigation_policy(state: np.array, goal: np.array, obstacles: list):
    """
    Policy for the agent to navigate towards the goal.

    Args:
    - state: current position of the agent (np.array)
    - goal: goal position (np.array)
    - obstacles: list of obstacles (not used directly here but can be)

    Returns:
    - action (str): one of 'up', 'down', 'left', or 'right'
    """
    actions = ['up', 'down', 'left', 'right']
    valid_actions = {}

    for action in actions:
        next_state = state + env.actions[action]
        if env.is_valid_state(next_state):
            # Calculate Manhattan distance to goal
            distance = np.sum(np.abs(next_state - goal))
            valid_actions[action] = distance

    # Choose the action that minimizes the distance to the goal
    if valid_actions:
        return min(valid_actions, key=valid_actions.get)
    else:
        return None  # No valid moves


### 3. Running the Simulation with Visualization

We simulate the agent's moves step-by-step, printing the grid to the console using emojis:

- `⬜` — Empty space
- `⛔` — Obstacle (dirt)
- `🟫` — Goal (cleaning target)
- `🤖` — Agent (cleaning robot)

The grid is printed at each step to show the agent’s progress.


In [None]:
def run_simulation_with_policy(env: GridWorld, policy):
    """
    Run the simulation until the agent reaches the goal or gets stuck.
    Prints the grid with emojis at each step.
    """
    state = env.reset()
    done = False

    logging.info(f"Start State: {state}, Goal: {env.goal}, Obstacles: {env.obstacles}")

    while not done:
        # Create grid filled with empty space emojis
        grid_emoji = np.full((env.height, env.width), '⬜')

        # Place obstacles
        for obstacle in env.obstacles:
            grid_emoji[tuple(obstacle)] = '⛔'

        # Place goal
        grid_emoji[tuple(env.goal)] = '🟫'

        # Place agent
        grid_emoji[tuple(state)] = '🤖'

        # Print grid row by row
        for row in grid_emoji:
            print(' '.join(row))
        print()  # Empty line between steps

        action = policy(state, env.goal, env.obstacles)
        if action is None:
            logging.info("No valid actions available, agent is stuck.")
            break

        next_state, reward, done = env.step(action)
        logging.info(f"State: {state} -> Action: {action} -> Next State: {next_state}, Reward: {reward}")
        state = next_state

        if done:
            logging.info("Goal reached!")


### 4. Setting Up and Running the Cleaning Agent

Now let's define some dirt spots as obstacles and run our cleaning agent simulation:


In [None]:
# Define dirt spots (obstacles) on the grid
obstacles = [(1, 1), (1, 2), (2, 1), (3, 3)]

# Initialize the environment with obstacles
env = GridWorld(obstacles=obstacles)

# Run the simulation using the navigation policy
run_simulation_with_policy(env, navigation_policy)


### Conclusion

- We modeled a grid environment with obstacles and a goal.
- The cleaning agent moves step-by-step using a simple heuristic policy.
- We visualized the agent’s progress in the console with emojis.
- This simple example can be extended to include more advanced policies, dynamic obstacles, or other features.

Feel free to experiment by changing the grid size, obstacles, or policy!
