# Warehouse Robot Navigation Using Q-Learning

## Overview

This notebook implements a reinforcement learning solution to the warehouse robot navigation problem using Q-learning. The robot must navigate from a loading bay to a target shelf on a slippery floor while avoiding hazards (holes).

## Problem Statement

We use the FrozenLake-v1 environment from Gymnasium, which models:

- **Environment**: A slippery warehouse floor represented as a grid
- **Agent**: A warehouse robot that can move in 4 directions (Left, Down, Right, Up)
- **Goal**: Navigate from start (S) to goal (G) while avoiding holes (H)
- **Challenge**: Stochastic transitions due to slippery surface (actions may slip)

## Tasks

1. Understanding the Environment
2. Setting Up the Q-Learning Agent
3. Training the Agent
4. Evaluation & Comparison with Baselines
5. Hyperparameter Optimization
6. Testing on Larger Maps (8×8)


## 1. Setup and Imports


In [1]:
# Import required libraries
import gymnasium as gym
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
import time

# Set random seeds for reproducibility
np.random.seed(42)

print("All libraries imported successfully!")
print(f"Gymnasium version: {gym.__version__}")
print(f"NumPy version: {np.__version__}")


All libraries imported successfully!
Gymnasium version: 1.2.1
NumPy version: 1.26.4


## 2. Understanding the Environment

In this section, we explore the FrozenLake-v1 environment to understand:

- State space (observation space)
- Action space
- Grid layout (Start, Goal, Holes, Frozen tiles)
- Reward structure
- Effect of slippery=True (stochastic transitions)


### 2.1 Create FrozenLake Environment


In [2]:
# Create FrozenLake environment with slippery surface
# Starting with 4x4 map (default)
env = gym.make('FrozenLake-v1', map_name="4x4", is_slippery=True, render_mode='ansi')

print("=" * 60)
print("FROZENLAKE ENVIRONMENT CREATED")
print("=" * 60)
print(f"Environment: {env.spec.id}")
print(f"Map: 4x4")
print(f"Slippery: True (stochastic transitions)")
# print("=" * 60)


FROZENLAKE ENVIRONMENT CREATED
Environment: FrozenLake-v1
Map: 4x4
Slippery: True (stochastic transitions)


### 2.2 Inspect State and Action Spaces


In [3]:
# Print observation and action spaces
print("\n STATE SPACE (Observation Space):")
print(f"   Type: {env.observation_space}")
print(f"   Number of states: {env.observation_space.n}")
print(f"   Description: Each tile on the grid is a discrete state (0 to {env.observation_space.n - 1})")

print("\n ACTION SPACE:")
print(f"   Type: {env.action_space}")
print(f"   Number of actions: {env.action_space.n}")
print(f"   Actions mapping:")
print(f"      0 = LEFT")
print(f"      1 = DOWN")
print(f"      2 = RIGHT")
print(f"      3 = UP")

print("\n" + "=" * 60)



 STATE SPACE (Observation Space):
   Type: Discrete(16)
   Number of states: 16
   Description: Each tile on the grid is a discrete state (0 to 15)

 ACTION SPACE:
   Type: Discrete(4)
   Number of actions: 4
   Actions mapping:
      0 = LEFT
      1 = DOWN
      2 = RIGHT
      3 = UP



### 2.3 Visualize the Grid


In [7]:
# Reset environment and visualize the grid
state, info = env.reset()
grid_render = env.render()

print("\n GRID LAYOUT (4x4 Map):")
print("=" * 60)
print(grid_render)
print("=" * 60)
print("\n Legend:")
print("   S = Start (loading bay) - Initial position")
print("   F = Frozen (safe tile) - Can walk on")
print("   H = Hole (hazard/spill) - Episode ends, reward = 0")
print("   G = Goal (target shelf) - Episode ends, reward = +1")
print("\n   The robot starts at 'S' and must reach 'G' while avoiding 'H'")
print("=" * 60)



 GRID LAYOUT (4x4 Map):

[41mS[0mFFF
FHFH
FFFH
HFFG


 Legend:
   S = Start (loading bay) - Initial position
   F = Frozen (safe tile) - Can walk on
   H = Hole (hazard/spill) - Episode ends, reward = 0
   G = Goal (target shelf) - Episode ends, reward = +1

   The robot starts at 'S' and must reach 'G' while avoiding 'H'


### 2.4 Reward Structure


In [5]:
print("\n REWARD STRUCTURE:")
print("=" * 60)
print("   Reaching Goal (G):      +1.0  (episode terminates)")
print("   Falling into Hole (H):   0.0  (episode terminates)")
print("   Safe tile (F or S):      0.0  (continue episode)")
print("=" * 60)
print("\n  SPARSE REWARD CHALLENGE:")
print("   - Agent only gets reward when reaching the goal")
print("   - No intermediate feedback during navigation")
print("   - Must explore extensively to discover successful paths")
print("=" * 60)



 REWARD STRUCTURE:
   Reaching Goal (G):      +1.0  (episode terminates)
   Falling into Hole (H):   0.0  (episode terminates)
   Safe tile (F or S):      0.0  (continue episode)

  SPARSE REWARD CHALLENGE:
   - Agent only gets reward when reaching the goal
   - No intermediate feedback during navigation
   - Must explore extensively to discover successful paths


### 2.5 Demonstration: Effect of Slippery Floor (Stochastic Transitions)


In [6]:
print("\n SLIPPERY FLOOR EFFECT (is_slippery=True):")
print("=" * 60)
print("When the robot attempts an action, the floor is slippery!")
print("The actual movement has stochastic (random) transitions:\n")
print("   Intended direction:  33.3% chance")
print("   Perpendicular left:  33.3% chance")
print("   Perpendicular right: 33.3% chance")
print("\nExample: If robot tries to move RIGHT:")
print("   → 33% moves RIGHT (intended)")
print("   → 33% moves UP (perpendicular)")
print("   → 33% moves DOWN (perpendicular)")
print("\n REAL-WORLD ANALOGY:")
print("   - Slippery warehouse floor with water/oil spills")
print("   - Wheels may slip in unexpected directions")
print("   - Must learn robust policy that handles uncertainty")
print("=" * 60)

# Demonstrate with a simple test
print("\n DEMONSTRATION: Trying to move RIGHT 10 times from start")
print("=" * 60)
action_right = 2  # RIGHT
outcomes = []

for i in range(10):
    state, info = env.reset()
    next_state, reward, terminated, truncated, info = env.step(action_right)
    outcomes.append(next_state)
    
print(f"Starting state: {state} (always starts at same position)")
print(f"Action taken: RIGHT (action={action_right})")
print(f"\nResulting states after action: {outcomes}")
print(f"Unique states reached: {set(outcomes)}")
print("\n Notice: Even with the same action, we reach different states!")
print("=" * 60)



 SLIPPERY FLOOR EFFECT (is_slippery=True):
When the robot attempts an action, the floor is slippery!
The actual movement has stochastic (random) transitions:

   Intended direction:  33.3% chance
   Perpendicular left:  33.3% chance
   Perpendicular right: 33.3% chance

Example: If robot tries to move RIGHT:
   → 33% moves RIGHT (intended)
   → 33% moves UP (perpendicular)
   → 33% moves DOWN (perpendicular)

 REAL-WORLD ANALOGY:
   - Slippery warehouse floor with water/oil spills
   - Wheels may slip in unexpected directions
   - Must learn robust policy that handles uncertainty

 DEMONSTRATION: Trying to move RIGHT 10 times from start
Starting state: 0 (always starts at same position)
Action taken: RIGHT (action=2)

Resulting states after action: [4, 0, 1, 4, 4, 1, 1, 0, 1, 4]
Unique states reached: {0, 1, 4}

 Notice: Even with the same action, we reach different states!
