<a href="https://colab.research.google.com/github/Sakinat-Folorunso/OOU_CSC309_Artificial_Intelligence/blob/main/notebooks/CSC309_Week02_Intelligent_Agents_Student_Centred.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSC309 ‚Äì Artificial Intelligence  
**Week 2 Lab:** Intelligent Agents ‚Äî Random vs Reflex vs Model‚Äëbased

**Instructor:** Dr Sakinat Folorunso

**Title:** Associate Professor of AI Systems and FAIR Data **Department:** Computer Sciences, Olabisi Onabanjo University, Ago-Iwoye, Ogun State, Nigeria

**Course Code:** CSC 309

**Mode:** Student‚Äëcentred, hands‚Äëon in Google Colab

> Every code cell is commented line‚Äëby‚Äëline so you can follow the logic precisely.

## How to use this notebook
1. Start with the **Group Log** and **Do Now**.  
2. Run the **Setup** cell once.  
3. Work through **Tasks**. Edit only cells marked **`# TODO(Student)`**.  
4. Use **Quick Checks** to test your understanding.  
5. Finish with the **Reflection**. If you finish early, try the **Extensions**.

In [None]:
#@title üßëüèΩ‚Äçü§ù‚Äçüßëüèæ Group Log (fill before you start)
# The '#@param' annotations create form fields in Colab for easy input.

group_members = "Abdulquadri Adekunle","Toheeb Mustapha","Aduraseyi Osilaja"  #@param {type:"string"}  # Names of teammates
roles_notes = "Intelligent-Agent-System"  #@param {type:"string"}  # Short working notes

print("üë• Group:", group_members)        # Echo the group list for confirmation
print("üìù Notes:", roles_notes)          # Echo the notes so they're preserved in output

### Learning Objectives
- Define **PEAS**, **rationality**, and **performance measures**.  
- Implement a small environment and three agent policies.  
- Compare policies using score distributions.

In [None]:
# TODO(Student): Model-based agent + comparison plot (with line-by-line comments)
#@title üß™ Environment + Policies (fully commented)
# We implement a tiny "vacuum-world" style grid with dirt.
# The agent gets +10 for cleaning a dirty cell and ‚àí1 for moving or cleaning a clean cell.
#@title üîß Setup (run once)
# This lab uses only common scientific Python libraries.
# Each import line is commented to explain its role.

import sys                  # Access to Python interpreter details (not strictly required)
import subprocess           # Allows us to call 'pip' if needed
def pip_install(pkgs):      # Helper to install packages only if missing
    for p in pkgs:
        try:
            __import__(p.split("==")[0])   # Try to import the package by name
        except Exception:
            subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", p])  # Quiet install

pip_install(["numpy", "matplotlib"])       # We need NumPy for arrays and Matplotlib for plots

import numpy as np           # Numerical arrays and random sampling
import random                # Simple random choices for agent actions
import matplotlib.pyplot as plt  # Basic plotting for histograms
print("‚úÖ Setup complete for Week 2.")

### Do Now
Sketch a quick **PEAS** for a campus cleaning robot (Performance, Environment, Actuators, Sensors).

In [None]:
class GridWorld:
    def __init__(self, n=5, dirt_prob=0.3, seed=0):
        random.seed(seed)                      # Fix Python's random seed for reproducibility
        np.random.seed(seed)                   # Fix NumPy's random seed for reproducibility
        self.n = n                             # Grid size (n x n)
        self.agent_pos = (0, 0)                # Start position in the top‚Äëleft corner
        self.dirt = (np.random.rand(n, n) < dirt_prob).astype(int)  # 1 indicates dirt; 0 is clean
        self.score = 0                         # Cumulative score earned by the agent

    def perceive(self):
        x, y = self.agent_pos                  # Unpack the current coordinates
        return {"dirty": bool(self.dirt[x, y])}# Observation: is the current cell dirty?

    def step(self, action):
        x, y = self.agent_pos                  # Current position of the agent
        if action == "CLEAN":                  # If the agent chooses to clean
            if self.dirt[x, y] == 1:          # Check if the current cell actually has dirt
                self.dirt[x, y] = 0           # Remove the dirt
                self.score += 10              # Reward for cleaning dirt
            else:
                self.score -= 1               # Penalty for cleaning when there is no dirt
        elif action in ["UP", "DOWN", "LEFT", "RIGHT"]:  # If the agent chooses to move
            nx, ny = x, y                     # Start with the current position
            if action == "UP":   nx = max(0, x - 1)              # Move up, staying inside the grid
            if action == "DOWN": nx = min(self.n - 1, x + 1)     # Move down, staying inside the grid
            if action == "LEFT": ny = max(0, y - 1)              # Move left, staying inside the grid
            if action == "RIGHT":ny = min(self.n - 1, y + 1)     # Move right, staying inside the grid
            self.agent_pos = (nx, ny)           # Update the agent's position
            self.score -= 1                     # Small movement penalty
        else:
            self.score -= 1                     # Penalize unknown actions to keep policy sensible
        return self.perceive()                  # Return the new observation

# --- Policies ---------------------------------------------------------------

def random_agent(obs):
    """Return a random action, ignoring the observation (baseline)."""
    return random.choice(["UP", "DOWN", "LEFT", "RIGHT", "CLEAN"])  # Uniform random choice

def reflex_agent(obs):
    """Clean if dirty; otherwise move randomly (simple reflex)."""
    if obs["dirty"]:                      # If the sensor says current cell is dirty
        return "CLEAN"                    # Then clean it
    return random.choice(["UP", "DOWN", "LEFT", "RIGHT"])  # Else move randomly

def run(agent_fn, steps=100, seed=0):
    """Simulate an agent for a fixed number of steps and return the final score."""
    env = GridWorld(seed=seed)            # Create a fresh environment per run
    for _ in range(steps):                # Repeat for the given number of steps
        obs = env.perceive()              # Read the current observation
        action = agent_fn(obs)            # Choose an action using the policy
        env.step(action)                  # Apply the action to the environment
    return env.score                      # Return total score as performance measure

# Quick experiment: average scores over 5 seeds for the two base policies
for fn in [random_agent, reflex_agent]:             # Iterate over the two policy functions
    scores = [run(fn, seed=s) for s in range(5)]    # Run each policy with seeds 0..4
    print(fn.__name__, "avg score:", sum(scores)/len(scores))  # Print the average score

In [None]:
def model_based_agent_factory():
    """
    Factory that returns a stateful agent function.
    The agent keeps an internal model (memory) of:
      - Which cells it has already visited
      - Whether those cells were dirty when visited
      - Its current believed position (since the real env hides it)
    It uses this model to avoid re-cleaning clean cells and to systematically
    explore the grid instead of moving randomly.
    """
    # ---------- Persistent state (lives between calls) ----------
    visited = set()                    # Set of (x,y) positions the agent has been to
    cleaned = set()                    # Set of (x,y) positions that were cleaned
    believed_pos = [0, 0]              # Agent's internal belief of its own position
    move_cycle = ["RIGHT", "DOWN", "LEFT", "UP"]   # Deterministic exploration order
    cycle_idx = 0                      # Index into the move_cycle

    def model_based_agent(obs):
        nonlocal believed_pos, cycle_idx   # Allow modification of these variables

        x, y = believed_pos                # Current believed coordinates

        # Step 1: Always clean if the current square is dirty
        if obs["dirty"]:
            cleaned.add((x, y))            # Remember we cleaned this cell
            visited.add((x, y))
            return "CLEAN"

        # Step 2: We are on a clean cell ‚Üí mark as visited (if not already)
        visited.add((x, y))

        # Step 3: Choose next move using deterministic cycle (systematic sweep)
        # This gives much better coverage than pure random moves
        for _ in range(4):                             # Try up to 4 directions
            move = move_cycle[cycle_idx]
            cycle_idx = (cycle_idx + 1) % 4            # Advance cycle

            # Compute candidate new position
            nx, ny = x, y
            if move == "RIGHT":  ny += 1
            if move == "LEFT":   ny -= 1
            if move == "DOWN":   nx += 1
            if move == "UP":     nx -= 1

            # Do not move into cells we *know* are already clean (avoid useless moves)
            if (nx, ny) in cleaned and (nx, ny) in visited:
                continue                               # Skip this direction

            # Otherwise this move looks promising ‚Üí take it
            believed_pos = [nx, ny]                    # Update believed position
            return move

        # Fallback: if all 4 directions are known clean, just pick any valid move
        # (this can still happen near the end when most of the grid is clean)
        valid_moves = []
        if y < 4: valid_moves.append("RIGHT")
        if y > 0: valid_moves.append("LEFT")
        if x < 4: valid_moves.append("DOWN")
        if x > 0: valid_moves.append("UP")
        return random.choice(valid_moves or ["RIGHT"])

    return model_based_agent       # Return the stateful function


# --- Evaluation helper (already provided, kept for completeness) ---
def evaluate(agent_fn, trials=30):
    """Run the given agent across many random seeds and collect scores."""
    results = []
    for i in range(trials):
        score = run(agent_fn, seed=i, steps=100)   # 100 steps as in original lab
        results.append(score)
    return results


# === Run comparison ===
random_scores = evaluate(random_agent)
reflex_scores = evaluate(reflex_agent)
model_scores  = evaluate(model_based_agent_factory())   # note: factory returns new agent each eval

# === Plot results ===
plt.figure(figsize=(10, 6))
plt.hist(random_scores,  alpha=0.6, bins=15, label=f"Random (Œº={np.mean(random_scores):.1f})")
plt.hist(reflex_scores,  alpha=0.6, bins=15, label=f"Reflex (Œº={np.mean(reflex_scores):.1f})")
plt.hist(model_scores,   alpha=0.6, bins=15, label=f"Model-based (Œº={np.mean(model_scores):.1f})")

plt.xlabel("Total Score after 100 steps")
plt.ylabel("Frequency (out of 30 trials)")
plt.title("Performance Comparison of Three Agents\n(higher score = better cleaning efficiency)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Print numeric summary
print("=== Average scores over 30 trials ===")
print(f"Random agent     : {np.mean(random_scores):.2f} ¬± {np.std(random_scores):.2f}")
print(f"Reflex agent     : {np.mean(reflex_scores):.2f} ¬± {np.std(reflex_scores):.2f}")
print(f"Model-based agent: {np.mean(model_scores):.2f} ¬± {np.std(model_scores):.2f}")

### Reflection
- What **performance measure** did we implicitly design with our scoring?  
- Which policy is most **rational** under this measure? Why?