<a href="https://colab.research.google.com/github/Dulyaaa/Wumpus-World/blob/main/Dynamic_Wumpus_World.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Description (Read Carefully)

Start and Exit locations are the same as Part 1: Start at `(1,1)` and Exit at `(n,n)`. There is one Gold somewhere in the grid.

- The **Wumpus moves** over time by a simple random-walk rule on the 4-connected grid, possibly **staying put** with some probability. The agent does not observe its position directly.
- **Pits are static** but **unknown**, and stepping onto a pit is terminal.
- The agent’s sensors are **noisy** (parameters known to you):  
  - **Breeze**: higher chance to report 1 if there is an adjacent pit; otherwise a small false-positive rate.  
  - **Stench**: higher chance to report 1 if the Wumpus is adjacent (in its current position); otherwise a small false-positive rate.  
  Use the following nominal values in your implementation:  
  - Breeze true-positive ≈ **0.90**, false-positive ≈ **0.05**  
  - Stench true-positive ≈ **0.90**, false-positive ≈ **0.05**  
  - Wumpus “stay” probability per step ≈ **0.20**
- The agent perceives only the local noisy Breeze/Stench at its current cell (glitter is still detectable on the gold cell). Hazards, gold, and Wumpus positions are **not** directly observed.
- The agent must **navigate to the Gold, pick it up, and reach the Exit**, maximizing overall performance while avoiding death. You may assume terminal reward for Exit-with-Gold and large negative utility for death; you may optionally include small per-step costs to discourage dithering.


------------------------------------------------------------------------------

#### Implement Wumpus Motion Model

In [16]:
def coord_to_idx(r: int, c: int, n: int) -> int:
    """Converts (r,c) 1-indexed coordinates to 0-indexed linear index."""
    return (r - 1) * n + (c - 1)

def idx_to_coord(idx: int, n: int) -> Tuple[int, int]:
    """Converts 0-indexed linear index to (r,c) 1-indexed coordinates."""
    r = (idx // n) + 1
    c = (idx % n) + 1
    return r, c

def calculate_wumpus_transition_matrix(n: int) -> List[List[float]]:
    """
    Calculates the Wumpus transition matrix for an n x n grid.
    T[i][j] is the probability that Wumpus moves from cell i to cell j.
    """
    num_cells = n * n
    transition_matrix = [[0.0 for _ in range(num_cells)] for _ in range(num_cells)]

    stay_prob = 0.20

    for r_from in range(1, n + 1):
        for c_from in range(1, n + 1):
            from_idx = coord_to_idx(r_from, c_from, n)

            # Get valid neighbors for the current cell
            valid_neighbors = list(neighbors4(r_from, c_from, n))

            # Include the current cell itself as a 'neighbor' for staying put
            possible_moves = valid_neighbors + [(r_from, c_from)]

            # Calculate move probability for each adjacent cell
            if len(valid_neighbors) == 0: # Should only happen for n=1 grid, but handle defensively
                move_prob_per_neighbor = 0.0
            else:
                # (1 - stay_prob) distributed among actual neighbors
                move_prob_per_neighbor = (1.0 - stay_prob) / len(valid_neighbors)

            # Distribute probabilities
            for (r_to, c_to) in valid_neighbors:
                to_idx = coord_to_idx(r_to, c_to, n)
                transition_matrix[from_idx][to_idx] = move_prob_per_neighbor

            # Add stay probability
            transition_matrix[from_idx][from_idx] += stay_prob # This is where the stay_prob is assigned

            # Ensure probabilities sum to 1 (due to float precision, might not be exactly 1)
            current_row_sum = sum(transition_matrix[from_idx])
            # If the sum is slightly off, normalize it (should be very close to 1 already)
            if current_row_sum != 0 and abs(current_row_sum - 1.0) > 1e-9: # Threshold for floating point comparison
                for j in range(num_cells):
                    transition_matrix[from_idx][j] /= current_row_sum

    return transition_matrix

# Example usage:
# n = 3
# transition_matrix_3x3 = calculate_wumpus_transition_matrix(n)
# print(f"Transition matrix for {n}x{n} grid (first 3 rows):\n{transition_matrix_3x3[:3]}")

# Test coordinate conversion
# print(f"Coord (1,1) -> Index {coord_to_idx(1,1,n)}")
# print(f"Index {coord_to_idx(1,1,n)} -> Coord {idx_to_coord(coord_to_idx(1,1,n), n)}")


#### Implement Percept Update Rules

In [17]:
import numpy as np

def update_wumpus_belief(
    current_wumpus_prob_grid: np.ndarray,
    observed_stench: bool,
    agent_pos: Coord,
    n: int,
    transition_matrix: List[List[float]],
    visited_cells: Set[Coord]
) -> np.ndarray:
    """
    Updates the Wumpus belief state using prediction (movement) and correction (stench percept).
    """
    num_cells = n * n

    # 0. Ensure current_wumpus_prob_grid is a numpy array
    current_wumpus_prob_grid = np.array(current_wumpus_prob_grid)

    # 1. Flatten current_wumpus_prob_grid to a 1D vector
    current_wumpus_prob_vector = current_wumpus_prob_grid.flatten()

    # 2. Prediction Step: Wumpus movement
    # Convert transition_matrix to numpy array for matrix multiplication
    np_transition_matrix = np.array(transition_matrix)
    predicted_wumpus_prob_vector = np.dot(current_wumpus_prob_vector, np_transition_matrix)
    predicted_wumpus_prob_grid = predicted_wumpus_prob_vector.reshape((n, n))

    # 3. Correction Step: Bayes' Rule with Stench percept
    likelihood_grid = np.zeros((n, n))
    tp_stench = 0.90  # True Positive for Stench
    fp_stench = 0.05  # False Positive for Stench

    ar, ac = agent_pos
    agent_adj_cells = set(neighbors4(ar, ac, n))

    for r_wumpus in range(1, n + 1):
        for c_wumpus in range(1, n + 1):
            wumpus_pos = (r_wumpus, c_wumpus)

            # If Wumpus is in a visited cell, its probability should be 0
            # This is handled after correction for consistency, but we make sure the likelihood doesn't
            # erroneously assign probability to a visited cell if Wumpus is thought to be there
            # if wumpus_pos in visited_cells:
            #     likelihood_grid[r_wumpus - 1, c_wumpus - 1] = 0.0
            #     continue

            # Determine likelihood P(Observed_Stench | Wumpus at (r_wumpus, c_wumpus))
            if observed_stench:
                if wumpus_pos in agent_adj_cells:
                    likelihood = tp_stench  # Stench is true, Wumpus is adjacent (True Positive)
                else:
                    likelihood = fp_stench  # Stench is true, Wumpus is NOT adjacent (False Positive)
            else: # Not observed_stench (no stench)
                if wumpus_pos in agent_adj_cells:
                    likelihood = (1 - tp_stench) # No stench, Wumpus is adjacent (False Negative)
                else:
                    likelihood = (1 - fp_stench) # No stench, Wumpus is NOT adjacent (True Negative)

            likelihood_grid[r_wumpus - 1, c_wumpus - 1] = likelihood

    # Apply likelihood to predicted probabilities
    updated_wumpus_prob_grid = predicted_wumpus_prob_grid * likelihood_grid

    # 4. Handle known safe cells (agent's current and previously visited cells cannot have wumpus)
    for r_vis, c_vis in visited_cells:
        # Convert to 0-indexed for numpy array
        updated_wumpus_prob_grid[r_vis - 1, c_vis - 1] = 0.0

    # 5. Normalization
    total_prob = np.sum(updated_wumpus_prob_grid)
    if total_prob > 1e-9: # Avoid division by zero if all probabilities are effectively zero
        updated_wumpus_prob_grid /= total_prob
    else:
        # If all probabilities are zero after update (e.g., strong evidence against all cells),
        # re-initialize to a uniform distribution over non-visited cells.
        # This is a fallback and indicates a potential issue or highly conflicting evidence.
        print("Warning: Wumpus probabilities collapsed to zero. Re-initializing uniformly over unvisited cells.")
        num_unvisited = (n * n) - len(visited_cells)
        if num_unvisited > 0:
            uniform_prob = 1.0 / num_unvisited
            for r in range(n):
                for c in range(n):
                    if (r + 1, c + 1) not in visited_cells:
                        updated_wumpus_prob_grid[r, c] = uniform_prob
            # Re-normalize just in case (e.g. if some non-visited cells were also 0 due to some prior assert)
            updated_wumpus_prob_grid /= np.sum(updated_wumpus_prob_grid)
        else:
            # If all cells are visited, and probabilities are zero, this is a fatal state, agent should know it's safe.
            # For now, just leave as all zeros, will likely lead to agent stopping.
            pass


    return updated_wumpus_prob_grid

def update_pit_belief(
    current_pit_prob_grid: np.ndarray,
    observed_breeze: bool,
    agent_pos: Coord,
    n: int,
    visited_cells: Set[Coord]
) -> np.ndarray:
    """
    Updates the Pit belief state using correction (breeze percept) for adjacent cells.
    """
    new_pit_prob_grid = np.copy(current_pit_prob_grid) # Create a copy to modify

    # 1. Handle known safe cells (agent's current and previously visited cells cannot have pits)
    for r_vis, c_vis in visited_cells:
        # Convert to 0-indexed for numpy array
        new_pit_prob_grid[r_vis - 1, c_vis - 1] = 0.0

    # 2. Correction Step: Bayes' Rule with Breeze percept for neighbors of agent_pos
    tp_breeze = 0.90  # True Positive for Breeze
    fp_breeze = 0.05  # False Positive for Breeze

    ar, ac = agent_pos
    agent_neighbors = list(neighbors4(ar, ac, n))

    # For simplicity and tractability, we'll approximate: we update each neighbor's
    # pit probability independently, assuming the breeze from one pit doesn't mask
    # or strongly interact with breeze from another for the *purpose of updating
    # the individual cell's probability*. This is a common approximation in such systems.
    # A more rigorous approach would involve summing over joint probabilities of all neighbors.

    for nr, nc in agent_neighbors:
        if (nr, nc) not in visited_cells:
            # Get current prior probability for this neighbor
            P_old_pit = new_pit_prob_grid[nr - 1, nc - 1]

            # Likelihoods for P(Breeze | Pit) and P(Breeze | ~Pit) at this neighbor
            P_breeze_given_pit = tp_breeze
            P_breeze_given_not_pit = fp_breeze

            if observed_breeze:
                # P(Pit | Breeze) = P(Breeze | Pit) * P(Pit) / P(Breeze)
                # P(Breeze) = P(Breeze | Pit) * P(Pit) + P(Breeze | ~Pit) * P(~Pit)
                numerator = P_breeze_given_pit * P_old_pit
                denominator = P_breeze_given_pit * P_old_pit + P_breeze_given_not_pit * (1 - P_old_pit)
            else: # Observed no breeze
                # P(Pit | ~Breeze) = P(~Breeze | Pit) * P(Pit) / P(~Breeze)
                # P(~Breeze) = P(~Breeze | Pit) * P(Pit) + P(~Breeze | ~Pit) * P(~Pit)
                P_not_breeze_given_pit = (1 - P_breeze_given_pit)
                P_not_breeze_given_not_pit = (1 - P_breeze_given_not_pit)

                numerator = P_not_breeze_given_pit * P_old_pit
                denominator = P_not_breeze_given_pit * P_old_pit + P_not_breeze_given_not_pit * (1 - P_old_pit)

            if denominator > 1e-9: # Avoid division by zero
                P_new_pit = numerator / denominator
                new_pit_prob_grid[nr - 1, nc - 1] = P_new_pit
            else:
                # If denominator is zero, it means the prior or likelihoods are contradictory,
                # or the cell is certainly not a pit. For now, we'll assume it's 0 if denominator is 0.
                new_pit_prob_grid[nr - 1, nc - 1] = 0.0

    return new_pit_prob_grid

print("Numpy imported and percept update functions for Wumpus and Pit beliefs defined.")

Numpy imported and percept update functions for Wumpus and Pit beliefs defined.


#### Action-Selection Strategy of Agent

In [18]:
import numpy as np

# 1. Define constants for utility values and risk threshold
U_death = -1000.0          # Utility for dying (falling into a pit or being eaten by wumpus)
U_grab_gold = 100.0        # Utility for grabbing gold
U_exit_with_gold = 1000.0  # Utility for exiting with gold
U_move_cost = -1.0         # Small penalty for each move
risk_threshold = 0.1       # Maximum acceptable probability of hazard for a cell to be considered 'safe'

def online_probabilistic_agent_noisy_sensors(
    world: Dict,
    verbose: bool = True,
    max_steps: int = 500
) -> Dict:
    """
    Probabilistic agent for Wumpus World with moving Wumpus and noisy sensors.
    """
    n = world["rows"]
    start = tuple(world["start"])
    exit_cell = tuple(world["exit"])
    gold = tuple(world["gold"])

    # Agent state initialization
    pos: Coord = start
    visited: Set[Coord] = {pos}
    have_gold = False
    plan_actions: List[str] = []
    trace: List[Coord] = [pos]
    stopped_reason = None

    # 2. Initialize Wumpus belief state (uniform over all cells, excluding start)
    # Renormalize after setting start to 0
    wumpus_prob_grid = np.full((n, n), 1.0)
    wumpus_prob_grid[start[0]-1, start[1]-1] = 0.0
    wumpus_prob_grid /= np.sum(wumpus_prob_grid) # Normalize

    # 3. Initialize Pit belief state (uniform prior, excluding start)
    # A default prior can be estimated from num_pits / (n*n - 1) if num_pits is known
    # Or, a small constant like 0.2 if no specific pit count is given prior.
    # For this assignment, we use a constant default value and later refine based on problem context if num_pits is an input.
    initial_pit_prob_unknown = 0.2 # Example constant, adjust if world generator implies better prior
    pit_prob_grid = np.full((n, n), initial_pit_prob_unknown)
    pit_prob_grid[start[0]-1, start[1]-1] = 0.0 # Agent is safe at start

    # Also, propagate `not pit` for neighbors if no breeze at start, as in Part 1.
    # This is done *after* initialisation of the full grid, so it overrides the prior.
    initial_percepts = truth_percepts(world, start)
    if not initial_percepts["breeze"]:
        for nr, nc in neighbors4(start[0], start[1], n):
            if (nr, nc) not in visited: # Only update for unvisited cells
                pit_prob_grid[nr-1, nc-1] = 0.0

    # 4. Calculate Wumpus transition matrix
    wumpus_transition_matrix = calculate_wumpus_transition_matrix(n)

    # Agent loop starts here
    steps = 0
    while steps < max_steps:
        steps += 1

        if verbose:
            print(f"\n--- Step {steps} ---")
            print(f"Current position: {pos}")
            print(f"Have gold: {have_gold}")

        # 5. Perceive
        obs = truth_percepts(world, pos)
        if verbose:
            print(f"Percepts at {pos}: Breeze={obs['breeze']}, Stench={obs['stench']}, Glitter={obs['glitter']}")

        # 6. Update beliefs
        # Ensure visited cells are not considered for wumpus or pit probabilities
        # This happens within update_wumpus_belief and update_pit_belief now.
        wumpus_prob_grid = update_wumpus_belief(
            wumpus_prob_grid,
            obs["stench"],
            pos,
            n,
            wumpus_transition_matrix,
            visited
        )
        pit_prob_grid = update_pit_belief(
            pit_prob_grid,
            obs["breeze"],
            pos,
            n,
            visited
        )

        # 7. Grab gold if available and not already grabbed
        if pos == gold and not have_gold and obs["glitter"]:
            have_gold = True
            plan_actions.append("Grab")
            if verbose:
                print(f"Action=Grab at {pos}.")
            continue # Re-evaluate for next move after grabbing

        # 8. Check for exit condition
        if pos == exit_cell and have_gold:
            if verbose:
                print("Reached exit with gold. Done.")
            break

        # 9. Action Selection Logic
        valid_next_moves_candidates = []
        for nr, nc in neighbors4(pos[0], pos[1], n):
            next_pos = (nr, nc)
            # Calculate combined hazard probability for the candidate cell
            p_pit_at_next = pit_prob_grid[nr-1, nc-1]
            p_wumpus_at_next = wumpus_prob_grid[nr-1, nc-1]

            # More accurate combined probability P(Hazard) = 1 - P(~Pit and ~Wumpus)
            p_hazard_at_next = 1.0 - (1.0 - p_pit_at_next) * (1.0 - p_wumpus_at_next)

            if p_hazard_at_next <= risk_threshold:
                valid_next_moves_candidates.append((next_pos, p_hazard_at_next))

        if not valid_next_moves_candidates:
            stopped_reason = "No safe moves available."
            if verbose: print(stopped_reason)
            break

        best_next_step = None
        if have_gold: # Phase 2: Go to Exit
            # Find shortest path to exit through safe cells
            safe_cells_for_pathfinding = set()
            for r in range(n):
                for c in range(n):
                    cell = (r+1, c+1)
                    p_pit_at_cell = pit_prob_grid[r, c]
                    p_wumpus_at_cell = wumpus_prob_grid[r, c]
                    p_hazard_at_cell = 1.0 - (1.0 - p_pit_at_cell) * (1.0 - p_wumpus_at_cell)
                    if p_hazard_at_cell <= risk_threshold:
                        safe_cells_for_pathfinding.add(cell)

            path_to_exit = bfs_path_in_set(n, pos, exit_cell, safe_cells_for_pathfinding)
            if path_to_exit and len(path_to_exit) > 1:
                best_next_step = path_to_exit[1]
            else:
                stopped_reason = "No safe path to exit found while carrying gold."
                if verbose: print(stopped_reason)
                break

        else: # Phase 1: Explore to find Gold
            # Prioritize exploring unvisited, safe frontier cells
            # Find the best frontier cell to move towards
            best_frontier_path = None
            min_path_len = float('inf')
            min_hazard_prob = float('inf')

            # Collect all safe cells for pathfinding
            safe_cells_for_pathfinding = set()
            for r in range(n):
                for c in range(n):
                    cell = (r+1, c+1)
                    p_pit_at_cell = pit_prob_grid[r, c]
                    p_wumpus_at_cell = wumpus_prob_grid[r, c]
                    p_hazard_at_cell = 1.0 - (1.0 - p_pit_at_cell) * (1.0 - p_wumpus_at_cell)
                    if p_hazard_at_cell <= risk_threshold:
                        safe_cells_for_pathfinding.add(cell)

            # Consider neighbors of visited cells that are unvisited and safe
            current_frontier = frontier_neighbors(n, visited)
            potential_targets = []
            for (r_fr, c_fr) in current_frontier:
                p_pit_at_fr = pit_prob_grid[r_fr-1, c_fr-1]
                p_wumpus_at_fr = wumpus_prob_grid[r_fr-1, c_fr-1]
                p_hazard_at_fr = 1.0 - (1.0 - p_pit_at_fr) * (1.0 - p_wumpus_at_fr)

                if p_hazard_at_fr <= risk_threshold:
                    potential_targets.append(((r_fr, c_fr), p_hazard_at_fr))

            if gold not in visited and gold in safe_cells_for_pathfinding: # Prioritize gold if it's safe and not visited
                potential_targets.append((gold, 0.0)) # Add gold as a target with 0 hazard

            if not potential_targets:
                stopped_reason = "No safe frontier or gold to explore."
                if verbose: print(stopped_reason)
                break

            # Find the best target among potential targets
            best_path_to_target = None
            for target_cell, target_hazard_prob in potential_targets:
                path_to_target = bfs_path_in_set(n, pos, target_cell, safe_cells_for_pathfinding)
                if path_to_target:
                    current_path_len = len(path_to_target)

                    # Tie-breaking: shorter path, then lower hazard, then deterministic order
                    if best_path_to_target is None or \
                       current_path_len < min_path_len or \
                       (current_path_len == min_path_len and target_hazard_prob < min_hazard_prob) or \
                       (current_path_len == min_path_len and target_hazard_prob == min_hazard_prob and target_cell < best_path_to_target[-1]): # Lexicographical for tie-breaking coord

                        min_path_len = current_path_len
                        min_hazard_prob = target_hazard_prob
                        best_path_to_target = path_to_target

            if best_path_to_target and len(best_path_to_target) > 1:
                best_next_step = best_path_to_target[1]
            elif best_path_to_target and len(best_path_to_target) == 1: # Already at target
                best_next_step = best_path_to_target[0] # Stay put, or effectively a no-op until next cycle
            else:
                stopped_reason = "No reachable safe frontier or gold."
                if verbose: print(stopped_reason)
                break

        # If for some reason best_next_step is not set (e.g. at target, but logic didn't pick next move)
        if best_next_step is None:
             # This can happen if the agent is already at the gold/exit or no moves are better than staying.
             # For now, if no explicit next step is found, and not at a goal, we stop.
            stopped_reason = "No effective next step could be determined."
            if verbose: print(stopped_reason)
            break

        # 10. Execute chosen move
        act = action_from_step(pos, best_next_step)

        # If agent decides to stay put (e.g., if best_next_step was current pos)
        if act == "NoOp":
            plan_actions.append(act)
            # No change to pos, visited, or trace for NoOp
            if verbose:
                print(f"Step {steps}: Action={act} (at {pos})")
                pretty_print_percepts(world, agent_pos=pos)
                print(f"Percepts: Breeze={obs['breeze']}, Stench={obs['stench']}\n")
            continue

        plan_actions.append(act)
        pos = best_next_step
        visited.add(pos)
        trace.append(pos)

        if verbose:
            print(f"Step {steps}: Action={act} -> pos={pos}")
            # We don't print percepts again immediately; they will be printed at the start of the next step

    return {
        "plan": plan_actions,
        "trace": trace,
        "have_gold": have_gold,
        "success": (pos == exit_cell and have_gold),
        "stopped_reason": stopped_reason
    }

#### Experiment and Evaluate

In [19]:
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm # For progress bars in notebooks

def pretty_print_belief_grid(grid: np.ndarray, title: str):
    """Pretty prints a belief grid with rounded values."""
    print(f"\n{title}:")
    for r_idx in range(grid.shape[0]):
        row_str = " ".join([f"{val:.3f}" for val in grid[r_idx, :]])
        print(row_str)

In [20]:
world_configs = [
    # Small grid configurations
    {"n": 4, "num_pits": 1, "num_wumpus": 1, "description": "Small grid, low hazards"},
    {"n": 4, "num_pits": 2, "num_wumpus": 1, "description": "Small grid, moderate hazards"},

    # Medium grid configurations
    {"n": 6, "num_pits": 3, "num_wumpus": 1, "description": "Medium grid, moderate hazards"},
    {"n": 6, "num_pits": 5, "num_wumpus": 2, "description": "Medium grid, high hazards"},
    {"n": 7, "num_pits": 4, "num_wumpus": 1, "description": "Medium grid, moderate hazards (7x7)"},
    {"n": 7, "num_pits": 6, "num_wumpus": 2, "description": "Medium grid, high hazards (7x7)"}
]

num_trials_per_config = 15
all_results = []

for config in tqdm(world_configs, desc="Running Configurations"):
    n = config["n"]
    num_pits = config["num_pits"]
    num_wumpus = config["num_wumpus"]
    config_description = config["description"]

    for i in tqdm(range(num_trials_per_config), desc=f"  Trials for {n}x{n}, {num_pits}P, {num_wumpus}W"):
        current_seed = i + 1000 # Use a distinct seed for each trial
        # Generate world. Use a local random.Random instance for `generate_world_percept_friendly`
        # to ensure reproducibility per trial, while the outer loop iterates seeds.
        rng_world_gen = random.Random(current_seed)
        try:
            world_data = generate_world_percept_friendly(
                n=n, num_pits=num_pits, num_wumpus=num_wumpus,
                seed=rng_world_gen.randint(0, 10_000_000), # Generate a specific world seed
                require_start_clear=True,
                min_percept_hits_on_safe_path=1,
                max_tries=2000 # Increased tries for world generation
            )

            # Run the agent on the generated world
            agent_result = online_probabilistic_agent_noisy_sensors(
                world_data, verbose=False, max_steps=1000 # Increased max_steps for agent
            )

            all_results.append({
                "n": n,
                "num_pits": num_pits,
                "num_wumpus": num_wumpus,
                "config_description": config_description,
                "trial": i,
                "seed": current_seed,
                "success": agent_result["success"],
                "steps": len(agent_result["trace"]) - 1 if agent_result["trace"] else 0, # -1 because trace includes start
                "stopped_reason": agent_result["stopped_reason"]
            })
        except RuntimeError as e:
            all_results.append({
                "n": n,
                "num_pits": num_pits,
                "num_wumpus": num_wumpus,
                "config_description": config_description,
                "trial": i,
                "seed": current_seed,
                "success": False,
                "steps": None,
                "stopped_reason": f"World generation failed: {e}"
            })

results_df = pd.DataFrame(all_results)

print("\n--- Raw Results Sample ---")
print(results_df.head())


Running Configurations:   0%|          | 0/6 [00:00<?, ?it/s]

  Trials for 4x4, 1P, 1W:   0%|          | 0/15 [00:00<?, ?it/s]

  Trials for 4x4, 2P, 1W:   0%|          | 0/15 [00:00<?, ?it/s]

  Trials for 6x6, 3P, 1W:   0%|          | 0/15 [00:00<?, ?it/s]

  Trials for 6x6, 5P, 2W:   0%|          | 0/15 [00:00<?, ?it/s]

  Trials for 7x7, 4P, 1W:   0%|          | 0/15 [00:00<?, ?it/s]

  Trials for 7x7, 6P, 2W:   0%|          | 0/15 [00:00<?, ?it/s]


--- Raw Results Sample ---
   n  num_pits  num_wumpus       config_description  trial  seed  success  \
0  4         1           1  Small grid, low hazards      0  1000    False   
1  4         1           1  Small grid, low hazards      1  1001    False   
2  4         1           1  Small grid, low hazards      2  1002    False   
3  4         1           1  Small grid, low hazards      3  1003    False   
4  4         1           1  Small grid, low hazards      4  1004    False   

   steps                                   stopped_reason  
0      3             No safe frontier or gold to explore.  
1      5             No safe frontier or gold to explore.  
2      9             No safe frontier or gold to explore.  
3      3             No safe frontier or gold to explore.  
4      2  No safe path to exit found while carrying gold.  


#### Step-by-step Agent Execution (Verbose Mode)

In [21]:
# Define a small world configuration for verbose output
n_verbose = 4
num_pits_verbose = 1
num_wumpus_verbose = 1
seed_verbose = 42 # A fixed seed for reproducibility

print(f"Generating a {n_verbose}x{n_verbose} world with {num_pits_verbose} pit(s) and {num_wumpus_verbose} wumpus(es) (seed: {seed_verbose}).")

# Generate a world for verbose demonstration
world_verbose = generate_world_percept_friendly(
    n=n_verbose,
    num_pits=num_pits_verbose,
    num_wumpus=num_wumpus_verbose,
    seed=seed_verbose,
    require_start_clear=True,
    min_percept_hits_on_safe_path=1,
    max_tries=1000
)

print("Base world (instructor view):")
pretty_print_world(world_verbose)
print("\nInitial world with Breeze/Stench flags (glitter omitted):")
pretty_print_percepts(world_verbose, agent_pos=tuple(world_verbose["start"]))
print("\nRunning agent in verbose mode...")

# Run the agent in verbose mode
agent_verbose_result = online_probabilistic_agent_noisy_sensors(
    world_verbose, verbose=True, max_steps=100
)

print("\n--- Final Summary (Verbose Run) ---")
print("Success:", agent_verbose_result["success"])
print("Have Gold:", agent_verbose_result["have_gold"])
print("Total Steps:", len(agent_verbose_result["trace"]) - 1)
print("Stopped Reason:", agent_verbose_result["stopped_reason"])


Generating a 4x4 world with 1 pit(s) and 1 wumpus(es) (seed: 42).
Base world (instructor view):
4x4 Base World
 S  .  .  P
 G  .  .  .
 .  .  .  .
 .  W  .  X

Initial world with Breeze/Stench flags (glitter omitted):
4x4 World (B=Breeze, S=Stench; Agent=A)
    A     .  .(B)     .
    .     .     .  .(B)
    .  .(S)     .     .
 .(S)     .  .(S)     .

Running agent in verbose mode...

--- Step 1 ---
Current position: (1, 1)
Have gold: False
Percepts at (1, 1): Breeze=False, Stench=False, Glitter=False
Step 1: Action=MoveS -> pos=(2, 1)

--- Step 2 ---
Current position: (2, 1)
Have gold: False
Percepts at (2, 1): Breeze=False, Stench=False, Glitter=True
Action=Grab at (2, 1).

--- Step 3 ---
Current position: (2, 1)
Have gold: True
Percepts at (2, 1): Breeze=False, Stench=False, Glitter=True
No safe path to exit found while carrying gold.

--- Final Summary (Verbose Run) ---
Success: False
Have Gold: True
Total Steps: 1
Stopped Reason: No safe path to exit found while carrying gold.


In [22]:
print("\n--- Analysis Summary ---")
summary_results = results_df.groupby(["n", "num_pits", "num_wumpus", "config_description"]).agg(
    success_rate=('success', lambda x: (x == True).sum() / len(x)),
    avg_steps=('steps', lambda x: x[x.notnull() & (results_df.loc[x.index, 'success'] == True)].mean()),
    num_trials=('success', 'size')
).reset_index()

print(summary_results)



--- Analysis Summary ---
   n  num_pits  num_wumpus                   config_description  success_rate  \
0  4         1           1              Small grid, low hazards      0.000000   
1  4         2           1         Small grid, moderate hazards      0.000000   
2  6         3           1        Medium grid, moderate hazards      0.000000   
3  6         5           2            Medium grid, high hazards      0.000000   
4  7         4           1  Medium grid, moderate hazards (7x7)      0.000000   
5  7         6           2      Medium grid, high hazards (7x7)      0.066667   

   avg_steps  num_trials  
0        NaN          15  
1        NaN          15  
2        NaN          15  
3        NaN          15  
4        NaN          15  
5       56.0          15  
