**This project is due Wednesday, October 22, 2025 at 11:59 pm. Please plan ahead and submit your work on time.**

<center><h1>Grad Project #1</h1></center>

In this first project for 16.413 / 6.4132, we are going to look at the search and rescue domain using symbolic representations. We are going to look at both planning using PDDL, and also inference using PDDL. This project isn't going to focus on implementing specific algorithms however -- we're going to use two libraries: `pyperplan` and `sympy`. (You'll remember them from problem sets 3 and 4). 

The project is structured as follows: 

0. [Credit for Contributors (required)](#contributors)
1. [Planning in Search and Rescue (15 points)](#planning_in_sar)
    1. [Search and Rescue Warmup 1 (5 points)](#warmup_1)
    2. [Search and Rescue Warmup 2 (5 points)](#warmup_2)
    3. [Search and Rescue PDDL Planner (5 points)](#sar_pddl)
2. [Inference from Observations (20 points)](#inference)
    1. [Inferring unknown values (10 points)](#infer_unknowns)
    2. [Belief update (10 points)](#belief_update)
3. [Putting It Together (35 points)](#integrated)
    1. [Safe but not so smart (7 points)](#safe_no_smart)
    2. [Safe and smart (7 points)](#safe_smart)
    3. [Reckless (7 points)](#reckless)
    4. [Safe and smart if possible, else reckless (7 points)](#safe_smart_reckless)
    5. [Looks before it leaps (7 points)](#look_before)
4. [Analysis (30 points)](#analysis)
5. [Feedback](#feedback)

## <a id="contributors"></a>0. Credit for Contributors

List the various students, lecture notes, or online resouces that helped you complete this project:

Ex: I worked with Bob on the inference.

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

## Imports and Utilities

In [None]:
import time
import numpy as np
from sympy import Symbol, And, Or, satisfiable
from utils import *

%load_ext autoreload
%autoreload 2
from principles_of_autonomy.grader import Grader
from principles_of_autonomy.notebook_tests.proj_1 import TestProj1

In [None]:
def test_policy(belief_map, true_map, problem, policy):
    """Test a policy on a SearchAndRescue problem.

    Args:
        belief_map: A numpy array specifying the belief map
        true_map:   A numpy array specifying the state map
        problem:    A SearchAndRescueProblem instance
        policy:     A policy returned by a policy making fn.
                    e.g. make_planner_policy(problem, planner)
    """
    height, width = true_map.shape
    bottom, right = height - 1, width - 1
    robot = (0, right)
    hospital = (bottom, right)
    people = {'pp': (bottom, right - 1)}  # Peter Parker
    carrying = None
    # Environment state
    env_state = State(robot=robot,
                      hospital=hospital,
                      people=people,
                      carrying=carrying,
                      state_map=true_map)
    # Initial belief: omniscient
    b0 = BeliefState(robot=robot,
                     hospital=hospital,
                     people=people,
                     carrying=carrying,
                     state_map=belief_map)
    # Do it
    return agent_loop(problem, env_state, policy, b0)

## <a id="planning_in_sar"></a> 1. Planning in Search and Rescue (15 points)

Let's say that we have a "search and rescue" robot who is charged with navigating a sometimes dangerous grid to find and help people in need. We will restrict ourselves to a deterministic problem domain; we will however look at both fully observed and partially observed cases.

Let us first focus on a single planning problem in the search-and-rescue domain, illustrated below:

![search and rescue problem](sar_problem.png)

This problem features four "people" (bears) with names p1, p2, p3, and p4. A robot, initialized in the top left corner, should navigate to each person, pick them up one by one, and deliver them to the hospital (bottom right).

* We always know the locations of the people, the robot, and the hospital. 
* Some locations may have walls in them.  **You know about all the walls in advance, as well.**
* There is also fire!  And smoke!  But, initially, you may not know which locations have fire and/or smoke. Therefore, you need to move around in the domain, make observations, update your belief (e.g., infer whether there may be fire or smoke in the cells you haven't observed yet), make plans, and rescue bears.
* Whenever the robot enters a grid cell, it observes the true environment state of all the neighboring cells.  Each environment cell contains exactly one of:  wall ('W'), fire ('F'), smoke ('S') or nothing (clear, 'C').
* The robot may safely enter any cell that is clear ('C') or contains smoke ('S').

We will first consider planning to navigate to, pick up, and drop off people at a hospital, and we are going to ask you to put this all together into a planning and execution system!

Take a look at the code.  `State` is a class with the following attributes:

* "state_map": a 2D numpy array of characters 'W', 'F', 'S', 'C'.
* "robot": A (row, col) representing the robot's loc.
* "hospital": A (row, col) representing the hospital's loc.
* "carrying": The str name of a person being carried, or None, if no person is being carried.
* "people": A dict mapping str people names to (row, col) locs. If a person is being carried, they do not appear in this dict.

States have a couple of useful methods: `render` returns a string representation of the state (which can be printed) and `copy` does what you would expect. `get_safe_grid` returns a boolean numpy array where True represents a safe space (not fire or wall).

Actions are strings:
* "up" / "down" / "left" / "right" : Moves the robot. The robot cannot move into obstacles or off the map.
* "pickup-{person}": If the robot is at the person, and if the robot is not already carrying someone, pick up this person.
* "dropoff": If the robot is carrying a person, they are dropped off at the robot's current location.  *Allow for there being multiple dropoff locations, even though we will only dropoff at hospitals in this example.*

Please now take a moment to read the docstring for `SearchAndRescueProblem` to make sure that you understand the state and action spaces.

Finally, we're going to use a Python PDDL planner called `pyperplan` [(more info here)](https://github.com/aibasel/pyperplan) to find our plans. 

Let's familiarize ourselve with State and SearchAndRescueProblem.

### <a id="warmup_1"></a> 1A. Search and Rescue Warmup 1 (5 points)

Let's make sure we can access fields of the SearchAndRescue State. Please write a function to check if a row and column have an obstacle in a SearchAndRescue State.

For reference, our solution is **1** line(s) of code.

In [None]:
def sar_warmup1(sar_state: State, row: int, col: int) -> bool:
    """Check if a row and col have an obstacle in a SearchAndRescueProblem
    state.

    Args:
      sar_state: A SearchAndRescue State.
      row: An int.
      col: An int.

    Returns:
      has_obstacle: True if (row, col) has an obstacle(wall) in sar_state.
    """
    raise NotImplementedError() 

In [None]:
# Test 1
Grader.run_single_test_inline(TestProj1, "test_01_warmup_1", locals())

### <a id="warmup_2"></a> 1B. Search and Rescue Warmup 2 (5 points)

Let's make sure we know how to encode a plan. Please write a function that returns a hard-coded list of actions that will deliver person 'p1' (in the image above) to the hospital location. (You'll need to work out the plan for yourself -- don't use `pyperplan` yet!)

For reference, our solution is **1** line(s) of code.

In [None]:
def sar_warmup2() -> list[str]:
    """Hand-code a list of actions that will deliver person 'p1' to the
    hospital location.

    Returns:
      actions: A list of str actions that will take person p1 to the hospital loccation.
    """
    raise NotImplementedError()

In [None]:
# Test 2
Grader.run_single_test_inline(TestProj1, "test_02_warmup_2", locals())

### <a id="sar_pddl"></a> 1C. Search and Rescue PDDL Planner (5 points)


Now that you're warmed up, let's try making a planner to solve a SearchAndRescueProblem!

The core function in this planner class is 'get_plan'. This function needs to do the following:
1. Create PDDL domain and problem strings for search and rescue. The operators should work for any grid size, obstacles, people locations, and hospital location.
2. Invoke `run_planning` using the given `search_algo` search algorithm with the `heuristic` heuristic.
3. Convert the output of `run_planning` (pyperplan Operators) into actions that can be executed, via `execute_plan`.

We have given you most of the structure of the needed functions, but you will need to look (carefully!) through the provided python to find the `TODO` sections that you need to complete. 

For reference, 'get_plan' takes ~1-2 seconds to run with our implementation if using 'gbf' search and 'hff' heuristic. To get credit on Gradescope, make sure that your function finishes in <10 seconds.

**Notes**:
* In this problem, you will need to construct somewhat complicated strings.  We *strongly* encourage you to read about [Python-3 f-strings](https://www.digitalocean.com/community/tutorials/how-to-use-f-strings-to-create-strings-in-python-3) which make this process much easier than the alternatives.
* You may find `state.render()` useful for debugging.
* We also highly recommend printing out the domain and problem after they have been created, and copying them into [editor.planning.domains](http://editor.planning.domains) to check whether it's possible to find a plan. This editor can be helpful for syntax checking.
* We also recommend writing careful test cases for yourself --- it's really easy to forget preconditions or effects. When we were debugging this, we forgot to make sure that the robot was at the location of the person that it picked up, so the plans were (confusingly) super short! 
* The image above with the robot and the bears is a faithful depiction of the initial state. For example, the initial locations of the people are: `"p1": (4, 0), "p2": (6, 0), "p3": (0, 6), "p4": (3, 3)`.
* One part of this problem that may be initially counterintuitive is the way that we'll represent locations in PDDL. In the problem, a location is a tuple of integers. PDDL does not support such representations -- everything needs to be just an object with a string name.
So to represent a location like (3, 5), we will make a string `"l3-5"` (where the first character there is a lowercase L), and we'll create an object with that name, of type "location". We will also need a way to encode the fact that the robot can only move between adjacent locations in the grid.
In Python, we can compare the numeric values of locations like (3, 5) and (3, 6) to see if they are neighbors. But in PDDL, all we have are the objects with string names, and we need to encode everything in terms of predicates. So, we will create a predicate `(conn ?v0 - location ?v1 - location ?v2 - direction)`, which says that location `?v0` is connected to location `?v1` in direction `?v2`. For example, `(conn l3-5 l3-6 right)` might appear in the initial state. We can then use these `conn` predicates in the preconditions of a `move` operator to encode the fact that the robot can only move between adjacent locations.
* We do not recommend modelling the hospital explicitly with special objects / types / predicates. Instead, the goal should be to deliver all people to the hospital, that is, `l6-6`.
In words, the goal should be `"person1 is at l6-6 and person2 is at l6-6 and person3 is at l6-6 and person4 is at l6-6"`.

In [None]:
class SearchAndRescuePlanner:
    """A planner for a search and rescue problem.

    The core function in this class is 'get_plan'
    This function does the following:
        1. Create PDDL domain and problem strings for search and rescue. The operators should work for any grid size, obstacles, people locations, and hospital location.
        2. Invoke `run_planning` using the given `search_algo` search algorithm with the `heuristic` heuristic.
        3. Convert the output of run_planning (pyperplan Operators) into actions
           that can be given to the SearchAndRescueProblem.

    Example Usage:
        problem = SearchAndRescueProblem()
        state = State()

        planner = SearchAndRescuePlanner(search_algo='astar', heuristic='lmcut')
        plan, plan_time = planner.get_plan(state)
        state = execute_plan(problem, plan, state)

    'get_plan' Returns:
        plan: A list of actions; each action is a str, see SearchAndRescueProblem.
        plan_time: Total planning time(sec) used for plan searching.

    For reference, 'get_plan' takes ~1-2 seconds to run with our implementation if using 'gbf' search and 'lmcut' heuristic.
    """

    def __init__(self, search_algo='astar', heuristic='lmcut'):
        self.search_algo = search_algo
        self.heuristic = heuristic

    def generate_domain_pddl(self,
                             domain_name,
                             added_operators='',
                             added_predicates=''):
        # <<< TODO: fill in missing parts in the PDDL domain below >>>
        predicates_str = """(conn ?v0 - location ?v1 - location ?v2 - direction)
        (is-clear ?v0 - location)
        ; TODO: write more here
        ; 
        """
        
        # <<< TODO: fill in missing parts in the PDDL domain below >>>
        operators_str = """(:action move-robot
    :parameters (?from - location ?to - location ?dir - direction)
    :precondition (and
      (conn ?from ?to ?dir)
      ; TODO: write more here
      ; 
    )
    :effect (and
      ; TODO: write more here
      ; 
    )
  )
  (:action pickup-person
    :parameters (?person - person ?loc - location)
    :precondition (and
      ; TODO: write more here
      ; 
    )
    :effect (and
      ; TODO: write more here
      ; 
    )
  )
  (:action dropoff-person
    :parameters (?person - person ?loc - location)
    :precondition (and
      ; TODO: write more here
      ; 
    )
    :effect (and
      ; TODO: write more here
      ; 
    )
  )"""

        domain_pddl = f"""(define (domain {domain_name})
    (:requirements :typing)
    (:types person location direction)
    (:constants
      down - direction
      left - direction
      right - direction
      up - direction
    )
    (:predicates
      {predicates_str}
      {added_predicates}
    )
    {operators_str}
    {added_operators}
)"""
        return domain_pddl

    def get_plan(self, state):
        search_algo, heuristic = self.search_algo, self.heuristic
        domain_name, added_predicate, added_operator = self.update_pddl_domain()
        domain_pddl = self.generate_domain_pddl(
            domain_name,
            added_operators=added_operator,
            added_predicates=added_predicate)
        # Create objects str
        obj_str = self.get_obj_strs(state)

        # Create init str
        init_str = self.get_init_strs(state)

        # Create goal str
        goal_str = self.get_goal_strs(state)

        problem_pddl = f"""(define (problem searchandrescue) (:domain {domain_name})
      (:objects
      {obj_str}
      )
      (:init
      {init_str}
      )
      (:goal (and {goal_str}))
    )"""

        start_time = time.time()
        plan = run_planning(domain_pddl, problem_pddl, search_algo, heuristic)
        time_elapsed = time.time() - start_time
        if plan is None:
            print("Failed to find a plan.")
            return None, time_elapsed

        # Convert operators to actions
        actions = self.parse_plan(plan)
        return actions, time_elapsed

    def get_obj_strs(self, state):
        height, width = state.state_map.shape
        objects_strs = [f"{person} - person" for person in state.people]
        # <<< TODO: add object strs for locations >>>
        raise NotImplementedError()
    
        if state.carrying is not None:
            objects_strs.append(f"{state.carrying} - person")
        objects_str = " ".join(objects_strs)
        return objects_str

    def get_init_strs(self, state):
        height, width = state.state_map.shape
        robot_r, robot_c = state.robot
        init_strs = [f"(robot-at l{robot_r}-{robot_c})"]
        for person, (r, c) in state.people.items():
            init_strs.append(f"(person-at {person} l{r}-{c})")
        if state.carrying is not None:
            init_strs.append(f"(carrying {state.carrying})")
        else:
            init_strs.append("(handsfree)")
            
        deltas = {
            "up": (-1, 0),
            "down": (1, 0),
            "left": (0, -1),
            "right": (0, 1),
        }
        
        safe_grid = state.get_safe_grid()
        for r in range(height):
            for c in range(width):
                # Here we're going to add one (conn ...) atom for every pair
                # of adjacent locations.
                for direction, (dr, dc) in deltas.items():
                    if not (0 <= r + dr < height and 0 <= c + dc < width):
                        continue
                    # For example, if r == 0, c == 0, dr == 0, dc == 1, then
                    # this line adds the atom (conn l0-0 l0-1 right).
                    init_strs.append(
                        f"(conn l{r}-{c} l{r + dr}-{c + dc} {direction})")
                # <<< TODO: add more init strs >>>
                raise NotImplementedError()

        init_str = " ".join(init_strs)
        return init_str

    def get_goal_strs(self, state):
        goal_strs = []
        hospital_r, hospital_c = state.hospital
        # <<< TODO: add goal strs >>>
        raise NotImplementedError()
        

        if state.carrying is not None:
            # <<< TODO: add goal strs >>>
            raise NotImplementedError()
            pass
        goal_str = " ".join(goal_strs)
        return goal_str

    def update_pddl_domain(self):
        domain_name = 'searchandrescue'
        added_predicate = ''
        added_operator = ''
        return domain_name, added_predicate, added_operator

    def parse_plan(self, plan):
        actions = []
        for op in plan:
            if "move-robot" in op.name:
                _, direction = op.name[:-1].rsplit(" ", 1)
                action = direction
            elif "pickup-person" in op.name:
                _, person, _ = op.name.split(" ")
                action = f"pickup-{person}"
            else:
                assert "dropoff-person" in op.name
                action = "dropoff"
            actions.append(action)
        return actions

Visualize to make sure your planner does what you expect

In [None]:
sar_pddl_problem = SearchAndRescueProblem()
sar_pddl_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")
sar_pddl_state = State()
sar_pddl_plan, plan_time = sar_pddl_planner.get_plan(sar_pddl_state)
result = execute_plan(problem=sar_pddl_problem, state=sar_pddl_state, plan=sar_pddl_plan)

In [None]:
# Test 3
Grader.run_single_test_inline(TestProj1, "test_03_sar_pddl", locals())

## <a id="inference"></a> 2. Inference from Observations (20 points)


The real world is **partially observable**, where the robot can only see part of the world at a time. To handle this, we introduce the idea of a **belief**. A belief is simply the robot’s current knowledge about the world. It may include some unknowns, and as the robot moves and makes observations, the belief gets updated. In this project, a belief will look like the grid maps you’ve seen already, except with `"U"` entries for unknown cells. As the robot observes cells, `"U"` entries will be replaced with `"C"`, `"F"`, `"S"`, or `"W"`. We will also use logical rules (inference) to fill in what must be true.

Now, let's look at the inference problem.  This is similar to the problem we saw in PSet 3. We will consider several problems with varying grid sizes and different sets of observations. For example, consider the grid below:
```
# Fire, Unknown, Clear, Smoke, Wall
GRID0 = np.array([
  ["F", "U", "C"],
  ["W", "C", "U"],
  ["U", "U", "C"]
], dtype=object)
```
This grid has 9 locations and 5 observations: there is fire in the top left, wall below it, and the center, top right, and bottom right locations are all known to be clear of smoke or fire.

We will assume the following axioms:
1. Each location has exactly one of {smoke, fire, clear, wall}.

2. There is smoke at a location only if there is a fire in at least one of the adjacent (above, below, left, right) locations. Diagonals are not adjacent!

3. There is smoke _or_ fire at a location if there is a fire in at least one of the adjacent locations, unless it's known to be 'W'.

Take a moment to run your human inference engine: which unknown values in the grid above can be determined?


### <a id="infer_unknowns"></a> 2A. Inferring unknown values (10 points)

Please write a program that takes a grid as input and infers unknown values.

Your program should output a new grid with all determinable unknown values replaced with the inferred value. If an unknown value cannot be determined, it should be left unknown.

**Your program should use sympy.**

For reference, our solution is **63** line(s) of code.

In [None]:
def infer_unknown_values(grid):
    """Fill in any unknown values in the grid that can be inferred.

    Args: grid: A list of lists of "F", "U", "S", "W", or "C".
    Returns:
      inferred_grid: A copy of grid with some unknown values replaced.

    Example:
      >> grid = [
      >>   ["F", "U", "C"],
      >>   ["W", "C", "U"],
      >>   ["U", "U", "C"]
      >> ]
      >> infer_unknown_values(grid)
      >> [["F" "S" "C"]
      >>  ["W" "C" "C"]
      >>  ["U" "U" "C"]]
    """
    raise NotImplementedError() 

In [None]:
# Note: we're providing these unit tests to help you. 
# The grading tests are not just these tests.

assert infer_unknown_values([["U", "F"]]) == [["U", "F"]]
assert infer_unknown_values([["F", "U", "C"], ["S", "C", "U"], ["U", "U", "C"]]) == [["F", "S", "C"], ["S", "C", "C"], ["U", "U", "C"]]
assert infer_unknown_values([["U", "C", "C"], ["S", "C", "U"], ["U", "U", "C"]]) == [["C", "C", "C"], ["S", "C", "C"], ["F", "S", "C"]]
assert infer_unknown_values([["U", "S", "C", "U"], ["U", "U", "C", "U"], ["U", "S", "C", "U"]]) == [["F", "S", "C", "C"], ["U", "U", "C", "C"], ["F", "S", "C", "C"]]
assert infer_unknown_values([["U", "U", "C", "U", "U", "U", "U", "U"], ["C", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "C", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "F", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"]]) == [["C", "C", "C", "U", "U", "U", "U", "U"], ["C", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "U", "U", "U", "U", "U", "C", "C"], ["U", "C", "U", "U", "U", "U", "U", "U"], ["U", "U", "U", "F", "U", "U", "U", "U"], ["U", "U", "U", "U", "U", "U", "U", "U"]]
assert infer_unknown_values([["C", "U", "C", "U", "U", "C", "U"], ["U", "W", "W", "U", "C", "W", "W"], ["U", "F", "U", "U", "U", "F", "U"], ["C", "S", "W", "C", "U", "U", "U"], ["U", "U", "W", "U", "W", "U", "U"], ["C", "C", "U", "C", "U", "W", "U"], ["U", "W", "C", "U", "W", "U", "C"]]) == [["C", "C", "C", "C", "C", "C", "C"], ["C", "W", "W", "C", "C", "W", "W"], ["S", "F", "U", "U", "S", "F", "U"], ["C", "S", "W", "C", "U", "U", "U"], ["C", "C", "W", "C", "W", "U", "U"], ["C", "C", "C", "C", "C", "W", "U"], ["C", "W", "C", "C", "W", "C", "C"]]
assert infer_unknown_values([["C", "U", "C", "U", "U", "C", "U"], ["U", "W", "W", "U", "C", "W", "W"], ["U", "F", "U", "U", "U", "F", "U"], ["C", "S", "W", "C", "U", "F", "U"], ["U", "U", "W", "U", "W", "U", "U"], ["C", "C", "U", "C", "U", "W", "F"], ["U", "W", "C", "U", "W", "U", "U"]]) == [["C", "C", "C", "C", "C", "C", "C"], ["C", "W", "W", "C", "C", "W", "W"], ["S", "F", "U", "U", "S", "F", "U"], ["C", "S", "W", "C", "S", "F", "U"], ["C", "C", "W", "C", "W", "U", "U"], ["C", "C", "C", "C", "C", "W", "F"], ["C", "W", "C", "C", "W", "U", "U"]]
print('Tests passed.')

In [None]:
# Test 4
Grader.run_single_test_inline(TestProj1, "test_04_infer_unknown", locals())

### <a id="belief_update"></a> 2B. Belief update (10 points)

When the robot acts in the environment, its knowledge must change in three ways:
1. **Transition step** – move the robot’s position forward based on the chosen action
   (e.g., if the action is `down`, the robot moves one square down).
2. **Observation step** – incorporate any new sensor information about nearby cells.
3. **Inference step** – apply logical rules to replace `"U"` with `"C"`, `"F"`, or `"S"`
   wherever those values can be deduced.

This process is called a **belief update**. It is nothing more than keep track of what the robot now knows after acting and looking around.

Now let us use our ability to infer unknown values to finish the implementation of the update method for BeliefState.

Make sure to look at the utilities defined at the top of this notebook, although you may not need to use all of them.

For reference, our solution is **12** line(s) of code.

In [None]:
class BeliefState(State):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        if "state_map" not in kwargs:
            self.state_map = np.array([['U', 'U', 'U', 'U', 'U', 'U', 'U'],
                                       ['U', 'W', 'W', 'U', 'U', 'W', 'W'],
                                       ['U', 'U', 'U', 'U', 'U', 'U', 'U'],
                                       ['U', 'U', 'W', 'U', 'U', 'U', 'U'],
                                       ['U', 'U', 'W', 'U', 'W', 'U', 'U'],
                                       ['U', 'U', 'U', 'U', 'U', 'W', 'U'],
                                       ['U', 'W', 'U', 'U', 'W', 'U', 'U']],
                                      dtype=str)

    def update(self, problem, obs, action=None):
        """
        problem: SearchAndRescueProblem instance
        obs: {loc: entry, loc: entry,...}
        act: string or None

        # <<< TODO: >>>
            1. Do transition from action (if any)
            2. Update from observation
            3. Do inference
        """
        raise NotImplementedError()

    def get_optimistic_state(self):
        """Returns a copy of the belief with a completed map in which Unknowns
        are assumed to be Clear."""
        new_state = self.copy()
        new_state.state_map[self.state_map == 'U'] = 'C'
        return new_state

    def get_careful_state(self):
        """Returns a copy of the belief.

        Unknown states will not be treated as safe, see get_safe_grid.
        """
        return self.copy()

### Tests

In [None]:
def beliefupdate_test1():
    state_map = np.array([["C", "S", "C", "C", "C"], ["S", "F", "S", "C", "C"],
                          ["S", "F", "S", "S", "S"], ["S", "F", "F", "F", "F"],
                          ["C", "S", "S", "S", "S"], ["C", "C", "C", "C", "C"]])
    beliefstate_map = np.array([["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"]])
    problem = SearchAndRescueProblem()
    state = State(state_map=state_map)
    bel = BeliefState(state_map=beliefstate_map)
    observation = problem.get_observation(state)
    new_bel = bel.update(problem, observation)
    assert new_bel.robot == (0, 0)
    assert new_bel.state_map.tolist() == [['C', 'S', 'U', 'U', 'U'],
                                          ['S', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U']]
beliefupdate_test1()


def beliefupdate_test2():
    state_map = np.array([["C", "S", "C", "C", "C"], ["S", "F", "S", "C", "C"],
                          ["S", "F", "S", "S", "S"], ["S", "F", "F", "F", "F"],
                          ["C", "S", "S", "S", "S"], ["C", "C", "C", "C", "C"]])
    beliefstate_map = np.array([["U", "U", "U", "U", "U"],
                                ["S", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"],
                                ["U", "U", "U", "U", "U"]])
    problem = SearchAndRescueProblem()
    state = State(state_map=state_map)
    bel = BeliefState(state_map=beliefstate_map)

    new_state, _ = problem.get_next_state(state, 'down')
    observation = problem.get_observation(new_state)
    new_bel = bel.update(problem, observation, 'down')
    assert new_bel.robot == (1, 0)
    assert new_bel.state_map.tolist() == [['C', 'S', 'U', 'U', 'U'],
                                          ['S', 'F', 'U', 'U', 'U'],
                                          ['S', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U'],
                                          ['U', 'U', 'U', 'U', 'U']]
beliefupdate_test2()

In [None]:
# Test 5
Grader.run_single_test_inline(TestProj1, "test_05_belief_update", locals())

## <a id="integrated"></a> 3. Putting It Together (35 points)

It's time to bring the planning and the inference together! Look at the procedure `agent_loop` that takes as input:

* "initial_state": (a true state of the environment, which includes a state map, as well as the current locations of the hospital, the robot and the people and whether the robot is carrying someone.
* "initial_belief" (an initial belief state, which is like an environment state, except that the state map may include characters 'U' indicating that the contents of a location is unknown)
* "policy" : a procedure that takes in a belief state and returns the next action to take;  the action can be one of the original actions or '*Success*' (which means the goal has been achieved and all the people are at the hospital) or '*Failure*' (which means that the agent is certain that the goal is impossible to achieve.)  
* "max_steps" : just a total number of steps to run the simulation to avoid infinite loops

We are going to ask you to write five different policies for this domain.

1. Safe but not so smart
1. Safe and smart
1. Reckless
1. Safe and smart if possible, else reckless
1. Looks before it leaps

### <a id="safe_no_smart"></a> 3A. Safe but not so smart (7 points)

The agent is scared and just running directly to the hospital. However, it at least takes observations into account as it moves.

Let's make an agent that:

* Updates the belief state based on every observation using propositional inference (implement the `belief.update` method that gets called in `agent_loop`).

* Executes a policy that, given the currently updated belief, checks to see whether it can move safely into a square that is closer (in Manhattan distance) to the hospital.  If so, it returns the action that would move it closer.  If it reaches the hospital, it returns `*Success*`.  If it cannot safely make a move (due to walls or fire) closer to the hospital, it waves its arms in anguish and returns `*Failure*`.

For reference, our solution is **18** line(s) of code.


In [None]:
def make_greedy_policy(problem):
    def policy(belief):
        """Returns an action or '*Failure*"""
        raise NotImplementedError() 

    # return the policy function
    return policy

In a completely clear map, the robot should be able to make it to the hospital.

In [None]:
def greedy_in_empty_map():
    problem = SearchAndRescueProblem()
    policy = make_greedy_policy(problem)

    # Empty map
    state = State()
    state.state_map[:, :] = 'C'
    bel = BeliefState()
    bel.state_map[:, :] = 'C'

    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
    print('Final robot location', final_state.robot)
    return s_or_f, final_state, final_bel

greedy_in_empty_result = greedy_in_empty_map()

Using our default map, the robot should at least make it a bit closer to the hospital.

In [None]:
def greedy_in_default():
    problem = SearchAndRescueProblem()

    # Use default map
    state = State()
    bel = BeliefState()

    policy = make_greedy_policy(problem)
    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
    r, c = final_state.robot
    hr, hc = final_state.hospital
    distance = abs(hr - r) + abs(hc - c)
    print('Final robot location', final_state.robot)
    print('Final distance (should be < initial distance) =', distance)
    return distance 

greedy_in_default_distance = greedy_in_default()

In [None]:
# Test 6
Grader.run_single_test_inline(TestProj1, "test_06_safe_not_smart", locals())

### <a id="safe_smart"></a> 3B. Safe and smart (7 points)

Let's try and make an agent that is both safe and a little smarter (although probably still a bit too conservative). 

We will continue to run the belief update method that you implemented above for the rest of these cases, so we only need to worry about the policy.  

* The first time the policy is called to generate an action, it should formulate, in PDDL, a planning problem to find a complete plan for moving people to the hospital that only traverses squares that are *known* not to have fire or walls (i.e., unknown cells are not traversable either), and returns the first step.

* On subsequent calls to the policy, it should just return the next step of the plan.

Just as we did in Section 1, we want you to use `pyperplan`.

For reference, our solution is **15** line(s) of code.

In [None]:
def make_planner_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

Our safe-and-smart agent should be able to successfully rescue all 4 people and deliver them to the hospital.

In [None]:
def sar_policy_test():
    problem = SearchAndRescueProblem()
    base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

    def safe_smart_planner(state):
        plan, time = base_planner.get_plan(state)
        return plan

    policy = make_planner_policy(problem, safe_smart_planner)
    state = State()
    # Observable
    bel = BeliefState(state_map=state.state_map)
    return agent_loop(problem, state, policy, bel)
sar_policy_results = sar_policy_test()

In [None]:
# Test 7
Grader.run_single_test_inline(TestProj1, "test_07_safe_smart", locals())

### <a id="reckless"></a> 3C. Reckless (7 points)

For this and the remaining questions, we're not going to autograde your solutions, and we're not providing test cases. We've given you boxes to hold your implementation, but you will need to define your own tests cases, and in section 4, we'll ask you to do some analysis. 

First, let's try to make our agent more aggressive but still not step into the fire!   As we saw in 3.1, even if we make a plan that might cause us to move into the fire, just before we are about to take an action, we can know for sure whether there is fire in the square we are about to move into.   So, let's make an (almost) reckless replanning agent.

* The first time the policy is called to generate an action, it should formulate in PDDL a planning problem to find a very optimistic  plan for moving people to the hospital that only traverses squares that are **not known to have** fire or walls (i.e., unknowns are fair game), and returns the first step.

* On subsequent calls to the policy, if the next step in the plan is safe to execute given the updated belief, it should return that action.  Otherwise, it should make a new plan!

In [None]:
def make_reckless_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

Use the code below to run your policy and provide an execution trace. 

In [None]:
def reckless_policy_test():
    problem = SearchAndRescueProblem()
    base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

    def reckless_planner(state):
        plan, time = base_planner.get_plan(state)
        return plan

    policy = make_reckless_policy(problem, reckless_planner)
    state = State()
    # Observable
    bel = BeliefState(state_map=state.state_map)
    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
reckless_policy_test() 

### <a id="safe_smart_reckless"></a> 3D. Safe and smart if possible, else reckless (7 points)

Try to construct a new policy that is a useful combination of "safe and smart" and "reckless" strategies, which combines the best aspects of each.

In [None]:
def make_hybrid_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

Use the code below to run your policy and provide an execution trace. 

In [None]:
def hybrid_policy_test():
    problem = SearchAndRescueProblem()
    base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

    def hybrid_planner(state):
        plan, time = base_planner.get_plan(state)
        return plan

    policy = make_hybrid_policy(problem, hybrid_planner)
    state = State()
    # Observable
    bel = BeliefState(state_map=state.state_map)
    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
hybrid_policy_test()

### <a id="look_before"></a> 3E. Looks before it leaps (7 points)

Another way to approach this problem is to plan in belief space.  That sounds fancy, but actually can be relatively simple to do:

* In your PDDL formulation, instead of just having a fluent `(is-clear ?loc)`, we'll have two fluents:  `(is-clear ?loc)` and `(is-unknown ?loc)`.

* The precondition for moving into a square should still be `(is-clear ?loc)`.

* You can add an operator that explicitly "looks" at a neighboring square;  we'll assume that it's optimistic and so if you look at a square that was previously unknown, it is now known to be clear.

* Note that you can define a [subclass](https://www.geeksforgeeks.org/python/create-a-python-subclass/) of `SearchAndRescuePlanner` and redefine the `update_pddl_domain` method to add to the previous PDDL domain definition.  You'll also need to change the `parse_plan` method to handle the new action and the `get_init_strs` method to handle the new facts.

* Make a replanning policy that plans in this belief-space formulation and executes its plan as long as it's safe, and replans otherwise.

In [None]:
def make_belief_space_policy(problem, planner):
    # Keep memory of plan and which step we're on
    status = {'plan': None, 'step': None}

    def policy(belief):
        """Returns an action string or '*Failure*' or '*Success*'."""
        raise NotImplementedError() 
    
    # return the policy function
    return policy

Use the code below to run your policy and provide an execution trace. 

In [None]:
def belief_space_policy_test():
    problem = SearchAndRescueProblem()
    base_planner = SearchAndRescuePlanner(search_algo="gbf", heuristic="hff")

    def belief_space_planner(state):
        plan, time = base_planner.get_plan(state)
        return plan

    policy = make_belief_space_policy(problem, belief_space_planner)
    state = State()
    # Observable
    bel = BeliefState(state_map=state.state_map)
    s_or_f, final_state, final_bel = agent_loop(problem, state, policy, bel)
belief_space_policy_test()

## <a id="analysis"></a> 4. Analysis (30 points)

In the following questions, please provide textual answers in the provided boxes. 

**First-order Logic:** In what way would having access to first-order logic have been helpful in this problem?

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

**Heuristics:** The heuristics used in `pyperplan` are *domain-independent*; we can use them for Search and Rescue, for blocks world, etc.  An alternative strategy would be to hand-specify a *domain-specific* heuristic. What would a good domain-specific heuristic for search and rescue look like?

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

**Look first:** We do one observation before choosing our first action.  Give an example scenario where omitting this step and using the reckless or two-phase planner would make an error.

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

**Replanning:**  Currently, for the policies in 3.3-3.4, we replan whenever executing the next step would be unsafe. That might not be the best replanning strategy. Describe another strategy, give a concrete example of where it would do something differently than the current strategy,
and say what the general trade-offs would be between that one and the current one.

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

**Planner Analysis:** Using the `test_policy` function, run your policies in the following three scenarios, specifically run:
* The safe-and-smart planner
* The reckless planner
* The belief-space (looks before it leaps) planner
    with the scenarios defined as:
    * belief_map = P1_B0, true_map = P1_G0
    * belief_map = P1_B1, true_map = P1_G0
    * belief_map = P2_B1, true_map = P2_G0

(These belief maps and true maps are all defined in the utility code at the top of this notebook.) 

In your answer below, don't count "look" as a step.
* How many steps does each policy take to solve each problem?
* Which method takes the fewest steps summed over all three problems?
* Which one would you choose if planning is very expensive compared to execution?
* Which one would you choose if execution is very expensive compared to planning? (In this case, what additional modifications might you make to your method)?

<div class="alert alert-info">
Write your answer in the cell below this one.
</div>

--> *(double click on this cell to delete this text and type your answer here)*

# Submission 

Your final submission to gradescope should include this notebook, with all cells run to display your example runs and your text answers.

In [None]:
# Run all tests
Grader.grade_output([TestProj1], [locals()], "results.json")
Grader.print_test_results("results.json")

In [None]:
# Package submission
# Make sure you save the notebook before running this cell so that the most updated version is zipped!
Grader.prepare_submission("Project01_release")

## Feedback <a id="feedback"></a>

If you have any feedback for us, please complete [this form](https://forms.gle/auNaHZ9sJyKcGJ4s9)!