# AI Agents

In this assignment, your task is to implement ... The algorithms are a recap from your earlier Algorithms and Data Structures class, but the input is slightly different: the graph is not materialized. Instead, it is generated dynamically and you can access a single node of the graph at a time.

## Agent

We begin by considering a generic problem-solving agent. The agent receives *percepts* from its *environment* through *sensors* and uses *actuators* to perform *actions* to influence the environment. The process of converting the percepts to the actions (denoted by the question mark in the picture below) is the core function of the agent.

![](aima-images/fig2_1.png)

This picture is very general and covers a broad range of agents, both software (e.g., a chatbot) and hardware (e.g., a robot vacuum cleaner). For now, we will simplify. We will ignore sensors and actuators and concentrate only on the core. We will implement it according to the following contract represented as the class `Agent` (see cell below) with a single method: `do_step`. The sole argument of the method are the percepts of the agent and it should return an action to be executed next.

In [1]:
class Agent:
    def next_action(self, percepts):
        ...
        return action

## Problem

The environment above represents a real environment, meaning that any action in it will be visible to the user. Usually, it makes it unsuitable for planning what action to take. For example, if one plans to go on a car trip from Poznań to Berlin, one does not start driving blindly trying to find a route to Berlin. Instead, one takes a map and search for it on a model of the environment. We will combine this model with the definition of the problem to be solved, i.e., a starting position and a destination.

This is represented by the class `Problem` below. It has a single property `initial_state` that returns the starting position in the problem, e.g., the city of Poznań in the example about the trip. There are also three methods:
* `available_actions` returns, for a given `state` (e.g., a currently considered city), what actions are to be considered (e.g., what are the neighbouring cities one could go to directly from the currently considered city)
* `do_action` returns new state reached from the state `state` by executing the action `action` (e.g., the state representing that begin in Poznań and taking the highway to Warsaw, one ends up in Warsaw)
* `is_goal` which returns `True` if `state` is (one of) the destination(s) for the agent in this particular problem.

In [2]:
class Problem:
    @property
    def initial_state(self):
        ...
        
    def available_actions(self, state):
        ...        
        
    def do_action(self, state, action):
        ...
        return new_state
    
    def is_goal(self, state) -> bool:
        ...

To ground the problem definition in something more concrete consider a small flat of two dirty rooms and a robotic vacuum cleaner, as depicted in the following picture:
![](aima-images/fig2_2.png)

In the model, there are two rooms (A, B), each of them is either dirty or clean and the robot can always go to the left or to the right (maybe bumping into a wall), or suck the dirt (even if there's no dirt).

This is a model. Maybe the real environment is similar: two very small, rectangular rooms such that a single "suck" of the robot cleans the whole room at a time. Maybe it is very dissimilary: there are more rooms and they're of varying shapes, and actually cleaning them is a tedious task. The agent doesn't know and must thus assume that the `Problem` represents the environment well enough.

Assume the initial state looks like the picture above, and there are two goal states: both rooms clean and robot in either room A or room B. Lets formalize all this into the class `VacuumProblem`

In [3]:
class VacuumProblem(Problem):
    @property
    def initial_state(self):
        return (0, (True, True))
    
    def available_actions(self, state):
        return ["Left", "Suck", "Right"]
        
    def do_action(self, state, action):
        robot, dirty = state
        if action == "Left":
            return (max(robot-1, 0), dirty)
        elif action == "Suck":
            new_dirty = list(dirty)
            new_dirty[robot] = False
            return (robot, tuple(new_dirty))
        elif action == "Right":
            return (min(robot+1, len(dirty)-1), dirty)        
        raise Exception('Invalid action')
    
    def is_goal(self, state) -> bool:
        return not any(state[1])

Lets test it a bit:

In [4]:
problem = VacuumProblem()
state = problem.initial_state
state = problem.do_action(state, 'Right')

print("State after going right from the initial state", state)
print("Are we there yet?", problem.is_goal(state))

State after going right from the initial state (1, (True, True))
Are we there yet? False


In [5]:
state = problem.do_action(state, 'Right')

print("State after going right again", state)
print("Are we there yet?", problem.is_goal(state))

State after going right again (1, (True, True))
Are we there yet? False


In [6]:
state = problem.do_action(state, 'Suck')

print("State after sucking", state)
print("Are we there yet?", problem.is_goal(state))

State after sucking (1, (True, False))
Are we there yet? False


In [7]:
state = problem.do_action(state, 'Left')
state = problem.do_action(state, 'Left')
state = problem.do_action(state, 'Suck')

print("State after going to the left twice and sucking", state)
print("Are we there yet?", problem.is_goal(state))

State after going to the left twice and sucking (0, (False, False))
Are we there yet? True


## Oracular agent

Consider the following oracular agent. It is initialized with a problem and has an embedded oracle, which provides it with a sequence of actions to be executed to reach a goal in the problem.

In [8]:
class OracularAgent(Agent):
    def __init__(self, problem):
        self.problem = problem
        self.plan = self.oracle()
        
    def next_action(self, percepts):
        return self.plan.pop(0)
    
    def oracle(self):
        return ['Suck', 'Right', 'Suck']

Now it is the time to test this agent. We create an instance of the `VacuumProblem` and then ask the agent for `next_action`s until goal is not reach.

In [9]:
problem = VacuumProblem()
agent = OracularAgent(problem)

state = problem.initial_state
while not problem.is_goal(state):
    action = agent.next_action(None)
    state = problem.do_action(state, action)
    
print("Reached state:", state)
print("Is it goal?", problem.is_goal(state))

Reached state: (1, (False, False))
Is it goal? True


Observe that the agent works remarkably well and is able to reach a goal state with no problem whatsoever. It is fast, the code is short. There is only one obvious drawback: the solution is hard-coded and completely unsuitable not only to other types of problems, but also for other variants of the vacuum problem, e.g., such that the robot must return to the room A. We must therefore look for a more general solution: a searching agent.

## Task 1: Implement a breadth-first searching agent

Complete the method `bfs` in the class `BFSAgent` in the next cell. It should implement breadth-first search, an algorithm you should be familar with after the *Algorithms and data structures* classes. The main difference is that the graph is not explicitly specified not available as a whole. Instead, you must look on the `Problem` as a graph generator: a state corresponds to a node in the graph and each action available in the state corresponds to an arc from the current state to the state reached by executing this action. The following picture should help you to understand the concept.

![](aima-images/fig3_3.png)

In [10]:
from collections import deque

In [11]:
class BFSAgent(Agent):
    def __init__(self, problem):
        self.problem = problem
        self.plan = self.bfs()
        
    def next_action(self, percepts):
        return self.plan.pop(0)
    
    def bfs(self):
        queue_of_paths = deque()
        current_path = ()
        current_state = self.problem.initial_state
        visited_states= {current_state}
        
        queue_of_paths.append( (current_state, current_path) )
        
        #in this version to save memory BFS doesn't expand nodes which visited SOMEWHERE in the graph
        
        while not self.problem.is_goal(current_state) and queue_of_paths:
            
            for move in self.problem.available_actions(current_state):
                new_state = self.problem.do_action(current_state, move)
                if new_state not in visited_states:
                    #print("wow you haven't been here")
                    new_path = list(current_path)
                    new_path.append(move)
                    visited_states.add(new_state)
                    visited_states.add(new_state)
                        
                    queue_of_paths.append( (new_state, tuple(new_path)) )
                    
            current_state, current_path = queue_of_paths.popleft()
            
        return list(current_path)

Lets test the solution in the same framework as the oracular agent

In [12]:
problem = VacuumProblem()
agent = BFSAgent(problem)

print("Plan to be executed", agent.plan)

state = problem.initial_state
while not problem.is_goal(state):
    action = agent.next_action(None)
    state = problem.do_action(state, action)
    
print("Reached state:", state)
print("Is it goal?", problem.is_goal(state))

Plan to be executed ['Suck', 'Right', 'Suck']
Reached state: (1, (False, False))
Is it goal? True


## Task 2: Implementing depth-first searching agent

Complete the method `dfs` in the class `BFSAgent` in the next cell. This time the task is to implement depth-first search (DFS). Be careful: DFS is vulnerable to infinite loops. Implement some sort of protection against them.

In [30]:
class DFSAgent(Agent):
    def __init__(self, problem):
        self.problem = problem
        self.plan = self.dfs()
        
    def next_action(self, percepts):
        return self.plan.pop(0)
    
    def dfs(self): 
        list_of_occured_states = [self.problem.initial_state]
        visited_states = {list_of_occured_states[-1]}
        path = []
        
        self.dfs_recurrence(list_of_occured_states, path, visited_states)
        return path
    
    def dfs_recurrence(self, list_of_occured_states, path, visited_states):
        for move in self.problem.available_actions(list_of_occured_states[-1]):
            
            if self.problem.is_goal(list_of_occured_states[-1]) or len(path) > 100:
#                 When there wasn't path length limit the kernel starts raising the recursion error
                return
            
            new_state = self.problem.do_action(list_of_occured_states[-1], move)
            
#             print("\ncurrent path: ", path,
#                   "\ncurrent move: ", move,
#                   "\ncurrent state: ", new_state)

            if new_state not in visited_states:
                path.append(move)
                list_of_occured_states.append(new_state)
                visited_states.add(new_state)
                
                self.dfs_recurrence(list_of_occured_states, path, visited_states)
                
                if not self.problem.is_goal(list_of_occured_states[-1]):
                    path.pop()
                    list_of_occured_states.pop()

In [14]:
problem = VacuumProblem()
agent = DFSAgent(problem)

print("Plan to be executed", agent.plan)

state = problem.initial_state
while not problem.is_goal(state):
    action = agent.next_action(None)
    state = problem.do_action(state, action)
    
print("Reached state:", state)
print("Is it goal?", problem.is_goal(state))

Plan to be executed ['Suck', 'Right', 'Suck']
Reached state: (1, (False, False))
Is it goal? True


## Task 3: Test it on 8-puzzle

The 8-puzzle, an instance of which is shown in the figure, consists of a 3×3 board with
eight numbered tiles and a blank space. A tile adjacent to the blank space can slide into the space. The object is to reach a specified goal state, such as the one shown on the right of the figure. The standard formulation is as follows:

* *States*: A state description specifies the location of each of the eight tiles and the blank in one of the nine squares.
* *Initial state*: Any state can be designated as the initial state. Note that any given goal can be reached from exactly half of the possible initial states.
* *Actions*: The simplest formulation defines the actions as movements of the blank space Left, Right, Up, or Down. Different subsets of these are possible depending on where the blank is.
* *Transition model*: Given a state and action, this returns the resulting state; for example, if we apply Left to the start state in the figure, the resulting state has the 5 and the blank switched.
* *Goal test*: This checks whether the state matches the goal configuration shown in the figure. (Other goal configurations are possible.)

![](aima-images/fig3_4.png)

Your task is to implement `PuzzleProblem` as a subclass of the class `Problem` to provide a formal problem description of the 8-puzzle problem described above. Then test your `DFSAgent` and `BFSAgent` on this new problem. 

It's fine if your agents are incapable of solving the problem for the presented start state and goal due to the excessive length of the necessary plan. If this the case, simplify the problem by changing the start state and/or the goal. The idea of this task is to show that your implementations are correct and the agents are capable - in principle - of solving this problem, not waiting for hours on end for computations to complete.

In [15]:
class PuzzleProblem(Problem):
    @property
    def initial_state(self):
        """1st component of state:
        - the (row, col) of where the blank tile (the zero tile) is
        2nd component:
        -3x3 matrix representing the board and the arrangement of tiles"""
        return ((1,1),((7,2,4),(5,0,6),(8,3,1)))
        
    def available_actions(self, state):
        position, board = state
        row_of_blank, col_of_blank = position
        actions = []
        
        if row_of_blank - 1 >= 0:
            actions.append("Up")
        if row_of_blank + 1 <= 2:
            actions.append("Down")
        if col_of_blank - 1 >= 0:
            actions.append("Left")
        if col_of_blank + 1 <= 2:
            actions.append("Right")

        return actions
        
    def do_action(self, state, action):
        blank_coords, matrix_of_board = state
#         print(blank_coords, action, state)
        table_of_action_encoding = {'Left':(0, -1),
                                 'Right':(0, 1),
                                 'Up':(-1,0),
                                 'Down':(1,0)}
        action_encoding = table_of_action_encoding[action]
        
        new_blank_coords = tuple([blank_coords[i] + action_encoding[i] for i in range(len(blank_coords))])
#         print(new_blank_coords, " new blank")
        mutable_board = [list(matrix_of_board[i]) for i in range(len(matrix_of_board))]
    
        value_at_new_blank_position = matrix_of_board[new_blank_coords[0]][new_blank_coords[1]]
        
        mutable_board[new_blank_coords[0]][new_blank_coords[1]] = 0
        
        mutable_board[blank_coords[0]][blank_coords[1]] = value_at_new_blank_position
        
        modified_board = tuple([tuple(mutable_board[i]) for i in range(len(mutable_board))])
        
        new_state = (new_blank_coords, modified_board)
        return new_state
    
    def is_goal(self, state) -> bool:
        return ((0,0),((0,1,2),(3,4,5),(6,7,8))) == state

In [21]:
problem = PuzzleProblem()
# print(problem.initial_state[0][])
# %timeit agent = BFSAgent(problem)
agent = BFSAgent(problem)

print("Plan lenght: ", len(agent.plan))

print("Plan to be executed", agent.plan)

state = problem.initial_state

while not problem.is_goal(state) and agent.plan:
    action = agent.next_action(None)
    state = problem.do_action(state, action)
    
print("Reached state:", state)
print("Is it goal?", problem.is_goal(state))

Plan lenght:  26
Plan to be executed ['Left', 'Up', 'Right', 'Down', 'Down', 'Left', 'Up', 'Right', 'Right', 'Up', 'Left', 'Left', 'Down', 'Right', 'Right', 'Down', 'Left', 'Up', 'Right', 'Up', 'Left', 'Down', 'Down', 'Left', 'Up', 'Up']
Reached state: ((0, 0), ((0, 1, 2), (3, 4, 5), (6, 7, 8)))
Is it goal? True


In [32]:
problem = PuzzleProblem()
%timeit agent = DFSAgent(problem)
agent = DFSAgent(problem)

print("Plan lenght: ", len(agent.plan))

print("Plan to be executed", agent.plan)

state = problem.initial_state
while not problem.is_goal(state) and agent.plan:
    action = agent.next_action(None)
    state = problem.do_action(state, action)
    
print("Reached state:", state)
print("Is it goal?", problem.is_goal(state))

233 ms ± 6.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Plan lenght:  96
Plan to be executed ['Up', 'Left', 'Down', 'Down', 'Right', 'Up', 'Up', 'Left', 'Down', 'Down', 'Right', 'Up', 'Up', 'Left', 'Down', 'Down', 'Right', 'Up', 'Up', 'Left', 'Down', 'Down', 'Right', 'Up', 'Up', 'Left', 'Down', 'Down', 'Right', 'Right', 'Up', 'Up', 'Left', 'Down', 'Down', 'Left', 'Up', 'Up', 'Right', 'Down', 'Down', 'Left', 'Up', 'Up', 'Right', 'Down', 'Down', 'Left', 'Up', 'Up', 'Right', 'Down', 'Down', 'Left', 'Up', 'Up', 'Right', 'Down', 'Down', 'Left', 'Up', 'Right', 'Up', 'Left', 'Down', 'Down', 'Right', 'Up', 'Up', 'Left', 'Down', 'Down', 'Right', 'Up', 'Left', 'Up', 'Right', 'Down', 'Right', 'Down', 'Left', 'Up', 'Left', 'Down', 'Right', 'Up', 'Right', 'Down', 'Left', 'Left', 'Up', 'Right', 'Down', 'Left', 'Up', 'Up']
Reached state: ((0, 0), ((0, 1, 2), (3, 4, 5), (6, 7, 8)))
Is it goal? True


------------
The pictures and the description of 8-puzzle are from "Artificial Intelligence: A Modern Approach" 3rd ed.