Exercise 1. Suppose that the performance measure is concerned with just the first T 
   time steps of the environment and ignores everything thereafter. Show that 
   a rational agent’s action may depend not just on the state of the environment 
   but also on the time step it has reached.

 In cases where the performance measure considers a finite number of steps, the rational action depends not only on the current state but also on the time step. This dependency arises because the remaining number of steps alters the set of possible future rewards. The agent must account for the temporal proximity to the end of the decision horizon to act rationally, optimizing its strategy based on both its current state and how much time it has left to achieve its goals.


Exercise 2. Let us examine the rationality of various vacuum-cleaner agent functions.
1. Show that the simple vacuum-cleaner agent function described in Figure 2.3 is indeed rational under the assumptions listed on page 
2. Describe a rational agent function for the case in which each movement costs one point. Does the corresponding agent program require internal state?
The program must know the current state because it needs to know what is clean/if there is more to clean to optimize efficiency. A variable would be needed to measure the threshold of what should be cleaned. 
3. Discuss possible agent designs for the cases in which clean squares can become dirty and the geography of the environment is unknown. Does it make sense for the agent to learn from its experience in these cases? If so, what should it learn? If not, why not?
The agent would need a loop to continue to explore. You want to minimize the number of movements still so learning the number of iterations before dirt comes back would be helpful. Learning the environment allows for better efficiency. 

In [1]:
''' Implement a performance-measuring environment simulator for the vacuum-cleaner world 
depicted in Figure 2.8 and specified on page . Your implementation should be modular so 
that the sensors, actuators, and environment characteristics (size, shape, dirt placement, 
etc.) can be changed easily. (Note: for some choices of programming language and operating 
system there are already implementations in the online code repository.) 
give the agent a score, every movement has a cost 1, suck cost 1, NoOp cost 0, clean a dirty patch gives 10, 
starts with 5. Implement reflex agents and a simple model agent, score both agents for all 
possible states ''' 

class Environment: 
    def __init__(self, size, dirt_locations): 
        self.size = size 
        self.dirt = set(dirt_locations)

    def is_dirty(self, location): 
        return location in self.dirt 

    def remove_dirt(self, location): 
        if location in self.dirt: 
            self.dirt.remove(location)

    def add_dirt(self, location): 
        self.dirt.add(location)

class Agent: 
    def __init__(self, initial_score=5): 
        self.location = 0
        

hello


In [2]:
# simple agent code 
class Environment:
    def __init__(self, layout):
        self.layout = layout  # layout is a dictionary with positions as keys and dirt status as values

    def is_dirty(self, position):
        return self.layout.get(position, False)

    def clean(self, position):
        if self.is_dirty(position):
            self.layout[position] = False

    def add_dirt(self, position):
        self.layout[position] = True


class Agent:
    def __init__(self, environment):
        self.environment = environment
        self.position = 'A'  # initial position
        self.score = 5       # initial score

    def perceive(self):
        return self.environment.is_dirty(self.position)

    def move(self, target):
        self.position = target
        self.score -= 1

    def suck(self):
        if self.perceive():
            self.environment.clean(self.position)
            self.score += 9  # net gain of 10 for cleaning but -1 for the action

    def no_op(self):
        pass  # no operation, does nothing


class ReflexAgent(Agent):
    def choose_action(self):
        if self.perceive():
            self.suck()
        else:
            self.move('B' if self.position == 'A' else 'A')


class ModelBasedAgent(Agent):
    def __init__(self, environment):
        super().__init__(environment)
        self.model = {'A': False, 'B': False}  # Simplified model of the world

    def update_model(self):
        self.model[self.position] = self.perceive()

    def choose_action(self):
        self.update_model()
        if self.model[self.position]:
            self.suck()
        else:
            self.move('B' if self.position == 'A' else 'A')


class Simulation:
    def __init__(self, agent, steps=10):
        self.agent = agent
        self.steps = steps

    def run(self):
        for _ in range(self.steps):
            self.agent.choose_action()

        return self.agent.score


# Example usage:
environment = Environment({'A': True, 'B': False})
reflex_agent = ReflexAgent(environment)
model_agent = ModelBasedAgent(environment)

simulation_reflex = Simulation(reflex_agent, 5)
simulation_model = Simulation(model_agent, 5)

print("Reflex Agent Score:", simulation_reflex.run())
print("Model Based Agent Score:", simulation_model.run())


Reflex Agent Score: 10
Model Based Agent Score: 0


In [3]:
# ex 12 

class Environment:
    def __init__(self, dirt_a, dirt_b, initial_position):
        self.positions = {'A': dirt_a, 'B': dirt_b}
        self.agent_position = initial_position

    def is_dirty(self):
        return self.positions[self.agent_position]

    def clean(self):
        if self.is_dirty():
            self.positions[self.agent_position] = False

    def move(self):
        self.agent_position = 'B' if self.agent_position == 'A' else 'A'


class ReflexAgent:
    def __init__(self, environment):
        self.environment = environment
        self.score = 5

    def act(self):
        if self.environment.is_dirty():
            self.environment.clean()
            self.score += 9  # 10 for cleaning, -1 for action
        else:
            self.environment.move()
            self.score -= 1  # -1 for movement


def simulate_environment(dirt_a, dirt_b, initial_position):
    environment = Environment(dirt_a, dirt_b, initial_position)
    agent = ReflexAgent(environment)
    
    # Run until no more dirt is present or a maximum of 4 actions to avoid infinite loops
    for _ in range(4):
        agent.act()
        if not any(environment.positions.values()):
            break
    
    return agent.score


# Possible states are combinations of dirt at A and B and initial position
dirt_configurations = [(False, False), (False, True), (True, False), (True, True)]
initial_positions = ['A', 'B']
results = {}

for config in dirt_configurations:
    for position in initial_positions:
        score = simulate_environment(config[0], config[1], position)
        results[(config, position)] = score

# Calculate the average score
average_score = sum(results.values()) / len(results)

# Display results
for k, v in results.items():
    print(f"Config: Dirt at A={k[0][0]}, Dirt at B={k[0][1]}, Initial Position={k[1]} -> Score: {v}")

print("Average Score:", average_score)


Config: Dirt at A=False, Dirt at B=False, Initial Position=A -> Score: 4
Config: Dirt at A=False, Dirt at B=False, Initial Position=B -> Score: 4
Config: Dirt at A=False, Dirt at B=True, Initial Position=A -> Score: 13
Config: Dirt at A=False, Dirt at B=True, Initial Position=B -> Score: 14
Config: Dirt at A=True, Dirt at B=False, Initial Position=A -> Score: 14
Config: Dirt at A=True, Dirt at B=False, Initial Position=B -> Score: 13
Config: Dirt at A=True, Dirt at B=True, Initial Position=A -> Score: 22
Config: Dirt at A=True, Dirt at B=True, Initial Position=B -> Score: 22
Average Score: 13.25


In [5]:
import random

class GridEnvironment:
    def __init__(self, n, m):
        self.n = n + 2  # account for walls
        self.m = m + 2  # account for walls
        # Initialize grid with walls (-1) around the periphery and random dirt or clean spaces inside
        self.grid = [[-1 if i == 0 or i == n+1 or j == 0 or j == m+1 else random.choice([True, False]) 
                      for j in range(self.m)] for i in range(self.n)]
        # Set the agent's initial position to [1, 1] (the first cell inside the wall)
        self.agent_position = (1, 1)

    def is_dirty(self, position):
        x, y = position
        return self.grid[x][y] if self.grid[x][y] != -1 else False

    def clean(self, position):
        if self.is_dirty(position):
            self.grid[position[0]][position[1]] = False

    def display(self):
        for row in self.grid:
            print(' '.join(str(x) for x in row))
        print(f"Agent Position: {self.agent_position}")


class Agent:
    def __init__(self, environment):
        self.environment = environment
        self.score = 5

    def move(self, direction):
        x, y = self.environment.agent_position
        if direction == 'up':
            new_x = x - 1
        elif direction == 'down':
            new_x = x + 1
        else:
            new_x = x

        if direction == 'left':
            new_y = y - 1
        elif direction == 'right':
            new_y = y + 1
        else:
            new_y = y

        # Check for walls before moving
        if self.environment.grid[new_x][new_y] != -1:
            self.environment.agent_position = (new_x, new_y)
            self.score -= 1

    def suck(self):
        if self.environment.is_dirty(self.environment.agent_position):
            self.environment.clean(self.environment.agent_position)
            self.score += 9  # 10 for cleaning, -1 for the action

    def no_op(self):
        pass  # no operation, does nothing


class RandomAgent(Agent):
    def act(self):
        actions = ['up', 'down', 'left', 'right', 'suck', 'no_op']
        action = random.choice(actions)
        if action == 'suck':
            self.suck()
        elif action in ['up', 'down', 'left', 'right']:
            self.move(action)
        elif action == 'no_op':
            self.no_op()


class ReflexAgent(Agent):
    def act(self):
        if self.environment.is_dirty(self.environment.agent_position):
            self.suck()
        else:
            direction = random.choice(['up', 'down', 'left', 'right'])
            self.move(direction)


class ModelBasedAgent(Agent):
    def __init__(self, environment):
        super().__init__(environment)
        self.visited = set()
        self.visited.add(environment.agent_position)

    def act(self):
        if self.environment.is_dirty(self.environment.agent_position):
            self.suck()
        else:
            possible_moves = ['up', 'down', 'left', 'right']
            best_move = None
            x, y = self.environment.agent_position
            for move in possible_moves:
                new_pos = {
                    'up': (x-1, y),
                    'down': (x+1, y),
                    'left': (x, y-1),
                    'right': (x, y+1)
                }[move]
                if self.environment.grid[new_pos[0]][new_pos[1]] != -1 and new_pos not in self.visited:
                    best_move = move
                    break
            if best_move:
                self.move(best_move)
                self.visited.add(self.environment.agent_position)
            else:
                self.no_op()  # No unvisited positions and no dirt, do nothing


def run_simulation(agent_class, steps=100, n=5, m=5):
    environment = GridEnvironment(n, m)
    agent = agent_class(environment)
    for _ in range(steps):
        agent.act()
    environment.display()
    return agent.score

# Running the simulations
print("Random Agent Score:", run_simulation(RandomAgent))
print("Reflex Agent Score:", run_simulation(ReflexAgent))
print("Model-Based Agent Score:", run_simulation(ModelBasedAgent))


-1 -1 -1 -1 -1 -1 -1
-1 False True False True True -1
-1 False False False True False -1
-1 False False False False True -1
-1 False False False True False -1
-1 False True True True True -1
-1 -1 -1 -1 -1 -1 -1
Agent Position: (3, 3)
Random Agent Score: -6
-1 -1 -1 -1 -1 -1 -1
-1 False False False False False -1
-1 True False False False False -1
-1 False False False False False -1
-1 False False False False False -1
-1 True False False False False -1
-1 -1 -1 -1 -1 -1 -1
Agent Position: (3, 5)
Reflex Agent Score: 6
-1 -1 -1 -1 -1 -1 -1
-1 False False False False False -1
-1 False False False False False -1
-1 False False False False False -1
-1 False False False False False -1
-1 False False False False False -1
-1 -1 -1 -1 -1 -1 -1
Agent Position: (5, 5)
Model-Based Agent Score: 107


In [8]:
import random
import logging
import matplotlib.pyplot as plt
from statistics import mean

# Setup basic configuration for logging 
logging.basicConfig(filename='simulation_results.log', level=logging.INFO,
                    format='%(asctime)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')

class GridEnvironment:
    def __init__(self, n, m):
        self.n = n + 2
        self.m = m + 2
        self.grid = [[-1 if i == 0 or i == n + 1 or j == 0 or j == m + 1 else random.choice([True, False])
                      for j in range(self.m)] for i in range(self.n)]
        self.agent_position = (1, 1)

    def is_dirty(self, position):
        x, y = position
        return self.grid[x][y] if self.grid[x][y] != -1 else False

    def clean(self, position):
        if self.is_dirty(position):
            self.grid[position[0]][position[1]] = False


class Agent:
    def __init__(self, environment):
        self.environment = environment
        self.score = 5

    def move(self, direction):
        x, y = self.environment.agent_position
        if direction == 'up':
            new_x = x - 1
        elif direction == 'down':
            new_x = x + 1
        else:
            new_x = x

        if direction == 'left':
            new_y = y - 1
        elif direction == 'right':
            new_y = y + 1
        else:
            new_y = y

        if self.environment.grid[new_x][new_y] != -1:
            self.environment.agent_position = (new_x, new_y)
            self.score -= 1

    def suck(self):
        if self.environment.is_dirty(self.environment.agent_position):
            self.environment.clean(self.environment.agent_position)
            self.score += 9

    def no_op(self):
        pass


class RandomAgent(Agent):
    def act(self):
        actions = ['up', 'down', 'left', 'right', 'suck', 'no_op']
        action = random.choice(actions)
        if action == 'suck':
            self.suck()
        elif action in ['up', 'down', 'left', 'right']:
            self.move(action)
        elif action == 'no_op':
            self.no_op()


class ReflexAgent(Agent):
    def act(self):
        if self.environment.is_dirty(self.environment.agent_position):
            self.suck()
        else:
            direction = random.choice(['up', 'down', 'left', 'right'])
            self.move(direction)


class ModelBasedAgent(Agent):
    def __init__(self, environment):
        super().__init__(environment)
        self.visited = set()
        self.visited.add(environment.agent_position)

    def act(self):
        if self.environment.is_dirty(self.environment.agent_position):
            self.suck()
        else:
            possible_moves = ['up', 'down', 'left', 'right']
            best_move = None
            x, y = self.environment.agent_position
            for move in possible_moves:
                new_pos = {
                    'up': (x - 1, y),
                    'down': (x + 1, y),
                    'left': (x, y - 1),
                    'right': (x, y + 1)
                }[move]
                if self.environment.grid[new_pos[0]][new_pos[1]] != -1 and new_pos not in self.visited:
                    best_move = move
                    break
            if best_move:
                self.move(best_move)
                self.visited.add(self.environment.agent_position)
            else:
                self.no_op()


def run_simulation(agent_class, steps=100, n=5, m=5):
    scores = []
    for _ in range(100):  # run 100 simulations for each agent
        environment = GridEnvironment(n, m)
        agent = agent_class(environment)
        for __ in range(steps):
            agent.act()
        scores.append(agent.score)
    average_score = mean(scores)
    logging.info(f"{agent_class.__name__} - Average Score: {average_score}")
    return scores, average_score

# Running simulations for each agent
agents = [RandomAgent, ReflexAgent, ModelBasedAgent]
results = {}
for agent in agents:
    scores, avg_score = run_simulation(agent)
    results[agent.__name__] = scores
    print(f"{agent.__name__} - Average Score: {avg_score}")

# Plotting the scores using box plots
fig, ax = plt.subplots()
score_lists = [results[agent.__name__] for agent in agents]
ax.boxplot(score_lists, vert=True, patch_artist=True, labels=[agent.__name__ for agent in agents])
ax.set_title('Box Plot of Scores for Each Agent')
ax.set_xlabel('Agent Type')
ax.set_ylabel('Scores')
plt.grid(True)
plt.show()


ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.