Exercise 1. Suppose that the performance measure is concerned with just the first T 
   time steps of the environment and ignores everything thereafter. Show that 
   a rational agent’s action may depend not just on the state of the environment 
   but also on the time step it has reached.

 In cases where the performance measure considers a finite number of steps, the rational action depends not only on the current state but also on the time step. This dependency arises because the remaining number of steps alters the set of possible future rewards. The agent must account for the temporal proximity to the end of the decision horizon to act rationally, optimizing its strategy based on both its current state and how much time it has left to achieve its goals.


Exercise 2. Let us examine the rationality of various vacuum-cleaner agent functions.
1. Show that the simple vacuum-cleaner agent function described in Figure 2.3 is indeed rational under the assumptions listed on page 
2. Describe a rational agent function for the case in which each movement costs one point. Does the corresponding agent program require internal state?
The program must know the current state because it needs to know what is clean/if there is more to clean to optimize efficiency. A variable would be needed to measure the threshold of what should be cleaned. 
3. Discuss possible agent designs for the cases in which clean squares can become dirty and the geography of the environment is unknown. Does it make sense for the agent to learn from its experience in these cases? If so, what should it learn? If not, why not?
The agent would need a loop to continue to explore. You want to minimize the number of movements still so learning the number of iterations before dirt comes back would be helpful. Learning the environment allows for better efficiency. 

In [1]:
''' Implement a performance-measuring environment simulator for the vacuum-cleaner world 
depicted in Figure 2.8 and specified on page . Your implementation should be modular so 
that the sensors, actuators, and environment characteristics (size, shape, dirt placement, 
etc.) can be changed easily. (Note: for some choices of programming language and operating 
system there are already implementations in the online code repository.) 
give the agent a score, every movement has a cost 1, suck cost 1, NoOp cost 0, clean a dirty patch gives 10, 
starts with 5. Implement reflex agents and a simple model agent, score both agents for all 
possible states ''' 

class Environment: 
    def __init__(self, size, dirt_locations): 
        self.size = size 
        self.dirt = set(dirt_locations)

    def is_dirty(self, location): 
        return location in self.dirt 

    def remove_dirt(self, location): 
        if location in self.dirt: 
            self.dirt.remove(location)

    def add_dirt(self, location): 
        self.dirt.add(location)

class Agent: 
    def __init__(self, initial_score=5): 
        self.location = 0
        

hello


In [2]:
# simple agent code 
class Environment:
    def __init__(self, layout):
        self.layout = layout  # layout is a dictionary with positions as keys and dirt status as values

    def is_dirty(self, position):
        return self.layout.get(position, False)

    def clean(self, position):
        if self.is_dirty(position):
            self.layout[position] = False

    def add_dirt(self, position):
        self.layout[position] = True


class Agent:
    def __init__(self, environment):
        self.environment = environment
        self.position = 'A'  # initial position
        self.score = 5       # initial score

    def perceive(self):
        return self.environment.is_dirty(self.position)

    def move(self, target):
        self.position = target
        self.score -= 1

    def suck(self):
        if self.perceive():
            self.environment.clean(self.position)
            self.score += 9  # net gain of 10 for cleaning but -1 for the action

    def no_op(self):
        pass  # no operation, does nothing


class ReflexAgent(Agent):
    def choose_action(self):
        if self.perceive():
            self.suck()
        else:
            self.move('B' if self.position == 'A' else 'A')


class ModelBasedAgent(Agent):
    def __init__(self, environment):
        super().__init__(environment)
        self.model = {'A': False, 'B': False}  # Simplified model of the world

    def update_model(self):
        self.model[self.position] = self.perceive()

    def choose_action(self):
        self.update_model()
        if self.model[self.position]:
            self.suck()
        else:
            self.move('B' if self.position == 'A' else 'A')


class Simulation:
    def __init__(self, agent, steps=10):
        self.agent = agent
        self.steps = steps

    def run(self):
        for _ in range(self.steps):
            self.agent.choose_action()

        return self.agent.score


# Example usage:
environment = Environment({'A': True, 'B': False})
reflex_agent = ReflexAgent(environment)
model_agent = ModelBasedAgent(environment)

simulation_reflex = Simulation(reflex_agent, 5)
simulation_model = Simulation(model_agent, 5)

print("Reflex Agent Score:", simulation_reflex.run())
print("Model Based Agent Score:", simulation_model.run())


Reflex Agent Score: 10
Model Based Agent Score: 0


In [3]:
# ex 12 

class Environment:
    def __init__(self, dirt_a, dirt_b, initial_position):
        self.positions = {'A': dirt_a, 'B': dirt_b}
        self.agent_position = initial_position

    def is_dirty(self):
        return self.positions[self.agent_position]

    def clean(self):
        if self.is_dirty():
            self.positions[self.agent_position] = False

    def move(self):
        self.agent_position = 'B' if self.agent_position == 'A' else 'A'


class ReflexAgent:
    def __init__(self, environment):
        self.environment = environment
        self.score = 5

    def act(self):
        if self.environment.is_dirty():
            self.environment.clean()
            self.score += 9  # 10 for cleaning, -1 for action
        else:
            self.environment.move()
            self.score -= 1  # -1 for movement


def simulate_environment(dirt_a, dirt_b, initial_position):
    environment = Environment(dirt_a, dirt_b, initial_position)
    agent = ReflexAgent(environment)
    
    # Run until no more dirt is present or a maximum of 4 actions to avoid infinite loops
    for _ in range(4):
        agent.act()
        if not any(environment.positions.values()):
            break
    
    return agent.score


# Possible states are combinations of dirt at A and B and initial position
dirt_configurations = [(False, False), (False, True), (True, False), (True, True)]
initial_positions = ['A', 'B']
results = {}

for config in dirt_configurations:
    for position in initial_positions:
        score = simulate_environment(config[0], config[1], position)
        results[(config, position)] = score

# Calculate the average score
average_score = sum(results.values()) / len(results)

# Display results
for k, v in results.items():
    print(f"Config: Dirt at A={k[0][0]}, Dirt at B={k[0][1]}, Initial Position={k[1]} -> Score: {v}")

print("Average Score:", average_score)


Config: Dirt at A=False, Dirt at B=False, Initial Position=A -> Score: 4
Config: Dirt at A=False, Dirt at B=False, Initial Position=B -> Score: 4
Config: Dirt at A=False, Dirt at B=True, Initial Position=A -> Score: 13
Config: Dirt at A=False, Dirt at B=True, Initial Position=B -> Score: 14
Config: Dirt at A=True, Dirt at B=False, Initial Position=A -> Score: 14
Config: Dirt at A=True, Dirt at B=False, Initial Position=B -> Score: 13
Config: Dirt at A=True, Dirt at B=True, Initial Position=A -> Score: 22
Config: Dirt at A=True, Dirt at B=True, Initial Position=B -> Score: 22
Average Score: 13.25
