[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/OlivierGeorgeon/Developmental-AI-Lab/blob/master/docs/agent2.ipynb)

# THE AGENT WHO THRIVED ON GOOD VIBES

# Learning objectives

Upon completing this lab, you will be able to implement agents driven by a type of intrinsic motivation called 'interactional motivation.' This refers to the drive to engage in sensorimotor interactions that have a positive valence while avoiding those that have a negative valence.

# Setup
## Define the Agent class

In [13]:
class Agent:
    def __init__(self, _valences):
        """ Creating our agent """
        self._valences = _valences
        self._action = None
        self._predicted_outcome = None

    def action(self, _outcome):
        """ tracing the previous cycle """
        if self._action is not None:
            print(f"Action: {self._action}, Prediction: {self._predicted_outcome}, Outcome: {_outcome}, " 
                  f"Prediction: {self._predicted_outcome == _outcome}, Valence: {self._valences[self._action][_outcome]}")

        """ Computing the next action to enact """
        # TODO: Implement the agent's decision mechanism
        self._action = 0
        # TODO: Implement the agent's anticipation mechanism
        self._predicted_outcome = 0
        return self._action


## Environment1 class

In [14]:
class Environment1:
    """ In Environment 1, action 0 yields outcome 0, action 1 yields outcome 1 """
    def outcome(self, _action):
        # return int(input("entre 0 1 ou 2"))
        if _action == 0:
            return 0
        else:
            return 1

## Environment2 class

In [15]:
class Environment2:
    """ In Environment 2, action 0 yields outcome 1, action 1 yields outcome 0 """
    def outcome(self, _action):
        if _action == 0:
            return 1
        else:
            return 0

## Define the valence of interactions

In [16]:
valences = [[-1, 1], 
            [1, -1]]

The valence table specifies the valence of each interaction. An interaction is a tuple (action, outcome):

|| outcome 0 | outcome 1 |
|---|---|---|
| action 0 | -1 | 1 |
| action 1 | 1 | -1 |

## Instantiate the agent

In [17]:
a = Agent(valences)

## Instantiate the environment 

In [18]:
e = Environment1()

## Test run the simulation

In [19]:
outcome = 0
for i in range(10):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1


Observe that, on each interaction cycle, the agent is mildly satisfied. On one hand, the agent made correct predictions, on the other hand, it experienced negative valence.

# PRELIMINARY EXERCISE

Execute the agent in Environment2. Observed that it obtains a positive valence. 

Modify the valence table to give a positive valence when the agent selects action `0` and obtains outcome `0`.
Observe that this agent obtains a positive valence in Environment1. 

# ASSIGNMENT

Implement Agent2 that selects actions that, it predicts, will result in an interaction that have a positive valence. 

Only when the agent gets bored does it select an action which it predicts to result in an interaction that have a negative valence. 

In the trace, you should see that the agent learns to obtain a positive valence during several interaction cycles.
When the agent gest bored, it occasionnaly selects an action that may result in a negative valence. 

## Create Agent2 by overriding the class Agent

In [20]:
class Agent2(Agent):
    def __init__(self, _valences):
        """Creating our hedonist agent"""
        super().__init__(_valences)
        # Memory: stores the last observed outcome for each action
        self.memory = {}
        # Counter for consecutive correct predictions
        self.correct_count = 0
        # Boredom threshold
        self.boredom_threshold = 4
        
    def action(self, _outcome):
        """Tracing the previous cycle"""
        if self._action is not None:
            # Update memory with the observed outcome
            self.memory[self._action] = _outcome
            
            # Check if prediction was correct
            satisfied = (self._predicted_outcome == _outcome)
            
            # Update correct prediction counter
            if satisfied:
                self.correct_count += 1
            else:
                self.correct_count = 0
            
            # Calculate valence
            valence = self._valences[self._action][_outcome]
            
            # Check for boredom
            bored = (self.correct_count >= self.boredom_threshold)
            
            print(f"Action: {self._action}, Prediction: {self._predicted_outcome}, "
                  f"Outcome: {_outcome}, Prediction: {satisfied}, Valence: {valence}, Bored: {bored}")
        
        """Computing the next action to enact"""
        # TODO: Implement the agent's decision mechanism
        # ✅ IMPLÉMENTATION DU MÉCANISME DE DÉCISION BASÉ SUR LA VALENCE
        if self.correct_count >= self.boredom_threshold:
            # Bored: try a different action
            self._action = 1 - self._action
            self.correct_count = 0
        else:
            # Not bored: choose the action with the best anticipated valence
            best_action = None
            best_valence = -float('inf')
            
            for action in [0, 1]:
                if action in self.memory:
                    predicted_outcome = self.memory[action]
                    predicted_valence = self._valences[action][predicted_outcome]
                    
                    if predicted_valence > best_valence:
                        best_valence = predicted_valence
                        best_action = action
            
            if best_action is not None:
                self._action = best_action
            elif self._action is None:
                self._action = 0
        
        # TODO: Implement the agent's anticipation mechanism
        # ✅ IMPLÉMENTATION DU MÉCANISME D'ANTICIPATION
        if self._action in self.memory:
            self._predicted_outcome = self.memory[self._action]
        else:
            self._predicted_outcome = 0
        
        return self._action

## Test your Agent2 in Environment1

In [21]:
a = Agent2(valences)
e = Environment1()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: True
Action: 1, Prediction: 0, Outcome: 1, Prediction: False, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: True
Action: 1, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Predic

## Test your Agent2 in Environment2

In [22]:
a = Agent2(valences)
e = Environment2()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 0, Prediction: 0, Outcome: 1, Prediction: False, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: True
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: True
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1, Bored: False
Action: 0, Prediction: 1, Out

# Test your agent with a different valence table

Note that, depending on the valence that you define, it may be impossible for the agent to obtain a positive valence in some environments. 

In [23]:
# Choose different valences
valences = [[1, -1], 
            [-1, 1]]
# Run the agent
a = Agent2(valences)
e = Environment2()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 0, Prediction: 0, Outcome: 1, Prediction: False, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: True
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: True
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Prediction: 1, Outcome: 1, Prediction: True, Valence: -1, Bored: False
Action: 0, Predic

## Report 

Explain what you programmed and what results you observed. Export this document as PDF including your code, the traces you obtained, and your explanations below (no more than a few paragraphs):