[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent9.ipynb)

# L'AGENT QUI ESPERAIT GAGNER AU COUP SUIVANT

# Objectifs

Ce tutoriel montre comment implémenter un agent qui prend une décision basée sur l'espérence de valence des deux prochains cycles d'interaction.

# Préparons les classes Interaction et CompositeInteraction

In [None]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install ipywidgets
!pip install IPython.display

Mêmes classes `Interaction` et `CompositeInteraction` que pour Agent7 et Agent8 mais ajout de la méthode `get_length()`

In [27]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, _action, _outcome, _valence):
        self._action = _action
        self._outcome = _outcome
        self._valence = _valence

    def get_action(self):
        """Return the action"""
        return self._action

    def get_decision(self):
        """Return the decision key"""
        return f"a{self._action}"

    def get_primitive_action(self):
        """Return the action for compatibility with CompositeInteraction"""
        return self._action

    def get_outcome(self):
        """Return the action"""
        return self._outcome

    def get_valence(self):
        """Return the action"""
        return self._valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self._action}{self._outcome}"

    def pre_key(self):
        """Return the key. Used for compatibility with CompositeInteraction"""
        return self.key()

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self._action}{self._outcome}:{self._valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False

    def get_length(self):
        """The length of the sequence of this interaction"""
        return 1

In [28]:
class CompositeInteraction:
    """A composite interaction is a tuple (pre_interaction, post_interaction) and a weight"""
    def __init__(self, pre_interaction, post_interaction):
        self.pre_interaction = pre_interaction
        self.post_interaction = post_interaction
        self.weight = 1
        self.isActivated = False

    def get_decision(self):
        """Return the sequence of decisions"""
        # return self.key()
        return f"{self.pre_interaction.key()}{self.post_interaction.get_decision()}"

    def get_primitive_action(self):
        """Return the primite action"""
        return self.pre_interaction.get_primitive_action()

    def get_valence(self):
        """Return the valence of the pre_interaction plus the valence of the post_interaction"""
        return self.pre_interaction.get_valence() + self.post_interaction.get_valence()

    def reinforce(self):
        """Increment the composite interaction's weight"""
        self.weight += 1

    def key(self):
        """ The key to find this interaction in the dictionary is the string '<pre_interaction><post_interaction>'. """
        return f"({self.pre_interaction.key()},{self.post_interaction.key()})"

    def pre_key(self):
        """Return the key of the pre_interaction"""
        return self.pre_interaction.pre_key()

    def __str__(self):
        """ Print the interaction in the Newick tree format (pre_interaction, post_interaction: valence) """
        return f"({self.pre_interaction}, {self.post_interaction}: {self.weight})"

    def __eq__(self, other):
        """ Interactions are equal if they have the same pre and post interactions """
        if isinstance(other, self.__class__):
            return (self.pre_interaction == other.pre_interaction) and (self.post_interaction == other.post_interaction)
        else:
            return False

    def get_length(self):
        return self.pre_interaction.get_length() + self.post_interaction.get_length()
        

# Définissons l'agent

Implémentons un agent qui calcule la valence attendue pour les décisions qui impliquent les deux pas de temps suivants.

Nous ajoutons aussi l'apprentissage d'une interaction composite de second niveau pour mémoriser le chainage de deux interactions composites de premier niveau : $((i_{t-4}, i_{t-3}), (i_{t-2}, i_{t-1}))$ 

La Figure 1 illustre ce mécanisme.

![Agent5](img/Figure_1_Agent9.svg)

Figure 1: L'agent calcule une valence attendue pour chaque interaction proposée puis une valence attendue aggrégée par décision.

La valence attendue $\mathbb{E}(V_d)$ est calculée en fonction de la valence $v_i$ de chaque interaction $i$ pouvant résulter de la décision $d$ et leurs probabilités d'être enactée $p(i)$.

# Implémentons l'agent 

La méthode `learn()` est modifiée pour apprendre l'interaction composite faite de deux interactions composites de plus bas niveau. 

Les méthodes `create_proposed_df()` et `aggregate_propositions()` sont modifiées pour que le champ `intended` contienne la première interaction primitive de la décision.

In [341]:
import pandas as pd

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._composite_interactions = {}
        self._intended_interaction = self._interactions["00"]
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        self._last_composite_interaction = None
        self._previous_composite_interaction = None
        self._penultimate_composite_interaction = None
        # Create a dataframe of default primitive interactions
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'activated': [np.nan] * len(default_interactions),
                'weight': [0] * len(default_interactions),
                'action': [i.get_primitive_action() for i in default_interactions],
                'intention': [i.key() for i in default_interactions],
                'valence': [i.get_valence() for i in default_interactions],
                'decision': [i.get_decision() for i in default_interactions],
                'proclivity': [0] * len(default_interactions), 
                'length': [1] * len(default_interactions),
                'primitive': [i.key() for i in default_interactions]}
        self.primitive_df = pd.DataFrame(data)
        # Store the selection dataframe as a class attribute so we can display it in the notebook
        self.proposed_df = None

        # Manage composite intended interactions
        self.primitive_intended_interaction = self._intended_interaction
        self.interaction_step = 0

    def action(self, _outcome):
        """Implement the agent's policy"""
        # Memorize the context
        self._penultimate_composite_interaction = self._previous_composite_interaction
        self._previous_composite_interaction = self._last_composite_interaction
        self._penultimate_interaction = self._previous_interaction
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[f"{self.primitive_intended_interaction.get_action()}{_outcome}"]

        # tracing the previous cycle
        print(
            f"Action: {self.primitive_intended_interaction.get_action()}, Prediction: {self.primitive_intended_interaction.get_outcome()}, "
            f"Outcome: {_outcome}, Prediction_correct: {self.primitive_intended_interaction.get_outcome() == _outcome}, "
            f"Valence: {self._last_interaction.get_valence()}")

        # Call the learning mechanism
        self.learn()

        if self.interaction_step == 1 and self.primitive_intended_interaction == self._last_interaction:
            # Continue the composite interaction
            self.interaction_step = 0
            self.primitive_intended_interaction = self._intended_interaction.post_interaction
            return self._intended_interaction.post_interaction.get_action()

        else:
            # Create the proposed dataframe
            self.create_proposed_df()
            self.aggregate_propositions()
    
            # Select the intended primitive interaction
            self.decide()
    
            if isinstance(self._intended_interaction, Interaction):
                self.interaction_step = 0
                self.primitive_intended_interaction = self._intended_interaction
                return self._intended_interaction.get_action()
            else:
                self.interaction_step = 1
                self.primitive_intended_interaction = self._intended_interaction.pre_interaction
                return self._intended_interaction.pre_interaction.get_action()

    def learn(self):
        """Learn the composite interactions"""
        # First level of composite interactions
        self._last_composite_interaction = self.learn_composite_interaction(self._previous_interaction,
                                                                            self._last_interaction)
        # Second level of composite interactions
        self.learn_composite_interaction(self._previous_composite_interaction, self._last_interaction)
        self.learn_composite_interaction(self._penultimate_interaction, self._last_composite_interaction)

        # Higher level composite interaction made of two composite interactions
        if self._last_composite_interaction is not None:
            self.learn_composite_interaction(self._penultimate_composite_interaction, self._last_composite_interaction)

    
    def learn_composite_interaction(self, pre_interaction, post_interaction):
        """Record or reinforce the composite interaction made of (pre_interaction, post_interaction)"""
        if pre_interaction is None:
            return None
        else:
            # If the pre interaction exist
            composite_interaction = CompositeInteraction(pre_interaction, post_interaction)
            if composite_interaction.key() not in self._composite_interactions:
                # Add the composite interaction to memory
                self._composite_interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                return composite_interaction
            else:
                # Reinforce the existing composite interaction and return it
                self._composite_interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._composite_interactions[composite_interaction.key()]}")
                return self._composite_interactions[composite_interaction.key()]

    def create_proposed_df(self):
        """Create the proposed dataframe from the activated interactions"""
        # The list of activated interaction that match the current context
        activated_keys = [composite_interaction.key() for composite_interaction in
                          self._composite_interactions.values()
                          if composite_interaction.pre_interaction == self._last_interaction or
                          composite_interaction.pre_interaction == self._last_composite_interaction]
        data = {'activated': activated_keys,
                'weight': [self._composite_interactions[k].weight for k in activated_keys],
                'action': [self._composite_interactions[k].post_interaction.get_primitive_action() for k in activated_keys],
                'intention': [self._composite_interactions[k].post_interaction.key() for k in activated_keys],
                'valence': [self._composite_interactions[k].post_interaction.get_valence() for k in activated_keys],
                'decision': [self._composite_interactions[k].post_interaction.get_decision() for k in activated_keys],
                'primitive': [self._composite_interactions[k].post_interaction.pre_key() for k in activated_keys],  # <-- MODIFIED
                'length': [self._composite_interactions[k].post_interaction.get_length() for k in activated_keys],
                }
        activated_df = pd.DataFrame(data)

        # Create the selection dataframe from the primitive and the activated dataframes
        self.proposed_df = pd.concat([self.primitive_df, activated_df], ignore_index=True)

        # Compute the proclivity for each proposition
        self.proposed_df['proclivity'] = self.proposed_df['weight'] * self.proposed_df['valence']

    def aggregate_propositions(self):
        """Aggregate the proclivity"""
        # Compute the proclivity for each action
        grouped_df = self.proposed_df.groupby('decision').agg({'proclivity': 'sum'}).reset_index()
        self.proposed_df = self.proposed_df.merge(grouped_df, on='decision', suffixes=('', '_agg'))
        # Sort by descending order of proclivity
        self.proposed_df = self.proposed_df.sort_values(by=['proclivity_agg', 'decision'], ascending=[False, False])

        # Find the most probable primitive interaction for each action <-- MODIFIED
        #max_weight_df = self.proposed_df.loc[self.proposed_df.groupby('decision')['weight'].idxmax(), ['decision', 'interaction']].reset_index(
        #    drop=True)
        #max_weight_df.columns = ['decision', 'intended']
        #self.proposed_df = self.proposed_df.merge(max_weight_df, on='decision')
        
    def decide(self):
        """Select the intended primitive or composite interaction from the proposed dataframe"""
        # Find the row that has the highest proclivity
        max_index = self.proposed_df['proclivity_agg'].idxmax()
        # Find the intended interaction in the row that has the highest proclivity
        intended_interaction_key = self.proposed_df.loc[max_index, ['intention']].values[0]
        if intended_interaction_key in self._interactions:
            self._intended_interaction = self._interactions[intended_interaction_key]
        else:
            self._intended_interaction = self._composite_interactions[intended_interaction_key]
        print("Intention:", self._intended_interaction)

## Implémentons l'environnement SmallLoop 

On crée l'environnement Small Loop

In [342]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, BoundaryNorm
from ipywidgets import Button, HBox,VBox, Output
from IPython.display import display

FORWARD = 0
TURN_LEFT = 1
TURN_RIGHT = 2
FEEL_FRONT = 3
FEEL_LEFT = 4
FEEL_RIGHT = 5

LEFT = 0
DOWN = 1
RIGHT = 2
UP = 3
FEELING = 2
BUMPING = 3

class SmallLoop():
    def __init__(self, poX, poY, direction):
        self.grid = np.array([
            [1, 1, 1, 1, 1], 
            [1, 0, 0, 0, 1],
            [1, 0, 1, 0, 1],
            [1, 0, 0, 0, 1],
            [1, 1, 1, 1, 1]
        ])
        self.maze = self.grid.copy()
        self.poX = poX
        self.poY = poY
        self.direction = direction
        self.cmap = ListedColormap(['white', 'green', 'yellow', 'red'])
        self.norm = BoundaryNorm([-0.5, 0.5, 1.5, 2.5, 3.5], self.cmap.N)

    def outcome(self, action):
        # print('before:', self.agent_position.strPosition(), action_dcit[action])
        self.maze[:,:] = self.grid
        result = 0
        
        if action == FORWARD:  # move forward
            # print('the action is move forward')
            # print(str(self.position.pointX)+': '+ str(self.position.pointY)+ ' ' +self.direction, action)
        
            if self.direction == LEFT:
                if self.maze[self.poX][self.poY - 1] == 0:
                    self.poY -= 1
                else:
                    self.maze[self.poX][self.poY - 1] = BUMPING
                    result = 1
            elif self.direction == DOWN:
                if self.maze[self.poX + 1][self.poY] == 0:
                    self.poX += 1
                else:
                    self.maze[self.poX + 1][self.poY] = BUMPING
                    result = 1
            elif self.direction == RIGHT:
                if self.maze[self.poX][self.poY + 1] == 0:
                    self.poY += 1
                else:
                    self.maze[self.poX][self.poY + 1] = BUMPING
                    result = 1
            elif self.direction == UP:
                if self.maze[self.poX - 1][self.poY] == 0:
                    self.poX -= 1
                else:
                    self.maze[self.poX - 1][self.poY] = BUMPING
                    result = 1
            # print(str(self.position.pointX)+': '+ str(self.position.pointY)+ ' ' +self.direction, action)
        elif action == TURN_RIGHT:
            if self.direction == LEFT:
                self.direction = UP
            elif self.direction == DOWN:
                self.direction = LEFT
            elif self.direction == RIGHT:
                self.direction = DOWN
            elif self.direction == UP:
                self.direction = RIGHT
        elif action == TURN_LEFT:
            if self.direction == LEFT:
                self.direction = DOWN  # RIGHT  # DOWN
            elif self.direction == DOWN:
                self.direction = RIGHT
            elif self.direction == RIGHT:
                self.direction = UP  # LEFT  # UP
            elif self.direction == UP:
                self.direction = LEFT
        elif action == FEEL_FRONT:
            if self.direction == LEFT:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
        elif action == FEEL_LEFT:
            if self.direction == LEFT:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
        elif action == FEEL_RIGHT:
            if self.direction == LEFT:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
        print(f"Line: {self.poX}, Column: {self.poY}, direction: {self.direction}")
        # return self.position,

        return result  
    
    def display(self):
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # ax.set_xticks([])
            # ax.set_yticks([])
            # ax.axis('off')
            # ax.imshow(self.maze, cmap='Greens', vmin=0, vmax=2)
            ax.imshow(self.maze, cmap=self.cmap, norm=self.norm)
            if self.direction == LEFT:
                # Y is column and X is line
                plt.scatter(self.poY, self.poX, s=400, marker='<')
            elif self.direction == DOWN:
                plt.scatter(self.poY, self.poX, s=400, marker='v')
            elif self.direction == RIGHT:
                plt.scatter(self.poY, self.poX, s=400, marker='>')
            else: # UP
                plt.scatter(self.poY, self.poX, s=400, marker='^')
            plt.show()

On instancie l'agent et l'environnement

In [343]:
# Instanciate the small loop environment
e = SmallLoop(1, 1, 0)

# Instanciate the agent 
interactions = [
    Interaction(0,0,5),
    Interaction(0,1,-10),
    Interaction(1,0,-6),
    Interaction(1,1,-6),
    Interaction(2,0,-6),
    Interaction(2,1,-6),
    Interaction(3,0,-1),
    Interaction(3,1,-1)
]
a = Agent(interactions)

# Run the interaction loop
step = 0
outcome = 0

# Display
out = Output()
e.display()
display(out)

Output()

In [345]:
print(f"Step: {step}")
step += 1
action = a.action(outcome)
e.display()
outcome = e.outcome(action)
a.proposed_df

Step: 1
Action: 3, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: -1
Learning (00:5, 31:-1: 1)
Intention: 30:-1
Line: 1, Column: 1, direction: 0


Unnamed: 0,activated,weight,action,intention,valence,decision,proclivity,length,primitive,proclivity_agg
3,,0.0,3.0,30,-1.0,a3,-0.0,1.0,30,0.0
2,,0.0,2.0,20,-6.0,a2,-0.0,1.0,20,0.0
1,,0.0,1.0,10,-6.0,a1,-0.0,1.0,10,0.0
0,,0.0,0.0,0,5.0,a0,0.0,1.0,0,0.0


Au pas 156 la décision 00a3 est prise car elle n'est pas contre balancée par le fait que l'interaction 00 pourrait échouer. 

# Calcul de la valence attendue

La valence attendue (_expected valence_) $\mathbb{E}(V_d)$ est la somme des valence des interactions pouvant résulter de cette décision multipliées par la probabilité d'enacter chacune de ces interaction :

$\displaystyle \mathbb{E}(V_d) = \sum_{i \in I_d} v_{i} \cdot p_{i} $

Dans le cas des décisions composites, nous prenons en compte :
* Le succès: l'interaction composite anticipée (_intended composite interaction_).
* L'échec sur la première interaction: une autre intéraction primitive a été enactée à la place de celle attendue.

Nous prenons en compte les possibilités d'échec dans le calcul de la proclivité. 
Pour cela nous ajoutons les proclivité des interactions qui correspondent à des échecs à la proclivité de la décision.

# Implémentons l'Agent9

Modifions l'aggrégation des décisions composites pour prendre en compte la possibilité qu'elles échouent.

In [357]:
class Agent9(Agent):
    def aggregate_propositions(self):
        """Aggregate the proclivity"""
        # Aggregate the proclivity for each decision
        grouped_df = self.proposed_df.groupby('decision').agg({'proclivity': 'sum', 'action': 'first', 'length': 'first', 'intention': 'first', 'primitive': 'first'}).reset_index()

        # For each composite decision, find the proposed primitive interactions that have the same action but a different outcome 
        for index, proposition in grouped_df[grouped_df['length'] > 1].iterrows():
            # print(f"Index {index}, action {proposition['action']}, intended {proposition['intended']}")
            for _, primitive in self.proposed_df[(self.proposed_df['action'] == proposition['action']) 
                                                & (self.proposed_df['primitive'] != proposition['primitive'])
                                                & (self.proposed_df['length'] == 1)].iterrows():
                grouped_df.loc[index, 'proclivity'] += primitive['proclivity']
                # print(f"Decision {proposition['decision']} recieves {primitive['proclivity']} from failing {primitive['intended']}")
        
        # Sort by descending proclivity
        self.proposed_df = grouped_df.sort_values(by=['proclivity', 'decision'], ascending=[False, True]).reset_index(drop=True)

    def decide(self):
        """Select the intended primitive or composite interaction from the proposed dataframe"""
        # The intended interaction is in the first row because it has been sorted by descending proclivity
        intended_interaction_key = self.proposed_df.loc[0, 'intention']
        if intended_interaction_key in self._interactions:
            self._intended_interaction = self._interactions[intended_interaction_key]
        else:
            self._intended_interaction = self._composite_interactions[intended_interaction_key]
        print("Intention:", self._intended_interaction)

# Testons dans le small loop

In [358]:
e = SmallLoop(1, 1, 0)
# Instanciate the agent 
interactions = [
    Interaction(0,0,5),
    Interaction(0,1,-10),
    Interaction(1,0,-3),
    Interaction(1,1,-3),
    Interaction(2,0,-3),
    Interaction(2,1,-3),
    Interaction(3,0,-1),
    Interaction(3,1,-1)
]
a = Agent9(interactions)

# Run the interaction loop
step = 0
outcome = 0

# Display
out = Output()
e.display()
display(out)

Output()

In [406]:
print(f"Step: {step}")
step += 1
action = a.action(outcome)
e.display()
outcome = e.outcome(action)
a.proposed_df

Step: 47
Action: 1, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -3
Reinforcing (31:-1, 10:-3: 5)
Reinforcing ((00:5, 31:-1: 6), 10:-3: 5)
Reinforcing (00:5, (31:-1, 10:-3: 5): 5)
Reinforcing ((30:-1, 00:5: 5), (31:-1, 10:-3: 5): 5)
Line: 2, Column: 1, direction: 1


Unnamed: 0,decision,proclivity,action,length,intention,primitive
0,10a0,16,1,2,"(10,00)",10
1,a2,0,2,1,20,20
2,a3,0,3,1,30,30
3,a0,-20,0,1,00,0
4,a1,-24,1,1,10,10
5,01a1,-26,0,2,"(01,10)",1


A partir du pas 25, l'agent apprend à toucher devant et ne pas avancer s'il touche un mur.