[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent10.ipynb)

# L'AGENT QUI MONTAIT DANS LA HIERARCHIE (EN CONSTRUCTION)

# Objectifs

Ce tutoriel montre comment implémenter un agent qui apprend récursivement des séquences hiérarchiques de plus en plus haut niveau.

In [None]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install ipywidgets
!pip install IPython.display

# Préparons les classes CompositeInteraction et Interaction

La classe `CompositeInteraction` est modifiée pour gérer l'enaction d'interactions composites de n'importe quel niveau hierarchique. 

On lui ajoute un attribut `_step` qui mémorise le pas courant de l'enaction en cours.

La méthode `current()` renvoie l'interaction primitive qui correspond au pas en cours.

La méthode `increment()` passe au pas suivant et renvoie l'interaction enactée si elle est terminée.

In [1414]:
class CompositeInteraction:
    """A composite interaction is a tuple (pre_interaction, post_interaction) and a weight"""
    def __init__(self, pre_interaction, post_interaction):
        self.pre_interaction = pre_interaction
        self.post_interaction = post_interaction
        self.weight = 1
        self._step = 1

    def get_decision(self):
        """Return the sequence of decisions"""
        # return self.key()
        return f"{self.pre_interaction.key()}{self.post_interaction.get_decision()}"

    def get_primitive_action(self):
        """Return the primite action"""
        return self.pre_interaction.get_primitive_action()

    def get_valence(self):
        """Return the valence of the pre_interaction plus the valence of the post_interaction"""
        return self.pre_interaction.get_valence() + self.post_interaction.get_valence()

    def reinforce(self):
        """Increment the composite interaction's weight"""
        self.weight += 1

    def key(self):
        """ The key to find this interaction in the dictionary is the string '<pre_interaction><post_interaction>'. """
        return f"({self.pre_interaction.key()},{self.post_interaction.key()})"

    def pre_key(self):
        """Return the key of the pre_interaction"""
        return self.pre_interaction.pre_key()

    def __str__(self):
        """ Print the interaction in the Newick tree format (pre_interaction, post_interaction: valence) """
        return f"({self.pre_interaction}, {self.post_interaction}: {self.weight})"

    def __eq__(self, other):
        """ Interactions are equal if they have the same pre and post interactions """
        if isinstance(other, self.__class__):
            return (self.pre_interaction == other.pre_interaction) and (self.post_interaction == other.post_interaction)
        else:
            return False

    def get_length(self):
        """Return the length of the number of primitive interactions in this composite interaction"""
        return self.pre_interaction.get_length() + self.post_interaction.get_length()

    def increment(self, interaction, interactions):
        """Increment the step of the appropriate sub-interaction. Return the enacted interaction if it is over, or None if it is ongoing."""
        # First step 
        if self._step == 1:
            interaction = self.pre_interaction.increment(interaction, interactions)
            # Ongoing pre-interaction. Return None
            if interaction is None:
                return None
            # Pre-interaction succeeded. Increment the step and return None
            elif interaction == self.pre_interaction:
                self._step = 2
                return None
            # Pre-interaction failed. Reset the step and return the enacted interaction
            else:
                self._step = 1
                return interaction
        # Second step
        else:
            interaction = self.post_interaction.increment(interaction, interactions)
            # Ongoing post-interaction. Return None
            if interaction is None:
                return None
            # Post-interaction succeeded. Reset the step and return this interaction
            elif interaction == self.post_interaction:
                self._step = 1
                return self
            # Post-interaction failed. Reset the step and return the enacted interaction
            else:
                self._step = 1
                composite_interaction = CompositeInteraction(self.pre_interaction, interaction)
                if composite_interaction.key() not in interactions:
                    # Add the enacted composite interaction to memory
                    interactions[composite_interaction.key()] = composite_interaction
                    print(f"Learning {composite_interaction}")
                    return composite_interaction
                else:
                    # Reinforce the existing composite interaction and return it
                    interactions[composite_interaction.key()].reinforce()
                    print(f"Reinforcing {interactions[composite_interaction.key()]}")
                    return interactions[composite_interaction.key()]

    def current(self):
        """Return the current intended primitive interaction"""
        # Step 1: the current primitive interaction of the pre-interaction
        if self._step == 1:
            return self.pre_interaction.current()
        # Step 2: The current primitive interaction of the post-interaction
        else:
            return self.post_interaction.current()


On ajoute aussi les méthodes `current()` et `increment()` à la classe `Interaction` pour assurer la compatibilité avec la classe `CompositeInteraction`.

In [1255]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, _action, _outcome, _valence):
        self._action = _action
        self._outcome = _outcome
        self._valence = _valence

    def get_action(self):
        """Return the action"""
        return self._action

    def get_decision(self):
        """Return the decision key"""
        return f"a{self._action}"

    def get_primitive_action(self):
        """Return the action for compatibility with CompositeInteraction"""
        return self._action

    def get_outcome(self):
        """Return the action"""
        return self._outcome

    def get_valence(self):
        """Return the action"""
        return self._valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self._action}{self._outcome}"

    def pre_key(self):
        """Return the key. Used for compatibility with CompositeInteraction"""
        return self.key()

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self._action}{self._outcome}:{self._valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False

    def get_length(self):
        """The length of the sequence of this interaction"""
        return 1

    def increment(self, interaction, interactions):
        """Return the enacted interaction for compatibility with composite interactions"""
        return interaction

    def current(self):
        """Return itself for compatibility with composite interactions"""
        return self

# Apprentissage hiérarchique

Lorsque l'agent enacte une interaction composite avec succès, il construit des interactions composites de plus haut niveau de la même manière qu'il le fait sur des interactions primitive. La Figure 1 illustre ce mécanisme.

![Agent5](img/Figure_1_Agent10.svg)

Figure 1: La flèche "décision time" représente les interactions primitive ou composites qui sont enactées chaque fois que l'agent prend une décision. 
L'énaction de l'interaction composite $i_{d-1}$ consiste à enacter les deux interactions primitives $i_{t-2}$ et $i_{t-1}$.

Les interactions en gris indiquent les interactions qui sont incluses dans le contexte qui active des interactions que proposent les interactions suivantes à énacter.

# Implémentons l'agent 

Les interactions composite sont maintenant stockées avec les interactions primitives dans le même dictionnaire `self._interactions`.

La méthode `action()` est modifiée pour gérer l'enaction d'interactions composites. 

La méthode `decide()` est modifiée pour choisir la prochaine intended interaction qui peut etre primitive ou composite.

In [1526]:
import pandas as pd

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._primitive_intended_interaction = self._interactions["00"]
        # self._enaction = None
        self._intended_interaction = None
        # self._enacted_interaction = None

        # The context
        self._penultimate_interaction = None
        self._previous_interaction = None
        self._last_interaction = None
        # self._primitive_enacted_interaction = None
        self._penultimate_composite_interaction = None
        self._previous_composite_interaction = None
        self._last_composite_interaction = None
        
        # Prepare the dataframe of proposed interactions
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'activated': [np.nan] * len(default_interactions),
                'weight': [0] * len(default_interactions),
                'action': [i.get_primitive_action() for i in default_interactions],
                'intention': [i.key() for i in default_interactions],
                'valence': [i.get_valence() for i in default_interactions],
                'decision': [i.get_decision() for i in default_interactions],
                'proclivity': [0] * len(default_interactions), 
                'length': [1] * len(default_interactions),
                'primitive': [i.key() for i in default_interactions]}
        self._default_df = pd.DataFrame(data)
        self.proposed_df = None

    def action(self, _outcome):
        """Implement the agent's policy"""

        # trace the previous cycle
        primitive_enacted_interaction = self._interactions[f"{self._primitive_intended_interaction.get_action()}{_outcome}"]
        print(
            f"Action: {self._primitive_intended_interaction.get_action()}, Prediction: {self._primitive_intended_interaction.get_outcome()}, "
            f"Outcome: {_outcome}, Prediction_correct: {self._primitive_intended_interaction.get_outcome() == _outcome}, "
            f"Valence: {primitive_enacted_interaction.get_valence()}")

        # Follow up the enaction
        if self._intended_interaction is None: # First interaction cycle
            enacted_interaction = primitive_enacted_interaction
        else:
            enacted_interaction = self._intended_interaction.increment(primitive_enacted_interaction, self._interactions)

        # If the intended interaction has been completely enacted or has been aborted
        if enacted_interaction is not None:
            # Memorize the context
            self._penultimate_composite_interaction = self._previous_composite_interaction
            self._previous_composite_interaction = self._last_composite_interaction
            self._penultimate_interaction = self._previous_interaction
            self._previous_interaction = self._last_interaction
            self._last_interaction = enacted_interaction
            # Call the learning mechanism
            self.learn(enacted_interaction)
            # Create the proposed dataframe
            self.create_proposed_df()
            self.aggregate_propositions()
            # Decide the next enaction
            self.decide()

        # Return the next primitive action
        self._primitive_intended_interaction = self._intended_interaction.current()
        return self._primitive_intended_interaction.get_action()
        
    def learn(self, enacted_interaction):
        """Learn the composite interactions"""
        # First level of composite interactions
        self._last_composite_interaction = self.learn_composite_interaction(self._previous_interaction, enacted_interaction)
        # Second level of composite interactions
        self.learn_composite_interaction(self._previous_composite_interaction, enacted_interaction)
        self.learn_composite_interaction(self._penultimate_interaction, self._last_composite_interaction)

        # Higher level composite interaction made of two composite interactions
        if self._last_composite_interaction is not None:
            self.learn_composite_interaction(self._penultimate_composite_interaction, self._last_composite_interaction)

    
    def learn_composite_interaction(self, pre_interaction, post_interaction):
        """Record or reinforce the composite interaction made of (pre_interaction, post_interaction)"""
        if pre_interaction is None:
            return None
        else:
            # If the pre-interaction exists
            composite_interaction = CompositeInteraction(pre_interaction, post_interaction)
            if composite_interaction.key() not in self._interactions:
                # Add the composite interaction to memory
                self._interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                return composite_interaction
            else:
                # Reinforce the existing composite interaction and return it
                self._interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._interactions[composite_interaction.key()]}")
                return self._interactions[composite_interaction.key()]

    def create_proposed_df(self):
        """Create the proposed dataframe from the activated interactions"""
        # The list of activated interaction that match the current context
        activated_keys = [composite_interaction.key() for composite_interaction in
                          self._interactions.values()
                          if composite_interaction.get_length() > 1 and
                          (composite_interaction.pre_interaction == self._last_interaction or
                          composite_interaction.pre_interaction == self._last_composite_interaction)]
                          # and composite_interaction.post_interaction.get_length() < 3]
        data = {'activated': activated_keys,
                'weight': [self._interactions[k].weight for k in activated_keys],
                'action': [self._interactions[k].post_interaction.get_primitive_action() for k in activated_keys],
                'intention': [self._interactions[k].post_interaction.key() for k in activated_keys],
                'valence': [self._interactions[k].post_interaction.get_valence() for k in activated_keys],
                'decision': [self._interactions[k].post_interaction.get_decision() for k in activated_keys],
                'primitive': [self._interactions[k].post_interaction.pre_key() for k in activated_keys],  # <-- MODIFIED
                'length': [self._interactions[k].post_interaction.get_length() for k in activated_keys],
                }
        activated_df = pd.DataFrame(data)

        # Create the selection dataframe from the primitive and the activated dataframes
        self.proposed_df = pd.concat([self._default_df, activated_df], ignore_index=True)

        # Compute the proclivity for each proposition
        self.proposed_df['proclivity'] = self.proposed_df['weight'] * self.proposed_df['valence']

    def aggregate_propositions(self):
        """Aggregate the proclivity"""
        # Aggregate the proclivity for each decision
        grouped_df = self.proposed_df.groupby('decision').agg({'proclivity': 'sum', 'action': 'first', 'length': 'first', 
                                                               'intention': 'first', 'primitive': 'first'}).reset_index()

        # For each composite decision, find the proposed primitive interactions that have the same action but a different outcome 
        for index, proposition in grouped_df[grouped_df['length'] > 1].iterrows():
            # print(f"Index {index}, action {proposition['action']}, intended {proposition['intended']}")
            for _, primitive in self.proposed_df[(self.proposed_df['action'] == proposition['action']) 
                                                & (self.proposed_df['primitive'] != proposition['primitive'])
                                                & (self.proposed_df['length'] == 1)].iterrows():
                grouped_df.loc[index, 'proclivity'] += primitive['proclivity']
                # print(f"Decision {proposition['decision']} recieves {primitive['proclivity']} from failing {primitive['intended']}")
        
        # Sort by descending proclivity
        self.proposed_df = grouped_df.sort_values(by=['proclivity', 'decision'], ascending=[False, True]).reset_index(drop=True)

    def decide(self):
        """Select the intended primitive or composite interaction from the proposed dataframe"""
        # The intended interaction is in the first row because it has been sorted by descending proclivity
        intended_interaction_key = self.proposed_df.loc[0, 'intention']
        print("Intention:", intended_interaction_key)
        self._intended_interaction = self._interactions[intended_interaction_key]

# Implémentons l'environnement SmallLoop 

In [1527]:
save_dir = "sav"

FORWARD = 0
TURN_LEFT = 1
TURN_RIGHT = 2
FEEL_FRONT = 3
FEEL_LEFT = 4
FEEL_RIGHT = 5

On crée l'environnement Small Loop

In [1530]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, BoundaryNorm
from ipywidgets import Button, HBox,VBox, Output
from IPython.display import display

LEFT = 0
DOWN = 1
RIGHT = 2
UP = 3
FEELING = 2
BUMPING = 3

class SmallLoop():
    def __init__(self, poX, poY, direction):
        self.grid = np.array([
            [1, 1, 1, 1, 1, 1], 
            [1, 0, 0, 0, 1, 1],
            [1, 0, 1, 0, 0, 1],
            [1, 0, 1, 1, 0, 1],
            [1, 0, 0, 0, 0, 1],
            [1, 1, 1, 1, 1, 1]
        ])
        self.maze = self.grid.copy()
        self.poX = poX
        self.poY = poY
        self.direction = direction
        self.cmap = ListedColormap(['white', 'green', 'yellow', 'red'])
        self.norm = BoundaryNorm([-0.5, 0.5, 1.5, 2.5, 3.5], self.cmap.N)

    def outcome(self, action):
        # print('before:', self.agent_position.strPosition(), action_dcit[action])
        self.maze[:,:] = self.grid
        result = 0
        
        if action == FORWARD:  # move forward
            # print('the action is move forward')
            # print(str(self.position.pointX)+': '+ str(self.position.pointY)+ ' ' +self.direction, action)
        
            if self.direction == LEFT:
                if self.maze[self.poX][self.poY - 1] == 0:
                    self.poY -= 1
                else:
                    self.maze[self.poX][self.poY - 1] = BUMPING
                    result = 1
            elif self.direction == DOWN:
                if self.maze[self.poX + 1][self.poY] == 0:
                    self.poX += 1
                else:
                    self.maze[self.poX + 1][self.poY] = BUMPING
                    result = 1
            elif self.direction == RIGHT:
                if self.maze[self.poX][self.poY + 1] == 0:
                    self.poY += 1
                else:
                    self.maze[self.poX][self.poY + 1] = BUMPING
                    result = 1
            elif self.direction == UP:
                if self.maze[self.poX - 1][self.poY] == 0:
                    self.poX -= 1
                else:
                    self.maze[self.poX - 1][self.poY] = BUMPING
                    result = 1
            # print(str(self.position.pointX)+': '+ str(self.position.pointY)+ ' ' +self.direction, action)
        elif action == TURN_RIGHT:
            if self.direction == LEFT:
                self.direction = UP
            elif self.direction == DOWN:
                self.direction = LEFT
            elif self.direction == RIGHT:
                self.direction = DOWN
            elif self.direction == UP:
                self.direction = RIGHT
        elif action == TURN_LEFT:
            if self.direction == LEFT:
                self.direction = DOWN  # RIGHT  # DOWN
            elif self.direction == DOWN:
                self.direction = RIGHT
            elif self.direction == RIGHT:
                self.direction = UP  # LEFT  # UP
            elif self.direction == UP:
                self.direction = LEFT
        elif action == FEEL_FRONT:
            if self.direction == LEFT:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
        elif action == FEEL_LEFT:
            if self.direction == LEFT:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
        elif action == FEEL_RIGHT:
            if self.direction == LEFT:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
        print(f"Line: {self.poX}, Column: {self.poY}, direction: {self.direction}")
        # return self.position,

        return result  
    
    def display(self):
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # ax.set_xticks([])
            # ax.set_yticks([])
            # ax.axis('off')
            # ax.imshow(self.maze, cmap='Greens', vmin=0, vmax=2)
            ax.imshow(self.maze, cmap=self.cmap, norm=self.norm)
            if self.direction == LEFT:
                # Y is column and X is line
                plt.scatter(self.poY, self.poX, s=400, marker='<')
            elif self.direction == DOWN:
                plt.scatter(self.poY, self.poX, s=400, marker='v')
            elif self.direction == RIGHT:
                plt.scatter(self.poY, self.poX, s=400, marker='>')
            else: # UP
                plt.scatter(self.poY, self.poX, s=400, marker='^')
            plt.show()

    def save(self, step):
        """
        save the display as png file
        """
        fig, ax = plt.subplots()
        ax.set_xticks([])
        ax.set_yticks([])
        ax.axis('off')
        ax.imshow(self.maze, cmap='Greens', vmin=0, vmax=2)
        ax.imshow(self.maze, cmap=self.cmap, norm=self.norm)
        if self.direction == LEFT:
            # Y is column and X is line
            plt.scatter(self.poY, self.poX, s=400, marker='<')
        elif self.direction == DOWN:
            plt.scatter(self.poY, self.poX, s=400, marker='v')
        elif self.direction == RIGHT:
            plt.scatter(self.poY, self.poX, s=400, marker='>')
        else: # UP
            plt.scatter(self.poY, self.poX, s=400, marker='^')

        # Add number in plot
        ax.text(4, 0, f"{step:>3}", fontsize=12, color='White')
        plt.savefig(f"{save_dir}/{step:03}.png", bbox_inches='tight', pad_inches=0, transparent=True)
        plt.close(fig)


# Testons l'agent dans le Small Loop avec 5 actions

In [1531]:
# Instanciate the small loop environment
e = SmallLoop(1, 1, 0)

# Instanciate the agent 
interactions = [
    Interaction(FORWARD,0,5),
    Interaction(FORWARD,1,-10),
    Interaction(TURN_LEFT,0,-3),
    Interaction(TURN_LEFT,1,-3),
    Interaction(TURN_RIGHT,0,-3),
    Interaction(TURN_RIGHT,1,-3),
    Interaction(FEEL_FRONT,0,-1),
    Interaction(FEEL_FRONT,1,-1),
    Interaction(FEEL_LEFT,0,-1),
    Interaction(FEEL_LEFT,1,-1),
    Interaction(FEEL_RIGHT,0,-1),
    Interaction(FEEL_RIGHT,1,-1)
]
a = Agent(interactions)

# Run the interaction loop
step = 0
outcome = 0

# Display
out = Output()
e.display()
display(out)

Output()

In [1630]:
print(f"Step: {step}")
step += 1
action = a.action(outcome)
e.display()
# e.save(step)  # Sauvegarde le fichier image qui servira au gif
outcome = e.outcome(action)
a.proposed_df

Step: 98
Action: 1, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -3
Line: 2, Column: 3, direction: 1


Unnamed: 0,decision,proclivity,action,length,intention,primitive
0,"(10,(30,00))2030a0",6,1,6,"((10,(30,00)),(20,(30,00)))",10
1,1030a0,5,1,3,"(10,(30,00))",10
2,a2,0,2,1,20,20
3,a3,0,3,1,30,30
4,a4,0,4,1,40,40
5,a5,0,5,1,50,50
6,"(10,(30,00))20a3",-3,1,5,"((10,(30,00)),(20,31))",10
7,a1,-3,1,1,10,10
8,10a3,-4,1,2,"(10,31)",10
9,"(10,31)20a3",-8,1,4,"((10,31),(20,31))",10


A partir du pas 98, l'agent reste coincé dans un comportement non satisfaisait. 
Ceci est dû au faut qu'il ne prend pas en compte que l'interaction composite (tourner, toucher vide devant) peut échouer. 

Nous allons implémenter l'Agent10 qui en est capable.

# Implémentons l'Agent 10

# Testons dans le Small Loop

# Créons le film gif

In [1162]:
import imageio.v2 as imageio
import os

img_dir = f"./{save_dir}"
all_files = [os.path.join(img_dir, f) for f in os.listdir(img_dir) if f.endswith('.png')]
images = [imageio.imread(f) for f in all_files]
imageio.mimsave("movie.gif", images, fps=3)