[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent7.ipynb)

# THE AGENT WHO PROGRAMMED ITSELF

# Learning objectives

Upon completing this lab, you will be able to implement a developmental agent that can re-enact a whole sequence of interaction.

## Define the necessary classes

Let's improve the Interaction class again.

In [1]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, action, outcome, valence):
        self._action = action
        self._outcome = outcome
        self._valence = valence

    def get_action(self):
        """Return the action"""
        return self._action

    def get_outcome(self):
        """Return the action"""
        return self._outcome

    def get_valence(self):
        """Return the action"""
        return self._valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self._action}{self._outcome}"

    def pre_key(self):
        """Return the key. Used for compatibility with CompositeInteraction"""
        return self.key()

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self._action}{self._outcome}:{self._valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False            

In [2]:
class CompositeInteraction:
    """A composite interaction is a tuple (pre_interaction, post_interaction) and a weight"""
    def __init__(self, pre_interaction, post_interaction):
        self.pre_interaction = pre_interaction
        self.post_interaction = post_interaction
        self.weight = 1
        self.isActivated = False

    def get_action(self):
        """Return the action of the pre interaction"""
        return self.pre_interaction.get_action()
    
    def get_valence(self):
        """Return the valence of the pre_interaction plus the valence of the post_interaction"""
        return self.pre_interaction.get_valence() + self.post_interaction.get_valence()

    def reinforce(self):
        """Increment the composite interaction's weight"""
        self.weight += 1

    def key(self):
        """ The key to find this interaction in the dictionary is the string '<pre_interaction><post_interaction>'. """
        return f"({self.pre_interaction.key()},{self.post_interaction.key()})"

    def pre_key(self):
        """Return the key of the pre_interaction"""
        return self.pre_interaction.pre_key()

    def __str__(self):
        """ Print the interaction in the Newick tree format (pre_interaction, post_interaction: valence) """
        return f"({self.pre_interaction}, {self.post_interaction}: {self.weight})"

    def __eq__(self, other):
        """ Interactions are equal if they have the same pre and post interactions """
        if isinstance(other, self.__class__):
            return (self.pre_interaction == other.pre_interaction) and (self.post_interaction == other.post_interaction)
        else:
            return False

## Define the Agent class

We will use a Pandas DataFrame to compute the selection of the next intended interaction and to predict its most likely outcome.

In [None]:
!pip install pandas

###### Let's implement a base Agent that has the functionnalities of Agent6.

In [4]:
import pandas as pd

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._composite_interactions = {}
        self._intended_interaction = self._interactions["00"]
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        self._last_composite_interaction = None
        self._previous_composite_interaction = None
        # Create a dataframe of default primitive interactions 
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'interaction': [i.key() for i in default_interactions],
                'action': [i.get_action() for i in default_interactions],
                'weight': [0] * len(default_interactions),
                'proclivity': [0] * len(default_interactions)}
        self.primitive_df = pd.DataFrame(data)
        # Store the selection dataframe as a class attribute so we can display it in the notebook
        self.selection_df = None

    def action(self, _outcome):
        """Implement the agent's policy"""
        # tracing the previous cycle
        self._previous_composite_interaction = self._last_composite_interaction
        self._penultimate_interaction = self._previous_interaction
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[f"{self._intended_interaction.get_action()}{_outcome}"]
        print(f"Action: {self._intended_interaction.get_action()}, Prediction: {self._intended_interaction.get_outcome()}, "
              f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.get_outcome() == _outcome}, "
              f"Valence: {self._last_interaction.get_valence()}")

        # Call the learning mechanism
        self.learn()
        
        # Create a dataframe from the activated composite interaction 
        activated_keys = [composite_interaction.key() for composite_interaction in self._composite_interactions.values() 
                          if composite_interaction.pre_interaction == self._last_interaction or 
                          composite_interaction.pre_interaction == self._last_composite_interaction]
        data = {'composite': activated_keys,
                'weight': [self._composite_interactions[k].weight for k in activated_keys],
                'post_valence': [self._composite_interactions[k].post_interaction.get_valence() for k in activated_keys],
                'action': [self._composite_interactions[k].post_interaction.get_action() for k in activated_keys],
                'interaction': [self._composite_interactions[k].post_interaction.pre_key() for k in activated_keys]
                }
        activated_df = pd.DataFrame(data)

        # Create the selection dataframe from the primitive and the activated dataframes
        df = pd.concat([self.primitive_df, activated_df], ignore_index=True)

        # Compute the proclivity for each action
        df['proclivity'] = df['weight'] * df['post_valence']
        grouped_df = df.groupby('action').agg({'proclivity': 'sum'}).reset_index()
        df = df.merge(grouped_df, on='action', suffixes=('', '_sum'))

        # Find the most probable outcome for each action
        max_weight_df = df.loc[df.groupby('action')['weight'].idxmax(), ['action', 'interaction']].reset_index(drop=True)
        max_weight_df.columns = ['action', 'intended']
        df = df.merge(max_weight_df, on='action')

        # Find the first row that has the highest proclivity
        max_index = df['proclivity_sum'].idxmax()
        intended_interaction_key = df.loc[max_index, ['intended']].values[0]
        self._intended_interaction = self._interactions[intended_interaction_key]
        print("Intended", self._intended_interaction)

        # Store the selection dataframe for printing
        self.selection_df = df.copy()
        
        return self._intended_interaction.get_action()

    def learn(self):
        # Recording previous composite interaction
        if self._previous_interaction is not None:
            # Record or reinforce the first level composite interaction
            composite_interaction = CompositeInteraction(self._previous_interaction, self._last_interaction)
            if composite_interaction.key() not in self._composite_interactions:
                self._composite_interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                self._last_composite_interaction = composite_interaction
            else:
                self._composite_interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._composite_interactions[composite_interaction.key()]}")
                self._last_composite_interaction = self._composite_interactions[composite_interaction.key()]
                # Retrieve the existing composite interaction
                composite_interaction = self._composite_interactions[composite_interaction.key()]
            
            # Record or reinforce the second level composite interaction
            if self._previous_composite_interaction is not None:
                composite_interaction_2 = CompositeInteraction(self._previous_composite_interaction, self._last_interaction)
                if composite_interaction_2.key() not in self._composite_interactions:
                    self._composite_interactions[composite_interaction_2.key()] = composite_interaction_2
                    print(f"Learning {composite_interaction_2}")
                else:
                    self._composite_interactions[composite_interaction_2.key()].reinforce()
                    print(f"Reinforcing {self._composite_interactions[composite_interaction_2.key()]}")
        
            if self._penultimate_interaction is not None:
                composite_interaction_3 = CompositeInteraction(self._penultimate_interaction, composite_interaction)
                if composite_interaction_3.key() not in self._composite_interactions:
                    self._composite_interactions[composite_interaction_3.key()] = composite_interaction_3
                    print(f"Learning {composite_interaction_3}")
                else:
                    self._composite_interactions[composite_interaction_3.key()].reinforce()
                    print(f"Reinforcing {self._composite_interactions[composite_interaction_3.key()]}")


# PRELIMINARY EXERCISE

Let's test this agent in Environment5

In [1021]:
class Environment5:
    """ Environment5 """
    def __init__(self):
        """ Initializing Environment4 """
        self._previous_action = 0
        self._last_action = 0

    def outcome(self, _action):
        """Take the action and generate the next outcome """
        if action == self._last_action and action == self._previous_action:
            # If same action during the last 3 steps then outcome 0
            outcome = 0
        else:
            # If different action then outcome 1
            outcome = 1
        self._previous_action = self._last_action
        self._last_action = action
        return outcome  

In [1023]:
# Instanciate the agent in Environment5
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment5()

# Run the interaction loop
step = 0
outcome = 0

Run the simulation step by step to see the Selection DataFrame. Use `Ctrl+Enter` to run the cell bellow and stay on it.

In [1046]:
print(f"Step {step}")
step += 1
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df[['composite', 'weight', 'post_valence', 'action', 'proclivity', 'proclivity_sum', 'intended']]

Step 21
Action: 0, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1
Reinforcing (01:1, 01:1: 6)
Reinforcing ((11:1, 01:1: 6), 01:1: 6)
Reinforcing (11:1, (01:1, 01:1: 6): 6)
Intended 11:1


Unnamed: 0,composite,weight,post_valence,action,proclivity,proclivity_sum,intended
0,,0,,0,,8.0,1
1,,0,,1,,12.0,11
2,"(01,01)",6,1.0,0,6.0,8.0,1
3,"(01,00)",2,-1.0,0,-2.0,8.0,1
4,"((01,01),00)",2,-1.0,0,-2.0,8.0,1
5,"(01,(01,00))",2,0.0,0,0.0,8.0,1
6,"(01,(00,11))",2,0.0,0,0.0,8.0,1
7,"(01,11)",3,1.0,1,3.0,12.0,11
8,"((01,01),11)",3,1.0,1,3.0,12.0,11
9,"(01,(01,11))",3,2.0,0,6.0,8.0,1


Observe the Selection DataFrame above as you run the agent step by step. 
Each activated composite interaction proposes the action of its post_interaction with proclity equals to the composite interaction's weight multiplied by the post interactions' valence. 

The proclivities are summed for each action. The action that has the highest sum proclivity is selected.

Let's test this agent in Environment6 that returns 1 if and only if the agent takes the same action twice in a row only. 

In [1048]:
class Environment6:
    """ Environm4 """
    def __init__(self):
        """ Initializing Environment4 """
        self._previous_action = 0
        self._last_action = 0

    def outcome(self, _action):
        """Take the action and generate the next outcome """
        if self._last_action == action and self._previous_action != action:
            # If same action twice only
            outcome = 1
        else:
            # If different action then outcome 0
            outcome = 0
        self._previous_action = self._last_action
        self._last_action = action
        return outcome     

In [1050]:
# Instanciate the agent in Environment6
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment6()

# Run the interaction loop
step = 0
outcome = 0

In [1099]:
print(f"Step {step}")
step += 1
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df[['composite', 'weight', 'post_valence', 'action', 'proclivity', 'proclivity_sum', 'intended']]

Step 48
Action: 1, Prediction: 1, Outcome: 0, Prediction_correct: False, Valence: -1
Reinforcing (10:-1, 10:-1: 5)
Reinforcing ((11:1, 10:-1: 4), 10:-1: 3)
Reinforcing (11:1, (10:-1, 10:-1: 5): 3)
Intended 11:1


Unnamed: 0,composite,weight,post_valence,action,proclivity,proclivity_sum,intended
0,,0,,0,,-9.0,0
1,,0,,1,,-7.0,11
2,"(10,00)",5,-1.0,0,-5.0,-9.0,0
3,"(10,(00,01))",4,0.0,0,0.0,-9.0,0
4,"(10,11)",8,1.0,1,8.0,-7.0,11
5,"(10,(11,00))",4,0.0,1,0.0,-7.0,11
6,"(10,(11,10))",4,0.0,1,0.0,-7.0,11
7,"(10,10)",5,-1.0,1,-5.0,-7.0,11
8,"((10,10),10)",2,-1.0,1,-2.0,-7.0,11
9,"(10,(10,10))",2,-2.0,1,-4.0,-7.0,11


The agent cannot get the optimum valence in Environment6. We are going to design Agent7 that can.

# ASSIGNMENT

We want Agent7 to be able to select the next action based on the possiblity to enact a full composite interaction. This is illustrated in Figure 1.

![Agent5](img/Figure_1_Agent7.svg)

Figure 1: Agent7 records and reinforces two levels of composite interactions:
* First-level composite interaction $c_{t-1} = (i_{t-2}, i_{t-1}: weight)$, 
* Second-level composite interaction $((i_{t-3}, i_{t-2}), i_{t-1}: weight)$, $i_{t-3}, (i_{t-2}, i_{t-1}): weight)$, 
and $((i_{t-4}, i_{t-2}), (i_{t-2}, i_{t-1}): weight)$. 

The last enacted primitive interaction $i_{t-1}$ and the last enacted composite interaction $c_{t-1}$ activates previously-learned composite interactions that propose their post interaction. 

Now we want this post interaction to possibly be a composite interaction.
This means that Agent7 is now capable of making a decision based on an anticipation of two steps ahead.

In [1101]:
class Agent7(Agent):
    def learn(self):
        # Implement the learning mechanism of Agent6 here
        pass

## Test your Agent7 in Environment6

In [1102]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1),
    Interaction(2,0,-1),
    Interaction(2,1,1)
]
a = Agent7(interactions)
e = Environment6()

# Run the interaction loop
step = 0
outcome = 0

In [92]:
print(f"Step {step}")
step += 1
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df[['composite', 'weight', 'expected_valence', 'action', 'proclivity', 'proclivity_sum', 'intended']]

Step 0
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1
Intended 00:-1


Unnamed: 0,composite,weight,expected_valence,action,proclivity,proclivity_sum,intended
0,,0.0,,0.0,,0.0,0
1,,0.0,,1.0,,0.0,10
2,,0.0,,2.0,,0.0,20


## Test your Agent7 in the grid world

## Report 

Explain what you programmed and what results you observed. Export this document as PDF including your code, the traces you obtained, and your explanations below (no more than a few paragraphs):