[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent7.ipynb)

# THE AGENT WHO LIVED THE NEXT DAY (Under development)

# Learning objectives

Upon completing this lab, you will be able to implement a developmental agent that selects an action based on the anticipation of the two next steps.

# Define the necessary classes

Ensure the required packages are installed if they aren't already.

In [None]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install ipywidgets

We keep improving the Interaction class.

In [2]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, _action, _outcome, _valence):
        self._action = _action
        self._outcome = _outcome
        self._valence = _valence

    def get_action(self):
        """Return the action"""
        return self._action

    def get_decision(self):
        """Return the decision key"""
        return f"a{self._action}"

    def get_primitive_action(self):
        """Return the action for compatibility with CompositeInteraction"""
        return self._action

    def get_outcome(self):
        """Return the action"""
        return self._outcome

    def get_valence(self):
        """Return the action"""
        return self._valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self._action}{self._outcome}"

    def pre_key(self):
        """Return the key. Used for compatibility with CompositeInteraction"""
        return self.key()

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self._action}{self._outcome}:{self._valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False

We keep improving the CompositeInteraction class.

In [3]:
class CompositeInteraction:
    """A composite interaction is a tuple (pre_interaction, post_interaction) and a weight"""
    def __init__(self, pre_interaction, post_interaction):
        self.pre_interaction = pre_interaction
        self.post_interaction = post_interaction
        self.weight = 1
        self.isActivated = False

    def get_decision(self):
        """Return the sequence of decisions"""
        # return self.key()
        return f"{self.pre_interaction.key()}{self.post_interaction.get_decision()}"

    def get_primitive_action(self):
        """Return the primite action"""
        return self.pre_interaction.get_primitive_action()

    def get_valence(self):
        """Return the valence of the pre_interaction plus the valence of the post_interaction"""
        return self.pre_interaction.get_valence() + self.post_interaction.get_valence()

    def reinforce(self):
        """Increment the composite interaction's weight"""
        self.weight += 1

    def key(self):
        """ The key to find this interaction in the dictionary is the string '<pre_interaction><post_interaction>'. """
        return f"({self.pre_interaction.key()},{self.post_interaction.key()})"

    def pre_key(self):
        """Return the key of the pre_interaction"""
        return self.pre_interaction.pre_key()

    def __str__(self):
        """ Print the interaction in the Newick tree format (pre_interaction, post_interaction: valence) """
        return f"({self.pre_interaction}, {self.post_interaction}: {self.weight})"

    def __eq__(self, other):
        """ Interactions are equal if they have the same pre and post interactions """
        if isinstance(other, self.__class__):
            return (self.pre_interaction == other.pre_interaction) and (self.post_interaction == other.post_interaction)
        else:
            return False


Let's use Agent7 that we desined previously

The new `aggregate_propositions()` method aggregates the proclivities of each proposition by decision

In [5]:
import pandas as pd

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._composite_interactions = {}
        self._intended_interaction = self._interactions["00"]
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        self._last_composite_interaction = None
        self._previous_composite_interaction = None
        # Create a dataframe of default primitive interactions
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'activated': [np.nan] * len(default_interactions),
                'weight': [0] * len(default_interactions),
                'action': [i.get_primitive_action() for i in default_interactions],
                'interaction': [i.key() for i in default_interactions],
                'valence': [i.get_valence() for i in default_interactions],
                'decision': [i.get_decision() for i in default_interactions],
                'proclivity': [0] * len(default_interactions)}
        self.primitive_df = pd.DataFrame(data)
        # Store the selection dataframe as a class attribute so we can display it in the notebook
        self.proposed_df = None

    def action(self, _outcome):
        """Implement the agent's policy"""
        # Memorize the context
        self._previous_composite_interaction = self._last_composite_interaction
        self._penultimate_interaction = self._previous_interaction
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[f"{self._intended_interaction.get_action()}{_outcome}"]

        # tracing the previous cycle
        print(
            f"Action: {self._intended_interaction.get_action()}, Prediction: {self._intended_interaction.get_outcome()}, "
            f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.get_outcome() == _outcome}, "
            f"Valence: {self._last_interaction.get_valence()}")

        # Call the learning mechanism
        self.learn()

        # Create the proposed dataframe
        self.create_proposed_df()
        self.aggregate_propositions()

        # Select the intended primitive interaction
        self.decide()

        return self._intended_interaction.get_action()

    def learn(self):
        """Learn the composite interactions"""
        # First level of composite interactions
        self._last_composite_interaction = self.learn_composite_interaction(self._previous_interaction,
                                                                            self._last_interaction)
        # Second level of composite interactions
        self.learn_composite_interaction(self._previous_composite_interaction, self._last_interaction)
        self.learn_composite_interaction(self._penultimate_interaction, self._last_composite_interaction)

    def learn_composite_interaction(self, pre_interaction, post_interaction):
        """Record or reinforce the composite interaction made of (pre_interaction, post_interaction)"""
        if pre_interaction is None:
            return None
        else:
            # If the pre interaction exist
            composite_interaction = CompositeInteraction(pre_interaction, post_interaction)
            if composite_interaction.key() not in self._composite_interactions:
                # Add the composite interaction to memory
                self._composite_interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                return composite_interaction
            else:
                # Reinforce the existing composite interaction and return it
                self._composite_interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._composite_interactions[composite_interaction.key()]}")
                return self._composite_interactions[composite_interaction.key()]

    def create_proposed_df(self):
        """Create the proposed dataframe from the activated interactions"""
        # The list of activated interaction that match the current context
        activated_keys = [composite_interaction.key() for composite_interaction in
                          self._composite_interactions.values()
                          if composite_interaction.pre_interaction == self._last_interaction or
                          composite_interaction.pre_interaction == self._last_composite_interaction]
        data = {'activated': activated_keys,
                'weight': [self._composite_interactions[k].weight for k in activated_keys],
                'action': [self._composite_interactions[k].post_interaction.get_primitive_action() for k in activated_keys],
                'interaction': [self._composite_interactions[k].post_interaction.pre_key() for k in activated_keys],
                'valence': [self._composite_interactions[k].post_interaction.get_valence() for k in activated_keys],
                'decision': [self._composite_interactions[k].post_interaction.get_decision() for k in activated_keys],
                }
        activated_df = pd.DataFrame(data)

        # Create the selection dataframe from the primitive and the activated dataframes
        self.proposed_df = pd.concat([self.primitive_df, activated_df], ignore_index=True)

        # Compute the proclivity for each proposition
        self.proposed_df['proclivity'] = self.proposed_df['weight'] * self.proposed_df['valence']

    def aggregate_propositions(self):
        """Aggregate the proclivity"""
        # Compute the proclivity for each action
        grouped_df = self.proposed_df.groupby('decision').agg({'proclivity': 'sum'}).reset_index()
        self.proposed_df = self.proposed_df.merge(grouped_df, on='decision', suffixes=('', '_agg'))
        # Sort by descending order of proclivity
        self.proposed_df = self.proposed_df.sort_values(by=['proclivity_agg', 'decision'], ascending=[False, False])

        # Find the most probable primitive interaction for each action
        max_weight_df = self.proposed_df.loc[self.proposed_df.groupby('decision')['weight'].idxmax(), ['decision', 'interaction']].reset_index(
            drop=True)
        max_weight_df.columns = ['decision', 'intended']
        self.proposed_df = self.proposed_df.merge(max_weight_df, on='decision')

    def decide(self):
        """Selects the intended interaction from the proposed dataframe"""
        # Find the row that has the highest proclivity
        max_index = self.proposed_df['proclivity_agg'].idxmax()
        # Find the intended interaction in the row that has the highest proclivity
        intended_interaction_key = self.proposed_df.loc[max_index, ['intended']].values[0]
        self._intended_interaction = self._interactions[intended_interaction_key]
        print("Intended", self._intended_interaction)


Let's test this agent in Environment6 as in the previous notebook.

In [6]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import Output
from IPython.display import display

class Environment6:
    """ The grid """
    def __init__(self):
        """ Initialize the grid """
        self.grid = np.array([[1, 0, 0, 1]])
        self.position = 1

    def outcome(self, _action):
        """Take the action and generate the next outcome """
        if _action == 0:
            # Move left
            if self.position > 1:
                # No bump
                self.position -= 1
                self.grid[0, 3] = 1
                _outcome = 0
            elif self.grid[0, 0] == 1:
                # First bump
                _outcome = 1
                self.grid[0, 0] = 2
            else:
                # Subsequent bumps
                _outcome = 0
        else:
            # Move right
            if self.position < 2:
                # No bump
                self.position += 1
                self.grid[0, 0] = 1
                _outcome = 0
            elif self.grid[0, 3] == 1:
                # First bump
                _outcome = 1
                self.grid[0, 3] = 2
            else:
                # Subsequent bumps
                _outcome = 0
        return _outcome
        
    def display(self):
        """Display the grid"""
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # Hide the ticks
            ax.set_xticks([])
            ax.set_yticks([])
            # Display the grid
            ax.imshow(self.grid, cmap='Greens', vmin=0, vmax=2)
            plt.scatter(self.position, 0, s=1000)
            plt.show()

## Run the agent in Environment6

In [7]:
# Instanciate the agent in Environment6
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment6()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

Run the simulation step by step to see the environment and the proposed DataFrame. Use `Ctrl+Enter` to run the cell bellow and stay on it.

In [43]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.proposed_df

Step 35


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 1, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1
Reinforcing (01:1, 10:-1: 6)
Reinforcing ((00:-1, 01:1: 7), 10:-1: 6)
Reinforcing (00:-1, (01:1, 10:-1: 6): 6)
Intended 11:1


Unnamed: 0,activated,weight,action,interaction,valence,decision,proclivity,proclivity_agg,intended
0,,0,1,10,-1,a1,0,10,11
1,"(10,11)",7,1,11,1,a1,7,10,11
2,"(10,10)",2,1,10,-1,a1,-2,10,11
3,"((01,10),11)",5,1,11,1,a1,5,10,11
4,"(10,(11,10))",1,1,11,0,11a1,0,0,11
5,"(10,(11,00))",6,1,11,0,11a0,0,0,11
6,"(10,(00,01))",1,0,0,0,00a0,0,0,0
7,,0,0,0,-1,a0,0,-1,0
8,"(10,00)",1,0,0,-1,a0,-1,-1,0
9,"(10,(10,10))",1,1,10,-2,10a1,-2,-2,10


Observe that the agent manages to obtain a positive valence every second step.

# ASSIGNMENT

Let's examine how the proclivity $proclivity_d$ is calculated for each decision $d$. 
As explained in Agent5, it  is the sum of the weight of the activated interaction multiplied by the valence of the proposed interaction:


$\displaystyle proclivity_d = \sum_{c \in A_d} w_c \cdot v_{post(c)}$

in which $A_d$ is the set of activated composite interactions that propose decision $d$, $w_c$ is the weight of composite interaction $c$, and $v_{post(c)}$ is the valence of the post interaction of $c$. 

As explained in Agent6, we expect this proclivity to reflect the expected valence when making this decision and to incorporate a "proposal weight" to favor habits:

$proclivity_d = W_d \cdot \hat{v}_d$

Now that we have composite decisions, however, the estimated expected valence $\hat{v}_d$ involves the estimated probability of susccessfully enacting each step of the decision:

$\displaystyle \hat{v}_d = \sum_{i \in I_d} \hat{p_i} \cdot v_i$

in which $I_d$ is the set of interactions that may result from $d$.
Interactions in $I_d$ may be primitive or composite, but they must be mutually exclusive. That is, sub-sequences must not be counted twice as a subpart of a longer sequence. 

For each of the possible interactions $i$ resulting from $d$, their estimated expected valence is the sum of the valence of their primitive interaction multiplied by the probability to reach this primitive interaction:

$\displaystyle \hat{v}_i = \sum_{j=1}^{len(i)}  v_{ij} \cdot \prod_{k=1}^{j} \hat{p}_{ik} $

in which $v_{ij}$ is the valence of the $j$'s primitive interaction in $i$, and $\hat{p}_{ik}$ is the estimated probability that the $k$'s interaction of $i$ is enacted if the $k-1$ is enacted. 

Now that we have three-step composite interactions, it is not obvious to compute the estimated probability of each step.

Let's look at the furmula introduced in Agent6:

$\displaystyle \hat{p}_{id} = \frac{\sum_{c \in A_{id}} w_c }{\sum_{c \in A_d} w_c }$ 

in which $A_d$ is the set of activated composite interactions proposing $d$, and $A_{id} \subset A_d$ is the set of activated composite interactions proposing interaction $i$ whose action corresponds to decision $d$. 