[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent7.ipynb)

# L'AGENT QUI ESPERAIT GAGNER AU COUP SUIVANT

# Objectifs

Ce tutoriel montre comment implémenter un agent qui prend une décision basée sur l'espérence de valence des deux prochains cycles d'interaction.

# Préparons les classes Interaction et CompositeInteraction

In [None]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install ipywidgets
!pip install IPython.display

Même classe Interaction que pour l'Agent7

In [797]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, _action, _outcome, _valence):
        self._action = _action
        self._outcome = _outcome
        self._valence = _valence

    def get_action(self):
        """Return the action"""
        return self._action

    def get_decision(self):
        """Return the decision key"""
        return f"a{self._action}"

    def get_primitive_action(self):
        """Return the action for compatibility with CompositeInteraction"""
        return self._action

    def get_outcome(self):
        """Return the action"""
        return self._outcome

    def get_valence(self):
        """Return the action"""
        return self._valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self._action}{self._outcome}"

    def pre_key(self):
        """Return the key. Used for compatibility with CompositeInteraction"""
        return self.key()

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self._action}{self._outcome}:{self._valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False

Même classe CompositeInteraction

In [798]:
class CompositeInteraction:
    """A composite interaction is a tuple (pre_interaction, post_interaction) and a weight"""
    def __init__(self, pre_interaction, post_interaction):
        self.pre_interaction = pre_interaction
        self.post_interaction = post_interaction
        self.weight = 1
        self.isActivated = False

    def get_decision(self):
        """Return the sequence of decisions"""
        # return self.key()
        return f"{self.pre_interaction.key()}{self.post_interaction.get_decision()}"

    def get_primitive_action(self):
        """Return the primite action"""
        return self.pre_interaction.get_primitive_action()

    def get_valence(self):
        """Return the valence of the pre_interaction plus the valence of the post_interaction"""
        return self.pre_interaction.get_valence() + self.post_interaction.get_valence()

    def reinforce(self):
        """Increment the composite interaction's weight"""
        self.weight += 1

    def key(self):
        """ The key to find this interaction in the dictionary is the string '<pre_interaction><post_interaction>'. """
        return f"({self.pre_interaction.key()},{self.post_interaction.key()})"

    def pre_key(self):
        """Return the key of the pre_interaction"""
        return self.pre_interaction.pre_key()

    def __str__(self):
        """ Print the interaction in the Newick tree format (pre_interaction, post_interaction: valence) """
        return f"({self.pre_interaction}, {self.post_interaction}: {self.weight})"

    def __eq__(self, other):
        """ Interactions are equal if they have the same pre and post interactions """
        if isinstance(other, self.__class__):
            return (self.pre_interaction == other.pre_interaction) and (self.post_interaction == other.post_interaction)
        else:
            return False

# Définissons l'agent

Implémentons un agent qui calcule la valence attendue sur la base d'une anticipation de deux pas de temps.

La Figure 1 illustre ce mécanisme.

![Agent5](img/Figure_1_Agent7_3.svg)

Figure 1: L'agent calcule une valence attendue pour chaque interaction proposée avant de les aggréger.

Les interactions proposées sont aggrégées par leur décision $d$.
La valence attendue $\mathbb{E}(V_d)$ est calculée en fonction de la valence attendue de chaque interaction pouvant résulter de cette décision et leurs probabilités de survenir.

## Calcul de la valence attendue

La valence attendue (_expected valence_) $\mathbb{E}(V_d)$ est la somme des valence des interactions pouvant résulter de cette décision multipliées par la probabilité d'enacter chacune de ces interaction :

$\displaystyle \mathbb{E}(V_d) = \sum_{i \in I_d} v_{i} \cdot \hat{p}_{i} $

Ici nous simplifions ce calcul en ne considérant que deux interactions pouvant résulter de la décision $d$ :
* L'interaction qui consiste à enacter entièrement la décision $d$.
* L'interaction qui résulte d'un échec de la première interaction primitive de la décision $d$.

# Examinons l'agent 

In [902]:
import pandas as pd

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._composite_interactions = {}
        self._intended_interaction = self._interactions["00"]
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        self._last_composite_interaction = None
        self._previous_composite_interaction = None
        # Create a dataframe of default primitive interactions
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'activated': [np.nan] * len(default_interactions),
                'weight': [0] * len(default_interactions),
                'action': [i.get_primitive_action() for i in default_interactions],
                'interaction': [i.key() for i in default_interactions],
                'valence': [i.get_valence() for i in default_interactions],
                'decision': [i.get_decision() for i in default_interactions],
                'proclivity': [0] * len(default_interactions)}
        self.primitive_df = pd.DataFrame(data)
        # Store the selection dataframe as a class attribute so we can display it in the notebook
        self.proposed_df = None

    def action(self, _outcome):
        """Implement the agent's policy"""
        # Memorize the context
        self._previous_composite_interaction = self._last_composite_interaction
        self._penultimate_interaction = self._previous_interaction
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[f"{self._intended_interaction.get_action()}{_outcome}"]

        # tracing the previous cycle
        print(
            f"Action: {self._intended_interaction.get_action()}, Prediction: {self._intended_interaction.get_outcome()}, "
            f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.get_outcome() == _outcome}, "
            f"Valence: {self._last_interaction.get_valence()}")

        # Call the learning mechanism
        self.learn()

        # Create the proposed dataframe
        self.create_proposed_df()
        self.aggregate_propositions()

        # Select the intended primitive interaction
        self.decide()

        return self._intended_interaction.get_action()

    def learn(self):
        """Learn the composite interactions"""
        # First level of composite interactions
        self._last_composite_interaction = self.learn_composite_interaction(self._previous_interaction,
                                                                            self._last_interaction)
        # Second level of composite interactions
        self.learn_composite_interaction(self._previous_composite_interaction, self._last_interaction)
        self.learn_composite_interaction(self._penultimate_interaction, self._last_composite_interaction)

    def learn_composite_interaction(self, pre_interaction, post_interaction):
        """Record or reinforce the composite interaction made of (pre_interaction, post_interaction)"""
        if pre_interaction is None:
            return None
        else:
            # If the pre interaction exist
            composite_interaction = CompositeInteraction(pre_interaction, post_interaction)
            if composite_interaction.key() not in self._composite_interactions:
                # Add the composite interaction to memory
                self._composite_interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                return composite_interaction
            else:
                # Reinforce the existing composite interaction and return it
                self._composite_interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._composite_interactions[composite_interaction.key()]}")
                return self._composite_interactions[composite_interaction.key()]

    def create_proposed_df(self):
        """Create the proposed dataframe from the activated interactions"""
        # The list of activated interaction that match the current context
        activated_keys = [composite_interaction.key() for composite_interaction in
                          self._composite_interactions.values()
                          if composite_interaction.pre_interaction == self._last_interaction or
                          composite_interaction.pre_interaction == self._last_composite_interaction]
        data = {'activated': activated_keys,
                'weight': [self._composite_interactions[k].weight for k in activated_keys],
                'action': [self._composite_interactions[k].post_interaction.get_primitive_action() for k in activated_keys],
                'interaction': [self._composite_interactions[k].post_interaction.pre_key() for k in activated_keys],
                'valence': [self._composite_interactions[k].post_interaction.get_valence() for k in activated_keys],
                'decision': [self._composite_interactions[k].post_interaction.get_decision() for k in activated_keys],
                }
        activated_df = pd.DataFrame(data)

        # Create the selection dataframe from the primitive and the activated dataframes
        self.proposed_df = pd.concat([self.primitive_df, activated_df], ignore_index=True)

        # Compute the proclivity for each proposition
        self.proposed_df['proclivity'] = self.proposed_df['weight'] * self.proposed_df['valence']

    def aggregate_propositions(self):
        """Aggregate the proclivity"""
        # Compute the proclivity for each action
        grouped_df = self.proposed_df.groupby('decision').agg({'proclivity': 'sum'}).reset_index()
        self.proposed_df = self.proposed_df.merge(grouped_df, on='decision', suffixes=('', '_agg'))
        # Sort by descending order of proclivity
        self.proposed_df = self.proposed_df.sort_values(by=['proclivity_agg', 'decision'], ascending=[False, False])

        # Find the most probable primitive interaction for each action
        max_weight_df = self.proposed_df.loc[self.proposed_df.groupby('decision')['weight'].idxmax(), ['decision', 'interaction']].reset_index(
            drop=True)
        max_weight_df.columns = ['decision', 'intended']
        self.proposed_df = self.proposed_df.merge(max_weight_df, on='decision')
        
    def decide(self):
        """Selects the intended interaction from the proposed dataframe"""
        # Find the row that has the highest proclivity
        max_index = self.proposed_df['proclivity_agg'].idxmax()
        # Find the intended interaction in the row that has the highest proclivity
        intended_interaction_key = self.proposed_df.loc[max_index, ['intended']].values[0]
        self._intended_interaction = self._interactions[intended_interaction_key]
        print("Intended", self._intended_interaction)


## Implémentons l'environnement de la petite boucle 

On crée l'environnement Small Loop

In [903]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, BoundaryNorm
from ipywidgets import Button, HBox,VBox, Output
from IPython.display import display

FORWARD = 0
TURN_LEFT = 1
TURN_RIGHT = 2
FEEL_FRONT = 3
FEEL_LEFT = 4
FEEL_RIGHT = 5

LEFT = 0
DOWN = 1
RIGHT = 2
UP = 3
FEELING = 2
BUMPING = 3

class SmallLoop():
    def __init__(self, poX, poY, direction):
        self.grid = np.array([
            [1, 0, 0, 0, 0, 1]
        ])
        self.maze = self.grid.copy()
        self.poX = poX
        self.poY = poY
        self.direction = direction
        self.cmap = ListedColormap(['white', 'green', 'yellow', 'red'])
        self.norm = BoundaryNorm([-0.5, 0.5, 1.5, 2.5, 3.5], self.cmap.N)

    def outcome(self, action):
        # print('before:', self.agent_position.strPosition(), action_dcit[action])
        self.maze[:,:] = self.grid
        result = 0
        
        if action == FORWARD:  # move forward
            # print('the action is move forward')
            # print(str(self.position.pointX)+': '+ str(self.position.pointY)+ ' ' +self.direction, action)
        
            if self.direction == LEFT:
                if self.maze[self.poX][self.poY - 1] == 0:
                    self.poY -= 1
                else:
                    self.maze[self.poX][self.poY - 1] = BUMPING
                    result = 1
            elif self.direction == DOWN:
                if self.maze[self.poX + 1][self.poY] == 0:
                    self.poX += 1
                else:
                    self.maze[self.poX + 1][self.poY] = BUMPING
                    result = 1
            elif self.direction == RIGHT:
                if self.maze[self.poX][self.poY + 1] == 0:
                    self.poY += 1
                else:
                    self.maze[self.poX][self.poY + 1] = BUMPING
                    result = 1
            elif self.direction == UP:
                if self.maze[self.poX - 1][self.poY] == 0:
                    self.poX -= 1
                else:
                    self.maze[self.poX - 1][self.poY] = BUMPING
                    result = 1
            # print(str(self.position.pointX)+': '+ str(self.position.pointY)+ ' ' +self.direction, action)
        elif action == TURN_RIGHT:
            if self.direction == LEFT:
                self.direction = UP
            elif self.direction == DOWN:
                self.direction = LEFT
            elif self.direction == RIGHT:
                self.direction = DOWN
            elif self.direction == UP:
                self.direction = RIGHT
        elif action == TURN_LEFT:
            if self.direction == LEFT:
                self.direction = RIGHT  # DOWN
            elif self.direction == DOWN:
                self.direction = RIGHT
            elif self.direction == RIGHT:
                self.direction = LEFT  # UP
            elif self.direction == UP:
                self.direction = LEFT
        elif action == FEEL_FRONT:
            if self.direction == LEFT:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
        elif action == FEEL_LEFT:
            if self.direction == LEFT:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
        elif action == FEEL_RIGHT:
            if self.direction == LEFT:
                if self.maze[self.poX - 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX - 1][self.poY] = FEELING
            elif self.direction == DOWN:
                if self.maze[self.poX][self.poY - 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY - 1] = FEELING
            elif self.direction == RIGHT:
                if self.maze[self.poX + 1][self.poY] != 0:
                    result = 1
                self.maze[self.poX + 1][self.poY] = FEELING
            elif self.direction == UP:
                if self.maze[self.poX][self.poY + 1] != 0:
                    result = 1
                self.maze[self.poX][self.poY + 1] = FEELING
        print(f"Line: {self.poX}, Column: {self.poY}, direction: {self.direction}")
        # return self.position,

        return result  
    
    def display(self):
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # ax.set_xticks([])
            # ax.set_yticks([])
            # ax.axis('off')
            # ax.imshow(self.maze, cmap='Greens', vmin=0, vmax=2)
            ax.imshow(self.maze, cmap=self.cmap, norm=self.norm)
            if self.direction == LEFT:
                # Y is column and X is line
                plt.scatter(self.poY, self.poX, s=400, marker='<')
            elif self.direction == DOWN:
                plt.scatter(self.poY, self.poX, s=400, marker='v')
            elif self.direction == RIGHT:
                plt.scatter(self.poY, self.poX, s=400, marker='>')
            else: # UP
                plt.scatter(self.poY, self.poX, s=400, marker='^')
            plt.show()

On instancie l'agent et l'environnement

In [1011]:
# Instanciate the small loop environment
e = SmallLoop(0, 1, 0)

# Instanciate the agent 
interactions = [
    Interaction(0,0,5),
    Interaction(0,1,-10),
    Interaction(1,0,-6),
    Interaction(1,1,-6),
    # Interaction(2,0,-2),
    # Interaction(2,1,-2),
    Interaction(3,0,-1),
    Interaction(3,1,-1)
]
a = Agent(interactions)

# Run the interaction loop
step = 0
outcome = 0

# Display
out = Output()
e.display()
display(out)

Output()

In [1074]:
print(f"Step {step}")
step += 1
action = a.action(outcome)
e.display()
outcome = e.outcome(action)
a.proposed_df

Step 62
Action: 3, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: -1
Reinforcing (31:-1, 31:-1: 43)
Reinforcing ((31:-1, 31:-1: 43), 31:-1: 41)
Reinforcing (31:-1, (31:-1, 31:-1: 43): 41)
Intended 31:-1
Line: 0, Column: 4, direction: 2


Unnamed: 0,activated,weight,action,interaction,valence,decision,proclivity,proclivity_agg,intended
0,"(31,(31,10))",1,3,31,-7,31a1,-7,-7,31
1,"(31,(10,30))",1,1,10,-7,10a3,-7,-7,10
2,,0,0,0,5,a0,0,-10,1
3,"(31,01)",1,0,1,-10,a0,-10,-10,1
4,,0,1,10,-6,a1,0,-12,10
5,"(31,10)",1,1,10,-6,a1,-6,-12,10
6,"((31,31),10)",1,1,10,-6,a1,-6,-12,10
7,"(31,(01,10))",1,0,1,-16,01a1,-16,-16,1
8,"(31,(31,31))",41,3,31,-2,31a3,-82,-82,31
9,,0,3,30,-1,a3,0,-84,31


# Test

## Implémentons l'agent

In [433]:
import pandas as pd

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._composite_interactions = {}
        self._intended_interaction = self._interactions["00"]
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        self._last_composite_interaction = None
        self._previous_composite_interaction = None
        # Create a dataframe of default primitive interactions
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'proposed': [i.key() for i in default_interactions],
                'E(Vi)': [0.] * len(default_interactions),
                'action': [i.get_action() for i in default_interactions],
                'E(Va)': [0.] * len(default_interactions),
                'interaction': [i.key() for i in default_interactions],
                'weight': [0] * len(default_interactions)}
        self.primitive_df = pd.DataFrame(data)
        # Store the selection dataframe as a class attribute so we can display it in the notebook
        self.selection_df = None

    def action(self, _outcome):
        """Implement the agent's policy"""
        # Memorize the context
        self._previous_composite_interaction = self._last_composite_interaction
        self._penultimate_interaction = self._previous_interaction
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[f"{self._intended_interaction.get_action()}{_outcome}"]

        # tracing the previous cycle
        print(
            f"Action: {self._intended_interaction.get_action()}, Prediction: {self._intended_interaction.get_outcome()}, "
            f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.get_outcome() == _outcome}, "
            f"Valence: {self._last_interaction.get_valence()}")

        # Call the learning mechanism
        self.learn()

        # Calculate the proposed dataframe
        self.calculate_proposed_df()

        # Select the intended primitive interaction
        self.select_intended_interaction()

        return self._intended_interaction.get_action()

    def learn(self):
        """Learn the composite interactions"""
        # First level of composite interactions
        self._last_composite_interaction = self.learn_composite_interaction(self._previous_interaction,
                                                                            self._last_interaction)
        # Second level of composite interactions
        self.learn_composite_interaction(self._previous_composite_interaction, self._last_interaction)
        self.learn_composite_interaction(self._penultimate_interaction, self._last_composite_interaction)

    def learn_composite_interaction(self, pre_interaction, post_interaction):
        if pre_interaction is None:
            return None
        else:
            # Record or reinforce the first level composite interaction
            composite_interaction = CompositeInteraction(pre_interaction, post_interaction)
            if composite_interaction.key() not in self._composite_interactions:
                self._composite_interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                return composite_interaction
            else:
                self._composite_interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._composite_interactions[composite_interaction.key()]}")
                # Retrieve the existing composite interaction
                return self._composite_interactions[composite_interaction.key()]

    def calculate_proposed_df(self):
        """Select the action that has the highest expected valence"""

        # The activated composite interactions
        activated_keys = [composite_interaction.key() for composite_interaction in self._composite_interactions.values()
                          if composite_interaction.pre_interaction == self._last_interaction or
                          composite_interaction.pre_interaction == self._last_composite_interaction]

        # Create the dataframe of sequences
        series_df = pd.DataFrame(columns=['proposed', 'weight', 'a_t', 'i_t', 'a_t+1', 'i_t+1'])
        for k in activated_keys:
            new_row = {'proposed': self._composite_interactions[k].post_interaction.key(),
                       'weight': self._composite_interactions[k].weight,
                       'a_t': self._composite_interactions[k].post_interaction.get_primitive_action()}
            if type(self._composite_interactions[k].post_interaction) == Interaction:
                new_row['i_t'] = self._composite_interactions[k].post_interaction.key()
            else:
                new_row['i_t'] = self._composite_interactions[k].post_interaction.pre_interaction.key()
                new_row['a_t+1'] = self._composite_interactions[k].post_interaction.post_interaction.get_primitive_action()
                new_row['i_t+1'] = self._composite_interactions[k].post_interaction.post_interaction.key()
            series_df = pd.concat([series_df, pd.DataFrame([new_row])], ignore_index=True)
        # print(series_df)

        # The probability P(it|at)
        total_by_i = series_df.groupby(["a_t", "i_t"], as_index=False)["weight"].sum().rename(columns={"weight": "i_weight"})
        total_by_a = series_df.groupby("a_t", as_index=False)["weight"].sum().rename(columns={"weight": "a_weight"})
        p_t_df = pd.merge(total_by_i, total_by_a, on="a_t")
        p_t_df['P(it|at)'] = p_t_df['i_weight'] / p_t_df['a_weight']
        # print(p_t_df[['a_t', 'i_t', 'P(it|at)']])

        # The probability P(it+1|it, at+1)
        series_filtered = series_df.dropna(subset=["i_t+1"])
        total_by_i = series_filtered.groupby(["i_t", "a_t+1", "i_t+1"], as_index=False)["weight"].sum().rename(columns={"weight": "t+1_weight"})
        total_by_a = series_filtered.groupby(["i_t", "a_t+1"], as_index=False)["weight"].sum().rename(columns={"weight": "t_weight"})
        p_t1_df = pd.merge(total_by_i, total_by_a, on=["i_t", "a_t+1"])
        p_t1_df['P(it+1|it, at+1)'] = p_t1_df['t+1_weight'] / p_t1_df['t_weight']
        # print(p_t1_df[['i_t', 'a_t+1', 'i_t+1', 'P(it+1|it, at+1)']])  # [['i_t', 'a_t+1', 'i_t+1', 'P(it+1|it, at+1)']]

        # Create the dataframe of proposed interactions
        data = {'proposed': [self._composite_interactions[k].post_interaction.key() for k in activated_keys],
                'E(Vi)': [0.] * len(activated_keys),
                'action': [self._composite_interactions[k].post_interaction.get_primitive_action() for k in
                           activated_keys],
                'E(Va)': [0.] * len(activated_keys),
                'weight': [self._composite_interactions[k].weight for k in activated_keys],
                'interaction': [self._composite_interactions[k].post_interaction.pre_key() for k in activated_keys]
                }
        expected_df = pd.DataFrame(data)
        # Add default interactions
        expected_df = pd.concat([self.primitive_df, expected_df], ignore_index=True)

        # Remove the post_interaction that have a negative valence
        for i, k in expected_df["proposed"].items():
            if k in self._composite_interactions and self._composite_interactions[k].post_interaction.get_valence() < 0:
                expected_df.at[i, "proposed"] = self._composite_interactions[k].pre_interaction.key()

        # Remove the interactions that are the beginning of a longer interaction
        to_remove = set()
        for i, k1 in expected_df["proposed"].items():
            for j, k2 in expected_df["proposed"].items():
                if i not in to_remove and j not in to_remove and i != j:
                    if k1 in self._composite_interactions:
                        s1 = pd.Series(self._composite_interactions[k1].get_primitive_series())
                    else:
                        s1 = pd.Series(k1)
                    if k2 in self._composite_interactions:
                        s2 = pd.Series(self._composite_interactions[k2].get_primitive_series())
                    else:
                        s2 = pd.Series(k2)
                    if len(s1) <= len(s2):
                        if s1.equals(s2.iloc[:len(s1)]):
                            # print(f"Remove {s1.tolist()} from {i} to {j} weight {expected_df.at[i, 'weight']}")
                            to_remove.add(i)  # Mark the sequence to be removed
                            expected_df.at[j, "weight"] += expected_df.at[i, "weight"]
                        elif s2.equals(s1.iloc[:len(s2)]):
                            # print(f"Remove {s2.tolist()} from {j} to {i} weight {expected_df.at[j, 'weight']}")
                            to_remove.add(j)  # Mark the sequence to be removed
                            expected_df.at[i, "weight"] += expected_df.at[j, "weight"]
        expected_df = expected_df.drop(index=to_remove)
        # Compute the expected valence of interactions
        for i, k in expected_df["proposed"].items():
            if k in self._interactions:
                first_row = p_t_df.loc[p_t_df["i_t"] == k].head(1)  # ["P(it|at)"].values[0]
                p = first_row["P(it|at)"].values[0] if not first_row.empty else 0
                # print(f"E(vi) of {k} probability {p}")
                expected_df.at[i, "E(Vi)"] = p * self._interactions[k].get_valence()
            else:
                k1 = self._composite_interactions[k].pre_interaction.key()
                p1 = p_t_df.loc[p_t_df["i_t"] == k1].head(1)["P(it|at)"].values[0]
                k2 = self._composite_interactions[k].post_interaction.key()
                p2 = p_t1_df.loc[p_t1_df["i_t+1"] == k2].head(1)["P(it+1|it, at+1)"].values[0]
                expected_df.at[i, "E(Vi)"] = p1 * (self._interactions[k1].get_valence()
                                                   + p2 * self._interactions[k2].get_valence())
        # The sum expected valence per action
        expected_df["E(Va)"] = expected_df.groupby("action")["E(Vi)"].transform("sum")
        # The sum weight per action
        # expected_df["weight"] = expected_df.groupby("action")["weight"].transform("sum")

        # Find the most probable outcome for each action
        max_weight_df = expected_df.loc[expected_df.groupby('action')['weight'].idxmax(), ['action', 'interaction']].reset_index(
            drop=True)
        max_weight_df.columns = ['action', 'intended']
        expected_df = expected_df.merge(max_weight_df, on='action')

        # Store the dataframe for printing
        self.selection_df = expected_df.copy()

    def select_intended_interaction(self):
        """Selects the intended interaction from the proposed dataframe"""
        # Find the first row that has the highest proclivity
        max_index = self.selection_df['E(Va)'].idxmax()
        intended_interaction_key = self.selection_df.loc[max_index, ['intended']].values[0]
        print(f"Intended Max E(Va) {intended_interaction_key}")
        self._intended_interaction = self._interactions[intended_interaction_key]


# PRELIMINARY EXERCISE

## Let's create Environment6

The agent has two possible actions: move to the left or move to the right. 
The environment returns outcome 1 when the agent bumps into a light green wall, and then the wall turns dark green until the agent moves away.

In [823]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import Output
from IPython.display import display

class Environment6:
    """ The grid """
    def __init__(self):
        """ Initialize the grid """
        self.grid = np.array([[1, 0, 0, 1]])
        self.position = 1

    def outcome(self, action):
        """Take the action and generate the next outcome """
        if action == 0:
            # Move left
            if self.position > 1:
                # No bump
                self.position -= 1
                self.grid[0, 3] = 1
                outcome = 0
            elif self.grid[0, 0] == 1:
                # First bump
                outcome = 1
                self.grid[0, 0] = 2
            else:
                # Subsequent bumps
                outcome = 0
        else:
            # Move right
            if self.position < 2:
                # No bump
                self.position += 1
                self.grid[0, 0] = 1
                outcome = 0
            elif self.grid[0, 3] == 1:
                # First bump
                outcome = 1
                self.grid[0, 3] = 2
            else:
                # Subsequent bumps
                outcome = 0
        return outcome  

    def display(self):
        """Display the grid"""
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # Hide the ticks
            ax.set_xticks([])
            ax.set_yticks([])
            # Display the grid
            ax.imshow(self.grid, cmap='Greens', vmin=0, vmax=2)
            plt.scatter(self.position, 0, s=1000)
            plt.show()

## Run the agent in Environment6

In [824]:
# Instanciate the agent in Environment6
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment6()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

Run the simulation step by step to see the Proposed DataFrame. Use `Ctrl+Enter` to run the cell bellow and stay on it.

In [979]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df

Step 14


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 1, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1
Reinforcing (10:-1, 10:-1: 2)
Learning ((00:-1, 10:-1: 2), 10:-1: 1)
Learning (00:-1, (10:-1, 10:-1: 2): 1)
Intended Max proclivity 00


Unnamed: 0,proposed,E(Vi),action,E(Va),interaction,weight,intended,weight_sum,proclivity
0,"(00,01)",-0.5,0,-0.5,0,5,0,5,-2.5
1,10,-1.0,1,-1.0,10,3,10,3,-3.0


Observe that, on step 8, the composite interaction (00, 01) is proposed but its expected valence equals 0 and it is not selected.

After Step 18, however the agent alternates the sequences (00, 01) and (10, 11) that gives the best average valence the agent can get in this environment. 

## Let's create Environment 7

In Environment7, the agent has two possible actions: move forward or turn 180°.
Like Environment6, Environment7 return 1 only when the agent bumps into a wall once. 

In [491]:
class Environment7:
    """ The grid """
    def __init__(self):
        """ Initialize the grid and the agent's pose """
        self.grid = np.array([[1, 0, 0, 1]])
        self.position = 1
        self.direction = 0

    def outcome(self, action):
        """Take the action and generate the next outcome """
        if action == 0:
            # Move forward
            if self.direction == 0:
                # Move to the left
                if self.position > 1:
                    # No bump
                    self.position -= 1
                    self.grid[0, 3] = 1
                    outcome = 0
                elif self.grid[0, 0] == 1:
                    # First bump
                    outcome = 1
                    self.grid[0, 0] = 2
                else:
                    # Subsequent bumps
                    outcome = 0
            else:
                # Move to the right
                if self.position < 2:
                    # No bump
                    self.position += 1
                    self.grid[0, 0] = 1
                    outcome = 0
                elif self.grid[0, 3] == 1:
                    # First bump
                    outcome = 1
                    self.grid[0, 3] = 2
                else:
                    # Subsequent bumps
                    outcome = 0
        else:
            # Turn 180°
            outcome = 0
            if self.direction == 0:
                self.direction = 1
            else:
                self.direction = 0
        return outcome  

    def display(self):
        """Display the grid"""
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # Hide the ticks
            ax.set_xticks([])
            ax.set_yticks([])
            # Display the grid
            ax.imshow(self.grid, cmap='Greens', vmin=0, vmax=2)
            if self.direction == 0:
                # Display agent to the left
                plt.scatter(self.position, 0, s=1000, marker='<')
            else:
                # Display agent to the right
                plt.scatter(self.position, 0, s=1000, marker='>')
            plt.show()

## Test the Agent in Environment7

In [346]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment7()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

In [976]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df

Step 11


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 0, Prediction: 1, Outcome: 0, Prediction_correct: False, Valence: -1
Reinforcing (00:-1, 00:-1: 3)
Learning ((10:-1, 00:-1: 2), 00:-1: 1)
Learning (10:-1, (00:-1, 00:-1: 3): 1)
Intended Max proclivity 00


Unnamed: 0,proposed,E(Vi),action,E(Va),interaction,weight,intended,weight_sum,proclivity
0,0,-0.6,0,-0.2,0,6,0,10,-2.0
1,10,-1.0,1,-1.0,10,3,10,3,-3.0
2,1,0.4,0,-0.2,1,4,0,10,-2.0


Observe that Agent7 does not manage to learn to obtain the optimum avarage valence in Environment7

This is because it selects the next action that has the highest expecte valence but it does not take into account the proposal weight.

# ASSIGNMENT

Create Agent7 that computes the proclivity of proposed interactions by multiplying the expectet valence with the weight, and select the proposed interaction that has the highest proclivity.

In [492]:
class Agent7(Agent):
    def select_intended_interaction(self):
        """Selects the intended interaction from the proposed dataframe"""
        # Modify to compute the proclivity and select the action that has the hiest proclivity
        # Find the first row that has the highest proclivity
        max_index = self.selection_df['E(Va)'].idxmax()
        intended_interaction_key = self.selection_df.loc[max_index, ['intended']].values[0]
        print(f"Intended Max E(Va) {intended_interaction_key}")
        self._intended_interaction = self._interactions[intended_interaction_key]

In [868]:
class Agent7(Agent):
    def select_intended_interaction(self):
        # The sum weight per action
        grouped_df = self.selection_df.groupby('action').agg({'weight': 'sum'}).reset_index()
        self.selection_df = self.selection_df.merge(grouped_df, on='action', suffixes=('', '_sum'))
        # self.selection_df["sum_weight"] = self.selection_df.groupby("action")["weight"].transform("sum")

        # Compute the proclivity
        self.selection_df['proclivity'] = self.selection_df["weight_sum"] * self.selection_df['E(Va)']

        # Select the action that has the highest proclivity
        max_index = self.selection_df['proclivity'].idxmax()
        intended_interaction_key = self.selection_df.loc[max_index, ['intended']].values[0]
        print(f"Intended Max proclivity {intended_interaction_key}")
        self._intended_interaction = self._interactions[intended_interaction_key]    

## Test your Agent7 in Environment7

In [964]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent7(interactions)
e = Environment7()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

In [972]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df

Step 7


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 0, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: 1
Reinforcing (00:-1, 01:1: 2)
Learning ((10:-1, 00:-1: 1), 01:1: 1)
Learning (10:-1, (00:-1, 01:1: 2): 1)
Intended Max proclivity 10


Unnamed: 0,proposed,E(Vi),action,E(Va),interaction,weight,intended,weight_sum,proclivity
0,10,0.0,1,0.0,10,0,10,0,0.0
1,0,-1.0,0,-1.0,0,3,0,3,-3.0
