[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent7.ipynb)

# THE AGENT WHO SAW TOMORROW

# Learning objectives

Upon completing this lab, you will be able to implement a developmental agent that selects an action based on a two-step-ahead anticipation.

# Define the necessary classes

Ensure the required packages are installed if they aren't already.

In [None]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install ipywidgets
!pip install IPython.display

We keep improving the Interaction class.

In [485]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, action, outcome, valence):
        self._action = action
        self._outcome = outcome
        self._valence = valence

    def get_action(self):
        """Return the action"""
        return self._action

    def get_primitive_action(self):
        """Return the action for compatibility with CompositeInteraction"""
        return self._action

    def get_outcome(self):
        """Return the action"""
        return self._outcome

    def get_valence(self):
        """Return the action"""
        return self._valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self._action}{self._outcome}"

    def pre_key(self):
        """Return the key. Used for compatibility with CompositeInteraction"""
        return self.key()

    def get_primitive_series(self):
        """"Return the key in a list"""
        return [self.key()]

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self._action}{self._outcome}:{self._valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False


We keep improving the CompositeInteraction class.

In [486]:
class CompositeInteraction:
    """A composite interaction is a tuple (pre_interaction, post_interaction) and a weight"""

    def __init__(self, pre_interaction, post_interaction):
        self.pre_interaction = pre_interaction
        self.post_interaction = post_interaction
        self.weight = 1
        self.isActivated = False

    def get_action(self):
        """Return the action of the pre interaction"""
        return self.key()
        # return self.pre_interaction.get_action()

    def get_primitive_action(self):
        """Return the primite action"""
        return self.pre_interaction.get_primitive_action()

    def get_valence(self):
        """Return the valence of the pre_interaction plus the valence of the post_interaction"""
        return self.pre_interaction.get_valence() + self.post_interaction.get_valence()

    def reinforce(self):
        """Increment the composite interaction's weight"""
        self.weight += 1

    def key(self):
        """ The key to find this interaction in the dictionary is the string '<pre_interaction><post_interaction>'. """
        return f"({self.pre_interaction.key()},{self.post_interaction.key()})"

    def pre_key(self):
        """Return the key of the pre_interaction"""
        return self.pre_interaction.pre_key()

    def get_primitive_series(self):
        """"Return the list of primitive keys"""
        return self.pre_interaction.get_primitive_series() + self.post_interaction.get_primitive_series()

    def __str__(self):
        """ Print the interaction in the Newick tree format (pre_interaction, post_interaction: valence) """
        return f"({self.pre_interaction}, {self.post_interaction}: {self.weight})"

    def __eq__(self, other):
        """ Interactions are equal if they have the same pre and post interactions """
        if isinstance(other, self.__class__):
            return (self.pre_interaction == other.pre_interaction) and (self.post_interaction == other.post_interaction)
        else:
            return False

    def get_valence_series(self):
        """"Return the list of valences of primitive interactions"""
        return self.pre_interaction.get_valence_series() + self.post_interaction.get_valence_series()

    def get_primitive_series(self):
        """"Return the list of primitive keys"""
        return self.pre_interaction.get_primitive_series() + self.post_interaction.get_primitive_series()


# Prepare the Agent class

Let's implement Agent7 that calculates the expected valence based on a two-step anticipation. 
Figure 1 illustrates the learning mechanism and the interaction selection mechanism.

![Agent5](img/Figure_1_Agent7_3.svg)

Figure 1: Agent7 records and reinforces two levels of composite interactions:
* First-level composite interaction $c_{t-1} = (i_{t-2}, i_{t-1})$, 
* Second-level composite interaction $((i_{t-3}, i_{t-2}), i_{t-1})$, and $(i_{t-3}, (i_{t-2}, i_{t-1}))$. 

The last enacted primitive interaction $i_{t-1}$ and the last enacted composite interaction $c_{t-1}$ activates previously-learned composite interactions that propose their post interaction. 
Now the post interaction may be a composite interaction.

Post interactions are aggregated by their first action. The expected valence $\mathbb{E}(Va)$ is computed for each action. The action that has the highest expected valence is selected. 

## Calculate the expected Valence

An action's expected valence $\mathbb{E}(V_a)$ is calculated as follows.

We find the set $A$ of activated composite interaction that propose $a$. 

We find the set $S$ of the longuest sequences that are proposed by the activated composite interactions in $A$.
Sequences that are the beginning of a longer sequence are not included in $S$. 
Final interactions that have negative valence are removed from the sequence. 

We compute the expected valence of action $a$ by incorporating each steps of the sequence in the formula from Agent6:

$\displaystyle \mathbb{E}(V_a) = \sum_{s \in S} \sum_{j=1}^{n_s} v_{sj} \cdot \prod_{k=1}^{j} p_{sk} $

in which $n_s$ is the length of $s$, i.e., the number of primitive interaction of $s$, and $p_{sk}$ is the probability of successfully enacting the $k^{th}$ primitive interaction of $s$.

The probability estimation $\hat{p}_{s1}$ of engaging in sequence $s$ is computed as in Agent6: 

$\displaystyle \hat{p}_{s1} = \frac{\sum_{c \in A_s} w_c }{\sum_{c \in A_1} w_c } = \hat{p}(s|a_{s1}) $ 

$A_1$ is the set of activated composite interactions proposing the first action $a_{s1}$ that begins sequence $s$. $A_s \subset A_1$ is the set of activated composite interactions that propose $s$. 
In other words, $\hat{p}_{s1}$ is the ratio of the number of time $s$ was enacted over the number of time its first action $a_{s1}$ was selected.

The probability estimation $\hat{p}_{sk}$ of enacting the step $k \gt 1$ of a sequence is computed similarly:

$\displaystyle \hat{p}_{sk} = \frac{\sum_{c \in A_{sk}} w_c }{\sum_{c \in A_{k}} w_c} = \hat{p}(s_k|s_{k-1}, a_{sk}) $ 

$A_{k}$ is the set of activated composite interactions proposing the $k^{th}$ action $a_{k}$ of sequence $s$. 
$A_{sk} \subset A_k$ is the set of activated composite interactions proposing the $k^{th}$ interaction $s_{k}$ of sequence $s$. 
In other words, $\hat{p}_{sk}$ is the ratio of the number of time the primitive interaction $s_k$ was enacted at step $k$ over the number of time its action $a_{sk}$ was selected when trying to enact sequence $s$.


## Implement the agent

In [822]:
import pandas as pd

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._composite_interactions = {}
        self._intended_interaction = self._interactions["00"]
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        self._last_composite_interaction = None
        self._previous_composite_interaction = None
        # Create a dataframe of default primitive interactions
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'proposed': [i.key() for i in default_interactions],
                'E(Vi)': [0.] * len(default_interactions),
                'action': [i.get_action() for i in default_interactions],
                'E(Va)': [0.] * len(default_interactions),
                'interaction': [i.key() for i in default_interactions],
                'weight': [0] * len(default_interactions)}
        self.primitive_df = pd.DataFrame(data)
        # Store the selection dataframe as a class attribute so we can display it in the notebook
        self.selection_df = None

    def action(self, _outcome):
        """Implement the agent's policy"""
        # Memorize the context
        self._previous_composite_interaction = self._last_composite_interaction
        self._penultimate_interaction = self._previous_interaction
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[f"{self._intended_interaction.get_action()}{_outcome}"]

        # tracing the previous cycle
        print(
            f"Action: {self._intended_interaction.get_action()}, Prediction: {self._intended_interaction.get_outcome()}, "
            f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.get_outcome() == _outcome}, "
            f"Valence: {self._last_interaction.get_valence()}")

        # Call the learning mechanism
        self.learn()

        # Calculate the proposed dataframe
        self.calculate_proposed_df()

        # Select the intended primitive interaction
        self.select_intended_interaction()

        return self._intended_interaction.get_action()

    def learn(self):
        """Learn the composite interactions"""
        # First level of composite interactions
        self._last_composite_interaction = self.learn_composite_interaction(self._previous_interaction,
                                                                            self._last_interaction)
        # Second level of composite interactions
        self.learn_composite_interaction(self._previous_composite_interaction, self._last_interaction)
        self.learn_composite_interaction(self._penultimate_interaction, self._last_composite_interaction)

    def learn_composite_interaction(self, pre_interaction, post_interaction):
        if pre_interaction is None:
            return None
        else:
            # Record or reinforce the first level composite interaction
            composite_interaction = CompositeInteraction(pre_interaction, post_interaction)
            if composite_interaction.key() not in self._composite_interactions:
                self._composite_interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                return composite_interaction
            else:
                self._composite_interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._composite_interactions[composite_interaction.key()]}")
                # Retrieve the existing composite interaction
                return self._composite_interactions[composite_interaction.key()]

    def calculate_proposed_df(self):
        """Select the action that has the highest expected valence"""

        # The activated composite interactions
        activated_keys = [composite_interaction.key() for composite_interaction in self._composite_interactions.values()
                          if composite_interaction.pre_interaction == self._last_interaction or
                          composite_interaction.pre_interaction == self._last_composite_interaction]

        # Create the dataframe of sequences
        series_df = pd.DataFrame(columns=['proposed', 'weight', 'a_t', 'i_t', 'a_t+1', 'i_t+1'])
        for k in activated_keys:
            new_row = {'proposed': self._composite_interactions[k].post_interaction.key(),
                       'weight': self._composite_interactions[k].weight,
                       'a_t': self._composite_interactions[k].post_interaction.get_primitive_action()}
            if type(self._composite_interactions[k].post_interaction) == Interaction:
                new_row['i_t'] = self._composite_interactions[k].post_interaction.key()
            else:
                new_row['i_t'] = self._composite_interactions[k].post_interaction.pre_interaction.key()
                new_row['a_t+1'] = self._composite_interactions[k].post_interaction.post_interaction.get_primitive_action()
                new_row['i_t+1'] = self._composite_interactions[k].post_interaction.post_interaction.key()
            series_df = pd.concat([series_df, pd.DataFrame([new_row])], ignore_index=True)
        # print(series_df)

        # The probability P(it|at)
        total_by_i = series_df.groupby(["a_t", "i_t"], as_index=False)["weight"].sum().rename(columns={"weight": "i_weight"})
        total_by_a = series_df.groupby("a_t", as_index=False)["weight"].sum().rename(columns={"weight": "a_weight"})
        p_t_df = pd.merge(total_by_i, total_by_a, on="a_t")
        p_t_df['P(it|at)'] = p_t_df['i_weight'] / p_t_df['a_weight']
        # print(p_t_df[['a_t', 'i_t', 'P(it|at)']])

        # The probability P(it+1|it, at+1)
        series_filtered = series_df.dropna(subset=["i_t+1"])
        total_by_i = series_filtered.groupby(["i_t", "a_t+1", "i_t+1"], as_index=False)["weight"].sum().rename(columns={"weight": "t+1_weight"})
        total_by_a = series_filtered.groupby(["i_t", "a_t+1"], as_index=False)["weight"].sum().rename(columns={"weight": "t_weight"})
        p_t1_df = pd.merge(total_by_i, total_by_a, on=["i_t", "a_t+1"])
        p_t1_df['P(it+1|it, at+1)'] = p_t1_df['t+1_weight'] / p_t1_df['t_weight']
        # print(p_t1_df[['i_t', 'a_t+1', 'i_t+1', 'P(it+1|it, at+1)']])  # [['i_t', 'a_t+1', 'i_t+1', 'P(it+1|it, at+1)']]

        # Create the dataframe of proposed interactions
        data = {'proposed': [self._composite_interactions[k].post_interaction.key() for k in activated_keys],
                'E(Vi)': [0.] * len(activated_keys),
                'action': [self._composite_interactions[k].post_interaction.get_primitive_action() for k in
                           activated_keys],
                'E(Va)': [0.] * len(activated_keys),
                'weight': [self._composite_interactions[k].weight for k in activated_keys],
                'interaction': [self._composite_interactions[k].post_interaction.pre_key() for k in activated_keys]
                }
        expected_df = pd.DataFrame(data)
        # Add default interactions
        expected_df = pd.concat([self.primitive_df, expected_df], ignore_index=True)

        # Remove the post_interaction that have a negative valence
        for i, k in expected_df["proposed"].items():
            if k in self._composite_interactions and self._composite_interactions[k].post_interaction.get_valence() < 0:
                expected_df.at[i, "proposed"] = self._composite_interactions[k].pre_interaction.key()

        # Remove the interactions that are the beginning of a longer interaction
        to_remove = set()
        for i, k1 in expected_df["proposed"].items():
            for j, k2 in expected_df["proposed"].items():
                if i not in to_remove and j not in to_remove and i != j:
                    if k1 in self._composite_interactions:
                        s1 = pd.Series(self._composite_interactions[k1].get_primitive_series())
                    else:
                        s1 = pd.Series(k1)
                    if k2 in self._composite_interactions:
                        s2 = pd.Series(self._composite_interactions[k2].get_primitive_series())
                    else:
                        s2 = pd.Series(k2)
                    if len(s1) <= len(s2):
                        if s1.equals(s2.iloc[:len(s1)]):
                            # print(f"Remove {s1.tolist()} from {i} to {j} weight {expected_df.at[i, 'weight']}")
                            to_remove.add(i)  # Mark the sequence to be removed
                            expected_df.at[j, "weight"] += expected_df.at[i, "weight"]
                        elif s2.equals(s1.iloc[:len(s2)]):
                            # print(f"Remove {s2.tolist()} from {j} to {i} weight {expected_df.at[j, 'weight']}")
                            to_remove.add(j)  # Mark the sequence to be removed
                            expected_df.at[i, "weight"] += expected_df.at[j, "weight"]
        expected_df = expected_df.drop(index=to_remove)
        # Compute the expected valence of interactions
        for i, k in expected_df["proposed"].items():
            if k in self._interactions:
                first_row = p_t_df.loc[p_t_df["i_t"] == k].head(1)  # ["P(it|at)"].values[0]
                p = first_row["P(it|at)"].values[0] if not first_row.empty else 0
                # print(f"E(vi) of {k} probability {p}")
                expected_df.at[i, "E(Vi)"] = p * self._interactions[k].get_valence()
            else:
                k1 = self._composite_interactions[k].pre_interaction.key()
                p1 = p_t_df.loc[p_t_df["i_t"] == k1].head(1)["P(it|at)"].values[0]
                k2 = self._composite_interactions[k].post_interaction.key()
                p2 = p_t1_df.loc[p_t1_df["i_t+1"] == k2].head(1)["P(it+1|it, at+1)"].values[0]
                expected_df.at[i, "E(Vi)"] = p1 * (self._interactions[k1].get_valence()
                                                   + p2 * self._interactions[k2].get_valence())
        # The sum expected valence per action
        expected_df["E(Va)"] = expected_df.groupby("action")["E(Vi)"].transform("sum")
        # The sum weight per action
        # expected_df["weight"] = expected_df.groupby("action")["weight"].transform("sum")

        # Find the most probable outcome for each action
        max_weight_df = expected_df.loc[expected_df.groupby('action')['weight'].idxmax(), ['action', 'interaction']].reset_index(
            drop=True)
        max_weight_df.columns = ['action', 'intended']
        expected_df = expected_df.merge(max_weight_df, on='action')

        # Store the dataframe for printing
        self.selection_df = expected_df.copy()

    def select_intended_interaction(self):
        """Selects the intended interaction from the proposed dataframe"""
        # Find the first row that has the highest proclivity
        max_index = self.selection_df['E(Va)'].idxmax()
        intended_interaction_key = self.selection_df.loc[max_index, ['intended']].values[0]
        print(f"Intended Max E(Va) {intended_interaction_key}")
        self._intended_interaction = self._interactions[intended_interaction_key]


# PRELIMINARY EXERCISE

## Let's create Environment6

The agent has two possible actions: move to the left or move to the right. 
The environment returns outcome 1 when the agent bumps into a light green wall, and then the wall turns dark green until the agent moves away.

In [823]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import Output
from IPython.display import display

class Environment6:
    """ The grid """
    def __init__(self):
        """ Initialize the grid """
        self.grid = np.array([[1, 0, 0, 1]])
        self.position = 1

    def outcome(self, action):
        """Take the action and generate the next outcome """
        if action == 0:
            # Move left
            if self.position > 1:
                # No bump
                self.position -= 1
                self.grid[0, 3] = 1
                outcome = 0
            elif self.grid[0, 0] == 1:
                # First bump
                outcome = 1
                self.grid[0, 0] = 2
            else:
                # Subsequent bumps
                outcome = 0
        else:
            # Move right
            if self.position < 2:
                # No bump
                self.position += 1
                self.grid[0, 0] = 1
                outcome = 0
            elif self.grid[0, 3] == 1:
                # First bump
                outcome = 1
                self.grid[0, 3] = 2
            else:
                # Subsequent bumps
                outcome = 0
        return outcome  

    def display(self):
        """Display the grid"""
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # Hide the ticks
            ax.set_xticks([])
            ax.set_yticks([])
            # Display the grid
            ax.imshow(self.grid, cmap='Greens', vmin=0, vmax=2)
            plt.scatter(self.position, 0, s=1000)
            plt.show()

## Run the agent in Environment6

In [824]:
# Instanciate the agent in Environment6
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment6()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

Run the simulation step by step to see the Proposed DataFrame. Use `Ctrl+Enter` to run the cell bellow and stay on it.

In [979]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df

Step 14


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 1, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1
Reinforcing (10:-1, 10:-1: 2)
Learning ((00:-1, 10:-1: 2), 10:-1: 1)
Learning (00:-1, (10:-1, 10:-1: 2): 1)
Intended Max proclivity 00


Unnamed: 0,proposed,E(Vi),action,E(Va),interaction,weight,intended,weight_sum,proclivity
0,"(00,01)",-0.5,0,-0.5,0,5,0,5,-2.5
1,10,-1.0,1,-1.0,10,3,10,3,-3.0


Observe that, on step 8, the composite interaction (00, 01) is proposed but its expected valence equals 0 and it is not selected.

After Step 18, however the agent alternates the sequences (00, 01) and (10, 11) that gives the best average valence the agent can get in this environment. 

## Let's create Environment 7

In Environment7, the agent has two possible actions: move forward or turn 180°.
Like Environment6, Environment7 return 1 only when the agent bumps into a wall once. 

In [491]:
class Environment7:
    """ The grid """
    def __init__(self):
        """ Initialize the grid and the agent's pose """
        self.grid = np.array([[1, 0, 0, 1]])
        self.position = 1
        self.direction = 0

    def outcome(self, action):
        """Take the action and generate the next outcome """
        if action == 0:
            # Move forward
            if self.direction == 0:
                # Move to the left
                if self.position > 1:
                    # No bump
                    self.position -= 1
                    self.grid[0, 3] = 1
                    outcome = 0
                elif self.grid[0, 0] == 1:
                    # First bump
                    outcome = 1
                    self.grid[0, 0] = 2
                else:
                    # Subsequent bumps
                    outcome = 0
            else:
                # Move to the right
                if self.position < 2:
                    # No bump
                    self.position += 1
                    self.grid[0, 0] = 1
                    outcome = 0
                elif self.grid[0, 3] == 1:
                    # First bump
                    outcome = 1
                    self.grid[0, 3] = 2
                else:
                    # Subsequent bumps
                    outcome = 0
        else:
            # Turn 180°
            outcome = 0
            if self.direction == 0:
                self.direction = 1
            else:
                self.direction = 0
        return outcome  

    def display(self):
        """Display the grid"""
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # Hide the ticks
            ax.set_xticks([])
            ax.set_yticks([])
            # Display the grid
            ax.imshow(self.grid, cmap='Greens', vmin=0, vmax=2)
            if self.direction == 0:
                # Display agent to the left
                plt.scatter(self.position, 0, s=1000, marker='<')
            else:
                # Display agent to the right
                plt.scatter(self.position, 0, s=1000, marker='>')
            plt.show()

## Test the Agent in Environment7

In [346]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment7()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

In [976]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df

Step 11


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 0, Prediction: 1, Outcome: 0, Prediction_correct: False, Valence: -1
Reinforcing (00:-1, 00:-1: 3)
Learning ((10:-1, 00:-1: 2), 00:-1: 1)
Learning (10:-1, (00:-1, 00:-1: 3): 1)
Intended Max proclivity 00


Unnamed: 0,proposed,E(Vi),action,E(Va),interaction,weight,intended,weight_sum,proclivity
0,0,-0.6,0,-0.2,0,6,0,10,-2.0
1,10,-1.0,1,-1.0,10,3,10,3,-3.0
2,1,0.4,0,-0.2,1,4,0,10,-2.0


Observe that Agent7 does not manage to learn to obtain the optimum avarage valence in Environment7

This is because it selects the next action that has the highest expecte valence but it does not take into account the proposal weight.

# ASSIGNMENT

Create Agent7 that computes the proclivity of proposed interactions by multiplying the expectet valence with the weight, and select the proposed interaction that has the highest proclivity.

In [492]:
class Agent7(Agent):
    def select_intended_interaction(self):
        """Selects the intended interaction from the proposed dataframe"""
        # Modify to compute the proclivity and select the action that has the hiest proclivity
        # Find the first row that has the highest proclivity
        max_index = self.selection_df['E(Va)'].idxmax()
        intended_interaction_key = self.selection_df.loc[max_index, ['intended']].values[0]
        print(f"Intended Max E(Va) {intended_interaction_key}")
        self._intended_interaction = self._interactions[intended_interaction_key]

In [868]:
class Agent7(Agent):
    def select_intended_interaction(self):
        # The sum weight per action
        grouped_df = self.selection_df.groupby('action').agg({'weight': 'sum'}).reset_index()
        self.selection_df = self.selection_df.merge(grouped_df, on='action', suffixes=('', '_sum'))
        # self.selection_df["sum_weight"] = self.selection_df.groupby("action")["weight"].transform("sum")

        # Compute the proclivity
        self.selection_df['proclivity'] = self.selection_df["weight_sum"] * self.selection_df['E(Va)']

        # Select the action that has the highest proclivity
        max_index = self.selection_df['proclivity'].idxmax()
        intended_interaction_key = self.selection_df.loc[max_index, ['intended']].values[0]
        print(f"Intended Max proclivity {intended_interaction_key}")
        self._intended_interaction = self._interactions[intended_interaction_key]    

## Test your Agent7 in Environment7

In [964]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent7(interactions)
e = Environment7()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

In [972]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.selection_df

Step 7


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 0, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: 1
Reinforcing (00:-1, 01:1: 2)
Learning ((10:-1, 00:-1: 1), 01:1: 1)
Learning (10:-1, (00:-1, 01:1: 2): 1)
Intended Max proclivity 10


Unnamed: 0,proposed,E(Vi),action,E(Va),interaction,weight,intended,weight_sum,proclivity
0,10,0.0,1,0.0,10,0,10,0,0.0
1,0,-1.0,0,-1.0,0,3,0,3,-3.0
