[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent8.ipynb)

# THE AGENT WHO REVERSED HIS DECISION

# Learning objectives

Upon completing this lab, you will be able to implement a developmental agent that selects an action based on the anticipation of the two next steps.

# Adapt the decision mechanism

Let's include the decision in the composite interaction. 
A composite interaction becomes a tuple (pre-interaction, decision, post-interaction).

At the end of time step $t$, the agent records or reinforces the interactions:

* $(i_{t-2}, d_{t-1}, i_{t-1})$
* $((i_{t-3}, d_{t-2}, i_{t-2}), d_{t-1}, i_{t-1})$
* $(i_{t-3}, d^2, (i_{t-2}, d_{t-1}, i_{t-1}))$
* $((i_{t-4}, d_{t-3}, i_{t-3}), d^2, (i_{t-2}, d_{t-1}, i_{t-1}))$

If it does not yet exist, the new decision $d^2$ is constructed different from the decision $d_{t-2}$ that was actually made at time $t-2$. 
For example, if the agent made decision $d_{t-2} = a0$ and enacted interaction $i_{t-2}=i00$, and then made decision $d_{t-1} = a0$, and enacted interaction $i_{t-1}=i01$, the agent learns the new decision $d^2=i00a0$ consisting of trying to enact the interaction $i_t=i00$ and then do action $a_{t+1}=a0.$

![Agent5](img/Figure_1_Agent8.svg)

The purpose of including the decision in the composite interaction is to learn that the agent may fail to enact a decision. 

For example, in the context of having enacted interaction $i01$, the agent may select decision $i10a0$ but may fail to enat $i10$ and enact $i11$ instead. In this case, the agent learns the composite interaction $(i01, i10a0, i11)$.

When the agent encounters the context $i01$ again, it will have a better assessment of the expected valence of decision $i10a0$ by taking its probabilities of success and failure into account. 

# Define the necessary classes

Ensure the required packages are installed if they aren't already.

In [None]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install ipywidgets
!pip install IPython.display

## The primitive interaction class

A primitive interaction's decision is its action.

In [1075]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, _action, _outcome, _valence):
        self._action = _action
        self._outcome = _outcome
        self._valence = _valence

    def get_action(self):
        """Return the action"""
        return self._action

    def get_decision(self):
        """Return the decision key"""
        return f"a{self._action}"

    def get_primitive_action(self):
        """Return the action for compatibility with CompositeInteraction"""
        return self._action

    def get_outcome(self):
        """Return the action"""
        return self._outcome

    def get_valence(self):
        """Return the action"""
        return self._valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self._action}{self._outcome}"

    def pre_key(self):
        """Return the key. Used for compatibility with CompositeInteraction"""
        return self.key()

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self._action}{self._outcome}:{self._valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False

## The composite interaction class

A composite interaction is initialized with a tuple (pre_interaction, decision, post_interaction)

In [1076]:
class CompositeInteraction:
    """A composite interaction is a tuple (pre_interaction, decision, post_interaction) and a weight"""
    def __init__(self, pre_interaction, decision, post_interaction):
        self.pre_interaction = pre_interaction
        self.decision = decision
        self.post_interaction = post_interaction
        self.weight = 1
        self.isActivated = False

    def get_decision(self):
        """Return the sequence of decisions"""
        return self.decision
        return f"{self.pre_interaction.key()}{self.post_interaction.get_decision()}"

    def get_primitive_action(self):
        """Return the primitive action"""
        return self.pre_interaction.get_primitive_action()

    def get_valence(self):
        """Return the valence of the pre_interaction plus the valence of the post_interaction"""
        return self.pre_interaction.get_valence() + self.post_interaction.get_valence()

    def reinforce(self):
        """Increment the composite interaction's weight"""
        self.weight += 1

    def key(self):
        """ The key to find this interaction in the dictionary is the string
        '<pre_interaction>,<decision>,<post_interaction>'. """
        # return f"({self.pre_interaction.key()},{self.post_interaction.key()})"
        return f"({self.pre_interaction.key()},{self.decision},{self.post_interaction.key()})"

    def pre_key(self):
        """Return the key of the pre_interaction"""
        return self.pre_interaction.pre_key()
        
    def __str__(self):
        """ Print the interaction in the Newick tree format (pre_interaction, post_interaction: valence) """
        return f"({self.pre_interaction}, {self.decision}, {self.post_interaction}: {self.weight})"

    def __eq__(self, other):
        """ Interactions are equal if they have the same keys """
        if isinstance(other, self.__class__):
            return self.key() == other.key()
        else:
            return False

## Define the Agent class

In [1077]:
import pandas as pd
import numpy as np

class Agent:
    def __init__(self, _interactions):
        """ Initialize our agent """
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._composite_interactions = {}
        self._intended_interaction = self._interactions["00"]
        self._decision = None  # "0"
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        self._last_composite_interaction = None
        self._previous_composite_interaction = None
        self._penultimate_composite_interaction = None
        # Create a dataframe of default primitive interactions
        default_interactions = [interaction for interaction in _interactions if interaction.get_outcome() == 0]
        data = {'activated': [np.nan] * len(default_interactions),
                'weight': [0] * len(default_interactions),
                # 'action': [i.get_primitive_action() for i in default_interactions],
                'post_interaction': [i.key() for i in default_interactions],
                'valence': [i.get_valence() for i in default_interactions],
                'decision': [i.get_decision() for i in default_interactions],
                'proclivity': [0] * len(default_interactions),
                'primitive': [i.key() for i in default_interactions]
               }
        self.primitive_df = pd.DataFrame(data)
        # Store the selection dataframe as a class attribute so we can display it in the notebook
        self.proposed_df = None

    def action(self, _outcome):
        """Implement the agent's policy"""
        # Memorize the context
        self._penultimate_composite_interaction = self._previous_composite_interaction
        self._previous_composite_interaction = self._last_composite_interaction
        self._penultimate_interaction = self._previous_interaction
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[f"{self._intended_interaction.get_action()}{_outcome}"]

        # tracing the previous cycle
        print(
            f"Action: {self._intended_interaction.get_action()}, Prediction: {self._intended_interaction.get_outcome()}, "
            f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.get_outcome() == _outcome}, "
            f"Valence: {self._last_interaction.get_valence()}")

        # Call the learning mechanism
        self.learn()

        # Create the proposed dataframe
        self.create_proposed_df()
        self.aggregate_propositions()

        # Select the intended primitive interaction
        self.decide()

        return self._intended_interaction.get_action()

    def learn(self):
        """Learn the composite interactions"""
        # First level of composite interactions using the last primitive decision
        self._last_composite_interaction = self.learn_composite_interaction(
            self._previous_interaction, self._last_interaction.get_decision(), self._last_interaction)
        # self._last_composite_interaction = self.learn_composite_interaction(
        #     self._previous_interaction, self._decision, self._last_interaction)

        # Second level of composite interactions 
        self.learn_composite_interaction(self._previous_composite_interaction, self._last_interaction.get_decision(),
                                         self._last_interaction)
        # self.learn_composite_interaction(
        #     self._previous_composite_interaction, self._decision, self._last_interaction)

        if self._last_composite_interaction is not None:
            # Possibly create a new composite decision
            decision = f"{self._last_composite_interaction.pre_interaction.key()}{self._last_composite_interaction.post_interaction.get_decision()}"
            self.learn_composite_interaction(self._penultimate_interaction, decision, self._last_composite_interaction)
            # self.learn_composite_interaction(self._penultimate_interaction, self._decision, self._last_composite_interaction)
            self.learn_composite_interaction(self._penultimate_composite_interaction, decision, self._last_composite_interaction)

    def learn_composite_interaction(self, pre_interaction, decision, post_interaction):
        """Record or reinforce the composite interaction made of (pre_interaction, post_interaction)"""
        if pre_interaction is None:
            return None
        else:
            # If the pre interaction exist
            composite_interaction = CompositeInteraction(pre_interaction, decision, post_interaction)
            if composite_interaction.key() not in self._composite_interactions:
                # Add the composite interaction to memory
                self._composite_interactions[composite_interaction.key()] = composite_interaction
                print(f"Learning {composite_interaction}")
                return composite_interaction
            else:
                # Reinforce the existing composite interaction and return it
                self._composite_interactions[composite_interaction.key()].reinforce()
                print(f"Reinforcing {self._composite_interactions[composite_interaction.key()]}")
                return self._composite_interactions[composite_interaction.key()]

    def create_proposed_df(self):
        """Create the proposed dataframe from the activated interactions"""
        # The list of activated interaction that match the current context
        activated_keys = [composite_interaction.key() for composite_interaction in
                          self._composite_interactions.values()
                          if composite_interaction.pre_interaction == self._last_interaction or
                          composite_interaction.pre_interaction == self._last_composite_interaction]
        data = {'activated': activated_keys,
                'weight': [self._composite_interactions[k].weight for k in activated_keys],
                # 'action': [self._composite_interactions[k].post_interaction.get_primitive_action() for k in
                #            activated_keys],
                'post_interaction': [self._composite_interactions[k].post_interaction.key() for k in activated_keys],
                'valence': [self._composite_interactions[k].post_interaction.get_valence() for k in activated_keys],
                'decision': [self._composite_interactions[k].get_decision() for k in activated_keys],
                'primitive': [self._composite_interactions[k].post_interaction.pre_key() for k in activated_keys]
                }
        activated_df = pd.DataFrame(data)

        # Create the selection dataframe from the primitive and the activated dataframes
        self.proposed_df = pd.concat([self.primitive_df, activated_df], ignore_index=True)

        # # Compute the proclivity for each proposition
        self.proposed_df['proclivity'] = self.proposed_df['weight'] * self.proposed_df['valence']

    def aggregate_propositions(self):
        """Aggregate the proclivity"""
        # Compute the proclivity for each action
        grouped_df = self.proposed_df.groupby('decision').agg({'proclivity': 'sum'}).reset_index()
        self.proposed_df = self.proposed_df.merge(grouped_df, on='decision', suffixes=('', '_agg'))
        # Sort by descending order of proclivity
        self.proposed_df = self.proposed_df.sort_values(by=['proclivity_agg', 'decision'], ascending=[False, False])

        # Find the most probable primitive interaction for each action
        max_weight_df = self.proposed_df.loc[
            self.proposed_df.groupby('decision')['weight'].idxmax(), ['decision', 'primitive']].reset_index(
            drop=True)
        max_weight_df.columns = ['decision', 'intended']
        self.proposed_df = self.proposed_df.merge(max_weight_df, on='decision')

    def decide(self):
        """Selects the intended interaction from the proposed dataframe"""
        # Find the first row that has the highest proclivity
        max_index = self.proposed_df['proclivity_agg'].idxmax()
        self._decision = self.proposed_df.loc[max_index, ['decision']].values[0]

        # Find the intended interaction corresponding to the action that has the highest proclivity
        intended_interaction_key = self.proposed_df.loc[max_index, ['intended']].values[0]
        self._intended_interaction = self._interactions[intended_interaction_key]
        print(f"Decision {self._decision}, Intended {self._intended_interaction}")

# PRELIMINARY EXERCISE

## Test the agent in Environment 6

In [1078]:
import matplotlib.pyplot as plt
from ipywidgets import Output
from IPython.display import display

class Environment6:
    """ The grid """
    def __init__(self):
        """ Initialize the grid """
        self.grid = np.array([[1, 0, 0, 1]])
        self.position = 1

    def outcome(self, _action):
        """Take the action and generate the next outcome """
        if _action == 0:
            # Move left
            if self.position > 1:
                # No bump
                self.position -= 1
                self.grid[0, -1] = 1
                _outcome = 0
            elif self.grid[0, 0] == 1:
                # First bump
                _outcome = 1
                self.grid[0, 0] = 2
            else:
                # Subsequent bumps
                _outcome = 0
        else:
            # Move right
            if self.position < self.grid.shape[1] - 2:
                # No bump
                self.position += 1
                self.grid[0, 0] = 1
                _outcome = 0
            elif self.grid[0, -1] == 1:
                # First bump
                _outcome = 1
                self.grid[0, -1] = 2
            else:
                # Subsequent bumps
                _outcome = 0
        return _outcome
        
    def display(self):
        """Display the grid"""
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # Hide the ticks
            ax.set_xticks([])
            ax.set_yticks([])
            # Display the grid
            ax.imshow(self.grid, cmap='Greens', vmin=0, vmax=2)
            plt.scatter(self.position, 0, s=1000)
            plt.show()

In [1079]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment6()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

Run the simulation step by step to see the environment and the proposed DataFrame. Use `Ctrl+Enter` to run the cell bellow and stay on it.

In [1080]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.proposed_df

Step 0


Output()

Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1
Decision a1, Intended 10:-1


Unnamed: 0,activated,weight,post_interaction,valence,decision,proclivity,primitive,proclivity_agg,intended
0,,0.0,10,-1.0,a1,-0.0,10,0.0,10
1,,0.0,0,-1.0,a0,-0.0,0,0.0,0


## Let's Create Environment7

Environment7 is similar to Environment6 but one cell wider.

In [226]:
class Environment7(Environment6):
    def __init__(self):
        """ Initialize the grid """
        self.grid = np.array([[1, 0, 0, 0, 1]])
        self.position = 1 

In [293]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent(interactions)
e = Environment7()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

In [443]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.proposed_df

Step 149


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1
Reinforcing (00:-1, a0, 00:-1: 34)
Reinforcing (00:-1, 01a1, 00:-1: 18)
Reinforcing ((00:-1, 01a1, 00:-1: 18), a0, 00:-1: 9)
Reinforcing ((00:-1, 01a1, 00:-1: 18), 01a1, 00:-1: 6)
Reinforcing (00:-1, 00a0, (00:-1, 01a1, 00:-1: 18): 8)
Learning ((01:1, 00a1, 00:-1: 1), 00a0, (00:-1, 01a1, 00:-1: 18): 1)
Decision 01a1, Intended 00:-1


Unnamed: 0,activated,weight,action,interaction,valence,decision,proclivity,proclivity_agg,intended
0,"(00,01a1,(01,a1,10))",4,0,1,0,01a1,0,-10,0
1,"(00,01a1,00)",18,0,0,-1,01a1,-18,-10,0
2,"((00,01a1,00),01a1,00)",6,0,0,-1,01a1,-6,-10,0
3,"(00,01a1,01)",9,0,1,1,01a1,9,-10,0
4,"((00,01a1,00),01a1,01)",5,0,1,1,01a1,5,-10,0
5,"(00,01a1,(01,10a0,10))",3,0,1,0,01a1,0,-10,0
6,"((00,01a1,00),01a1,(01,10a0,10))",2,0,1,0,01a1,0,-10,0
7,"(00,01a0,(01,a0,00))",5,0,1,0,01a0,0,-11,0
8,"((00,01a1,00),01a0,(01,a0,00))",3,0,1,0,01a0,0,-11,0
9,"(00,01a0,00)",10,0,0,-1,01a0,-10,-11,0


# Assignment 

# Create Agent 8

Modify the `learn()` method to record composite interaction from previous composite decisions

It must keeps track of decisions that failed to enact and take them into account in making the next decision.


When the post_interaction is a primitive interaction, the decision is always this post_interaction's action. When the post_interaction is a composite interaction, however, the decision may be a sequence

In [1081]:
class Agent8(Agent):
    def learn(self):
        """Learn the composite interactions"""
        # First level of composite interactions using the last primitive decision
        # self._last_composite_interaction = self.learn_composite_interaction(
        #     self._previous_interaction, self._last_interaction.get_decision(), self._last_interaction)
        self._last_composite_interaction = self.learn_composite_interaction(
            self._previous_interaction, self._decision, self._last_interaction)

        # Second level of composite interactions 
        # self.learn_composite_interaction(self._previous_composite_interaction, self._last_interaction.get_decision(),
        #                                  self._last_interaction)
        self.learn_composite_interaction(
            self._previous_composite_interaction, self._decision, self._last_interaction)

        if self._last_composite_interaction is not None:
            # Possibly create a new composite decision
            decision = f"{self._last_composite_interaction.pre_interaction.key()}{self._last_composite_interaction.post_interaction.get_decision()}"
            self.learn_composite_interaction(self._penultimate_interaction, decision, self._last_composite_interaction)
            # self.learn_composite_interaction(self._penultimate_interaction, self._decision, self._last_composite_interaction)
            self.learn_composite_interaction(self._penultimate_composite_interaction, decision, self._last_composite_interaction)

    def decide(self):
        """Selects the intended interaction from the proposed dataframe"""

        # Remove the proposed post_interaction that have a low weight
        to_remove = set()
        for i, k in self.proposed_df["post_interaction"].items():
            if k in self._composite_interactions and self._composite_interactions[k].weight <= 4:
                to_remove.add(i)
        self.proposed_df = self.proposed_df.drop(index=to_remove)

        # Find the first row that has the highest proclivity
        max_index = self.proposed_df['proclivity_agg'].idxmax()
        self._decision = self.proposed_df.loc[max_index, ['decision']].values[0]

        # Find the intended interaction corresponding to the action that has the highest proclivity
        intended_interaction_key = self.proposed_df.loc[max_index, ['intended']].values[0]
        self._intended_interaction = self._interactions[intended_interaction_key]
        print(f"Decision {self._decision}, Intended {self._intended_interaction}")

## Test Agent8 in Environment7

In [1082]:
# Instanciate a new agent
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent8(interactions)
e = Environment7()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

In [1124]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.proposed_df

Step 41


Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

Action: 0, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: 1
Reinforcing (00:-1, a0, 01:1: 4)
Reinforcing ((00:-1, a0, 00:-1: 7), a0, 01:1: 3)
Reinforcing (00:-1, 00a0, (00:-1, a0, 01:1: 4): 3)
Learning ((10:-1, 00a0, 00:-1: 1), 00a0, (00:-1, a0, 01:1: 4): 1)
Decision a0, Intended 00:-1


Unnamed: 0,activated,weight,post_interaction,valence,decision,proclivity,primitive,proclivity_agg,intended
0,,0,00,-1,a0,0,0,-2,0
1,"(01,a0,00)",1,00,-1,a0,-1,0,-2,0
2,"((00,a0,01),a0,00)",1,00,-1,a0,-1,0,-2,0
3,,0,10,-1,a1,0,10,-4,10
4,"(01,a1,10)",2,10,-1,a1,-2,10,-4,10
5,"((00,a0,01),a1,10)",2,10,-1,a1,-2,10,-4,10
6,"(01,10a1,(10,a1,10))",1,"(10,a1,10)",-2,10a1,-2,10,-4,10
7,"((00,a0,01),10a1,(10,a1,10))",1,"(10,a1,10)",-2,10a1,-2,10,-4,10
8,"(01,10a0,(10,a0,00))",1,"(10,a0,00)",-2,10a0,-2,10,-4,10
9,"((00,a0,01),10a0,(10,a0,00))",1,"(10,a0,00)",-2,10a0,-2,10,-4,10


## Environment8

In [682]:
import matplotlib.pyplot as plt
from ipywidgets import Output
from IPython.display import display

class Environment8:
    """ The grid """
    def __init__(self):
        """ Initialize the grid and the agent's pose """
        self.grid = np.array([[1, 0, 0, 1]])
        self.position = 1
        self.direction = 0

    def outcome(self, action):
        """Take the action and generate the next outcome """
        if action == 0:
            # Move forward
            if self.direction == 0:
                # Move to the left
                if self.position > 1:
                    # No bump
                    self.position -= 1
                    self.grid[0, 3] = 1
                    outcome = 0
                elif self.grid[0, 0] == 1:
                    # First bump
                    outcome = 1
                    self.grid[0, 0] = 2
                else:
                    # Subsequent bumps
                    outcome = 0
            else:
                # Move to the right
                if self.position < 2:
                    # No bump
                    self.position += 1
                    self.grid[0, 0] = 1
                    outcome = 0
                elif self.grid[0, 3] == 1:
                    # First bump
                    outcome = 1
                    self.grid[0, 3] = 2
                else:
                    # Subsequent bumps
                    outcome = 0
        else:
            # Turn 180°
            outcome = 0
            if self.direction == 0:
                self.direction = 1
            else:
                self.direction = 0
        return outcome  

    def display(self):
        """Display the grid"""
        out.clear_output(wait=True)
        with out:
            fig, ax = plt.subplots()
            # Hide the ticks
            ax.set_xticks([])
            ax.set_yticks([])
            # Display the grid
            ax.imshow(self.grid, cmap='Greens', vmin=0, vmax=2)
            if self.direction == 0:
                # Display agent to the left
                plt.scatter(self.position, 0, s=1000, marker='<')
            else:
                # Display agent to the right
                plt.scatter(self.position, 0, s=1000, marker='>')
            plt.show()

## Run the agent in Environment8

In Environment7, the agent has two possible actions: move forward or turn 180°.
Like Environment6, Environment7 return 1 only when the agent bumps into a wall once. 

In [683]:
# Instanciate the agent in Environment7
interactions = [
    Interaction(0,0,-1),
    Interaction(0,1,1),
    Interaction(1,0,-1),
    Interaction(1,1,1)
]
a = Agent8(interactions)
e = Environment8()

# Output widget for displaying the plot
out = Output()

# Run the interaction loop
step = 0
outcome = 0

In [684]:
print(f"Step {step}")
step += 1
e.display()
display(out)
action = a.action(outcome)
outcome = e.outcome(action)
a.proposed_df

Step 0


Output()

Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1
Decision a1, Intended 10:-1


Unnamed: 0,activated,weight,action,interaction,valence,decision,proclivity,proclivity_agg,intended
0,,0.0,1.0,10,-1.0,a1,-0.0,0.0,10
1,,0.0,0.0,0,-1.0,a0,-0.0,0.0,0


Observe that the agent gets stuck and keeps bumping into the right wall from Step 11 on.

This is because the proposed decision 01a0 is never counter balanced by the fact that the first action `0` retuns outcome `0`.

Let's implement Agent8 that can reverse its decision!

# ASSIGNMENT

We will create Agent8 who keeps track of decisions that failed to enact and take them into account in making the next decision.

We modify the definition of composite interaction to become a triple (pre_interaction, decision, post_interaction)

When the post_interaction is a primitive interaction, the decision is always this post_interaction's action. 
When the post_interaction is a composite interaction, however, the decision may be a sequence

In [571]:
class Agent8(Agent):
    # *** Modify to aggregate by decision rather than by action***
    pass
