# AGENT1: l'agent qui simulait

Ce notebook présente un agent qui prend une décision basée sur une simulation interne de ses actions.

C'est le premier agent Sartrien qui une mémoire binaire: l'être ou le néant :-)

# Environnement

* L'action `set` mets passe l'état de l'environnement à `1` et renvoie outcome `0`
* L'action `reset` mets passe l'état de l'environnement à `0` et renvoie outcome `0`
* L'action `feel` renvoie un outcome correspondant à l'état de l'environnement

L'environnement possède une méthode `clone()`

In [55]:
ACTION_SET = 0
ACTION_RESET = 1
ACTION_FEEL = 2
OUTCOME_EMPTY = 0
OUTCOME_FULL = 1

class Environment1:
    def __init__(self, state=0):
        self.state = state
    def outcome(self, action):
        if action == ACTION_FEEL:
            outcome = self.state
        else:
            if action == ACTION_SET:
                self.state = 1
            else:
                self.state = 0
            outcome = 0
        return outcome
    def clone(self):
        return Environment1(self.state)
        

# L'agent

In [56]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, action, outcome, valence):
        self.action = action
        self.outcome = outcome
        self.valence = valence

    def key(self):
        """ The key to find this interaction in the dictinary is the string '<action><outcome>'. """
        return f"{self.action}{self.outcome}"

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self.action}{self.outcome}:{self.valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        return self.key() == other.key()

L'agent est instancié avec un simularteur interne qui lui est passé en argument.

In [57]:
import pandas as pd

class Agent:
    """Creating our agent"""
    def __init__(self, _interactions, simulator):
        """ Initialize the dictionary of interactions"""
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._intended_interaction = self._interactions["00"]
        self.memory = 0
        self.action_df = pd.DataFrame({"action": [i.action for i in _interactions if i.outcome == OUTCOME_EMPTY]}) # , columns=['action', 'outcome', 'valence'])
        self.simulator = simulator

    def select_action(self):
        """Select the next action"""
        # Roll the actions
        self.action_df = pd.concat([self.action_df.tail(1), self.action_df.head(len(self.action_df) - 1)], ignore_index=True)
        # self.action_df = pd.concat([self.action_df.iloc[1:], self.action_df.iloc[[0]]], ignore_index=True)
        # Select the first action in the action_df
        return self.action_df.loc[0, "action"]

    def action(self, _outcome):
        """ Tracing the previous cycle """
        previous_interaction = self._interactions[f"{self._intended_interaction.action}{_outcome}"]
        print(f"Action: {self._intended_interaction.action}, Prediction: {self._intended_interaction.outcome}, Outcome: {_outcome}, " 
              f"Prediction: {self._intended_interaction.outcome == _outcome}, Valence: {previous_interaction.valence})")

        """ Computing the next interaction to try to enact """
        # Select the next action
        intended_action = self.select_action()
        
        # Predict the outcome based on simulation
        intended_outcome = self.simulator.outcome(intended_action)
        # Memorize the intended interaction
        self._intended_interaction = self._interactions[f"{intended_action}{intended_outcome}"]
        return intended_action

# Run the simulation

Nous utilisons un instance de l'environnement lui même comme simulateur passé à l'agent

In [58]:
interactions = [
    Interaction(ACTION_SET,OUTCOME_EMPTY,-1),
    Interaction(ACTION_SET,OUTCOME_FULL,1),
    Interaction(ACTION_RESET,OUTCOME_EMPTY,-1),
    Interaction(ACTION_RESET,OUTCOME_FULL,1),
    Interaction(ACTION_FEEL,OUTCOME_EMPTY,-1),
    Interaction(ACTION_FEEL,OUTCOME_FULL,1),
]

simulator = Environment1()
a = Agent(interactions, simulator)
e = Environment1()

outcome = 0
for i in range(10):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
Action: 2, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
Action: 2, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1)
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
Action: 2, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1)
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)


# Agent1

Implémentons l'Agent1 pour qu'il choisisse l'action qui conduit à l'interaction qui a la meilleure valence.

Notons qu'il faut effectuer la simulation de chaque action dans un clone différent du simulateur car celui-ci peut changer d'état suite à la simulation

In [61]:
class Agent1(Agent):
    def select_action(self): 
        """Select the action that yeilds the highest valence"""
        # Roll the actions to try different actions when all outcome are equal
        self.action_df = pd.concat([self.action_df.tail(1), self.action_df.head(len(self.action_df) - 1)], ignore_index=True)
        # self.action_df = pd.concat([self.action_df.iloc[1:], self.action_df.iloc[[0]]], ignore_index=True)
        # Try every action in a clone of the simulator
        self.action_df["outcome"] = self.action_df.apply(lambda row: self.simulator.clone().outcome(row["action"]), axis=1)
        # Record the expected valence for each resulting interaction
        self.action_df["valence"] = self.action_df.apply(lambda row: self._interactions[f"{row["action"]}{row["outcome"]}"].valence, axis=1)
        # Sort by descending valence
        self.action_df = self.action_df.sort_values(by=['valence'], ascending=[False]).reset_index(drop=True)
        print(self.action_df)
        # Return the action that yeilds the highest valence
        return self.action_df.loc[0, "action"]


## Testons l'Agent1 

In [62]:
simulator = Environment1()
a = Agent1(interactions, simulator)
e = Environment1()

outcome = 0
for i in range(10):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
   action  outcome  valence
0       2        0       -1
1       0        0       -1
2       1        0       -1
Action: 2, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
   action  outcome  valence
0       1        0       -1
1       2        0       -1
2       0        0       -1
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
   action  outcome  valence
0       0        0       -1
1       1        0       -1
2       2        0       -1
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
   action  outcome  valence
0       2        1        1
1       0        0       -1
2       1        0       -1
Action: 2, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1)
   action  outcome  valence
0       2        1        1
1       1        0       -1
2       0        0       -1
Action: 2, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1)
   action  outcome  valence


Notons que si le simulateur est initialisé dans un état différent que l'environnement, il se synchronise tout seul par les actions `set` ou `reset`.

In [68]:
simulator = Environment1()
a = Agent1(interactions, simulator)
e = Environment1(1)

outcome = 0
for i in range(10):
    print("Step", i)
    action = a.action(outcome)
    outcome = e.outcome(action)

Step 0
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
   action  outcome  valence
0       2        0       -1
1       0        0       -1
2       1        0       -1
Step 1
Action: 2, Prediction: 0, Outcome: 1, Prediction: False, Valence: 1)
   action  outcome  valence
0       1        0       -1
1       2        0       -1
2       0        0       -1
Step 2
Action: 1, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
   action  outcome  valence
0       0        0       -1
1       1        0       -1
2       2        0       -1
Step 3
Action: 0, Prediction: 0, Outcome: 0, Prediction: True, Valence: -1)
   action  outcome  valence
0       2        1        1
1       0        0       -1
2       1        0       -1
Step 4
Action: 2, Prediction: 1, Outcome: 1, Prediction: True, Valence: 1)
   action  outcome  valence
0       2        1        1
1       1        0       -1
2       0        0       -1
Step 5
Action: 2, Prediction: 1, Outcome: 1, Prediction: True