[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent5-DNN.ipynb)

# UTILISATION D'UNE LSTM POUR ESTIMER LA VALENCE ATTENDUE DE CHAQUE ACTION

Ce notrebook présente notre troisième agent dotté d'un DNN. 
Nous entrainons un LSTM à chaque cycle d'interaction avec toutes les séquences mémorisées. 

## La classe Interaction

On crée un token entier pour chaque interaction: `key = action * BASE_ACTION + outcome`.

In [1]:
BASE_ACTION = 1 
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, action, outcome, valence):
        self.action = action
        self.outcome = outcome
        self.valence = valence

    def key(self):
        """ The key to find this interaction in the dictinary. """
        return self.action * BASE_ACTION + self.outcome 
        # return f"{self.action}{self.outcome}"

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self.action}{self.outcome}:{self.valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        return self.key() == other.key()

In [2]:
ACTION1 = 0
ACTION2 = 2
OUTCOME1 = 0
OUTCOME2 = 1

## Environment1 class

In [3]:
class Environment1:
    """ In Environment 1, action 2 yields outcome 0, action 3 yields outcome 1 """
    def outcome(self, _action):
        if _action == ACTION1:
            return OUTCOME1
        else:
            return OUTCOME2

## Environment2 class

In [4]:
class Environment2:
    """ In Environment 2, action 2 yields outcome 1, action 3 yields outcome 0 """
    def outcome(self, _action):
        if _action == ACTION1:
            return OUTCOME2
        else:
            return OUTCOME1

## Environment3 class

Environment 3 yields outcome 1 only when the agent alternates actions 0 and 1

In [5]:
class Environment3:
    """ Environment 3 yields outcome 1 only when the agent alternates actions 0 and 1 """
    def __init__(self):
        """ Initializing Environment3 """
        self.previous_action = 0

    def outcome(self, _action):
        if _action == self.previous_action:
            _outcome = OUTCOME1
        else:
            _outcome = OUTCOME2
        self.previous_action = _action
        return _outcome

## Environment4 class

Environment4 behaves like Environment1 during the first 10 cycles and then like Environment 2

In [6]:
class Environment4:
    """ Environm4 """
    def __init__(self):
        """ Initializing Environment4 """
        self.step = 0

    def outcome(self, _action):
        """Take the action and generate the next outcome """
        self.step += 1
        # Behave like environment1 during the first 10 steps
        if self.step < 10:
            if _action == ACTION1:
                return OUTCOME1
            else:
                return OUTCOME2            
        # Behave like Environment2 after the first 10 steps
        else: 
            if _action == ACTION1:
                return OUTCOME2
            else:
                return OUTCOME1            

## Initialize the interactions 

In [7]:
interactions = [
    Interaction(ACTION1,OUTCOME1,-1),
    Interaction(ACTION1,OUTCOME2,1),
    Interaction(ACTION2,OUTCOME1,-1),
    Interaction(ACTION2,OUTCOME2,1),
    # Interaction(4,0,-1),
    # Interaction(5,1,1)
]

# AGENT LSTM

Implémentons l'Agent3 qui va prédire la probabilité des prochains tokens d'une séquence

## Créons le modèle de LSTM

Le modèle a deux entrées: previous_interaction, last_interaction. 

In [20]:
import torch
import torch.nn as nn

class LSTM(nn.Module):
    def __init__(self):
        super(LSTM, self).__init__()
        self.len_vocab = 4
        self.num_layers = 1
        self.hidden_size = 64

        embedding_dim = self.len_vocab 
        # Create an embedding layer to convert token indices to dense vectors
        self.embedding = nn.Embedding(self.len_vocab, embedding_dim )
        
        # Define the LSTM layer
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True) # , dropout=0.5)
        
        # Define the output fully connected layer
        self.fc_out = nn.Linear(self.hidden_size, self.len_vocab)

        self._optimizer = torch.optim.Adam(self.parameters(), lr=0.001, weight_decay=0.0001)
        self._loss_func = nn.CrossEntropyLoss()

    
    def forward(self, input_seq, hidden_in, mem_in):
        # Convert token indices to dense vectors
        input_embs = self.embedding(input_seq)

        # Pass the embeddings through the LSTM layer
        output, (hidden_out, mem_out) = self.lstm(input_embs, (hidden_in, mem_in))
                
        # Pass the LSTM output through the fully connected layer to get the final output
        return self.fc_out(output), hidden_out, mem_out

    def fit(self, inputs, targets):

        input_tensor = torch.tensor(inputs) # , dtype=torch.int)
        print("input tensor", input_tensor)
        labels = torch.tensor(targets)
        print("label tensor", labels)
        
        # Loop through each epoch
        for epoch in range(20):    
            # Set model to training mode
            self.train()
            train_acc = 0
    
            # Initialize hidden and memory states
            hidden = torch.zeros(self.num_layers, input_tensor.shape[0], self.hidden_size, device="cpu")
            memory = torch.zeros(self.num_layers, input_tensor.shape[0], self.hidden_size, device="cpu")
    
            # Forward pass through the model
            pred, hidden, memory = self(input_tensor, hidden, memory)

            # Calculate the loss
            loss = self._loss_func(pred[:, -1, :], labels)
        
            # Backpropagation and optimization
            self._optimizer.zero_grad()
            loss.backward()
            self._optimizer.step()
    
            # Append training loss to logger
            # training_loss_logger.append(loss.item())
    
            # Calculate training accuracy
            train_acc += (pred[:, -1, :].argmax(1) == labels).sum()
            print(f"acc : {train_acc/len(labels):.3f} = {train_acc}/{len(labels)} for epoch {epoch}")

    def predict(self, sequence):
        # Construct the context sequence
        sequence = torch.tensor(sequence, dtype=torch.int)

        h = torch.zeros(self.num_layers, sequence.shape[0], self.hidden_size, device="cpu")
        cell = torch.zeros(self.num_layers, sequence.shape[0], self.hidden_size, device="cpu")
        
        with torch.no_grad():  # Pas de calcul de gradients en mode prédiction
            logits, _, _ = self(sequence, h, cell)
        ## probabilities = nn.functional.softmax(logits[0, -1, :], dim=0).tolist()
        # Compute the probability of each outcome for each action
        pairwise_logits = logits[0, -1, :].reshape(-1, 2)
        probabilities = nn.functional.softmax(pairwise_logits, dim=1) # .flatten().tolist()
        # print("probabilities", probabilities)
        return probabilities
    

# Définisson l'agent

In [21]:
import torch.optim as optim
import pandas as pd

class Agent:
    """Creating our agent"""
    def __init__(self, _interactions):
        """ Initialize the dictionary of interactions"""
        # Initialise le réseau de neurone
        self._model = LSTM()
        
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._intended_interaction = list(self._interactions.values())[0]
        self._last_interaction = None
        self._previous_interaction = None
        self._penultimate_interaction = None
        # Le dataframe pour mémoriser les séquences d'interactions
        self.sequences_df = pd.DataFrame({
            'i1': pd.Series(dtype='int'),
            'i2': pd.Series(dtype='int'),
            'i3': pd.Series(dtype='int'),
            'action': pd.Series(dtype='int'),
            'valence': pd.Series(dtype='float'),
            'count': pd.Series(dtype='int'),
            'proclivity': pd.Series(dtype='int'),
        })
    
    def action(self, _outcome):
        """ Tracing the previous cycle """
        self._penultimate_interaction = self._previous_interaction 
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[self._intended_interaction.action * BASE_ACTION + _outcome ]
        print(f"Action: {self._intended_interaction.action}, Prediction: {self._intended_interaction.outcome}, "
              f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.outcome == _outcome}, "
              f"Valence: {self._last_interaction.valence})")

        """ Computing the next interaction to try to enact """
        if self._previous_interaction is not None and self._last_interaction is not None:
            if self._penultimate_interaction is not None:
                # Record or increment the last sequence
                condition = ((self.sequences_df['i1'] == self._penultimate_interaction.key()) & 
                            (self.sequences_df['i2'] == self._previous_interaction.key()) & 
                            (self.sequences_df['i3'] == self._last_interaction.key()))
                if self.sequences_df[condition].empty:
                    new_sequence = pd.DataFrame({
                        'i1': [self._penultimate_interaction.key()], 
                        'i2': [self._previous_interaction.key()], 
                        'i3': [self._last_interaction.key()], 
                        'action': [self._last_interaction.action], 
                        'valence': [self._last_interaction.valence],
                        'count': [1], 
                        'proclivity': [0]
                    })
                    self.sequences_df = pd.concat([self.sequences_df, new_sequence], ignore_index=True)
                else:
                    self.sequences_df.loc[condition, 'count'] += 1
                
                # Entraine le réseau de neurone avec les informations du dernier cycle d'interaction
                # Create the dataset to train the model
                x = self.sequences_df[['i1', 'i2']].values.tolist()
                y = self.sequences_df['i3'].tolist()
                # print("x", x)
                # print("y", y)
                self._model.fit(x, y)

            # Calcul de la proclivité basée sur le décompte
            self.sequences_df['proclivity'] = self.sequences_df['valence'] * self.sequences_df['count']
            filtered_df = self.sequences_df[(self.sequences_df['i1'] == self._previous_interaction.key()) & (self.sequences_df['i2'] == self._last_interaction.key())]
            grouped_df = filtered_df.groupby('action').agg({'proclivity': 'sum'}).reset_index()
    
            # Prédit les probabilités des prochaines interactions
            probabilities = self._model.predict([[self._previous_interaction.key(), self._last_interaction.key()]])        
            # Le dataframe pour trouver la meilleure expected valence
            probability_df = pd.DataFrame({
                'interaction': [i.key() for i in self._interactions.values()],
                'action': [i.action for i in self._interactions.values()],
                'outcome': [i.outcome for i in self._interactions.values()],
                'valence': [i.valence for i in self._interactions.values()],
                'probability': probabilities.flatten().tolist()})
            probability_df['expected_valence'] = probability_df['valence'] * probability_df['probability']
            print(probability_df)
            # Aggregate by action
            grouped_probability_df = probability_df.groupby('action').agg({'expected_valence': 'sum'}).reset_index()
    
            # On merge le dataframe des proclivity et celui des expected valences
            merged_df = pd.merge(grouped_df, grouped_probability_df, on='action', how='outer')
            # merged_df = merged_df.sort_values(by=['proclivity'], ascending=[False]).reset_index(drop=True)
            merged_df = merged_df.sort_values(by=['expected_valence'], ascending=[False]).reset_index(drop=True)
            print(merged_df)
            intended_action = merged_df.loc[0, 'action']
            if intended_action == ACTION1:
                intended_outcome = torch.argmax(probabilities[0]).item()
            else:    
                intended_outcome = torch.argmax(probabilities[1]).item()
        else: 
            intended_action = ACTION1
            intended_outcome = OUTCOME1

        # Memorize the intended interaction
        self._intended_interaction = self._interactions[intended_action * BASE_ACTION + intended_outcome]
        return intended_action


# Testons l'agent dans Environment1

In [22]:
torch.manual_seed(42)

a = Agent(interactions)
e = Environment1()
outcome = 0
for i in range(20):
    print(f"Step {i} ----- ")
    action = a.action(outcome)
    outcome = e.outcome(action)

Step 0 ----- 
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
Step 1 ----- 
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
   interaction  action  outcome  valence  probability  expected_valence
0            0       0        0       -1     0.566167         -0.566167
1            1       0        1        1     0.433833          0.433833
2            2       2        0       -1     0.482110         -0.482110
3            3       2        1        1     0.517890          0.517890
   action  proclivity  expected_valence
0       2         NaN          0.035780
1       0         NaN         -0.132335
Step 2 ----- 
Action: 2, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
input tensor tensor([[0, 0]])
label tensor tensor([3])
acc : 0.000 = 0/1 for epoch 0
acc : 0.000 = 0/1 for epoch 1
acc : 0.000 = 0/1 for epoch 2
acc : 0.000 = 0/1 for epoch 3
acc : 0.000 = 0/1 for epoch 4
acc : 0.000 = 0/1 for epoch 5
acc : 0.000 =

## Agent2 dans Environment2

In [23]:
torch.manual_seed(42)
a = Agent(interactions)
e = Environment2()
outcome = 0
for i in range(20):
    print(f"Step {i} ----- ")
    action = a.action(outcome)
    outcome = e.outcome(action)

Step 0 ----- 
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
Step 1 ----- 
Action: 0, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: 1)
   interaction  action  outcome  valence  probability  expected_valence
0            0       0        0       -1     0.492448         -0.492448
1            1       0        1        1     0.507552          0.507552
2            2       2        0       -1     0.465165         -0.465165
3            3       2        1        1     0.534836          0.534836
   action  proclivity  expected_valence
0       2         NaN          0.069671
1       0         NaN          0.015104
Step 2 ----- 
Action: 2, Prediction: 1, Outcome: 0, Prediction_correct: False, Valence: -1)
input tensor tensor([[0, 1]])
label tensor tensor([2])
acc : 0.000 = 0/1 for epoch 0
acc : 0.000 = 0/1 for epoch 1
acc : 0.000 = 0/1 for epoch 2
acc : 0.000 = 0/1 for epoch 3
acc : 0.000 = 0/1 for epoch 4
acc : 0.000 = 0/1 for epoch 5
acc : 0.000

## Dans Environment3

In [25]:
torch.manual_seed(42)
a = Agent(interactions)
e = Environment3()
outcome = 0
for i in range(100):
    print(f"Step {i} ----- ")
    action = a.action(outcome)
    outcome = e.outcome(action)

Step 0 ----- 
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
Step 1 ----- 
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
   interaction  action  outcome  valence  probability  expected_valence
0            0       0        0       -1     0.566167         -0.566167
1            1       0        1        1     0.433833          0.433833
2            2       2        0       -1     0.482110         -0.482110
3            3       2        1        1     0.517890          0.517890
   action  proclivity  expected_valence
0       2         NaN          0.035780
1       0         NaN         -0.132335
Step 2 ----- 
Action: 2, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
input tensor tensor([[0, 0]])
label tensor tensor([3])
acc : 0.000 = 0/1 for epoch 0
acc : 0.000 = 0/1 for epoch 1
acc : 0.000 = 0/1 for epoch 2
acc : 0.000 = 0/1 for epoch 3
acc : 0.000 = 0/1 for epoch 4
acc : 0.000 = 0/1 for epoch 5
acc : 0.000 =

L'agent apprend à alterner les action à partir du pas 11 et fait une prédiction correct à partir du pas 14.

## Agent5 dans Environment4

In [26]:
torch.manual_seed(42)
a = Agent(interactions)
e = Environment4()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
   interaction  action  outcome  valence  probability  expected_valence
0            0       0        0       -1     0.566167         -0.566167
1            1       0        1        1     0.433833          0.433833
2            2       2        0       -1     0.482110         -0.482110
3            3       2        1        1     0.517890          0.517890
   action  proclivity  expected_valence
0       2         NaN          0.035780
1       0         NaN         -0.132335
Action: 2, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
input tensor tensor([[0, 0]])
label tensor tensor([3])
acc : 0.000 = 0/1 for epoch 0
acc : 0.000 = 0/1 for epoch 1
acc : 0.000 = 0/1 for epoch 2
acc : 0.000 = 0/1 for epoch 3
acc : 0.000 = 0/1 for epoch 4
acc : 0.000 = 0/1 for epoch 5
acc : 0.000 = 0/1 for epoch 6
acc : 0.000 = 0/1 for epo

# Analyse

Ca fonctionne ! 