[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent5-DNN.ipynb)

# UTILISATION D'UN DNN POUR ESTIMER LA VALENCE ATTENDUE DE CHAQUE ACTION

Ce notrebook présente notre premier agent dotté d'un DNN. 
Le DNN est entrainé à chaque cycle d'interaction avec le nouveau datapoint. 
Il est utilisé pour estimer la probabilité des différents outcomes pour chaque action.  
Il montre l'effet de "catastrophic forgetting" qui fait que chaque nouveau datapoint "efface la mémoire" des datapoints précédent. 

## La classe Interaction

On crée un token entier pour chaque interaction: `key = action * 10 + outcome * 5`.

In [1]:
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, action, outcome, valence):
        self.action = action
        self.outcome = outcome
        self.valence = valence

    def key(self):
        """ The key to find this interaction in the dictinary. """
        return self.action * 10 + self.outcome *5
        # return f"{self.action}{self.outcome}"

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self.action}{self.outcome}:{self.valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        return self.key() == other.key()

## Environment1 class

In [2]:
class Environment1:
    """ In Environment 1, action 2 yields outcome 0, action 3 yields outcome 1 """
    def outcome(self, _action):
        if _action == 2:
            return 0
        else:
            return 1

## Environment2 class

In [3]:
class Environment2:
    """ In Environment 2, action 2 yields outcome 1, action 3 yields outcome 0 """
    def outcome(self, _action):
        if _action == 2:
            return 1
        else:
            return 0

## Environment3 class

Environment 3 yields outcome 1 only when the agent alternates actions 0 and 1

In [4]:
class Environment3:
    """ Environment 3 yields outcome 1 only when the agent alternates actions 0 and 1 """
    def __init__(self):
        """ Initializing Environment3 """
        self.previous_action = 0

    def outcome(self, _action):
        if _action == self.previous_action:
            _outcome = 0
        else:
            _outcome = 1
        self.previous_action = _action
        return _outcome

## Environment4 class

Environment4 behaves like Environment1 during the first 10 cycles and then like Environment 2

In [5]:
class Environment4:
    """ Environm4 """
    def __init__(self):
        """ Initializing Environment4 """
        self.step = 0

    def outcome(self, _action):
        """Take the action and generate the next outcome """
        self.step += 1
        # Behave like environment1 during the first 10 steps
        if self.step < 10:
            if _action == 2:
                return 0
            else:
                return 1            
        # Behave like Environment2 after the first 10 steps
        else: 
            if _action == 2:
                return 1
            else:
                return 0            

## Initialize the interactions 

In [6]:
interactions = [
    Interaction(2,0,-1),
    Interaction(2,1,1),
    Interaction(3,0,-1),
    Interaction(3,1,1),
    Interaction(4,0,-1),
    Interaction(5,1,1)
]

Interactions are initialized with their action, their outcome, and their valence:

|| outcome 0 | outcome 1|
|---|---|---|
| action 2| -1 | 1 |
| action 3 | -1 | 1 |
| action 4 | -1 | 1 |

# AGENT1 DNN

Implémentons l'Agent1 qui va prédire la probabilité de chaque outcome pour chaque action possible

## Créons le modèle de DNN

Le modèle a deux entrées: previous_interaction, action. 

In [7]:
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(2, 6)
        # Apply He Initialization recommended for ReLU
        nn.init.kaiming_normal_(self.fc1.weight, mode='fan_in', nonlinearity='relu')
        
        self.fc2 = nn.Linear(6, 2)
        # Apply Xavier initialisation recommended for linear activation
        nn.init.xavier_uniform_(self.fc2.weight)
        nn.init.zeros_(self.fc2.bias)  # Biases are usually set to zero

        self._criterion = nn.CrossEntropyLoss()  # Cross-entropy for classification
        self._optimizer = optim.SGD(self.parameters(), lr=0.1)  # SGD optimizer
        self._optimizer.zero_grad()  # Reset gradients (not sure it is needed)


    def forward(self, x):
        x = torch.nn.functional.relu(self.fc1(x))  # Apply non-linearity
        return self.fc2(x)  # Logits (CrossEntropyLoss handles softmax)

    def fit(self, inputs, targets):
        """La fonction d'apprentissage"""
        input_tensor = torch.tensor(inputs, dtype=torch.float)
        # input_tensor = torch.randn_like(input_tensor) * 0.01 (voir si le modèle apprend des tendances)
        target_tensor = torch.tensor(targets, dtype=torch.long)
        labels = torch.nn.functional.one_hot(target_tensor, num_classes=2).to(torch.float)
        #labels = torch.argmax(target_tensor, dim=1)  # Convert one-hot to class indices

        outputs = self(input_tensor)  # Forward pass
        loss = self._criterion(outputs, labels)  # Compute loss
        loss.backward()  # accumulation of backpropagation
        # Udpate weights and reset gradients (may be accumulated over several steps)
        self._optimizer.step()  # Update weights
        self._optimizer.zero_grad()  # Reset gradients

        # Check accuracy (we expect 100% accuracy)
        predictions = torch.argmax(outputs, dim=1)
        accuracy = (predictions == target_tensor).float().mean().item()

        print(f"Loss: {loss.item():.6f}, Accuracy: {accuracy * 100:.0f}%")
    
    def predict(self, inputs):
        """La fonction de prediction"""
        input_tensor = torch.tensor(inputs, dtype=torch.float)
        outputs = self(input_tensor)
        print("prediction", torch.argmax(outputs, dim=1))
        return torch.softmax(outputs, dim=1) 


# Définisson l'agent

In [9]:
import torch.optim as optim
import pandas as pd

class Agent1:
    """Creating our agent"""
    def __init__(self, _interactions):
        """ Initialize the dictionary of interactions"""
        # Initialise le réseau de neurone
        self._model = Model()
        
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._intended_interaction = self._interactions[20]
        self._last_interaction = None
        self._previous_interaction = None
        # Le dataframe pour compter les actions et les outcomes dans le contexte de la previous_interaction
        self.count_df = pd.DataFrame({
            'interaction': [20, 20, 20, 20, 25, 25, 25, 25, 30, 30, 30, 30, 35, 35, 35, 35],
            'action':  [2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 3, 3], 
            'outcome': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
            'valence': [self._interactions[i].valence for i in [20, 25, 30, 35, 20, 25, 30, 35, 20, 25, 30, 35, 20, 25, 30, 35]],
            'count': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            'proclivity': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        })
    
    def action(self, _outcome):
        """ Tracing the previous cycle """
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[self._intended_interaction.action * 10 + _outcome * 5]
        print(f"Action: {self._intended_interaction.action}, Prediction: {self._intended_interaction.outcome}, "
              f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.outcome == _outcome}, "
              f"Valence: {self._last_interaction.valence})")

        """ Computing the next interaction to try to enact """
        # Entraine le réseau de neurone avec les informations du dernier cycle d'interaction
        if self._previous_interaction is not None:
            self._model.fit([[self._previous_interaction.key(), self._intended_interaction.action]], [_outcome])
            # Count the number of occurrences of previous_interaction followed by last_interaction
            self.count_df.loc[(self.count_df['interaction'] == self._previous_interaction.key()) & 
                              (self.count_df['action'] == self._intended_interaction.action) & 
                              (self.count_df['outcome'] == _outcome), 'count'] += 1

        # Calcul de la proclivité basée sur le décompte
        self.count_df['proclivity'] = self.count_df['valence'] * self.count_df['count']
        filtered_df = self.count_df[self.count_df['interaction'] == self._last_interaction.key()]
        grouped_df = filtered_df.groupby('action').agg({'proclivity': 'sum'}).reset_index()

        # Prédit les résultats pour les différentes actions
        probabilities = self._model.predict([[self._last_interaction.key(), 2], [self._last_interaction.key(), 3]])        
        # Le dataframe pour trouver la meilleure expected valence
        probability_df = pd.DataFrame({'action': [2, 2, 3, 3],
                'outcome': [0, 1, 0, 1],
                'valence': [self._interactions[i].valence for i in [20, 25, 30, 35]],
                'probability': probabilities.flatten().tolist()})
        probability_df['expected_valence'] = probability_df['valence'] * probability_df['probability']
        # Aggregate by action
        grouped_probability_df = probability_df.groupby('action').agg({'expected_valence': 'sum'}).reset_index()

        # On merge le dataframe des proclivity et celui des expected valences
        merged_df = pd.merge(grouped_df, grouped_probability_df, on='action', how='inner')
        merged_df = merged_df.sort_values(by=['proclivity'], ascending=[False]).reset_index(drop=True)
        print(merged_df)
        intended_action = merged_df.loc[0, 'action']

        # TODO: Implement the agent's prediction mechanism
        predictions = torch.argmax(probabilities, dim=1)
        intended_outcome = predictions.tolist()[intended_action - 2]
        
        # Memorize the intended interaction
        self._intended_interaction = self._interactions[intended_action * 10 + intended_outcome * 5]
        return intended_action


# Testons l'agent dans Environment1

In [10]:
torch.manual_seed(42)

a = Agent1(interactions)
e = Environment1()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       2           0         -0.988310
1       3           0         -0.984425
Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
Loss: 0.005862, Accuracy: 100%
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       3           0         -0.988604
1       2          -1         -0.991496
Action: 3, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: 1)
Loss: 5.167597, Accuracy: 0%
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           0               1.0
1       3           0               1.0
Action: 2, Prediction: 1, Outcome: 0, Prediction_correct: False, Valence: -1)
Loss: 26.726799, Accuracy: 0%
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       3           1         -0.001154
1       2          -1         -0.001154
Action: 3, Predic

## Agent5 dans Environment2

In [11]:
a = Agent1(interactions)
e = Environment2()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       2           0         -0.975083
1       3           0         -0.990816
Action: 2, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: 1)
Loss: 4.385372, Accuracy: 0%
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           0               1.0
1       3           0               1.0
Action: 2, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
Loss: -0.000000, Accuracy: 100%
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           1               1.0
1       3           0               1.0
Action: 2, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
Loss: -0.000000, Accuracy: 100%
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           2               1.0
1       3           0               1.0
Action: 2, Predic

## Agent5 dans Environment3

In [12]:
a = Agent1(interactions)
e = Environment3()
outcome = 0
for i in range(100):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           0          0.998288
1       3           0          0.997665
Action: 2, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
Loss: 0.000856, Accuracy: 100%
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           0          0.999793
1       3           0          0.999720
Action: 2, Prediction: 1, Outcome: 0, Prediction_correct: False, Valence: -1)
Loss: 9.175957, Accuracy: 0%
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       2           1              -1.0
1       3           0              -1.0
Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
Loss: -0.000000, Accuracy: 100%
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       2           0              -1.0
1       3           0              -1.0
Action: 2, Predi

## Agent5 dans Environment4

In [13]:
a = Agent1(interactions)
e = Environment4()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       2           0              -1.0
1       3           0              -1.0
Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
Loss: -0.000000, Accuracy: 100%
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       3           0              -1.0
1       2          -1              -1.0
Action: 3, Prediction: 0, Outcome: 1, Prediction_correct: False, Valence: 1)
Loss: 20.821493, Accuracy: 0%
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           0               1.0
1       3           0               1.0
Action: 2, Prediction: 1, Outcome: 0, Prediction_correct: False, Valence: -1)
Loss: 719.306396, Accuracy: 0%
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       3           1               0.0
1       2          -1               0.0
Action: 3, Pre

# Analyse

On voit que les deux actions possibles convergent rapidement vers des valeurs de probabilité d'outcome égales. 

C'est le phénomène de "catastrophic forgetting" qui fait que le DNN oublie rapidement ce qu'il a appris précédement quand on le ré-entraine. 