[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PetiteIA/schema_mechanism/blob/master/notebooks/agent5-DNN.ipynb)

# UTILISATION D'UN DNN POUR ESTIMER LA VALENCE ATTENDUE DE CHAQUE ACTION

Ce notrebook présente notre second agent dotté d'un DNN. 
Le DNN est réentrainé à chaque cycle d'interaction avec tous les datapoints distincts.  
Il est utilisé pour estimer la probabilité des différents outcomes pour chaque action.  


## La classe Interaction

On crée un token entier pour chaque interaction: `key = action * BASE_ACTION + outcome`.

In [201]:
BASE_ACTION = 1 
class Interaction:
    """An interaction is a tuple (action, outcome) with a valence"""
    def __init__(self, action, outcome, valence):
        self.action = action
        self.outcome = outcome
        self.valence = valence

    def key(self):
        """ The key to find this interaction in the dictinary. """
        return self.action * BASE_ACTION + self.outcome 
        # return f"{self.action}{self.outcome}"

    def __str__(self):
        """ Print interaction in the form '<action><outcome:<valence>' for debug."""
        return f"{self.action}{self.outcome}:{self.valence}"

    def __eq__(self, other):
        """ Interactions are equal if they have the same key """
        return self.key() == other.key()

In [214]:
ACTION1 = 2
ACTION2 = 4
OUTCOME1 = 0
OUTCOME2 = 1

## Environment1 class

In [215]:
class Environment1:
    """ In Environment 1, action 2 yields outcome 0, action 3 yields outcome 1 """
    def outcome(self, _action):
        if _action == ACTION1:
            return OUTCOME1
        else:
            return OUTCOME2

## Environment2 class

In [216]:
class Environment2:
    """ In Environment 2, action 2 yields outcome 1, action 3 yields outcome 0 """
    def outcome(self, _action):
        if _action == ACTION1:
            return OUTCOME2
        else:
            return OUTCOME1

## Environment3 class

Environment 3 yields outcome 1 only when the agent alternates actions 0 and 1

In [217]:
class Environment3:
    """ Environment 3 yields outcome 1 only when the agent alternates actions 0 and 1 """
    def __init__(self):
        """ Initializing Environment3 """
        self.previous_action = 0

    def outcome(self, _action):
        if _action == self.previous_action:
            _outcome = OUTCOME1
        else:
            _outcome = OUTCOME2
        self.previous_action = _action
        return _outcome

## Environment4 class

Environment4 behaves like Environment1 during the first 10 cycles and then like Environment 2

In [218]:
class Environment4:
    """ Environm4 """
    def __init__(self):
        """ Initializing Environment4 """
        self.step = 0

    def outcome(self, _action):
        """Take the action and generate the next outcome """
        self.step += 1
        # Behave like environment1 during the first 10 steps
        if self.step < 10:
            if _action == ACTION1:
                return OUTCOME1
            else:
                return OUTCOME2            
        # Behave like Environment2 after the first 10 steps
        else: 
            if _action == ACTION1:
                return OUTCOME2
            else:
                return OUTCOME1            

## Initialize the interactions 

In [219]:
interactions = [
    Interaction(ACTION1,OUTCOME1,-1),
    Interaction(ACTION1,OUTCOME2,1),
    Interaction(ACTION2,OUTCOME1,-1),
    Interaction(ACTION2,OUTCOME2,1),
    # Interaction(4,0,-1),
    # Interaction(5,1,1)
]

# AGENT2 DNN

Implémentons l'Agent2 qui va prédire la probabilité de chaque outcome pour chaque action possible

## Créons le modèle de DNN

Le modèle a deux entrées: previous_interaction, action. 

In [220]:
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(2, 6)
        # Apply He Initialization recommended for ReLU
        nn.init.kaiming_normal_(self.fc1.weight, mode='fan_in', nonlinearity='relu')
        
        self.fc2 = nn.Linear(6, 2)
        # Apply Xavier initialisation recommended for linear activation
        nn.init.xavier_uniform_(self.fc2.weight)
        nn.init.zeros_(self.fc2.bias)  # Biases are usually set to zero

        self._criterion = nn.CrossEntropyLoss()  # Cross-entropy for classification
        self._optimizer = optim.SGD(self.parameters(), lr=0.01)  # SGD optimizer


    def forward(self, x):
        x = torch.nn.functional.relu(self.fc1(x))  # Apply non-linearity
        return self.fc2(x)  # Logits (CrossEntropyLoss handles softmax)

    def fit(self, inputs, targets):
        """La fonction d'apprentissage"""
        input_tensor = torch.tensor(inputs, dtype=torch.float)
        # input_tensor = torch.randn_like(input_tensor) * 0.01 (voir si le modèle apprend des tendances)
        target_tensor = torch.tensor(targets, dtype=torch.long)
        labels = torch.nn.functional.one_hot(target_tensor, num_classes=2).to(torch.float)
        #labels = torch.argmax(target_tensor, dim=1)  # Convert one-hot to class indices

        for epoch in range (50):
            outputs = self(input_tensor)  # Forward pass
            loss = self._criterion(outputs, labels)  # Compute loss
            self._optimizer.zero_grad()  # Reset gradients
            loss.backward()  # accumulation of backpropagation
            self._optimizer.step()  # Update weights

            # Check accuracy (we expect 100% accuracy)
            predictions = torch.argmax(outputs, dim=1)
            accuracy = (predictions == target_tensor).float().mean().item()
            print(f"Loss: {loss.item():.6f}, Accuracy: {accuracy * 100:.0f}%")
    
    def predict(self, inputs):
        """La fonction de prediction"""
        input_tensor = torch.tensor(inputs, dtype=torch.float)
        outputs = self(input_tensor)
        print("prediction", torch.argmax(outputs, dim=1))
        return torch.softmax(outputs, dim=1) 


# Définisson l'agent

In [226]:
import torch.optim as optim
import pandas as pd

class Agent:
    """Creating our agent"""
    def __init__(self, _interactions):
        """ Initialize the dictionary of interactions"""
        # Initialise le réseau de neurone
        self._model = Model()
        
        self._interactions = {interaction.key(): interaction for interaction in _interactions}
        self._actions = [i.action for i in self._interactions.values()]
        self._outcomes = [i.outcome for i in self._interactions.values()]
        self._intended_interaction = list(self._interactions.values())[0]
        self._last_interaction = None
        self._previous_interaction = None
        # Le dataframe pour compter les actions et les outcomes dans le contexte de la previous_interaction
        self.count_df = pd.DataFrame({
            'interaction': [i.key() for i in self._interactions.values() for _ in range(4)], # [20, 20, 20, 20, 21, 21, 21, 21, 30, 30, 30, 30, 31, 31, 31, 31],
            'action': [i.action for i in self._interactions.values()] * 4, # [2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 3, 3], 
            'outcome': [i.outcome for i in self._interactions.values()] * 4, # [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
            'valence': [self._interactions[i].valence for i in self._interactions] * 4, # [20, 21, 30, 31, 20, 21, 30, 31, 20, 21, 30, 31, 20, 21, 30, 31]],
            'count': [0] * 16, # [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            'proclivity': [0] * 16, # [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        })
    
    def action(self, _outcome):
        """ Tracing the previous cycle """
        self._previous_interaction = self._last_interaction
        self._last_interaction = self._interactions[self._intended_interaction.action * BASE_ACTION + _outcome ]
        print(f"Action: {self._intended_interaction.action}, Prediction: {self._intended_interaction.outcome}, "
              f"Outcome: {_outcome}, Prediction_correct: {self._intended_interaction.outcome == _outcome}, "
              f"Valence: {self._last_interaction.valence})")

        """ Computing the next interaction to try to enact """
        # Entraine le réseau de neurone avec les informations du dernier cycle d'interaction
        if self._previous_interaction is not None:
            # Count the number of occurrences of previous_interaction followed by last_interaction
            self.count_df.loc[(self.count_df['interaction'] == self._previous_interaction.key()) & 
                              (self.count_df['action'] == self._intended_interaction.action) & 
                              (self.count_df['outcome'] == _outcome), 'count'] += 1
            # Create the dataset to train the model
            df_filtered = self.count_df[self.count_df['count'] > 0]
            x = df_filtered[['interaction', 'action']].values.tolist()
            y = df_filtered['outcome'].tolist()
            print("x", x)
            print("y", y)
            self._model.fit(x, y)

        # Calcul de la proclivité basée sur le décompte
        self.count_df['proclivity'] = self.count_df['valence'] * self.count_df['count']
        filtered_df = self.count_df[self.count_df['interaction'] == self._last_interaction.key()]
        grouped_df = filtered_df.groupby('action').agg({'proclivity': 'sum'}).reset_index()

        # Prédit les résultats pour les différentes actions
        probabilities = self._model.predict([[self._last_interaction.key(), 2], [self._last_interaction.key(), 3]])        
        # Le dataframe pour trouver la meilleure expected valence
        probability_df = pd.DataFrame({'action': [i.action for i in self._interactions.values()],
                'outcome': [i.outcome for i in self._interactions.values()],
                'valence': [i.valence for i in self._interactions.values()],
                'probability': probabilities.flatten().tolist()})
        probability_df['expected_valence'] = probability_df['valence'] * probability_df['probability']
        print(probability_df)
        # Aggregate by action
        grouped_probability_df = probability_df.groupby('action').agg({'expected_valence': 'sum'}).reset_index()

        # On merge le dataframe des proclivity et celui des expected valences
        merged_df = pd.merge(grouped_df, grouped_probability_df, on='action', how='inner')
        # merged_df = merged_df.sort_values(by=['proclivity'], ascending=[False]).reset_index(drop=True)
        merged_df = merged_df.sort_values(by=['expected_valence'], ascending=[False]).reset_index(drop=True)
        print(merged_df)
        intended_action = merged_df.loc[0, 'action']

        # TODO: Implement the agent's prediction mechanism
        predictions = torch.argmax(probabilities, dim=1)
        if intended_action == ACTION1:
            intended_outcome = predictions.tolist()[0]
        else:
            intended_outcome = predictions.tolist()[1]
        
        # Memorize the intended interaction
        self._intended_interaction = self._interactions[intended_action * BASE_ACTION + intended_outcome]
        return intended_action


# Testons l'agent dans Environment1

In [227]:
torch.manual_seed(42)

a = Agent(interactions)
e = Environment1()
outcome = 0
for i in range(20):
    print(f"Step {i} ----- ")
    action = a.action(outcome)
    outcome = e.outcome(action)

Step 0 ----- 
Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([0, 0])
   action  outcome  valence  probability  expected_valence
0       2        0       -1     0.536675         -0.536675
1       2        1        1     0.463325          0.463325
2       4        0       -1     0.604807         -0.604807
3       4        1        1     0.395193          0.395193
   action  proclivity  expected_valence
0       2           0         -0.073350
1       4           0         -0.209614
Step 1 ----- 
Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
x [[2, 2]]
y [0]
Loss: 0.622362, Accuracy: 100%
Loss: 0.593102, Accuracy: 100%
Loss: 0.565706, Accuracy: 100%
Loss: 0.540023, Accuracy: 100%
Loss: 0.515916, Accuracy: 100%
Loss: 0.493265, Accuracy: 100%
Loss: 0.471959, Accuracy: 100%
Loss: 0.451900, Accuracy: 100%
Loss: 0.432997, Accuracy: 100%
Loss: 0.415170, Accuracy: 100%
Loss: 0.398342, Accuracy: 100%
Loss: 0.382446, Acc

## Agent2 dans Environment2

In [223]:
a = Agent(interactions)
e = Environment2()
outcome = 0
for i in range(20):
    print(f"Step {i} ----- ")
    action = a.action(outcome)
    outcome = e.outcome(action)

Step 0 ----- 
Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([1, 1])
   action  proclivity  expected_valence
0       2           0          0.913637
1       4           0          0.996666
Step 1 ----- 
Action: 2, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
x [[2, 2]]
y [1]
Loss: 0.044142, Accuracy: 100%
Loss: 0.042517, Accuracy: 100%
Loss: 0.041004, Accuracy: 100%
Loss: 0.039593, Accuracy: 100%
Loss: 0.038273, Accuracy: 100%
Loss: 0.037036, Accuracy: 100%
Loss: 0.035875, Accuracy: 100%
Loss: 0.034783, Accuracy: 100%
Loss: 0.033753, Accuracy: 100%
Loss: 0.032782, Accuracy: 100%
Loss: 0.031863, Accuracy: 100%
Loss: 0.030994, Accuracy: 100%
Loss: 0.030169, Accuracy: 100%
Loss: 0.029386, Accuracy: 100%
Loss: 0.028642, Accuracy: 100%
Loss: 0.027934, Accuracy: 100%
Loss: 0.027259, Accuracy: 100%
Loss: 0.026616, Accuracy: 100%
Loss: 0.026001, Accuracy: 100%
Loss: 0.025413, Accuracy: 100%
Loss: 0.024851, Accuracy: 100%
Loss:

## Dans Environment3

In [213]:
a = Agent(interactions)
e = Environment3()
outcome = 0
for i in range(100):
    print(f"Step {i} ----- ")
    action = a.action(outcome)
    outcome = e.outcome(action)

Step 0 ----- 
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([0, 0])
   action  proclivity  expected_valence
0       0           0         -0.634419
1       2           0         -0.834576
Step 1 ----- 
Action: 0, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
x [[0, 0]]
y [0]
Loss: 0.534547, Accuracy: 100%
Loss: 0.528560, Accuracy: 100%
Loss: 0.522656, Accuracy: 100%
Loss: 0.516832, Accuracy: 100%
Loss: 0.511088, Accuracy: 100%
Loss: 0.505422, Accuracy: 100%
Loss: 0.499834, Accuracy: 100%
Loss: 0.494322, Accuracy: 100%
Loss: 0.488885, Accuracy: 100%
Loss: 0.483522, Accuracy: 100%
Loss: 0.478232, Accuracy: 100%
Loss: 0.473014, Accuracy: 100%
Loss: 0.467867, Accuracy: 100%
Loss: 0.462790, Accuracy: 100%
Loss: 0.457781, Accuracy: 100%
Loss: 0.452841, Accuracy: 100%
Loss: 0.447968, Accuracy: 100%
Loss: 0.443161, Accuracy: 100%
Loss: 0.438419, Accuracy: 100%
Loss: 0.433741, Accuracy: 100%
Loss: 0.429126, Accuracy: 100%
Loss

## Agent5 dans Environment4

In [228]:
a = Agent(interactions)
e = Environment4()
outcome = 0
for i in range(20):
    action = a.action(outcome)
    outcome = e.outcome(action)

Action: 2, Prediction: 0, Outcome: 0, Prediction_correct: True, Valence: -1)
prediction tensor([1, 1])
   action  outcome  valence  probability  expected_valence
0       2        0       -1     0.043182         -0.043182
1       2        1        1     0.956818          0.956818
2       4        0       -1     0.001667         -0.001667
3       4        1        1     0.998333          0.998333
   action  proclivity  expected_valence
0       4           0          0.996666
1       2           0          0.913637
Action: 4, Prediction: 1, Outcome: 1, Prediction_correct: True, Valence: 1)
x [[2, 4]]
y [1]
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, Accuracy: 100%
Loss: 0.000062, A

# Analyse

On voit que les deux actions possibles convergent rapidement vers des valeurs de probabilité d'outcome égales. 

C'est le phénomène de "catastrophic forgetting" qui fait que le DNN oublie rapidement ce qu'il a appris précédement quand on le ré-entraine. 