# <b> active-pynference </b> : T-maze demo

Hello you ! This is a quick demo of the <b>active-pynference</b> / <b>actynf</b> package to simulate MDPs using Sophisticated Inference ! 
Buckle up buckaroo !


## 1. Introducing the task

To demonstrate the ability of the Sophisticated Inference algorithm to predict various behaviours in an explore/exploit environment, we will introduce a environment well known within the Active Inference community : the T-maze environment.

Let's picture a mouse in a simple maze :

![starting_situation.png](local_resources/tmaze/starting_situation.png)

This maze consists in two objects (here : a cheese and a mousetrap) that are either on the left or the right branch. The initial position of those objects is determined by the experimenter (you !). Formally, we can note $s_1$ the state relative to the position of the cheese (either left or right). The initial position of the cheese is determined by the following probability distribution : $D_1 = [p_{init},1-p_{init}]$ and will not change during a trial.

The mouse may be in four different places : on its starting position (0),on the bottom of the maze (1), on the left(2) or on the right (3). Let's call this second state $s_2$. Initially, the value of $s_2$ is always 0.

The mouse always wants to get to observe cheese as fast as possible and wants to stay away from observing the mousetrap. How much the mouse is looking after the reward and how much it fears the trap is fixed by the experimenter through preference parameters called *reward seeking* (rs)  and *loss aversion* (la).

**Note :** To discourage greedy mouses, once it has picked either left or right, it is stuck for the remainder of the trial. Therefore, the mouse only has one chance at guessing where the cheese is.

This wouldn't be a very interesting setup if we didn't add another dimension to the task : the clue. If the mouse chooses to get to the bottom of the maze, it will receive a clue. Although this clue has no extrinsic reward, it may (or may not) contain some relevant information regarding the position of the cheese. For example, if the clue is good, it will indicate reliably a certain value if the cheese is left and another if the cheese is right.If its not, the observation it provides the mouse will have no correlation with the position of the cheese whatsoever, making it useless. We can picture those clue observation values as arrows pointing towards the right or the left : 

Reliable clue                           | Unreliable clue                     
:--------------------------------------:|:------------------------------------:
![](local_resources/tmaze/goodclue.gif) |![](local_resources/tmaze/badclue.gif)

The point of this task is to explore how various parameters such as the mouse initial perception of the task or the environmental dynamcis may affect its behaviour : *Should I get the clue,resolving uncertainty but differing my reward ? Should I risk going for the cheese even if I'm not sure about its position ? How good is the clue ?*

On the next part of this tutorial , we'll see how to simulate various mouse behaviours using *active_pynference*.

## 2 . Using the package

### a. Install the package & import the needed packages 
<sup><sub><b> active-pynference </b> requires Python 3.x. and has been tested for Python 3.11 + but probably works well enough with slightly older versions.</sub></sup>

You can install the package by running :

```
    pip install active-pynference
```

You can find more complete documentation regarding the package installation in the installation_instructions.ipynb file.

Now that the package is successfully installed, let's explore what we can do with it !

In [1]:
# First, let's import stuff !
# Python "classics": 
import numpy as np

# Active Inference based packages :
import actynf # import active-pynference package
print("Imported active-pynference - version " + actynf.__version__)

Imported active-pynference - version 0.1.2


### b. Set up the environment and the mouse model

The active-pynference package relies on a generic component to build both subject environments and models. This generic component is the <i> layer </i>.
Let's import it using :

In [2]:
from actynf.layer.model_layer import mdp_layer

In *actynf*, the **mdp_layer** is a generic Python class that can be used to compute observations from states and actions (a generative process) as well as infer states and actions from observations and model variables (a generative model). All the user has to do to differentiate between those behaviours is to specify it in the constructor.

Let's build the environment for our T-maze example :

In [7]:
def build_tmaze_process(pinit,pHA,pWin):
    """
    pinit : prob of reward initial position being left / right
    pHA : probability of clue giving the correct index position
    pWin : probability of winning if we are in the correct position

    This function returns a mdp_layer representing the t-maze environment.
    """
    print("T-maze gen. process set-up ...  ",end='')

    T = 3  # The trials are made of 3 timesteps (starting step + 2 others)

    # Initial situation
    d = [np.array([pinit,1-pinit])    ,np.array([1,0,0,0])]
    #  on which side is the cheese | where is the mouse 
    Ns = [arr.shape[0] for arr in d] # Number of states
    
    # Transition matrixes between hidden states=
    # a. Transition between cheese states --> the cheese doesn't move during the trial, and the mouse can't make it move :
    B_context_states = np.array([[[1],[0]],
                                 [[0],[1]]])
    # b. Transition between mouse position states --> 4 actions possible for the mouse
    B_behav_states = np.zeros((Ns[1],Ns[1],Ns[1]))

    # - 0 --> Move to start from any state
    B_behav_states[0,:,0] = 1          
    # - 1 --> Move to clue from start, else go to start
    B_behav_states[:,:,1] = np.array([[0,1,1,1],
                                      [1,0,0,0],
                                      [0,0,0,0],
                                      [0,0,0,0]])
    # - 2 --> Move to choose left from start or hint, else go to start
    B_behav_states[:,:,2] = np.array([[0,0,1,1],
                                      [0,0,0,0],
                                      [1,1,0,0],
                                      [0,0,0,0]])  
    
    # - 3 --> Move to choose right from start or hint, else go to start
    B_behav_states[:,:,3] = np.array([[0,0,1,1],
                                      [0,0,0,0],
                                      [0,0,0,0],
                                      [1,1,0,0]])  
    b = [B_context_states, B_behav_states]
    # Note : as you can see, the mouse can't go to right then left or left then right : every trial, it has to make a decision between the two.

    # Active Inference also revolves around a state-observation correspondance that we describe here :
    

    # 1. Mapping from states to observed hints, depending on cheese & mouse states
    #
    # [ .  . ]  No hint
    # [ .  . ]  Left Hint            Rows = observations
    # [ .  . ]  Right Hint
    # Left Right
    # Columns = cheese state
    A_obs_hints = np.zeros((3,Ns[0],Ns[1]))
    A_obs_hints[0,:,:] = 1
    A_obs_hints[:,:,1] = np.array([[0,0],
                             [pHA, 1-pHA],
                             [1-pHA,pHA]]) # We only get the clue if the mouse moves to state 1
    
    # 2. Mapping from states to outcome (win / loss / null), depending on cheese & mouse states
    #
    # [ .  . ]  Null
    # [ .  . ]  Win           Rows = observations
    # [ .  . ]  Loss
    # Left Right
    # Columns = cheese state
    A_obs_outcome = np.zeros((3,Ns[0],Ns[1]))
    A_obs_outcome[0,:,:2] = 1
    A_obs_outcome[:,:,2] = np.array([[0,0],   # If we choose left, what is the probability of achieving win / loss 
                             [pWin, 1-pWin],
                             [1-pWin,pWin]]) # Choice gives an observable outcome
                   # If true = left, right
    A_obs_outcome[:,:,3] = np.array([[0,0],     # If we choose right, what is the probability of achieving win / loss 
                                     [1-pWin, pWin],
                                     [pWin,1-pWin]]) # Choice gives an observable outcome
                  # If true = left, right
    
    # 3. Mapping from mouse position states to observed mouse position
    #
    # [ .  .  .  .] start
    # [ .  .  .  .] hint
    # [ .  .  .  .] choose left         Row = Behaviour state
    # [ .  .  .  .] choose right
    #  s   h  l  r
    #
    # 3rd dimension = observed behaviour
    # The 2nd dimension maps the dependance on cheese state (unvariant)
    A_obs_behaviour = np.zeros((Ns[1],Ns[0],Ns[1]))
    for i in range (Ns[1]) :
        A_obs_behaviour[i,:,i] = np.array([1,1])
    a = [A_obs_hints,A_obs_outcome,A_obs_behaviour]

    No = [ai.shape[0] for ai in a] # Number of outcomes

    # Finally, we set up the preferences of the environment (this is an environment, thus this is empty) ...
    c = [np.zeros((No[0],T)),np.zeros((No[1],T)),np.zeros((No[2],T))]
    # ... as well as the allowable transitions the mouse can choose :
    u = np.array([[0,0],[0,1],[0,2],[0,3]]).astype(int)
    
    # Habits
    e = np.ones((u.shape[0],))

    # The environment has been well defined and we may now build a mdp_layer using the following constructor : 
    layer = mdp_layer("T-maze_environment","process",a,b,c,d,e,u,T)
    #     mdp_layer(name of the layer,process or model, a,b,c,d,e,u,T)
    print("Done.")
    return layer

# We can test that the layer was well defined by instantiating and building it :
tmaze_environment = build_tmaze_process(0.5,1.0,1.0)
print(tmaze_environment)

T-maze gen. process set-up ...  Done.
LAYER T-maze_environment : 
 -------------------------------------
LAYER DIMENSION REPORT (T-maze_environment): 

Observation modalities : 3
    Modality 0 : 3 outcomes.
    Modality 1 : 3 outcomes.
    Modality 2 : 4 outcomes.
Hidden states factors : 2
    Model factor 0 : 2 possible states. 
    Model factor 1 : 4 possible states. 
Number of potential actions : 4
    Factor 0 : 1 possible transitions. 
    Factor 1 : 4 possible transitions. 
-------------------------------------

##################################################
Layer weights :
   Matrix a :
     Modality 0 :
[[[1. 0. 1. 1.]
  [1. 0. 1. 1.]]

 [[0. 1. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 1. 0. 0.]]]
     Modality 1 :
[[[1. 1. 0. 0.]
  [1. 1. 0. 0.]]

 [[0. 0. 1. 0.]
  [0. 0. 0. 1.]]

 [[0. 0. 0. 1.]
  [0. 0. 1. 0.]]]
     Modality 2 :
[[[1. 0. 0. 0.]
  [1. 0. 0. 0.]]

 [[0. 1. 0. 0.]
  [0. 1. 0. 0.]]

 [[0. 0. 1. 0.]
  [0. 0. 1. 0.]]

 [[0. 0. 0. 1.]
  [0. 0. 0. 1.]]]


As you can see, we can see a general overview of the layer we just defined. One advantage of using the same object for processes and models is that we can easily use the same object for both purposes if needed. Let's now define the model our mouse is going to entertain (don't worry, it'll be much quicker :) ) :

In [9]:

def build_mouse_model(true_process_layer,la,rs,T_horizon,initial_clue_confidence = 0.1):
    """
    true_process_layer : the mdp_layer object where the tmaze environment has been defined
    la : how much the mouse is afraid of adverse outcomes (>0)
    rs : how much the mouse wants to observe cheese (>0)
    T_horizon : how much into the future the mouse will plan before picking its next action
    initial_clue_confidence : how much the mouse knows about the clue reliability
    """
    print("T-maze gen. model set-up ...  ",end='')
    T = 3

    #  The mouse knows where it stands in the maze initially, but it doesn't know where the cheese will spawn : this is something that
    # it will need to learn !
    d = [np.array([0.25,0.25]),np.array([1,0,0,0])]

    
    # Transition matrixes between hidden states ( = control states)
    b=[]
    for b_fac_proc in (true_process_layer.b):
        b.append(np.copy(b_fac_proc)*200)
    # The mouse knows how its action will affect the general situation. The mouse does not need
    # to learn that element . Be aware that too much uncertainty in some situations may prove hard to resolve for our
    # artifical subjects.


    a = []
    for a_mod_proc in (true_process_layer.a):
        a.append(np.copy(a_mod_proc)*200)
    a[0][:,:,1] = initial_clue_confidence*np.array([[0,0],
                                                    [0.25,0.25],
                                                    [0.25,0.25]])  
    # The mouse already knows how the cheese position and its own position in the 
    # maze relates relates to its probability to observe cheese. It also knows where
    # it is in the maze at all times. It knows this because it knows where it isn't ;)
    # However, the mouse still has to learn the reliability of the clue.


    # Finally, the preferences of the mouse are governed by the experimenter through the rs/la weights.
    No = [ai.shape[0] for ai in a]

    C_hints = np.zeros((No[0],T))
    C_win_loss = np.zeros((No[1],T))
    C_win_loss = np.array([[0,0,0],     #null
                           [0,rs,rs/2.0],  #win : as you can see, the mouse would much rather find the cheese at timestep 2 rather than 3. Feel free to play with this factor.
                           [0,-la,-la]]) #loss
    C_observed_behaviour = np.zeros((No[2],T))
    c = [C_hints,C_win_loss,C_observed_behaviour]
    # The mouse has no preference towards seeing a clue or being in a given position. However, it does have a preference regarding
    # the outcome of the trial (i.e. seeing the cheese or the mousetrap)
    
    # The allowable actions have been defined earlier
    u = true_process_layer.U
    # u = np.array([[0,0],[0,1],[0,2],[0,3]]).astype(int)
    
    # Habits
    e = np.ones((u.shape[0],))

    layer = mdp_layer("mouse_model","model",a,b,c,d,e,u,T,T_horiz=T_horizon)
    # This time, we define our layer as a "model" 

    # Here, we give a few hyperparameters guiding the beahviour of our agent :
    layer.hyperparams.alpha = 32 # action precision : 
        # for high values the mouse will always perform the action it perceives as optimal, with very little exploration 
        # towards actions with similar but slightly lower interest

    layer.learn_options.eta = 1 # learning rate (shared by all channels : a,b,c,d,e)
    layer.learn_options.learn_a = True  # The agent learns the reliability of the clue
    layer.learn_options.learn_b = False # The agent does not learn transitions
    layer.learn_options.learn_d = True  # The agent has to learn the initial position of the cheese
    layer.learn_options.backwards_pass = True  # When learning, the agent will perform a backward pass, using its perception of 
                                               # states in later trials (e.g. I saw that the cheese was on the right at t=3)
                                               # as well as what actions it performed (e.g. and I know that the cheese position has
                                               # not changed between timesteps) to learn more reliable weights (therefore if my clue was
                                               # a right arrow at time = 2, I should memorize that cheese on the right may correlate with
                                               # right arrow in general)
    print("Done.")
    return layer

mouse_model = build_mouse_model(tmaze_environment,2,3,3,1.0)
print(mouse_model)

T-maze gen. model set-up ...  Done.
LAYER mouse_model : 
 -------------------------------------
LAYER DIMENSION REPORT (mouse_model): 

Observation modalities : 3
    Modality 0 : 3 outcomes.
    Modality 1 : 3 outcomes.
    Modality 2 : 4 outcomes.
Hidden states factors : 2
    Model factor 0 : 2 possible states. 
    Model factor 1 : 4 possible states. 
Number of potential actions : 4
    Factor 0 : 1 possible transitions. 
    Factor 1 : 4 possible transitions. 
-------------------------------------

##################################################
Layer weights :
   Matrix a :
     Modality 0 :
[[[200.     0.   200.   200.  ]
  [200.     0.   200.   200.  ]]

 [[  0.     0.25   0.     0.  ]
  [  0.     0.25   0.     0.  ]]

 [[  0.     0.25   0.     0.  ]
  [  0.     0.25   0.     0.  ]]]
     Modality 1 :
[[[200. 200.   0.   0.]
  [200. 200.   0.   0.]]

 [[  0.   0. 200.   0.]
  [  0.   0.   0. 200.]]

 [[  0.   0.   0. 200.]
  [  0.   0. 200.   0.]]]
     Modality 2 :
[[[200. 

Now that we have defined our environment (generative process) and the model our mouse will entertain (generative model), we need to describe how the two will interact to form a system. 

To create interactions between layers, we need to establish *links* between some of their inputs and outputs :
- The environment outputs (outcomes) are forwarded to the mouse sensory states (observations)
- The mouse actions (active states) lead to a changes in the environment

The resulting interconnected system will then form a dedicated *actynf* object called a *network*.

In [10]:
from actynf.layer.layer_link import establish_layerLink # the function we use to establish links between layers

#1. Create a link from observations generated by the environment to the mouse sensory states :
link_obs = establish_layerLink(tmaze_environment,mouse_model,["o","o"])
link_act = establish_layerLink(mouse_model,tmaze_environment,["u","u"])

Established layerLink between T-maze_environment and mouse_model.
