# N Back Task

Theoretical description of the modelling framework: N-Back task.

------------



## Define Statespace

Define also possible variables in the model, formalising the task as a probabilistic mathematical construct.

----------

## Action Space: Output Space

The $\mathcal{A}: Action \; Space$ is the number of possible actions an agent can take. The action sequence in the *N-Back* task can be encoded in binary:

$$: 
\begin{equation}
    \mathcal{A} := 
    \begin{cases}
      0, & \text{if}\  \text{agent} \; \mathcal{a} \; \text{signals an nback match} \\
      1, & \text{otherwise}
    \end{cases}
  \end{equation}
$$

----------

## State Space: Input Space

The $\mathcal{S}: State \; Space$ represents the encoding of the environment. It should be rich enough to capture the complexity of the problem, but not so rich as to totally defeat the possibility of learning a functional approximation given the available data,

In our task, there are $15$ unique letters, encoded:


$$\{A, B, C, D, ..., O\}  \; \rightarrow \; \{1,2,3,...,15\}$$


#### Covariates

With this encoding, there are $4$ covariates:

$$
\begin{eqnarray} 
\phi_0 &:=& \text{current card} \\
\phi_1 &:=& \text{1 card back} \\
\phi_2 &:=& \text{2 cards back} \\
\phi_3 &:=& \text{3 cards back} \\
\tau &:=& \text{number of occurances of the current card} 
\end{eqnarray}
$$

Where

$$\phi_j \in [1:15]$$


----------


## Reduced Space

This may provide a level of complexity that is undesirable & adds little explanatory benefits. Instead we encode the $trailing \; 3$ cards in binary, where:

$$: 
\begin{equation}
    phi_j := 
    \begin{cases}
      1, & \text{if card }\ i  \text{ equals the current card.} \\
      0, & \text{otherwise}
    \end{cases}
  \end{equation}
$$



$for \ j \in [1,2,3]$ - note it is also no longer neccessary to keep track of the current cards value. By using this reduced encoding we are assuming individual's will perform similarly if the same experimental instance was applied with different cards - which is a plausible assumption.


----------


----------
## Final State Space

The reduced may lack some necessary complexity, suppose:


> The previous cards do not match the current card, but do match each other.

This could easily confuse the candidate, however under the reduced form representation there is no way for the model to discern this from the case where all cards are different. It is still not necessary to capture unique letter types, so to add this capability we simply need $4$ encoding possibilities:

$$: 
\begin{equation}
    \phi_j := 
    \begin{cases}
      \mathcal{a}, & \text{if card }\ j  \text{ matches the current card.} \\
      \mathcal{b}, & \text{the first unique card} \\
      \mathcal{c}, & \text{the second unique card} \\
      \mathcal{d}, & \text{the third unique card} 
    \end{cases}
  \end{equation}
$$



----------

## Final Covariates 

Leveraging this reduced form, our final model has $4$ explantory variables:

$$
\begin{eqnarray} 
\phi_1 &:=& \text{1 card back} \\
\phi_2 &:=& \text{2 cards back} \\
\phi_3 &:=& \text{3 cards back} \\
\tau &:=& \text{number of occurances of the current card} 
\end{eqnarray}
$$


----------

## Choice Probability

Theres covariates need to capture the probability of taking an action (signalling an $n-back$ match). The data is encoded in the design matrix $X$, with columns:


$$
\begin{eqnarray} 
\{x_1, x_2_, x_3\} &  \rightarrow & \{\mathcal{a,b,c,d}\} \\
x_4     &:=& \text{number of occurances of the current card} 
\end{eqnarray}
$$

#### Interaction Terms

One might expect interaction between terms at certain sequence pairs may confuse candidates. $\{\phi_4, \phi_5, \phi_6\}$ are added as interaction terms.


#### Choice Probability

Our choice is binary, thus we can denote the probability of signalling a match $P[a=1]$ as a, letting $p \in [0,1]$:


$$
\begin{eqnarray} 
  ZX                &=& \phi_0 + \phi_1 x_1 + \phi_2 x_2 + \phi_3 x_3 + 
                        \phi_4 x_1 x_2 + \phi_5 x_1 x_3 + \phi_6 x_2 x_3 + \tau x_4 \\
  log\frac{p}{1-p}  &=& ZX \\
  \frac{p}{1-p}     &=& exp\{ ZX \} \\
  p                 &=& \frac{exp\{ ZX \}}{1 + exp\{ ZX \}} \\
  p                 &=& \frac{1}{1 + exp\{ -ZX \}} \\
  p                 &=& \sigma(ZX) 
\end{eqnarray}
$$

This represents the model for a single individual. More flexible function approximates may be tested, but are probably not neccessary.

**_Interaction terms should be added to the $ZX$ formulation._**


---------- 

## Individual Models: Bayesian Model

Each participant should have an individual model (unique parameters) that will be regularised in a Bayesian fashion. Thus each parameter needs to be index by:

$$
\begin{eqnarray} 
i &:=& \text{participant i} \\
t &:=& \text{time t} 
\end{eqnarray}
$$

----------


## Additions

A number of additions are to be added:
- Bayesian hierarchical framework to regulate variation across individuals
- Fitts law parameterisation
- Corsi parameterisation
- Navon parameterisation
- WCST parameterisation (possibly)


```
author: Zach Wolpe
email:  zachcolinwolpe@gmail.com
date:   22 June 2021
```



## Optimal Behaviour

#### Deterministic Task

The task is deterministic thus programming the best possible behaviour is trivial:
- Store the sequence of information
- Trigger if matches


--------

# Formalisation

Here we provide a mathematical formalism to represent the _N-back_ task. The _optimal agent_ will assume the known correct probabilities, whilst the data can will be used to infer parameters that capture these probabilities - allowing us to measure the deviation from optimal performance across individuals.


-------

## State Space

The _N Back_ task can be considered a dichotomous state space, with two mutually exclusive states representing whether or not the current stimuli (lekker) matches that of the stimuli _N_ steps prior. That is:


$$State: S = \{Y, N\}$$

Where: 

$$
\begin{equation}
 Y = \text{Yes: there is a match} \\
 N = \text{No: there is not a match} 
\end{equation}
$$

Where a particular instance is denoted by the lowercase $y$ or $n$ respectively. Further, there is no desire to learn transition dynamics as the states are entirely independent.

$$\text{The function of the state space is only to constrain the action space, by conditioning the action space on the state space.}$$


-----------
## Action Space

Each participant has independent parameters quantifying the probability of the possible actions. (Note: the individual parameters can later incorporate joint distributional information by incorporating Bayesian priors over all participants).

The action space is dichotomous & defined as:


- *C* - _*Correct*_:      accurately signaling whether or not there is a match.
- *I* - _*Incorrect*_:    inaccurately signaling a match erroneously


Thus:

$$Action \ Space: \{C, I\}$$


We can represent the likelikhood of taking a specific action as non-stationary probabilities (non-stationary as they change over the course of the task, thus indexed by time $t$). An individual probability should be learnt for each participant $p$:


\begin{equation}
\delta_t^p(j) \ \ for \ \  j=\{c, i\}
\end{equation}

That is:


\begin{equation}
\delta_t^p(c) = \text{probability of partipicpant $p$ takes the correct action at time $t$}\\
\delta_t^p(i) = \text{probability of partipicpant $p$ takes the incorrect action at time $t$}
\end{equation}


#### Dichotomous Actions

Because the actions are binary, we can simply assume:

$$\delta_t^p(c) = 1 - \delta_t^p(i)$$

Thus only need to model:

$$\delta_t^p(a)$$

Where $a$ is the dichotomosu action taken (signal a match, or do not signal a match).

#### Dependence on State

The action space is conditioned on the state $s$. Inituitively, one can expect the probability of taking the correct action to be higher when the correct action is to do nothing then when the correct action is to signal a match. 

Therefore the parameters ought to capture this dependence on state:

\begin{equation}
\delta_t^p(a|s) \ \ \ \ where \ s=:\{y,n\}
\end{equation}

Thus we arrive at:

\begin{equation}
\delta_t^p(a|s) = \text{probability of partipicpant $p$ takes the correct action $c$ at time $t$, given state $s$}\\
\end{equation}


## Transition probability? - Temporal Dependence

In this particular task, we cane expect the samples to be fairly independent. We know there are no transitition dynamics as there is no dependence between states. 

_*however, is there a need to capture the effects of time? Or does the parameter updating process account for this?*_


## Parameter Updating


------ 

## Optimal model
The optimal model simply results in 

$$\delta_t^p(a|s) = 1$$

_Our primary interest is to assess how different individuals deviate from this optimality by learning the probabilities as a parameter across individuals._



------

# Instantiate Model

We define the model class with learnable parameters:

$$\delta^p(a|s) = 1 \ \ for \ a=\{0,1\}, \ \ s=\{0,1\}$$

$t$ is dropped as it does not index a parameter, but rather indicate the parameter value at a given time period $t$ (during training).

Recall that $\delta^p(a|s)$ denotes:
- $\delta^p(a|s)$: probabilty of action $a$ given state $s$
- $a:\{signal, no-signal\}$: whether or not the participant signals the event
- $s:\{match, no-match\}$: whether or not there is actually a match

The conditional is provided as we expect the probabilites to differ significantly between states.

## Without Data 

Before training the parameters, we can specify them to be some arbitrary value, the optimal value being:

- $\delta^p(1|1) = 1$
- $\delta^p(0|1) = 0$
- $\delta^p(1|0) = 0$
- $\delta^p(0|0) = 1$

## Fitting Data

We are able to fit the data to these parameters by best 



----------

# Next Steps


- Temporal Dependence in the simplies case (N Back task)
- Extend Temporal dependence into the more complicated models (Corsi Block Span etc)
- NBack == above
- Navon == almost identical to this model
- Corsi == greater temporal dependence
- Fitts == known formula, how to handle this??? How to quantify the deviation? Capture the variation overtime?
- WCST  == Q-learning multi-arm bandit task 


## RL - J.Shock

RL is generally the _*optimization technique*_ - requiring *many, many* runs. In our instances, RL is not the optimisation technique, but rather the framework? How would we optimize this? Over many samples (Bayes)?

## Time Series models
## Gausian Processes (Cool)
## Markov Models

# Objective
## 1. Specify statistical framework to represent the task

    - *Action* probabilities
    - *State* possible 
    - *Transition* probabilities
    - *Hierarchical* structure

    - timing parameter
    - transition dynamics (dependency on sequence)
        - additional variable caputuring history (repeated letters)
        - state space: encoding of last _k_ letters (k=4) 
            --> binary encoding of current letter or not
            --> encoding current & previous letter


# Bayesian Inference
- assumed DGP per participant _p_ (distributional assumptions)
- each participant has a unique parameter/s that draws from the (common) distribution
- Bayes regularises via the assumed prior

- different distribution 

## Hidden Markov Models

True model: underlying process
Participant: behaviour 

Statistics: model is a assumption 


In [1]:
# !conda activate dynocog
# !conda init
# !conda install pandas -y
# !conda install -c pytorch pytorch -y
# !conda install -c conda-forge numpyro
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
import torch
import sys
from tqdm import tqdm
import pickle
import plotly.express as px
import plotly.graph_objects as go

# ---- load data module ----x
import sys
sys.path.append('../')
import process_data.process_raw_data as prd
from process_data.process_raw_data import batch_processing 

In [6]:
# ---- reprocess raw data ----x
path  = '../data/data_sample/'
path2 = '../data/data_samples_pandas/'
bp    = prd.batch_processing(path)

bp.create_wcst_data()
bp.create_navon_data()
bp.create_nback_data()
bp.create_corsi_data()
bp.create_fitts_data()
bp.convert_data_to_int()
bp.write_to_pickle(path2)
bp.read_from_pickle(path2)
bp.write_class_to_pickle(path2)



        ------------------------------------------------------------------
                                WCST data created
        ------------------------------------------------------------------

        


        ------------------------------------------------------------------
                                Navon data created
        ------------------------------------------------------------------

        


        ------------------------------------------------------------------
                                N back data created
        ------------------------------------------------------------------

        


        ------------------------------------------------------------------
                                Corsi data created
        ------------------------------------------------------------------

        


        ------------------------------------------------------------------
                                Fitts data created
        ------------

In [7]:
# ---- fetch data object ----x
with open('../data/data_samples_pandas/batch_processing_object.pkl', 'rb') as file2:
    bp = pickle.load(file2)
bp.__dict__.keys()

dict_keys(['path', 'mapping', 'data_times', 'participants', 'parti_code', 'n', 'wcst_paths', 'nback_paths', 'corsi_paths', 'fitts_paths', 'navon_paths', 'wcst_data', 'nback_data', 'corsi_data', 'fitts_data', 'navon_data'])

In [15]:
sub = bp.nback_data.loc[bp.nback_data.participant ==851366.0, ]

In [17]:
sub.head()

Unnamed: 0,participant,participant_code,block_number,score,status,miss,false_alarm,reaction_time_ms,match,stimuli,stimuli_n_1,stimuli_n_2
0,851366.0,s.32ff642a-efe0-436f-8075-fa703d677fed.txt,1,1,0,1,0,0,0,3000,1,14
1,851366.0,s.32ff642a-efe0-436f-8075-fa703d677fed.txt,1,2,0,1,0,0,0,3000,2,13
2,851366.0,s.32ff642a-efe0-436f-8075-fa703d677fed.txt,1,3,0,1,0,0,0,3000,3,10
3,851366.0,s.32ff642a-efe0-436f-8075-fa703d677fed.txt,1,4,1,0,0,1,0,3000,1,13
4,851366.0,s.32ff642a-efe0-436f-8075-fa703d677fed.txt,1,5,0,0,0,0,1,867,2,12
