# Computationally rational gaze-based interaction

Andrew Howes & Xiuli Chen

University of Birmingham<br>
Aalto University<br>

The purpose of this tutorial is to introduce an approach to building computationally rational models.

It does the following:

* imports libraries,
* defines a cognitive POMDP for computational rationality,
* defines a theory of gaze based interaction as a cognitive POMDP,
* defines the external environment (the task),
* combines the theory and external environment into a machine learning problem (a model) that can be solved with baselines3,
* train the model,
* examines the learning curve to ensure that we are generating an approximately optimal policy,
* animates the model behaviour to develop our intuitions about its adaptation,
* compares the model to human data.

Preqrequisites:

* foveated vision
* Bayesian integration
* POMDP


In [None]:
# Install baselines3
# Only needs to be run once

#!pip install stable_baselines3

In [None]:
# This cell only for users of Google Colab.
# No need to run this if you are using Jupyter notebooks
# Mount Google drive and change directory into the project folder
# Only needs to be run once

#from google.colab import drive
#drive.mount('/content/drive')

#%cd '/content/drive/MyDrive/CHI22CMT/CHI22_CogMod_Tutorial/03-Reinforcement-Learning/034_Gaze_based_Interaction'


In [None]:
# load required standard modules and configure matplotlib

import numpy as np
import math
import matplotlib.pyplot as plt
import sys

import gym
from gym import spaces

import matplotlib as mpl
%matplotlib inline
mpl.style.use('ggplot')

In [None]:
# Load local modules
# gazetools is a module that contains functions for modeling gaze-based interaction.

from gazetools import *

### A Cognitive POMDP

<img src="image/cognitive_POMDP.png" alt="Box diagram of a cognitive model." width="300" height="400">

A cognitive POMDP is a framework for specifying cognitive models.

The first step to formalise this framework is to define the architecture of the model. We do this by specifying a class of cognitive theories and will later define instances of this class.

The class has only a single method, which defines a step through the processes defined in the figure. 

All processes and variables are defined except the agent and its policy! We will show later how to learn the policy with reinforcement learning.

In [None]:
class CognitivePOMDP():

    def __init__(self):
        self.internal_state = {}
        
    def step(self, ext, action):
        ''' Define the cognitive POMDP.'''
        self._update_state_with_action(action)
        response = self._get_response()
        external_state, done = ext.external_env(response)
        stimulus, stimulus_std = self._get_stimulus(ext.external_state)
        self._update_state_with_stimulus(stimulus, stimulus_std)
        obs = self._get_obs()
        reward = self._get_reward()
        return obs, reward, done

### A theory of gaze-based interaction

Each of the entities in CognitivePOMDP must be defined so as to state our theory of gaze-based interaction. The theory makes the following assumptions:

* Target location stimuli are corrupted by Gaussian noise in human vision.
* The standard deviation of noise increases linearly with eccentricity from the fovea.
* Sequences of stimuli are noisily perceived and optimally integrated.
* Intended eye movements (oculomotor actions) are corrupted by signal dependent Gaussian noise to generate responses.

These assumptions are further described in Chen et al. (2021).

In [None]:


class GazeTheory(CognitivePOMDP):

    def __init__(self):
        ''' Initialise the theoretically motivated parameters.'''
        # weight eye movement noise with distance of saccade
        self.oculamotor_noise_weight = 0.01
        # weight noise with eccentricity
        self.stimulus_noise_weight = 0.09
        # step_cost for the reward function
        self.step_cost = -1
        # super.__init__()

    def reset_internal_env(self, external_state):
        ''' The internal state includes the fixation location, the latest estimate of 
        the target location and the target uncertainty. Assumes that there is no 
        uncertainty in the fixation location.
        Assumes that width is known. All numbers are on scale -1 to 1.
        The target_std represents the strength of the prior.'''
        self.internal_state = {'fixation': np.array([-1,-1]),  
                               'target': np.array([0,0]), 
                               'target_std': 0.1,
                               'width': external_state['width'],
                               'action': np.array([-1,-1])} 
        return self._get_obs()    

    def _update_state_with_action(self, action):
        self.internal_state['action'] = action
        
    def _get_response(self):
        ''' Take an action and add noise.'''
        # !!!! should take internal_state as parameter
        move_distance = get_distance( self.internal_state['fixation'], 
                                     self.internal_state['action'] )
        
        ocularmotor_noise = np.random.normal(0, self.oculamotor_noise_weight * move_distance, 
                                        self.internal_state['action'].shape)
        # response is action plus noise
        response = self.internal_state['action'] + ocularmotor_noise
        
        # update the ocularmotor state (internal)
        self.internal_state['fixation'] = response
        
        # make an adjustment if response is out of range. 
        response = np.clip(response,-1,1)
        return response
    
    def _get_stimulus(self, external_state):
        ''' define a psychologically plausible stimulus function in which acuity 
        falls off with eccentricity.''' 
        eccentricity = get_distance( external_state['target'], external_state['fixation'] )
        stm_std = self.stimulus_noise_weight * eccentricity
        stimulus_noise = np.random.normal(0, stm_std, 
                                         external_state['target'].shape)
        # stimulus is the external target location plus noise
        stm = external_state['target'] + stimulus_noise
        return stm, stm_std

    
    def _update_state_with_stimulus(self, stimulus, stimulus_std):
        posterior, posterior_std = self.bayes_update(stimulus, 
                                                     stimulus_std, 
                                                     self.internal_state['target'],
                                                     self.internal_state['target_std'])
        self.internal_state['target'] = posterior
        self.internal_state['target_std'] = posterior_std

    def bayes_update(self, stimulus, stimulus_std, belief, belief_std):
        ''' A Bayes optimal function that integrates multiple stimuluss.
        The belief is the prior.'''
        z1, sigma1 = stimulus, stimulus_std
        z2, sigma2 = belief, belief_std
        w1 = sigma2**2 / (sigma1**2 + sigma2**2)
        w2 = sigma1**2 / (sigma1**2 + sigma2**2)
        posterior = w1*z1 + w2*z2
        posterior_std = np.sqrt( (sigma1**2 * sigma2**2)/(sigma1**2 + sigma2**2) )
        return posterior, posterior_std
    
    def _get_obs(self):
        # the Bayesian posterior has already been calculated so just return it.
        # could also return the target_std so that the controller knows the uncertainty 
        # of the observation.
        #return self.internal_state['target']
        return np.array([self.internal_state['target'][0],
                        self.internal_state['target'][1],
                        self.internal_state['target_std']])
    
    def _get_reward(self):
        distance = get_distance(self.internal_state['fixation'], 
                                self.internal_state['target'])
        
        if distance < self.internal_state['width'] / 2:
            reward = 0
        else:
            reward = -distance # a much better model of the psychological reward function is possible.
            
        return reward


### External environment

In order to test the theory we need to define the external environment. 

The external environment allows us to make predictions from the theory for a particular task. The theory makes predictions for many more tasks. For example, adaptation to mixed target widths and distances.

Note, the external environment is a type of auxiliary assumption. Auxiliary assumptions must not be "auxiliary hypothesis". See Gershman (2019) https://link.springer.com/article/10.3758/s13423-018-1488-8

In [None]:
class GazeTask():
    
    def __init__(self):
        self.target_width = 0.15
        self.target_loc_std = 0.3

    def reset_external_env(self):
        ''' The external_state includes the fixation and target location.
        Choose a new target location and reset to the first fixation location.'''
        
        def _get_new_target():
            x_target =np.clip(np.random.normal(0, self.target_loc_std),-1,1)
            y_target =np.clip(np.random.normal(0, self.target_loc_std),-1,1)         
            return np.array( [x_target, y_target] )
    
        fx = np.array([-1,-1])
        tg = _get_new_target()
        self.external_state = {'fixation':fx, 'target':tg, 'width':self.target_width }
    
    def external_env(self, action):
        self.external_state['fixation'] = action
        
        # determine when the goal has been achieved.
        distance = get_distance(self.external_state['fixation'], 
                                self.external_state['target'])
        if distance < self.external_state['width']/2 :
            done = True
        else:
            done = False
        
        return self.external_state, done
    

### Gym environment

In order to find an optimal policy we use the theory and external environment to define a machine learning problem, here, making use of the framework defined by one specific library called gym.

For further information see: https://gym.openai.com/

gym.Env is a class provided by this library. Note that Env here refers to all of the components of the, including both internal and external environment, with the exception of the controller.

In [None]:
class GazeModel(gym.Env):
    
    def __init__(self):
        
        def default_box(x):
            return spaces.Box(low=-1, high=1, shape=(x, ), dtype=np.float64)
        
        self.GT = GazeTheory()
        self.TX = GazeTask()        
        
        # Required by gym. These define the range of each variable.
        # Each action has an x,y coordinate therefore the box size is 2.
        # Each obs has a an x,y and an uncertainty therefore the box size is 3.
        self.action_space = default_box(2)
        self.observation_space = default_box(3)
        
        # max_fixations per episode. Used to curtail exploration early in training.
        self.max_steps = 500
        
    def reset(self):
        self.n_step = 0
        self.TX.reset_external_env()
        self.GT.reset_internal_env(self.TX.external_state)
        obs = self.GT.reset_internal_env( self.TX.external_state )
        return obs
    
    def step(self, action):
        obs, reward, done = self.GT.step( self.TX, action )
        self.n_step+=1

        # give up if been looking for too long
        if self.n_step > self.max_steps:
            done = True
        
        info = self.get_info()
        return obs, reward, done, info
    
    def get_info(self):
        return {'step': self.n_step,
                'target_width': self.TX.target_width,
                'target_x': self.TX.external_state['target'][0],
                'target_y': self.TX.external_state['target'][1],
                'fixate_x':self.TX.external_state['fixation'][0],
                'fixate_y':self.TX.external_state['fixation'][1] }

### Test the model

Step through the untrained model to check for simple bugs. More comprehensive tests needed.

In [None]:
model = GazeModel()

model.reset()

i=0
done = False
while not done:
    # make a step with a randomly sampled action
    obs, reward, done, info = model.step(model.action_space.sample())
    i+=1

print(i)

### Train the model

We can train the model to generate a controller.

By plotting the learning curve we can see whether the performance improves with training and whether the model approaches an optimum performance. We are interested in approximately optimal performance, so if the training curve is not approaching asymptote then we need to train with more timesteps or revise the model.

We can see that at first the model uses hundreds of fixations to find the target, this is because it has not yet learned to move the gaze in a way that is informed by the observation. As it learns to do this, it takes fewer steps to gaze at the target and its performance improves.

If our problem definition is correct then the model will get more 'human-like' the more that it is trained. In other words, training makes it a better model of interaction.

If we assume that people are computationally rational then the optimal solution to a cognitive problem predicts human behavior.

In [None]:
timesteps = 200000
#timesteps = 50000

controller = train(model, timesteps)
plot_learning_curve()

### Run the model
Run the trained model and save a trace of each episode to csv file.

In [None]:
run_model( model, controller, 100, 'behaviour_trace.csv' )

### Next

Go to notebook 'visualise'

### References
Chen, X., Acharya, A., Oulasvirta, A., & Howes, A. (2021, May). An adaptive model of gaze-based selection. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1-11).