# Introduction to post processing and logging with the Empowered Herding simulation
This notebook is intended to provide a high-level overview of the logging from the Empowered Herding simulation.  It's not exhausive documentation(!).

Some terms used in this document are:
- **User / Participant** - the person undertaking the experiment
- **Session** - a single instance of the experiment.  A session starts when the program is first loaded and ends when the program closes.  A session contains multiple trials.
- **Trial** - a single episode of the herding "game".


Original author: Chris Bennett (christopher.bennett@bristol.ac.uk)

Last update: 14/06/2022

## Files

The simulation logs are stored in a direction named with the session id which generated them.  The format of the session id is "yyyymmddThhmmss" e.g. "20220604T151812".  This is the date and time (in ISO 8601 format) when the data was collected.

The following .pkl files should be present in the directory:

 - **user_details**: stores the participant's responses to questions about themselves e.g. whether english is their first language, date of birth etc.
 - **post_test_responses**: stores the participant's responses to the questions asked after each trial e.g. "the sliders".
 - **config_fam_X_simlog**: the simulation data (sim_data) from the familiration trials.  
     - X is numbered 1-4
 - **config_exp_X_Y_Z**: the simuulation data (sim_data) from the experimental trials.  
     - X is numbered 1-12
     - Y is either 
         - absent  (the simulation did not show the state of the dogs empowerement to the participant during the trial)
         - "empshown" (empowerment information was shown)
     - Z is either
         - absent (empowerment was calculated using the "vanilla" method where all states in which the dog changes the state of the flock are counted)
         - "taskweighted" (empowerment was calculated using a method which ones uses states in which the dog effected a change in the flock which moved the flock closer to the goal)

## Data structures

This notebook gives examples for loading the raw data logged by the empowered herding experiment and turning it into (slightly) more user friendly formats.

After running the cells below, the following view of the data should be available:

 - **user_details_df**: a pandas dataframe storing the participant's answers to screening and consent questions
 - **user_responses_df**: a panads dataframe storing the participant's answer to each post-test questionaire (i.e. the sliders). 
     - *user_responses_df.attrs* is a dictionary with two keys: 
         - 'session_id' (the unique identifier for session) 
         - 'test_order' (the names of the config files in the order they were presented to the participant)
 - **events_df** : a pandas data frame which stores everything the user did during the simulation
 - **sim_data['sheep_logs']** : a dictionary referenced by the agent's id which stores when the sheep agent was created, removed and its positions in a structure of type type SimLog.AgentState
 - **sim_data['dog_logs']** : a dictionary referenced by the agent's id which stores when the dog agent was created, removed and its positions in a structure of type type SimLog.AgentState
 
See the section ["Simulation Log Files"](#simulation_log_files) for a walk through for how the simulation stores the actions and state of the simulation at each time step.  


         
         
 




# Setting up the Environment

You will need to edit the paths to match where the simulation logs and empowered_herding program are located

Set the session_id to the unique identifier of the experiment you want to analyse

In [1]:
import os
import numpy as np
import pickle
import glob
import pandas as pd
user_home_dir = os.path.expanduser('~')

#change this to where the empowered herding model is 
os.chdir("C:\\Users\\matth\\TB_Phase_Code\\Empowerment_Integration\\Python Model")#'C:\\TB_Phase_Code\\Empowerment_Integration\\Python Model')

#change this to the name of the session to be analysed
session_id = "20220811T105821"

#change this to match where the logs are stored
# base_path = "C:\\Documents\\simulations\\Empowerment_Integration\\Python Model\\Empowerment Results"
base_path = "C:\\Users\\matth\\TB_Phase_Code\\Empowerment_Integration\\Python Model\\Empowerment Results"
#base_path = os.path.join(user_home_dir, "OneDrive - University of Bristol\\Empowerment Results")

path = os.path.join(base_path, session_id)

#change this to match the name of a single log file of the form config_XXXXXX_simlog.pkl 
log_file_name = "config_exp_1_simlog.pkl"

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\matth\\TB_Phase_Code\\Empowerment_Integration\\Python Model'

In [5]:
# this is needed as a work around to handle simulation logs created as part of the beta testing
import io
class RenameUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        renamed_module = module
        if module == "model.SimLog":
            renamed_module = "model.SimLogBetaVersion"

        return super(RenameUnpickler, self).find_class(renamed_module, name)

def renamed_load(file_obj):
    return RenameUnpickler(file_obj).load()

# Retrieving the Participant's Details
These are the details the participant entered in response to the screening questions and consent forms

In [6]:
#Load the participant's responses
with open(os.path.join(path,"user_details.pkl"), "rb") as input_file:
    user_details = pickle.load(input_file)

In [7]:
print("These are the participant's details")
print("---------------------------")
print(f"Session ID was {user_details[0]}")
print(f"User details where {user_details[1]}")

These are the participant's details
---------------------------
Session ID was 20220808T150232
User details where {'participantnumber': '123', 'english': (('Yes', 0), 0), 'vision': (('Yes', 0), 0), 'colour': (('No', 1), 1), 'age': '2', 'gender': (('Female', 0), 0), 'games': '1'}


And as a pandas data frame

In [8]:
user_details_simple = user_details[1]
user_details_simple['english'] = user_details_simple['english'][0][0]
user_details_simple['colour'] = user_details_simple['colour'][0][0]
user_details_simple['vision'] = user_details_simple['vision'][0][0]
user_details_simple['gender'] = user_details_simple['gender'][0][0]
user_details_df =  pd.DataFrame(user_details_simple, index=[0])
user_details_df

Unnamed: 0,participantnumber,english,vision,colour,age,gender,games
0,123,Yes,Yes,No,2,Female,1


# Retrieving the Participant's Responses to the Post-Trial Questions
These are the responses the participant made with the sliders after each experiment

In [9]:
with open(os.path.join(path,"post_test_responses.pkl"), "rb") as input_file:
    test_responses = pickle.load(input_file)

In [10]:
print(test_responses)

['20220808T150232', ['config_exp_1_empshown', 'config_exp_1'], {'config_exp_1_empshown': {'time': '1', 'engaged': 3, 'part_of_team': 3}, 'config_exp_1': {'time': '4', 'engaged': 3, 'part_of_team': 3}}]


and as a pandas dataframe

In [11]:
#convert all the user responses into a pandas dataframe table
user_responses_df = pd.DataFrame.from_dict(test_responses[2], orient='index')
user_responses_df.attrs = {'session_id': test_responses[0], 'test_order' :  test_responses[1]}

In [12]:
user_responses_df

Unnamed: 0,time,engaged,part_of_team
config_exp_1_empshown,1,3,3
config_exp_1,4,3,3


In [13]:
#Play it safe and sort the data frame in the order the experiments were run (it should already be in this order..)
idx = user_responses_df.index == user_responses_df.attrs['test_order']
user_responses_df = user_responses_df[idx]

In [14]:
print(f"Session ID: {user_responses_df.attrs['session_id']}")
#print(f"test order was {user_responses_df.attrs['test_order']}")
print("in the order tested, responses where")
user_responses_df

Session ID: 20220808T150232
in the order tested, responses where


Unnamed: 0,time,engaged,part_of_team
config_exp_1_empshown,1,3,3
config_exp_1,4,3,3


<a id='simulation_log_files'></a>
# Simulation Log Files

Each trial creates a single file named config_XXXXXX_simlog.pkl where XXXXX describes the parameters used during the trial.  

There (should be!) multiple files with this format inside the directory for a single experiment.

This section follows an example of opening and reading a single log file.

The commands and approach of this section can be extended to open, process and examine multiple files e.g. if we want to compare the performance of different participants.

The first step is to load the file.

Note, the module *"model/SimLog.py"* needs to be in the current directory - *Need to check this and confirm*

In [15]:
try_old_version_b = False
try:
    with open(os.path.join(path, log_file_name), "rb") as input_file:
        sim_data = pickle.load(input_file)
except:
    # the beta sessions created logs using a slightly different version of the data structures
    # (the only difference was userInputs instead of UserInputs!!)
    # Unfortunately, this means we need to explicitly tell pickle to use a different version of the SimLog
    # This work around was taken from https://stackoverflow.com/questions/2121874/python-pickling-after-changing-a-modules-directory
    try_old_version_b = True

if try_old_version_b:
    with open(os.path.join(path, log_file_name), "rb") as input_file:
        sim_data = renamed_load(input_file)        

sim_data is a dictionary which makes uses of two data structures called userInputs* and AgentState to store all the activity which occured during an experiment.
*ok, so bad coding here, this should be capatalised for consistency but 

It has the following five keys:

- **dog_logs** : a dictionary keyed by agent id, stores a log of the state of each dog i.e., 
                  { 1 : AgentState, 2 : AgentState,...dog_id:AgentState,...#Dogs:AgentState}
- **sheep_logs** : a dictionary keyed by agent id, stores a log of the state of each sheep i.e., 
                  { 10000 : AgentState, 10001 : AgentState,...sheep_id:AgentState,...10000+#Sheep:AgentState}
    
- **world_at_t** : a dictionary keyed by simulation step, stores the ids and positions of all the agents present in the world at each simulation step i.e.,
                 t : {  'ids' : 1d numpy array, 
                        'positions' : 2d numpy array where the ith row relates position of the ith id}
                        
- **user_log** :  a data structure which stores all the user interactions which occured during the simulation. It has two parts 
    - *user_log.events_at_t* : the agent events which were caused by user input
    - *user_log.input_at_t* : the user inputs (button presses, times and locations)
                    
                    
- **meta_data**: a dictionary which stores information about the simulation.
    - **config_name**: the name of the config which set the simulation parameters
    - **session_id**: the identifer for the **session** and also the time at which the **session** was started. Note, the session id is **not** the start time of the trial.
    - **start_time**: the time at which the **trial** started
    - **end_time**: the time at which the **trial** ended
    - **taskweighted_empowerment**: True if the empowerement was calculated using a task weighted approach
    - **empowerment_shown**: True if empowerment information (via colour of the dog) was shown to the participant
                    
    

Note, the dog ids start at 1 and the sheep ids start at 10,000.  This is a legacy from the simulations more humble beginnings.  If more than 9,999 dogs are used then the logging will break (although this is practically impossible with a time limit of 2 mins (ish) and a limit of 10 concurrent dogs)
                        
                 
    




In [16]:
print(f"The complete simulation data file for session {session_id}, log file {log_file_name}")
sim_data

The complete simulation data file for session 20220808T150232, log file config_exp_1_simlog.pkl


{'dog_logs': {0: <model.SimLog.AgentState at 0x1c6026f6d30>,
  1: <model.SimLog.AgentState at 0x1c602ee1610>,
  2: <model.SimLog.AgentState at 0x1c60269a520>,
  3: <model.SimLog.AgentState at 0x1c6026985b0>,
  4: <model.SimLog.AgentState at 0x1c6026984c0>,
  5: <model.SimLog.AgentState at 0x1c602698ee0>,
  6: <model.SimLog.AgentState at 0x1c602698760>,
  7: <model.SimLog.AgentState at 0x1c6026987c0>,
  8: <model.SimLog.AgentState at 0x1c6026988b0>,
  9: <model.SimLog.AgentState at 0x1c6729a2cd0>},
 'sheep_logs': {99: <model.SimLog.AgentState at 0x1c6729a2ca0>,
  100: <model.SimLog.AgentState at 0x1c6729a2c70>},
 'world_at_t': {0: {'ids': array([  0.,   1.,   2.,  99., 100.]),
   'positions': array([[ 80.,  80.],
          [ 80., 140.],
          [140.,  80.],
          [200., 200.],
          [220., 190.]])},
  1: {'ids': array([  0.,   1.,   2.,  99., 100.]),
   'positions': array([[ 80.        ,  78.        ],
          [ 80.        , 138.        ],
          [140.        ,  78.     

## Accessing the meta data

In [17]:
print("Meta data for loaded log file")
print("-----------------------------")
sim_data['meta_data']

Meta data for loaded log file
-----------------------------


{'config_name': 'config_exp_1',
 'session_id': '20220808T150232',
 'start_time': datetime.datetime(2022, 8, 8, 15, 5, 35, 996057),
 'end_time': datetime.datetime(2022, 8, 8, 15, 6, 9, 70222),
 'taskweighted_empowerment': False,
 'empowerment_shown': False}

## Converting the positions of each agent at each time step into a big numpy table

It's useful to convert the world data into a big table to make calculating metrics easier

The table is oraganised into the following dimensions:
- **rows (dimesion 0)** are time
- **columns (dimension 1)** are the agent ids, 
- **verticals (dimesion 2)** are the cell connents and these are the agents position

In [18]:
sim_data['dog_logs']

{0: <model.SimLog.AgentState at 0x1c6026f6d30>,
 1: <model.SimLog.AgentState at 0x1c602ee1610>,
 2: <model.SimLog.AgentState at 0x1c60269a520>,
 3: <model.SimLog.AgentState at 0x1c6026985b0>,
 4: <model.SimLog.AgentState at 0x1c6026984c0>,
 5: <model.SimLog.AgentState at 0x1c602698ee0>,
 6: <model.SimLog.AgentState at 0x1c602698760>,
 7: <model.SimLog.AgentState at 0x1c6026987c0>,
 8: <model.SimLog.AgentState at 0x1c6026988b0>,
 9: <model.SimLog.AgentState at 0x1c6729a2cd0>}

In [19]:
dog_ids = list(sim_data['dog_logs'].keys())
sheep_ids = list(sim_data['sheep_logs'].keys())
col_names = dog_ids + sheep_ids
row_names =  list(sim_data['world_at_t'].keys())
#create a big table with 3 dimensions (time, agents, xy-position)
# and initialise it with nan
world_ts = np.empty((len(row_names), len(col_names),2))
world_ts[:] = np.nan

#copy the data from the dictionary world_at_t into the big table
for t in sim_data['world_at_t'].keys():
    # time started from 1 and indexing is from 0
    row_idx = t-1
    #copy the ids and positions from the dicitonary just to make it a bit easier to read the code
    ids = sim_data['world_at_t'][t]['ids']
    positions_t = sim_data['world_at_t'][t]['positions']   
    #loop through each id stored at time t, find which column has been allocated to the id 
    # and copy the relevant position from the dictionary to the table
    for i,iid in enumerate(ids):
        col_idx = np.where(col_names == iid)
        #print(f"row idx {row_idx}, col_idx {col_idx}")
        world_ts[row_idx, col_idx,:] = positions_t[i]

In [20]:
#we can now access the positions of agents at time t using a simple numpy reference
t=25
positions_at_t = np.squeeze(world_ts[t-1,:,:])

print(f"At time {t}:")
for i,id in enumerate(col_names):
    print(f"agent {id} was at position {positions_at_t[i,:]}")
print("\n")    
print(f"Note, a value of [nan, nan] means the agent was not present in the world at time {t}")

At time 25:
agent 0 was at position [114.23916308  74.97341791]
agent 1 was at position [126.11711305 140.04770521]
agent 2 was at position [103.17612294  49.37769886]
agent 3 was at position [nan nan]
agent 4 was at position [nan nan]
agent 5 was at position [nan nan]
agent 6 was at position [nan nan]
agent 7 was at position [nan nan]
agent 8 was at position [nan nan]
agent 9 was at position [nan nan]
agent 99 was at position [199.99207484 200.00419699]
agent 100 was at position [220.00792332 187.99580396]


Note, a value of [nan, nan] means the agent was not present in the world at time 25


## Converting the positions of each agent at each time step into a pandas dataframe
**This isn't working because pandas initialises its cells to NaN (size (1,1)) and we then want to overwrite the cells one by one with a position of size (2,1).**

The numpy code is almost certainly faster but the pandas approach might be easier to read if it worked!

In [21]:
# import pandas as pd
# import numpy as np
# dog_ids = list(sim_data['dog_logs'].keys())
# sheep_ids = list(sim_data['sheep_logs'].keys())
# # col_names = []
# # for iid in dog_ids+sheep_ids:
# #     col_names.append(str(iid) +'x')
# #     col_names.append(str(iid) +'y')
# # print(col_names)

# col_names = dog_ids + sheep_ids
# #create the empty data frame with the columns are the agent ids and the row index is the time step
# world_data_df = pd.DataFrame(index = sim_data['world_at_t'].keys(), columns = col_names, dtype='object')

In [22]:
#world_data_df

In [23]:
# #now need to populate the cells.  There's probably a fast way to do this but for simplicity, just loop through all the world_at_t entries
# for t in sim_data['world_at_t'].keys():
#     row_idx = world_data_df.index==t
#     ids = sim_data['world_at_t'][t]['ids']
#     positions_t = sim_data['world_at_t'][t]['positions']
#     for i,iid in enumerate(ids):
#         col_idx = world_data_df.columns == iid
#         world_data_df.loc[row_idx, col_idx] = [positions_t[i,:]]

## The 'dog_logs' and 'sheep_logs' structures
The logger contains the state information for each agent as a simLog.AgentState class

To access the log of a single agent's state:

* **for dogs**: sim_data["dog_logs"][*agent_number*]
* **for sheep**: sim_data["sheep_logs"][*agent_number*]

Note, the states are only recorded for the time when the agent was present in the simulation.  I.e. if a dog is removed then the state record for that dog ends.  Each new dog is added with a new id and starts a new state record.

In [24]:
sim_data['dog_logs'][1]

<model.SimLog.AgentState at 0x1c602ee1610>

The following properties are available:

'id' : an integer

'state' : a dictionary with the following entries

        {
        id': integer
        'time_created' : simulation tick #
        'time_destroyed': simulation tick #
        'postions' : np 2d array where dimension 0 is time and dimesion 1 is x,y position
        'empowerment' : np 1d array where cell n is the empowerment of the agent at time ['time'][n]
        'time' : the simulation ticks when the agent state was updated, the ith cell relates to the ith cell for 'position' and 'empowerment'
        }
 
 For example, for a dog agent the state information can be accessed as follows:

In [25]:
idog = 1
id = sim_data['dog_logs'][idog].id
tc = sim_data['dog_logs'][idog].state['time_created']
td = sim_data['dog_logs'][idog].state['time_destroyed']
p = sim_data['dog_logs'][idog].state['positions']
time = sim_data['dog_logs'][idog].state['time']
empowerment = sim_data['dog_logs'][idog].state['empowerment']

print(f"Dog with Id {id}")
print(f'at simulation step {tc} the dog was created')
n=4
print(f'at simulation step {time[n]} the dog was at position {p[n,:]}')
print(f'at simulation step {time[n]} the dog had empowerment {empowerment[n]}')
print(f'at simulation step {td} the dog was removed (-1 indicates it was present at the end of the simulation)')

Dog with Id 1
at simulation step 0 the dog was created
at simulation step 3 the dog was at position [ 80.71482967 134.07170678]
at simulation step 3 the dog had empowerment 0.25
at simulation step -1 the dog was removed (-1 indicates it was present at the end of the simulation)


And to see a single agents complete state type:
sim_data['*type*_logs'][*agent_number*].state

e.g., to see the state of the sheep with id=10000

In [26]:
try:
    print(sim_data['sheep_logs'][10000].state)
except:
    #the beta tests started the sheep ids from 100 so if it's an older log then try this instead
    print(sim_data['sheep_logs'][100].state)

{'id': 100, 'time_created': 0, 'time_destroyed': -1, 'positions': array([[220.        , 190.        ],
       [220.        , 190.        ],
       [220.00039983, 189.99980008],
       ...,
       [603.18285427, 482.9256587 ],
       [604.7798285 , 484.1296825 ],
       [605.85240174, 485.38312124]]), 'time': array([   0,    0,    1, ..., 1551, 1552, 1553]), 'empowerment': array([-1, -1, -1, ..., -1, -1, -1])}


To plot the empowerment of a dog over the course of a run, combine together the logged empowerment value and the time at which the value was logged. 

In [27]:
#e.g. for dog with id 1.  The first column is simulation tick and the second column is empowerment
for i in zip(sim_data['dog_logs'][1].state['time'], sim_data['dog_logs'][1].state['empowerment']):
    print(i)

(0, -1.0)
(0, 0.0)
(1, 0.25)
(2, 0.25)
(3, 0.25)
(4, 0.25)
(5, 0.25)
(6, 0.25)
(7, 0.25)
(8, 0.25)
(9, 0.25)
(10, 0.25)
(11, 0.25)
(12, 0.25)
(13, 0.25)
(14, 0.25)
(15, 0.25)
(16, 0.25)
(17, 0.25)
(18, 0.25)
(19, 0.25)
(20, 0.25)
(21, 0.25)
(22, 0.25)
(23, 0.25)
(24, 0.25)
(25, 0.25)
(26, 0.25)
(27, 0.25)
(28, 0.25)
(29, 0.25)
(30, 0.25)
(31, 0.25)
(32, 0.25)
(33, 0.25)
(34, 0.25)
(35, 0.3783976025999559)
(36, 0.407266248137244)
(37, 0.5479243801998422)
(38, 0.6063067000261433)
(39, 0.6673891994940552)
(40, 0.7232208874852512)
(41, 0.7528964097928593)
(42, 0.8139476779647178)
(43, 0.8695502688467219)
(44, 0.92506294787172)
(45, 0.9801972300253203)
(46, 1.0209048203094246)
(47, 1.0601382053603072)
(48, 1.097455321093267)
(49, 1.136284600756318)
(50, 1.169709264628788)
(51, 1.1973874222331804)
(52, 1.2191612530279807)
(53, 1.2350683691271835)
(54, 1.2453152293739014)
(55, 1.2502086128547956)
(56, 1.2500616491318162)
(57, 1.2451066356053437)
(58, 1.2354424783716813)
(59, 1.221025729822127

Using numpy, we can combine the empowerment time series for all dogs into a big table and call it empowerment_ts.  

In [28]:
import numpy as np
dog_ids = list(sim_data['dog_logs'].keys())
col_names = dog_ids
row_names =  list(sim_data['world_at_t'].keys())
#create a big table with 2 dimensions (time, agents)
# and initialise it with nan
empowerment_ts = np.empty((len(row_names), len(col_names)))
empowerment_ts[:] = np.nan

#copy the data from the dictionary world_at_t into the big table
for dog_id in sim_data['dog_logs'].keys():
    dog_state = sim_data['dog_logs'][dog_id].state
    #copy the empowerment
    col_idx = np.where(np.asarray(col_names) == dog_id)
    row_idx = dog_state['time']
    empowerment_ts[row_idx, col_idx] = dog_state['empowerment']

print("The full empowerment table is...")
print("(A value of nan means the agent was not present at that time step)")
print("\n")
empowerment_ts


The full empowerment table is...
(A value of nan means the agent was not present at that time step)




array([[ 0.        ,  0.        ,  0.        , ..., -1.        ,
        -1.        , -1.        ],
       [ 0.        ,  0.25      ,  0.25      , ...,         nan,
                nan,         nan],
       [ 0.        ,  0.25      ,  0.25      , ...,         nan,
                nan,         nan],
       ...,
       [ 0.        ,  0.        ,  0.73030345, ...,  0.61383566,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.73644694, ...,  0.62299359,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.73111624, ...,  0.61215404,
         0.        ,  0.        ]])

In [29]:
print("Or if you really want to handle it as a pandas dataframe")
col_names = ["dog"+str(i) for i in range(empowerment_ts.shape[1])]
df = pd.DataFrame(empowerment_ts, columns = col_names)
df.index.name = 'tick'
df

Or if you really want to handle it as a pandas dataframe


Unnamed: 0_level_0,dog0,dog1,dog2,dog3,dog4,dog5,dog6,dog7,dog8,dog9
tick,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0.0,0.00,0.000000,-1.0,-1.0,-1.000000,-1.0,-1.000000,-1.0,-1.0
1,0.0,0.25,0.250000,,,,,,,
2,0.0,0.25,0.250000,,,,,,,
3,0.0,0.25,0.250000,,,,,,,
4,0.0,0.25,0.250000,,,,,,,
...,...,...,...,...,...,...,...,...,...,...
1549,0.0,0.00,0.724769,0.0,0.0,0.196565,0.0,0.615772,0.0,0.0
1550,0.0,0.00,0.722457,0.0,0.0,0.193422,0.0,0.610673,0.0,0.0
1551,0.0,0.00,0.730303,0.0,0.0,0.191586,0.0,0.613836,0.0,0.0
1552,0.0,0.00,0.736447,0.0,0.0,0.191113,0.0,0.622994,0.0,0.0


Dimesion 0 is time and dimension 1 is the dog id.  The [i,j] cell is therefore the empowerement of dog j at simulation step i.

In [30]:
t = 3
dog_id = 1
# python indexes from 0 but ids start at 1
dog_id = dog_id - 1
print("the dimensions of the table are:" + " " + str(empowerment_ts.shape))
print(f"the empowerment of dog {dog_id} at t={t} was {empowerment_ts[t,dog_id]}")
print(f"Note, a value of nan means the agent was not present in the world at time {t}")


the dimensions of the table are: (1554, 10)
the empowerment of dog 0 at t=3 was 0.0
Note, a value of nan means the agent was not present in the world at time 3


## Accessing the logs of what the participant did
sim_data['user_log'] is a data structure which stores all the user interactions which occured during the simulation. 

It has two parts
- **sim_data['user_log'].events_at_t** : the agent events which were caused by user input
- **sim_data['user_log'].input_at_t** : the user inputs (button presses, times and locations)

### sim_data['user_log'].events_at_t 
When a user interacts with the simulation, a data "frame" is created in the form of a dictionary with the following structure:

    {
        'id' : ordered list of the agent id's affected by the user
        'event : ordered list of the action taken on each id (either add or remove)
        'grid_position' : the grid square the mouse click occured in (this is **not** the same as the screen coordinates)
    }

The frames are stored in a further dictiory that can be indexed by the simulation timestep at which the event took place

e.g. *sim_data['user_log'].events_at_t[t]* accesses the "frame" at time t.

and *sim_data['user_log'].events_at_t[5]['id']* will return the list of affected ids at t=5.

To see **all** time steps the particpant interacted on

In [31]:
sim_data['user_log'].events_at_t

{0: {'id': [0, 1, 2, 99, 100, 3, 4, 5, 6, 7, 8, 9],
  'event': ['add',
   'add',
   'add',
   'add',
   'add',
   'add',
   'add',
   'add',
   'add',
   'add',
   'add',
   'add'],
  'position': [array([80, 80]),
   array([ 80, 140]),
   array([140,  80]),
   array([200, 200]),
   array([220, 190]),
   array([226, 181]),
   array([266, 201]),
   array([266, 201]),
   array([269, 205]),
   array([272, 207]),
   array([278, 213]),
   array([292, 222])]}}

and to see what the user did at time t...

In [32]:
t=10
if t not in sim_data['user_log'].events_at_t.keys():
    print(f"no participant interaction at t={t}")
else:
    print(sim_data['user_log'].events_at_t[t])

no participant interaction at t=10


### sim_data['user_log'].input_at_t
The actual inputs the participant made are recorded and stored in sim_data['user_log'].input_at_t.

The indexing and structure is very similar to sim_data['user_log'].events_at_t except the frame created when a user interacts with the simulation has different fields:

    {
        'type': the name of the (mouse) button pressed
        'screen_position': the position of the mouse pointer on the **screen** when the button was pressed
        'grid_position': the position of the mouse pointer on the **window** when the button was pressed
        'realtime': the realworld time of the event, has the format datetime.datetime(yyyy, mm, dd, hh, mm, ss, microsecond)

    }
    
**Important note**, because in principle more than one event can occur at a single time step, each of the fields is a python list.

So to see **all** the inputs recorded during the simulation...

In [33]:
sim_data['user_log'].input_at_t

{137: {'type': ['MB:DOWN:LEFT'],
  'screen_position': [(226, 181)],
  'realtime': [datetime.datetime(2022, 8, 8, 15, 5, 38, 961704)]},
 201: {'type': ['MB:DOWN:LEFT'],
  'screen_position': [(266, 201)],
  'realtime': [datetime.datetime(2022, 8, 8, 15, 5, 40, 316283)]},
 210: {'type': ['MB:DOWN:LEFT'],
  'screen_position': [(266, 201)],
  'realtime': [datetime.datetime(2022, 8, 8, 15, 5, 40, 509059)]},
 224: {'type': ['MB:DOWN:LEFT'],
  'screen_position': [(269, 205)],
  'realtime': [datetime.datetime(2022, 8, 8, 15, 5, 40, 801325)]},
 236: {'type': ['MB:DOWN:LEFT'],
  'screen_position': [(272, 207)],
  'realtime': [datetime.datetime(2022, 8, 8, 15, 5, 41, 58367)]},
 243: {'type': ['MB:DOWN:LEFT'],
  'screen_position': [(278, 213)],
  'realtime': [datetime.datetime(2022, 8, 8, 15, 5, 41, 203520)]},
 268: {'type': ['MB:DOWN:LEFT'],
  'screen_position': [(292, 222)],
  'realtime': [datetime.datetime(2022, 8, 8, 15, 5, 41, 735848)]}}

We can see the specific input the user made on time step t by typing sim_data['user_log'].input_at_t[t]

In [34]:
t=12
if t not in sim_data['user_log'].input_at_t.keys():
    print(f"no participant interaction at t={t}")
else:
    print(sim_data['user_log'].input_at_t[t])

no participant interaction at t=12


and to access the time of the input at timestep t

In [35]:
t=12
if t not in sim_data['user_log'].input_at_t.keys():
    print(f"no participant interaction at t={t}")
else:
    print(sim_data['user_log'].input_at_t[t]['realtime'][0])

no participant interaction at t=12


### Create a pandas data frame of the user events
Instead of accessing events using the dictonary stuctures of the previous section, we can combine events_at_t and input_at_t into a single pandas data frame holding all the events for a single trial.  

The approach can be extended to create a list of dataframes (or your favourite flavour of array-like structure) to hold the results from multiple trials and experiments. 



First load the log file...

In [36]:
# we did this at the start of the simulation section

Next, load the "input_at_t" dictionary into a data frame indexed by the times at which events took place.  

In [37]:
df_user_inputs = pd.DataFrame.from_dict(sim_data['user_log'].input_at_t, orient='index')
df_user_inputs.index.name = 'sim_time'

In [38]:
df_user_inputs

Unnamed: 0_level_0,type,screen_position,realtime
sim_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
137,[MB:DOWN:LEFT],"[(226, 181)]",[2022-08-08 15:05:38.961704]
201,[MB:DOWN:LEFT],"[(266, 201)]",[2022-08-08 15:05:40.316283]
210,[MB:DOWN:LEFT],"[(266, 201)]",[2022-08-08 15:05:40.509059]
224,[MB:DOWN:LEFT],"[(269, 205)]",[2022-08-08 15:05:40.801325]
236,[MB:DOWN:LEFT],"[(272, 207)]",[2022-08-08 15:05:41.058367]
243,[MB:DOWN:LEFT],"[(278, 213)]",[2022-08-08 15:05:41.203520]
268,[MB:DOWN:LEFT],"[(292, 222)]",[2022-08-08 15:05:41.735848]


Now load the events_at_t into a data frame indexed by the times at which events took place.

In [39]:
events_df = pd.DataFrame.from_dict(sim_data['user_log'].events_at_t, orient='index')
events_df.index.name = 'sim_time'

In [40]:
events_df

Unnamed: 0_level_0,id,event,position
sim_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,"[0, 1, 2, 99, 100, 3, 4, 5, 6, 7, 8, 9]","[add, add, add, add, add, add, add, add, add, ...","[[80, 80], [80, 140], [140, 80], [200, 200], [..."


Finally, combine user_inputs and events_df into a single data frame called events_df

In [42]:
#do a bit of processing to concatinate the user inputs and the agent events into a single table
df = events_df
df['screen_click_coords'] = df_user_inputs['screen_position']
df['user_input'] = df_user_inputs['type']
df['time_of_input'] = df_user_inputs['realtime']
df.rename(columns = {'id':'agent_id', 'screen_position' : 'event_position'})
events_df = df

In [None]:
events_df

Unnamed: 0_level_0,id,event,grid_position,screen_click_coords,user_input,grid_click_coords,time_of_input
sim_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,"[10000, 10001, 10002, 10003, 10004, 1]","[add, add, add, add, add, add]","[(20, 20), (22, 19), (19, 22), (23, 21), (21, ...",,,,
12,[2],[add],"[[6, 38]]","[(381, 71)]",[MB:DOWN:LEFT],"[[7, 38]]",[2022-06-14 16:41:40.300029]
15,[3],[add],"[[14, 19]]","[(188, 163)]",[MB:DOWN:LEFT],"[[16, 19]]",[2022-06-14 16:41:40.502485]
17,[4],[add],"[[26, 12]]","[(143, 276)]",[MB:DOWN:LEFT],"[[28, 14]]",[2022-06-14 16:41:40.634133]


Side note, there are two measurement systems for position used in the simulation.

The simulation uses a grid world to track the position of the agents.  grid_position is where the participant clicked on that grid.  

The simulation displays the graphical elements at a given resolution.  screen_click_coords is the where the participant clicked on the screen relative to the screen's top left position (TODO: check it's definietly top left!).

