# Behavioural Pre-processing Eyetracking Grief
This script is for preprocessing the second half of the eyetracking grief experiment. 
1. It reads the .csv ouput file from OS
2. It isolates the behavioural responses (RT) of the participants
3. Provides two useful columns of stimulus and the response time of the participant

## Define WD
- check current working directory
- define working directory 

In [1]:
# Step 1
import os
print(os.getcwd())

# Step 2
os.chdir("C:\\Users\\katec\\Documents\\Eyetracking_grief\\Analysis\\Behavioural") 
print(os.getcwd())

C:\Users\katec\Documents\Eyetracking_grief\Analysis\Behavioural
C:\Users\katec\Documents\Eyetracking_grief\Analysis\Behavioural


## Read the data
- import necessary package to read & transform the data 
- read the data
- transform it to datamatrix

In [2]:
from datamatrix import io
from datamatrix import io, DataMatrix
from datamatrix import operations as ops
from datamatrix import functional as fnc

If you want to run one participant at a time  (however note that then you need to change the datamatrix name below from dm_all to dm)

In [3]:
# dm = io.readtxt('subject-1.csv')
# print(dm)

Otherwise, combine all participants data in one file.

In [4]:
import os

files = os.listdir("C:\\Users\\katec\\Documents\\Eyetracking_grief\\Analysis\\Behavioural")
print(files)

# define the files that are in the same folder but are not data
unwanted = ['.ipynb_checkpoints', 'BG_prepro_v01.ipynb', 'output']

# remove these files from the inputlist
files = [ele for ele in files if ele not in unwanted]

dm_all = DataMatrix()
for file in files:    
    dm = io.readtxt(file)
    dm_all <<= dm

print(dm_all)
print(dm_all.column_names)

['.ipynb_checkpoints', 'BG_prepro_v01.ipynb', 'output', 'subject-1.csv', 'subject-2.csv', 'subject-3.csv', 'subject-4.csv', 'subject-5.csv', 'subject-6.csv']
+----+-----------+-----------+-----------------------+-----------+------------+------+
| #  |    acc    |  accuracy | average_response_time |   avg_rt  | background | bidi |
+----+-----------+-----------+-----------------------+-----------+------------+------+
| 0  | undefined | undefined |       undefined       | undefined |   black    | yes  |
| 1  | undefined | undefined |       undefined       | undefined |   black    | yes  |
| 2  | undefined | undefined |       undefined       | undefined |   black    | yes  |
| 3  | undefined | undefined |       undefined       | undefined |   black    | yes  |
| 4  | undefined | undefined |       undefined       | undefined |   black    | yes  |
| 5  | undefined | undefined |       undefined       | undefined |   black    | yes  |
| 6  | undefined | undefined |       undefined       | unde

## Explore the data
- print first two rows from the dm to see your variables and their values
- choose what you need
- create your subset by dropping any unnecessary variables

What do we need here? 
1. Subject Number
2. Only the behavioural trials: "free_choice" & "free_choise_words"
3. the stimulus (portraits_a)
4. response_time: 
    - **response_time** : interval in ms between the display of image  and keypress.
    - **time_wait_keypress** :  timestamp of a keypress since the start of the experiment.

We can additionally include for now the below variables just for a quick sanity check.
    - **time_fc_portraits_trial** : timestamp of when a fc (i.e. free choice) trial ended. 
    - **time_portraits_fc** : timestamp of the last time a fc protrait sketchpad was shown.
    - **time_fcw_trial** :timestamp of when a fcw trial ended.
    - **time_fcw_portraits** : timestamp of the last time a fcw protrait sketchpad was shown.
    

In [5]:
# Drop any unnecessary columns

dm_all = ops.keep_only(dm_all, dm_all.subject_nr, dm_all.portraits_a, dm_all.words_list, dm_all.response_time, dm_all.time_wait_keypress,  dm_all.time_portraits_fc, dm_all.time_fc_portraits_trial, 
                   dm_all.time_fcw_trial, dm_all.time_fcw_portraits)
print(dm_all)

+----+-----------------+---------------+------------+-------------------------+--------------------+----------------+
| #  |   portraits_a   | response_time | subject_nr | time_fc_portraits_trial | time_fcw_portraits | time_fcw_trial |
+----+-----------------+---------------+------------+-------------------------+--------------------+----------------+
| 0  |        NA       |  1.154900E+04 |     1      |            NA           |         NA         |       NA       |
| 1  |        NA       |  1.154900E+04 |     1      |            NA           |         NA         |       NA       |
| 2  |        NA       |  1.154900E+04 |     1      |            NA           |         NA         |       NA       |
| 3  |        NA       |  1.154900E+04 |     1      |            NA           |         NA         |       NA       |
| 4  |        NA       |  1.154900E+04 |     1      |            NA           |         NA         |       NA       |
| 5  |        NA       |  1.154900E+04 |     1      |   

In [6]:
# Drop rows from the first task; keep only the free_choice(fc) & free_choice_words(fcw) trials

dm_all = (dm_all.portraits_a != "NA")
print(dm_all)

+----+-----------------+---------------+------------+-------------------------+--------------------+----------------+
| #  |   portraits_a   | response_time | subject_nr | time_fc_portraits_trial | time_fcw_portraits | time_fcw_trial |
+----+-----------------+---------------+------------+-------------------------+--------------------+----------------+
| 15 | portrait_s1.jpg |  2.488000E+03 |     1      |       6.890200E+04      |         NA         |       NA       |
| 16 | portrait_d1.jpg |  1.959000E+03 |     1      |       7.147900E+04      |         NA         |       NA       |
| 17 |     la1.jpg     |  1.047000E+03 |     1      |       7.353200E+04      |         NA         |       NA       |
| 18 | portrait_d2.jpg |  1.406000E+03 |     1      |       7.466900E+04      |         NA         |       NA       |
| 19 |     la2.jpg     |  1.775000E+03 |     1      |       7.617400E+04      |         NA         |       NA       |
| 20 | portrait_s2.jpg |  3.834000E+03 |     1      |   

In [7]:
# Add a column with trial number (for convenience)

dm_all.trial = range(len(dm_all)) 
dm_all.trial+= 1

print(dm_all.trial)
print(dm_all.column_names)

col[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
['portraits_a', 'response_time', 'subject_nr', 'time_fc_portraits_trial', 'time_fcw_portraits', 'time_fcw_trial', 'time_portraits_fc', 'time_wait_keypress', 'trial', 'words_list']


In [8]:
# Quick check of each trial

for row in dm_all:
    print(row)

+-------------------------+-----------------+
|           Name          |      Value      |
+-------------------------+-----------------+
|       portraits_a       | portrait_s1.jpg |
|      response_time      |       2488      |
|        subject_nr       |        1        |
| time_fc_portraits_trial |      68902      |
|    time_fcw_portraits   |        NA       |
|      time_fcw_trial     |        NA       |
|    time_portraits_fc    |      68905      |
|    time_wait_keypress   |      68905      |
|          trial          |        1        |
|        words_list       |        NA       |
+-------------------------+-----------------+
+-------------------------+-----------------+
|           Name          |      Value      |
+-------------------------+-----------------+
|       portraits_a       | portrait_d1.jpg |
|      response_time      |       1959      |
|        subject_nr       |        1        |
| time_fc_portraits_trial |      71479      |
|    time_fcw_portraits   |       

+-------------------------+-----------------+
+-------------------------+-----------------+
|           Name          |      Value      |
+-------------------------+-----------------+
|       portraits_a       | portrait_d1.jpg |
|      response_time      |       1972      |
|        subject_nr       |        2        |
| time_fc_portraits_trial |      69961      |
|    time_fcw_portraits   |      76806      |
|      time_fcw_trial     |      76803      |
|    time_portraits_fc    |      69963      |
|    time_wait_keypress   |      76806      |
|          trial          |        20       |
|        words_list       |   [fc_d_word]   |
+-------------------------+-----------------+
+-------------------------+---------+
|           Name          |  Value  |
+-------------------------+---------+
|       portraits_a       | la1.jpg |
|      response_time      |   2237  |
|        subject_nr       |    2    |
| time_fc_portraits_trial |  69961  |
|    time_fcw_portraits   |  78893  |
|     

+-------------------------+-----------------+
+-------------------------+---------+
|           Name          |  Value  |
+-------------------------+---------+
|       portraits_a       | la2.jpg |
|      response_time      |    61   |
|        subject_nr       |    4    |
| time_fc_portraits_trial |  161533 |
|    time_fcw_portraits   |  162847 |
|      time_fcw_trial     |  162844 |
|    time_portraits_fc    |  161537 |
|    time_wait_keypress   |  162847 |
|          trial          |    47   |
|        words_list       |         |
+-------------------------+---------+
+-------------------------+-----------------+
|           Name          |      Value      |
+-------------------------+-----------------+
|       portraits_a       | portrait_s2.jpg |
|      response_time      |        83       |
|        subject_nr       |        4        |
| time_fc_portraits_trial |      161533     |
|    time_fcw_portraits   |      163039     |
|      time_fcw_trial     |      163035     |
|    tim

###### Notes
Sanity check: *time_wait_keypress* & *time_portraits_fc/fcw* overlap -->  this means that the portrait indeed disappeared when the key was pressed. Also, *time_fc/fcw_portraits_trial* gives you a broad idea of the overall timing of the experiment, but no more.
Thus, we can omit these variables.

We will only keep time_fcw_portraits to help us divide the data into easily reading output for fc & fcw.


In [9]:
dm_all = ops.keep_only(dm_all, dm_all.subject_nr, dm_all.trial, dm_all.portraits_a, dm_all.words_list, dm_all.response_time,
                   dm_all.time_wait_keypress, dm_all.time_fcw_portraits)

In [10]:
# To keep only the essential columns, we will add the string "_word" in the trials of fcw to distinguish in whihc part of 
# the experiment we are. 

# Careful, run only once!

for row in dm_all:
    if row.time_fcw_portraits != "NA":
        if "portrait_" in row.portraits_a: 
            row.portraits_a = row.portraits_a + "_word"
        print(row)
        
        
#However if you want to also recode landscape images ("la") as being in the fcw block, then run:

# for row in dm:
#     if row.time_fcw_portraits != "NA":
#         row.portraits_a = row.portraits_a + "_word"
#         print(row)

+--------------------+----------------------+
|        Name        |        Value         |
+--------------------+----------------------+
|    portraits_a     | portrait_s1.jpg_word |
|   response_time    |         1208         |
|     subject_nr     |          1           |
| time_fcw_portraits |        82024         |
| time_wait_keypress |        82024         |
|       trial        |          7           |
|     words_list     |     [fc_s_word]      |
+--------------------+----------------------+
+--------------------+----------------------+
|        Name        |        Value         |
+--------------------+----------------------+
|    portraits_a     | portrait_d1.jpg_word |
|   response_time    |         1103         |
|     subject_nr     |          1           |
| time_fcw_portraits |        83333         |
| time_wait_keypress |        83333         |
|       trial        |          8           |
|     words_list     |     [fc_d_word]      |
+--------------------+------------

+--------------------+---------+
+--------------------+----------------------+
|        Name        |        Value         |
+--------------------+----------------------+
|    portraits_a     | portrait_s2.jpg_word |
|   response_time    |          83          |
|     subject_nr     |          4           |
| time_fcw_portraits |        163039        |
| time_wait_keypress |        163039        |
|       trial        |          48          |
|     words_list     |     [fc_s_word]      |
+--------------------+----------------------+
+--------------------+----------------------+
|        Name        |        Value         |
+--------------------+----------------------+
|    portraits_a     | portrait_s1.jpg_word |
|   response_time    |         251          |
|     subject_nr     |          5           |
| time_fcw_portraits |        187688        |
| time_wait_keypress |        187688        |
|       trial        |          55          |
|     words_list     |     [fc_s_word]      |
+

## Drop some extra columns

1. The *dm.words_list*   does not provide any further information--> as we know from the experimental design that strangersare matched with neutral words and deceaseds with loss words. 
    <span style="color:#606060">*(we do not get any specific info about what word appears when, only about its class)*</span>.
    Therefore, we can drop this. 

2. *time_fcw_portraits* is no longer of use. We can drop this as well.  

In [11]:
del dm_all.words_list
del dm_all.time_fcw_portraits
print(dm_all)

+----+----------------------+---------------+------------+--------------------+-------+
| #  |     portraits_a      | response_time | subject_nr | time_wait_keypress | trial |
+----+----------------------+---------------+------------+--------------------+-------+
| 15 |   portrait_s1.jpg    |  2.488000E+03 |     1      |    6.890500E+04    |   1   |
| 16 |   portrait_d1.jpg    |  1.959000E+03 |     1      |    7.148100E+04    |   2   |
| 17 |       la1.jpg        |  1.047000E+03 |     1      |    7.353400E+04    |   3   |
| 18 |   portrait_d2.jpg    |  1.406000E+03 |     1      |    7.467200E+04    |   4   |
| 19 |       la2.jpg        |  1.775000E+03 |     1      |    7.617700E+04    |   5   |
| 20 |   portrait_s2.jpg    |  3.834000E+03 |     1      |    7.804100E+04    |   6   |
| 21 | portrait_s1.jpg_word |  1.208000E+03 |     1      |    8.202400E+04    |   7   |
| 22 | portrait_d1.jpg_word |  1.103000E+03 |     1      |    8.333300E+04    |   8   |
| 23 |       la1.jpg        |  1

## Write the dm to a csv/xlsx file

In [12]:
# Change output directory 
os.chdir("C:\\Users\\katec\\Documents\\Eyetracking_grief\\Analysis\\Behavioural\\output")

# Save output file
# io.writetxt(dm, 'data_subj{}.csv'.format(1)) #example for subj1

#io.writexlsx(dm, 'data_subj1.xlsx') #or if you prefer xlsx

io.writetxt(dm_all, 'data.csv') #for all subjs