<!--
IRdisplay::display_html(file='code_hiding.html')
if the line above generates an error, it could be due to this:
https://github.com/IRkernel/IRdisplay/issues/41
In the meantime, the code below is enough; it works on nbviewer but not on the notebook directly
-->
<script>
  code_show=true;
  function code_toggle() {
    if (code_show){
      $('div.input').hide();
    } else {
      $('div.input').show();
    }
    code_show = !code_show
  } 
  $( document ).ready(code_toggle);
</script>
<font size=4>
<a href="javascript:code_toggle()">Toggle ON/OFF</a>
code cells.
</font>

**Author**: Adrian Ernesto Radillo  
**Date**: 05 May 2019

Aim of this notebook is to:  
1. classify trials into invalid and valid ones
2. assign a unique id to each valid trial, for all subjects and all tasks
3. use this unique id as a foreign key in the dots table
4. perform an outer JOIN between dots and trials
5. write the result of the JOIN into a .csv file

In [1]:
# load packages 

# Note:
# if one of the packages below is not installed, type, once, in another cell
# install.packages("<package_name>", lib="<path_to_installation_folder>")
# note that if you don't put the lib arg above, it will default to first item in .libPaths()
# ref:https://www.rdocumentation.org/packages/utils/versions/3.5.2/topics/install.packages

# I can't load the conflicted package here :(
# library(conflicted)
# https://github.com/r-lib/conflicted/issues/26

library(data.table)     # see https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html for reference
library(ggplot2)        # for plots
library(repr)           # for resizing figures
# library(OneR)           # to use the function 'bin'
library(gridExtra)      # to use grid.arrange()

source("../../R/R_functions.r") # custom functions

In [2]:
# DEFINE CONSTANTS
# folder/file-specific constants
PILOT_NUMBERS <- list(15, 16, 17, 18, 19)
PILOT_NUMBER <- paste(PILOT_NUMBERS, collapse = '-')

DATA_FOLDER <- "../../data/"
FIRA_TAG <- "FIRA"

# data-related constants
FIRST_TRIALS_TO_DISCARD <- 4

# plot-specific constants
PLOT_TITLE_FONT_SIZE <- 18
PLOT_SUBTITLE_FONT_SIZE <- 13 
AXES_LABEL_FONT <- 14
AXES_FONT <- 13

ERROR_WIDTH <- 4
SMALL_ERROR_WIDTH <- .01*ERROR_WIDTH

LINE_WIDTH <- 1.4
POINT_SIZE <- 2
SMALL_DOT_SIZE <- 1

# other variables
FRAME_RATE_ESTIMATE <- 60 # Hz
FRAME_DURATION <- (1 / FRAME_RATE_ESTIMATE) # sec

In [3]:
# load csv file into data.table
TRIALS <- fread(file="../../data/Pilot15-19/fixed_FIRA_TRIALS.csv", header=TRUE, sep=',')

In [4]:
NODES = unique(TRIALS[,taskID])
NUM_NODES = length(NODES)

In [5]:
NUM_SUBJECTS <- length(PILOT_NUMBERS)

# Data pre-processing
<a id='preproc'></a>
## Summary of `TRIALS` dataset (`*FIRA.csv` file)

In [6]:
# set some variables to "factor"
TRIALS[,`:=`(
            choice=as.factor(choice), 
            correct=as.logical(correct), # probably a bad idea to have this be a factor variable
            initDirection=as.factor(initDirection),
            endDirection=as.factor(endDirection),
            presenceCP=as.logical(presenceCP))]
# put back in missing values
TRIALS[choice == 'NaN' | correct == 'NaN', `:=`(choice = NA, correct = NA)] 

# display summary for reference
# str(TRIALS)

## Classify trials
We classify trials as follows:
- `valid` means that the trial is kept in the analysis
- `skipped` means that no answer was recorded (this could be linked to fixation break for instance)
- `bug` means the code itself renders the trial unusable (for instance the 1st trial, because of our timing bug)
- `early` means that an answer was provided before the end of the viewing duration

Currently, I don't control for the fact that a single trial might fall into more than one non-valid category. I just make sure all `valid` trials are indeed valid.

In [7]:
setkey(TRIALS, 'trialStart')
for (subject in PILOT_NUMBERS) {
    for (node in NODES) {
        TRIALS[pilotID==subject & taskID==node, trialInSession:=.I]
    }
}

TRIALS[,`:=`(trialClass='valid')]

In [8]:
TRIALS[is.na(choice), trialClass:='skipped']
TRIALS[trialInSession <= FIRST_TRIALS_TO_DISCARD, trialClass:='bug']
TRIALS[RT <= 0, trialClass:='early']
TRIALS[,trialClass:=as.factor(trialClass)]

Let's introduce a `validTrialCount` column in all datasets, with invalid trials flagged `validTrialCount=NA_integer_`.

In [9]:
for (node in NODES) {    
    for (subj in PILOT_NUMBERS) {
        TRIALS[trialClass == 'valid' & taskID==node & pilotID==subj, validTrialCount:=.I]
    }
}

In [10]:
TRIALS[trialClass != 'valid', validTrialCount := NA_integer_]

In [11]:
# create a specific column signedCoherence
TRIALS[, `:=`(
    taskID=as.factor(taskID),
    pilotID=as.factor(pilotID),
    signedCoherence=coherence)]
TRIALS[endDirection==180, signedCoherence := -coherence]
# set values to NA when there is a change point or when trial is not valid
TRIALS[presenceCP | is.na(validTrialCount), signedCoherence := NA_integer_]
# str(TRIALS)

In [12]:
str(TRIALS)

Classes ‘data.table’ and 'data.frame':	2360 obs. of  29 variables:
 $ taskID         : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ trialIndex     : int  5 25 40 2 37 38 14 18 26 50 ...
 $ trialStart     : num  333 339 345 349 355 ...
 $ trialEnd       : num  339 345 349 355 360 ...
 $ RT             : num  NA 0.476 0.639 NA 0.489 ...
 $ choice         : Factor w/ 2 levels "0","1": NA 2 1 NA 2 1 1 1 1 1 ...
 $ correct        : logi  NA TRUE TRUE NA TRUE TRUE ...
 $ initDirection  : Factor w/ 2 levels "0","180": 1 1 2 2 1 2 2 2 2 2 ...
 $ endDirection   : Factor w/ 2 levels "0","180": 1 1 2 2 1 2 2 2 2 2 ...
 $ presenceCP     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ coherence      : num  23 23 21 18 23 16 14 11 9 7 ...
 $ viewingDuration: num  0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 ...
 $ probCP         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ timeCP         : num  0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
 $ randSeedBase   : int  573 2013 1425 9522 7635 5039 5448 21

**Here we dump a .csv file for psychophysical data analysis**

In [14]:
valid_trials <- TRIALS[!is.na(validTrialCount)]

valid_trials[,choice:=droplevels(choice)]        # drop unused level "NA" for choice variable

# print(nrow(pp_dump[choice=="1"]))
levels(valid_trials$choice) <- c('left','right') # rename remaining levels
# print(nrow(pp_dump[choice=="right"]))

# treat presenceCP as factor and rename the labels
# print(nrow(pp_dump[presenceCP==TRUE]))
valid_trials[,presenceCP:=as.factor(presenceCP)]
# print(nrow(pp_dump[presenceCP=="TRUE"]))
levels(valid_trials$presenceCP) <- c('no','yes')
# print(nrow(pp_dump[presenceCP=="yes"]))

valid_trials[,`:=`(targetOff=NULL, fixationOff=NULL, feedbackOn=NULL, seqDumpTime=NA_real_)]

In [15]:
str(valid_trials)

Classes ‘data.table’ and 'data.frame':	2193 obs. of  27 variables:
 $ taskID         : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ trialIndex     : int  37 38 14 18 26 50 24 36 49 46 ...
 $ trialStart     : num  355 360 364 369 372 ...
 $ trialEnd       : num  360 364 369 372 376 ...
 $ RT             : num  0.489 0.321 0.647 0.528 0.481 ...
 $ choice         : Factor w/ 2 levels "left","right": 2 1 1 1 1 1 2 1 2 1 ...
 $ correct        : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ initDirection  : Factor w/ 2 levels "0","180": 1 2 2 2 2 2 2 2 1 2 ...
 $ endDirection   : Factor w/ 2 levels "0","180": 1 2 2 2 2 2 2 2 1 2 ...
 $ presenceCP     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ coherence      : num  23 16 14 11 9 7 6 16 14 13 ...
 $ viewingDuration: num  0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 ...
 $ probCP         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ timeCP         : num  0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
 $ randSeedBase   : int  7635 5039 5

Up to here, `valid_trials` contains all the relevant data of valid trials, and the unique identifier of each trial is a combination of `(validTrialCount, pilotID, taskID)`. But really, the JOIN with the dots table will be performed thanks to the column seqDumpTime, which will act as a foreign key.

# Load dots files

In [18]:
list_of_dots <- list()
for (subj in PILOT_NUMBERS) {
    pilot <- toString(subj)
    dots <- fread(file=paste(DATA_FOLDER,"Pilot",pilot,"/pilot",pilot,"_dotsPositions.csv", sep=''), header=TRUE, sep=',')
    list_of_dots <- c(list_of_dots, list(dots))
}
DOTS <- rbindlist(list_of_dots)

In [19]:
str(DOTS)

Classes ‘data.table’ and 'data.frame':	1938026 obs. of  8 variables:
 $ xpos       : num  0.5997 0.0315 0.4435 0.2796 0.0924 ...
 $ ypos       : num  0.496 0.352 0.5 0.151 0.933 ...
 $ isActive   : int  1 0 0 1 0 0 1 0 0 1 ...
 $ isCoherent : int  1 0 0 1 0 0 0 0 0 0 ...
 $ frameIdx   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ seqDumpTime: num  6049 6049 6049 6049 6049 ...
 $ pilotID    : int  15 15 15 15 15 15 15 15 15 15 ...
 $ taskID     : int  1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, ".internal.selfref")=<externalptr> 


In [24]:
for (subject in PILOT_NUMBERS) {
    for (node in NODES) {
        startEndTimes <- valid_trials[pilotID==subject & taskID==node, .(trialStart, trialEnd)]
        for (row in seq(startEndTimes[,.N])) {
            tstart <- startEndTimes[row,trialStart]
            tend <- startEndTimes[row,trialEnd]
            valid_trials[pilotID==subject & taskID==node & trialStart==tstart & trialEnd==tend, 
                     seqDumpTime:=unique(
                         DOTS[pilotID==subject & taskID==node & seqDumpTime < tend & seqDumpTime > tstart,
                              seqDumpTime])]
        }
    }
}

In [25]:
str(valid_trials)

Classes ‘data.table’ and 'data.frame':	2193 obs. of  27 variables:
 $ taskID         : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ trialIndex     : int  37 38 14 18 26 50 24 36 49 46 ...
 $ trialStart     : num  355 360 364 369 372 ...
 $ trialEnd       : num  360 364 369 372 376 ...
 $ RT             : num  0.489 0.321 0.647 0.528 0.481 ...
 $ choice         : Factor w/ 2 levels "left","right": 2 1 1 1 1 1 2 1 2 1 ...
 $ correct        : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ initDirection  : Factor w/ 2 levels "0","180": 1 2 2 2 2 2 2 2 1 2 ...
 $ endDirection   : Factor w/ 2 levels "0","180": 1 2 2 2 2 2 2 2 1 2 ...
 $ presenceCP     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ coherence      : num  23 16 14 11 9 7 6 16 14 13 ...
 $ viewingDuration: num  0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 ...
 $ probCP         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ timeCP         : num  0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
 $ randSeedBase   : int  7635 5039 5

In [30]:
setkey(valid_trials, seqDumpTime)
setkey(DOTS, seqDumpTime)
tables()

            NAME      NROW NCOL MB
1:          dots   391,092    8 16
2:          DOTS 1,938,026    8 81
3: startEndTimes       196    2  0
4:        TRIALS     2,360   29  0
5:  valid_trials     2,193   27  0
                                                     COLS         KEY
1: xpos,ypos,isActive,isCoherent,frameIdx,seqDumpTime,...            
2: xpos,ypos,isActive,isCoherent,frameIdx,seqDumpTime,... seqDumpTime
3:                                    trialStart,trialEnd  trialStart
4:    taskID,trialIndex,trialStart,trialEnd,RT,choice,...  trialStart
5:    taskID,trialIndex,trialStart,trialEnd,RT,choice,... seqDumpTime
Total: 97MB


In [31]:
# inner join
dots_join_trials <- DOTS[valid_trials, nomatch=0]
str(dots_join_trials)

Classes ‘data.table’ and 'data.frame':	1886506 obs. of  34 variables:
 $ xpos           : num  0.479 0.831 0.405 0.79 0.752 ...
 $ ypos           : num  0.452 0.533 0.918 0.382 0.628 ...
 $ isActive       : int  0 0 1 0 0 1 0 0 1 0 ...
 $ isCoherent     : int  0 0 1 0 0 1 0 0 1 0 ...
 $ frameIdx       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ seqDumpTime    : num  360 360 360 360 360 ...
 $ pilotID        : int  18 18 18 18 18 18 18 18 18 18 ...
 $ taskID         : int  1 1 1 1 1 1 1 1 1 1 ...
 $ i.taskID       : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ trialIndex     : int  37 37 37 37 37 37 37 37 37 37 ...
 $ trialStart     : num  355 355 355 355 355 ...
 $ trialEnd       : num  360 360 360 360 360 ...
 $ RT             : num  0.489 0.489 0.489 0.489 0.489 ...
 $ choice         : Factor w/ 2 levels "left","right": 2 2 2 2 2 2 2 2 2 2 ...
 $ correct        : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ initDirection  : Factor w/ 2 levels "0","180": 1 1 1 1 1 1 1 1 1 1 ...
 $ e

In [32]:
fwrite(dots_join_trials, file = paste(DATA_FOLDER,"Pilot15-19/dots_join_valid_trials.csv", sep=''), na="NA")