## Visualization of gaze data


In [1]:
import os
import pandas as pd
from alabeye.etdata import ETdata

# Main directory for experiment data.
expdata_dir = os.path.abspath('./sample_input')

# HDF file of gaze data (see ../preprocess_rawdata about how to prepare this file)
hdf_filename = 'office_sample_etdata_compressed.hdf5'

hdf_file = os.path.join(expdata_dir, hdf_filename)

# 'ETdata' class from the 'alabeye.etdata' module handles most of the interface with the data
# an initial look into the hdf file to learn available data in the hdf file 
print(ETdata(data_file=hdf_file).available_tasks)


['office_sample']


In [2]:
# load data for a specific task:
task2load = 'office_sample'

# the video file name associated with this particular experiment task
stimname_map = {'office_sample':'office_sample_vid.mp4'}

# in addition we add some optional info about the participants/subjects
subj_info_file = os.path.join(expdata_dir, 'participant_info.csv')


The file 'participant_info.csv' is a file that includes, at a minimum, two columns of information. The first column contains subjectIDs that are used to save eye-tracking data in the HDF file, while the second column contains information about the group to which each subject belongs, such as 
'ASD' or 'TD'. If this file is not provided, then subjectIDs will be read from the HDF file directly, and all subjects will be assumed to be part of the same group. 

In [3]:
# load the file just to its content, we don't need this step normally.
subj_df = pd.read_csv(subj_info_file)
print(subj_df)

         ID Group
0    A00067   ASD
1    A00071   ASD
2    A00082   ASD
3    A00083   ASD
4    A00093   ASD
..      ...   ...
148  RA0999    TD
149  RA1001    TD
150  RA1003    TD
151  RA1004    TD
152  RA1005    TD

[153 rows x 2 columns]


In [4]:
# use the 'ETdata' class for loading varios information about the experiment
sample_etdata = ETdata(taskname=task2load, data_file=hdf_file, 
                       subj_info_file=subj_info_file,
                       use_subj_info=True, # use the group info for the subjects 
                       stim_dir=expdata_dir, # directory where the video stimulus is
                       stimname_map=stimname_map
                       )

Now we can take a look at various attributes that ETdata class have
"Let's now explore some attributes and methods of the ETdata class."

In [5]:
# subject IDs
subjs = sample_etdata.subjs
print(subjs[:5])

['A00067', 'A00071', 'A00082', 'A00083', 'A00093']


In [6]:
sample_etdata.stim_mediainfo

{'duration': 25.010421008753646,
 'nframes': 600,
 'fps': 23.99,
 'frame_width': 720,
 'frame_height': 480}

In [7]:
sample_etdata.hdf_datakeys[:5]

['/RA1005/office_sample',
 '/RA1004/office_sample',
 '/RA1003/office_sample',
 '/RA1002/office_sample',
 '/RA1001/office_sample']

In [8]:
# Load ET data from the HDF file for a sample subject                           
sample_etdata.load_rawdata(subjs_use=subjs[0])

# note that the ET data is a dictionary with
# the key is subject ID and the value is a pandas.dataframe 
subj0_rawdata = sample_etdata.rawdata

subj0_rawdata

Loading raw data...


100%|██████████| 1/1 [00:00<00:00, 117.38it/s]


{'A00067':         RecTime       GazeX       GazeY  PupilDiameter
 0      0.002554  412.861168  189.484155       3.287816
 1      0.005807  412.075109  188.519504       3.290534
 2      0.009169  412.557714  191.450572       3.292244
 3      0.012507  413.389542  188.727479       3.287925
 4      0.015879  413.669093  192.624907       3.304099
 ...         ...         ...         ...            ...
 7496  24.986201  249.658202   90.128785       3.631838
 7497  24.989552  249.601103   91.402781       3.630762
 7498  24.992909  249.772453   88.924327       3.627809
 7499  24.996273  249.687696   90.123242       3.621965
 7500  24.999531  248.554462   89.683546       3.625768
 
 [7501 rows x 4 columns]}

In [9]:
# Load ET data from the HDF file for all subjects
sample_etdata.load_rawdata()

# see raw ET data, which is a dictionary with
# keys = subject IDs and values = pandas.dataframes 
all_subjs_rawdata = sample_etdata.rawdata

print(f"'all_subjs_rawdata' is a {type(all_subjs_rawdata)} with size of {len(all_subjs_rawdata)}." )

Loading raw data...


100%|██████████| 153/153 [00:00<00:00, 159.26it/s]

'all_subjs_rawdata' is a <class 'dict'> with size of 153.





Note that when dealing with a large number of participants or long-duration experiments, it may not be optimal to load all the data at once as this could consume a large amount of memory. A better approach to handling data is to use ETdata.get_timebinned_data() instead. This ETdata method loads the eye-tracking data for each subject individually and downsamples the data by averaging the gaze data points within a specific time bin duration.

In [10]:

# Loading ET data using a time bin. For analyses, we usually need to time bin the data.
# enter a time bin duration in seconds or set timebin_sec='frame_duration' [default]
sample_etdata.get_timebinned_data(timebin_sec='frame_duration', 
                                  split_groups=True, # splits ASDs and TDs into two groups in returned data
                                  bin_operation='mean', 
                                  fix_length=True, # use the same number of time bins across subjects
                                  save_output=True, # save downsampled data
                                  output_dir='sample_pkldata', # a directory to save the output file
                                  )

tbinned_data = sample_etdata.data

Loading raw data and applying timebins...


100%|██████████| 153/153 [00:04<00:00, 36.04it/s]


Note that once ETdata.get_timebinned_data() method is run with the `save_output=True` option, the downsampled data is saved in the user-provided `output_dir`. Then, the downsampled data can be loaded directly without having to load the initial raw data again.

In [11]:
# making a new instance of the ETdata class while quickly loading the data from the saved downsampled data
sampledata_pickle_file = './sample_pkldata/timebinned_data_office_sample_frame_duration.pkl'
sample_etdata_re = ETdata(data_file=sampledata_pickle_file,
                          subj_info_file=subj_info_file,
                          use_subj_info=True, # use the group info for the subjects 
                          stim_dir=expdata_dir, # directory where the video stimulus is
                          stimname_map=stimname_map)


Loaded data for taskname: office_sample


In [12]:
sample_etdata_re.stim_mediainfo

{'duration': 25.010421008753646,
 'nframes': 600,
 'fps': 23.99,
 'frame_width': 720,
 'frame_height': 480}

In [None]:
# A quick visualization of gaze data
sample_etdata_re.visualize_gaze(save_viz=True,
                                show_viz=True, # if you get an openCV related error, set this to False
                                )


This command will generate the following visualization:

![gaze scatters](./gaze_visualization/office_sample_vid_ETgaze.gif)


In [None]:
# Visualize two groups separately with heatmaps.
sample_etdata_re.visualize_2groups(save_viz=True,
                                   show_viz=True, # if you get an openCV related error, set this to False
                                  )



This command will generate the following visualization:

![heatmaps viz](./gaze_visualization/office_sample_vid_compare_grps.gif)
