## Extracting Deep Cognitive Features from Bridge Inspectors Eye Tracking Data

Previous to this study we collected data from five different PhD engineering students on the task of inspecting a bridge in a virtual environment. The students were given 3 minutes to inspect a bridge inside of a Unity platform that we have developed, and take images of any defects, cracks, or areas of concern that they found on the bridge. During their inspection we collected data of their character's movement as well as eye movement. Below you see an example of the type of data that was obtained at one time step:

```xml
<?xml version="1.0" encoding="utf-8"?>
<Data>
  <GazeData>
    <DisplayDimensions Width="1920" Height="1080" />
    <Timestamp>902049965729</Timestamp>
    <GazeOrigin>
      <CombinedGazeRayScreen Origin="(7656.03600000, 927.89240000, -1049.60500000)" Direction="(-0.95211820, -0.28659020, 0.10647600)" Valid="True" />
    </GazeOrigin>
    <pupil average_pupildiameter="4.157799" />
    <IntersectionPoint X="4420.004" Y="-46.16226" Z="-687.7166" />
    <HitObject Name="RW_Bridge_Vologda_II_track_LOD0">
      <ObjectPosition X="0" Y="0" Z="0" />
    </HitObject>
    <PositionOnDisplayArea X="0.5182106" Y="0.5221348" />
  </GazeData>
  <CameraData>
    <Timestamp>902049965729</Timestamp>
    <CameraOrigin X="7656.323" Y="927.9785" Z="-1049.637" />
    <CameraDirection X="-0.9624248" Y="-0.2623872" Z="0.06994079" />
  </CameraData>
<Data>
```

One can see information such as the time of the data point, the location and viewing direction of the inspector at this time as well as the location where their eye were looking. This data was collected at a rate of 60 fps, therefore for 3 minutes of data we have 5400 data points.

### Goal

Our goal in this experiment is to determine whether we can use machine learning to make insights into the cognitive behavior of the inspectors based on this eye tracking data. For example, a successful insight would be if we can correctly identify whether a person is planning, searching, or deciding at every point in their search. For example, when the person is first planning where on the bridge to look, then the person might be actively looking, once the person finds something interesting such as a crack they will be deciding whether this is something they should take a picture of or not.

In addition to these fine-scaled, granular behavioral patterns, we would also like to see if machine learning methods can identify more "big picture" patterns, such as classifying search styles based on a larger set of data.

We believe that the insights gathered from these methods can be used as tools to make concrete data-driven decisions for designing training procedures for new inspectors, or comparing the efficiency of different inspection patterns.

### Methodology

In order to extract deep features from our data we propose two methods. 

The first method consists of using dimensionality reduction and clustering techniques such as PCA, t-SNE, and k-means on manually extracted feature vectors, such as velocity, acceleration, rate of fixations to saccades, etc. Then investigating these reduced dimensionality arrays to determine which combination of features creates the most meaningful feature space.

The second method involves using similar dimensionality reduction techniques on feature vectors extracted from unsupervised feature extraction techniques such as Deep Autoencoders, Variational Autoencoders, LSTM Autoencoders, etc.

We will then visualize the two methods and extract clusters of points, then we will interpret the clusters to find whether they correspond to certain parts of the inspection process or different search patterns. We can apply both of these methods on a global as well as local level to get different types of clustering.

For example, if we apply the method "locally" meaning that we divide the dataset into 1 second intervals and perform the clustering on these 1 second intervals then we expect the clusters to represent local/short term cognitive behavior such as whether a person is searching or planning.

On the other hand, if we perform the clustering "globally" meaning that we use the entire data record we expect the clusters to represent more global characteristics about the search, such as search strategy, or who the inspector is.

Something to keep in mind is that we want to anonymize the data records as much as possible to avoid clustering based on trivial factors such as the exact locations that inspectors looked at or other features that wouldn't generalize well and wouldn't truly represent an inspector's cognitive behavior.

### Table of Contents (TO-DO)


<ul>
    <li>Loading Data</li>
    <li>Local clustering</li>
        <ul>
            <li>Breaking up the data</li>
            <li>Method 1
                <ul>
                    <li>Extracting manual features</li>
                    <li>Trying different clustering techniques</li>
                    <li>Visualizing clusters</li>
                </ul>
            </li>
            <li>Method 2
                <ul>
                    <li>Training Deep AE, VAE, and LSTM AE</li>
                    <li>Extracting deep features</li>
                    <li>Clustering / dimensionality reduction of deep features</li>
                    <li>Visualizing clusters</li>
                </ul>
            </li>
            <li>Testing generality of clusters</li>
            <li>Findings and conclusions</li>
        </ul>
    <li>Global clustering
        <ul>
            <li>Extracting manual global features</li>
            <li>Extracting deep features (Training extractor model)</li>
            <li>Clustering techniques</li>
            <li>Visualizing clusters</li>
            <li>Testing generality of clusters</li>
            <li>Findings and conclusions</li>
        </ul>
    </li>
</ul>

### Loading Data <a name="loading-data"></a>

In this section we will create a 