# Density-based clustering of webcam-based eyetracking data into fixations and saccades using Machine Learning


### Authored by: Taimur Khan, Benjamin Nava Höer
**Final Project for TU Berlin WU'20 course: Machine Learning using Python - Theory and Application**

**Source-code: [Github](https://github.com/thisistaimur/TUB_WU_FinalProject)**

___**Licensed under:**___ [Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)](https://creativecommons.org/licenses/by-nc-nd/4.0/)

### 1. Abstract

Machine Learning(ML) methods have shown promissing results in the classification of eyetracking data into fixations and saccades. However, present ML models for such classsification are trained with data from eyetracking hardware, and hence do not perform well on webcam-based eyetrackng. Additonally, no labeled (fixations and saccades) dataset exists for webcam-based eyetracking data.

Here, an unlabeled dataset of an eyetracking timeseries was clustered using the spatial clustering algorithms DBSCAN and OPTICS, as well as the spatio-temporal clustering algorithms ST-DBSCAN and ST-OPTICS. The silhouette score was not found to be the appropriate evaluation metric for the obtained clusterings. A second, heuristically hand-labelled dataset was used to evaluate the accuracy of the most promising algorithms ST-DBSCAN and ST-OPTICS. 75.38% of the predicted labels in ST-OPTICS and 85.02% with ST-DBSCAN matched the provided labels, making these a valuable tool for the labeling of webcam-based eyetracking data.

Although ST-DBSCAN shows higher accuracy value, it must be noted that the hand-labeled dataset used to measure model accuracy contains equally spread out gaze points, as compared to "wild" eyetracking datasets where the data is not as equally spaced. The resulting variance in cluster densities might cause an accuracy loss for ST-DBSCAN in "wild" data. A further analysis of hand labaled datasets is suggested to create a better understanding of the performance difference between ST-OPTICS and ST-DBSCAN.

![SegmentLocal](resources/3d-DBSCAN.gif "segment")


### 2. Introduction

**2.1. The Problem** 

Human gaze is classified into two natural events: Fixations and Saccades. However, event detection is a challenging stage in eyetracking data analysis. A major drawback of current event detection methods is that parameters have to be adjusted based on eye movement data quality. Such noise is even further exagerated in data gathered by Adsata's webcam-based eyetracking system (Webgazer), owing to low frequency and high noise data collection. Here we show that a fully automated clustering of raw gaze samples can help to create labeled datasets with clusters belonging to fixations or noise, using a machine-learning approach. Any already manually or algorithmically detected features of the dataset can then be used to further train a classifier to classify the saccades out of the noise clusters without the need for a user to set parameters. In this study, we explore the application of the following machine learning clustering methods for the detection of fixations:

1. DBSCAN : Density-Based Spatial Clustering of Applications with Noise
2. ST-DBSCAN : Spatio-Temporal Density-Based Spatial Clustering of Applications with Noise
3. OPTICS: Ordering Points To Identify the Clustering Structure
4. ST-OPTICS: Spatio-Temporal Ordering Points To Identify the Clustering Structure


In an effort to show the practical utility of the proposed methods to the applications that employ eye movement classification, we provide an bechmark of which methods perform best.

___Fig 1:___ Data comparison between Webgazer and Tobii-Pro X3-120 showing the different in data quality and noise.


<img src="resources\problem.png" alt="Drawing" style="width: 700px;"/>

**2.2. About the datasets**

___Dataset 1___

Dataset 1 is the primary dataset for Webgazer vs Tobii-Pro X3-120 study collected for Schreiber, 2020 [5] hosted by Adsata and Martin-Luther-University. This was done to evaluate the behavior of webgazer when the hardware detects a fixation or a saccade. However, the discrepancy between the frequency rates of the Tobii-Pro and Webgazer did not allow for labeling of Webgazer data that directly corresponded the data from Tobii-Pro. 


___Fig 2:___ Set-up for Schreiber 2020 [5]


<img src="resources\ds1-setup.png" alt="Drawing" style="width: 700px;"/>


___Fig 3:___ Sample results for Schreiber 2020 [5]

<img src="resources\ds1-results.png" alt="Drawing" style="width: 700px;"/>

___Dataset 2___

This dataset uses a heuristic approach to "hand label" the data into fixations and saccades taken from Schreiber, 2020 [5]


Heuristic:

- Drag & Drop --> Saccade
- Click --> Fixations


___Fig 4:___ Heuristic method for "hand labeling" the data

<img src="resources\ds2.png" alt="Drawing" style="width: 700px;"/>


___Fig 5:___ Comparison of 2 samples collected for Dataset 2

<img src="resources\ds2-samples.png" alt="Drawing" style="width: 700px;"/>



### 3. Theoretical Rationalization

#### 3.1 DBSCAN : Density-Based Spatial Clustering of Applications with Noise [3]
DBSCAN discovers arbitrarily shaped clusters in a dataset using a radius value $\epsilon$ based on a user defined distance metric, i.e. euclidean. Additionally, a MinSamples value defines the minimal number of points that should occur within $\epsilon$ radius. Given the neighborhood of $p$ as $N(p) := \{q \in D: d(p,q) \leq \epsilon\}$ with $D := dataset$ and $p$ and $q$ as points therein, this leads to the following three kinds of points:
- Core points: $\mid N(p)\mid \geq MinSamples$
- Border points: $\mid N(p)\mid < MinSamples$
- Else: Noise

- Parameters: 
    1. Epsilon ($\epsilon$)
    2. MinSamples

Source code: https://github.com/scikit-learn/scikit-learn/blob/b3ea3ed6a/sklearn/cluster/_dbscan.py#L148


#### 3.2 ST-DBSCAN : Spatio-Temporal Density-Based Spatial Clustering of Applications with Noise [2]
ST-DBSCAN extends builds on DBSCAN by adding a second, temporal radius value $\epsilon_{2}$. Analogous distance metrics as for $\epsilon$ can be used, i.e. euclidean. The neighborhood of a point is now described by both $\epsilon$ and $\epsilon_{2}$: 

$N(p) := \{q \in D: d_{1}(p,q) \leq \epsilon_{1},d_{2}(p,q) \leq \epsilon_{2}\}$

Thereupon the points in the dataset will be classified according to the above mentioned categories.

- Parameters: 
    1. Epsilon ($\epsilon_{1}$)
    2. Epsilon ($\epsilon_{2}$) 
    3. MinSamples

Source Code: https://github.com/eren-ck/st_dbscan

#### 3.3 OPTICS: Ordering Points To Identify the Clustering Structure [1]

"The OPTICS algorithm shares many similarities with the DBSCAN algorithm, and can be considered a generalization of DBSCAN that relaxes the eps requirement from a single value to a value range. The key difference between DBSCAN and OPTICS is that the OPTICS algorithm builds a reachability graph, which assigns each sample both a reachability_ distance, and a spot within the cluster ordering_ attribute; these two attributes are assigned when the model is fitted, and are used to determine cluster membership. If OPTICS is run with the default value of inf set for max_eps, then DBSCAN style cluster extraction can be performed repeatedly in linear time for any given eps value using the cluster_optics_dbscan method. Setting max_eps to a lower value will result in shorter run times, and can be thought of as the maximum neighborhood radius from each point to find other potential reachable points." - SK Learn Docs

Like DBSCAN, OPTICS requires two parameters: $\epsilon$, which describes the maximum distance (radius) to consider, and $MinSamples$, describing the number of points required to form a cluster. A point $p$ is a ''core point'' if at least $\epsilon_{1}$|MinPts points are found within its $\epsilon$-neighborhood $\epsilon(p)$ (including point $p$ itself). In contrast to $\epsilon$, OPTICS also considers points that are part of a more densely packed cluster, so each point is assigned a ''core distance'' that describes the distance to the $MinSamples$th closest point:

$core-dist{\epsilon,MinPts}(p)=$
\begin{cases}
\text{UNDEFINED} & \text{if } |N_\varepsilon(p)| < \mathit{MinPts}\\ 
\mathit{MinPts}\text{-th smallest distance in } N_\varepsilon(p) & \text{otherwise}
\end{cases}</math>

The ''reachability-distance'' of another point $o$ from a point $p$ is either the distance between $o$ and $p$, or the core distance of $p$, whichever is bigger:

$reachability-dist{\epsilon,MinPts}(o, p)=$
\begin{cases}
\text{UNDEFINED} & \text{if } |N_\varepsilon(p)| < \mathit{MinPts}\\ 
\max(\text{core-dist}_\mathit{\varepsilon,MinPts}(p), \text{dist}(p,o)) & \text{otherwise}
\end{cases}</math>

If $p$ and $o$ are nearest neighbors, this is the $\epsilon' < \epsilon $ we need to assume to have $p$ and $o$ belong to the same cluster.

- Parameters: 
    1. Xi ($\epsilon$)
    3. MinSamples

Source code: https://github.com/scikit-learn/scikit-learn/blob/b3ea3ed6a/sklearn/cluster/_optics.py#L24

#### 3.4 ST-OPTICS: Spatio-Temporal Ordering Points To Identify the Clustering Structure [4]

ST-OPTICS extends OPTICS by also taking a temporal radius into the calculation.

- Parameters: 
    1. Xi ($\epsilon_1$)
    2. Epsilon ($\epsilon_{2}$) 
    3. MinSamples


    



Source code: https://github.com/eren-ck/st_optics


### 4. Implementation

**4.0. Setup Environment**

In [None]:
 # OPTIONAL: Python package installations
    
!pip install st-dbscan #https://github.com/eren-ck/st_dbscan
!pip install ipympl
!pip install st_optics #https://github.com/eren-ck/st_optics

In [23]:
# Import project dependencies

import json
import urllib.request
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN, OPTICS, cluster_optics_dbscan
from st_dbscan import ST_DBSCAN
from mpl_toolkits.mplot3d import Axes3D
from sklearn.metrics import silhouette_score
from st_dbscan import ST_DBSCAN
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import animation
from st_optics import ST_OPTICS
from sklearn.metrics import accuracy_score, confusion_matrix, plot_confusion_matrix


**4.1. Load and explore data**

In [24]:
# Loading Dataset 1 and store in dataframe 'df1'
url1 = urllib.request.urlopen("http://dschr.de/api/resultCombineData")
data1 = json.loads(url1.read().decode())
df1 = pd.DataFrame(data1[0]["data"])

# Loading Dataset 2 and store in dataframe 'df2'
url2 = urllib.request.urlopen("http://dschr.de/api/handLabeled")
data2 = json.loads(url2.read().decode())
df2 = pd.DataFrame(data2[0]["data"])


df1, df2

(     timestamp            x           y         label
 0       102708   986.288075  508.004755      Fixation
 1       102781  1005.492167  495.522600      Fixation
 2       102842   942.353008  492.123891      Fixation
 3       102893   948.193646  474.714589      Fixation
 4       102943   938.728917  481.875697      Fixation
 5       103020   917.912747  506.359529       Saccade
 6       103071   884.942194  542.620793       Saccade
 7       103137   886.420614  545.920014  Unclassified
 8       103205   824.196518  553.490247  Unclassified
 9       103270   917.523887  578.276431       Saccade
 10      103320  1023.332480  635.139971      Fixation
 11      103403  1076.643626  654.778095      Fixation
 12      103454   918.386355  599.253187      Fixation
 13      103520   815.464265  525.277457       Saccade
 14      103593   773.356921  431.572552      Fixation
 15      103653   691.419014  313.519792      Fixation
 16      103704   629.121041  226.476961      Fixation
 17      1

**4.2. Pre-process data**

In [25]:
# setting timestamps in both dataframes to start at 0
df1['timestamp'] = df1['timestamp'].apply(lambda x: x - df1['timestamp'][0]) 
df2['timestamp'] = df2['timestamp'].apply(lambda x: x - df2['timestamp'][0])

#Convert dataframes to numpy arrays
array1 = df1.to_numpy()
array2 = df2.to_numpy()

array1, array2

(array([[0, 986.2880749379, 508.0047550332, 'Fixation'],
        [73, 1005.4921671685, 495.5226000186, 'Fixation'],
        [134, 942.3530079831, 492.1238913635, 'Fixation'],
        ...,
        [60447, 732.960154917, 276.0373709741, 'Fixation'],
        [60531, 635.0750563979, 295.3095717554, 'Saccade'],
        [60581, 618.0751626144, 313.9583669145, 'Fixation']], dtype=object),
 array([[0, 739.0239530773961, 417.4759015313773, 'fixation'],
        [38, 707.4810489651718, 444.2107367829889, 'fixation'],
        [82, 713.9262252699162, 445.593757589654, 'fixation'],
        ...,
        [36541, 674.5909449373224, 369.15619639448806, 'saccade'],
        [36582, 671.6680822153079, 416.0326625889876, 'fixation'],
        [36619, 684.2615176944689, 428.94640490405294, 'fixation']],
       dtype=object))

**4.3. Configuring Models & Hyperparameters**

In [26]:
# Setup DBSCAN classifier
eps_dbscan=150
min_samples_dbscan=5

clf_dbscan = DBSCAN(eps=eps_dbscan, min_samples=min_samples_dbscan, metric='euclidean', algorithm='auto', leaf_size=30, p=2, n_jobs=1)

# Setup ST-DBSCAN classifier
eps_stdbscan=70
eps2_stdbscan=250
min_samples_stdbscan=5

clf_st_dbscan = ST_DBSCAN(eps1=eps_stdbscan, eps2=eps2_stdbscan, min_samples=min_samples_stdbscan) 

# Setup OPTICS classifier
xi_optics = 0.05
max_eps_optics = 180
min_cluster_size_optics = 5
min_samples_optics = 4

clf_optics = OPTICS(cluster_method = 'xi', xi= xi_optics, max_eps= max_eps_optics, min_cluster_size = min_cluster_size_optics, min_samples=min_samples_optics, metric='euclidean', algorithm='auto', p=2)

# Setup ST-OPTICS classifier
xi_stoptics = 0.2
eps2_stoptics = 250
min_samples = 4

clf_st_optics = ST_OPTICS(xi = 0.08, eps2 = 250, min_samples = 5)



**4.4. Train model with Dataset 1 and predict labels**

##### 4.4.1 DBSCAN

In [38]:
clf_dbscan.fit(df1.iloc[:,:3])

labels_pred_dbscan = clf_dbscan.labels_

for i in range(labels_pred_dbscan.size):
    if labels_pred_dbscan[i] >= 0:
        labels_pred_dbscan[i] = 1

          
%matplotlib widget
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df1.iloc[:,1],df1.iloc[:,2],df1.iloc[:,0], c=labels_pred_dbscan)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Timestamp')


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'Timestamp')

##### 4.4.2 ST-DBSCAN

In [37]:
clf_st_dbscan.fit(df1.iloc[:,:3])

labels_pred_stdbscan = clf_st_dbscan.labels

for i in range(labels_pred_stdbscan.size):
    if labels_pred_stdbscan[i] >= 0:
        labels_pred_stdbscan[i] = 1

          
%matplotlib widget
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df1.iloc[:,1],df1.iloc[:,2],df1.iloc[:,0], c=labels_pred_stdbscan)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Timestamp')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'Timestamp')

##### 4.4.3 OPTICS

In [39]:

labels_pred_optics = clf_optics.fit_predict(df1.iloc[:,:3])

# Encoding the labels to [0,1]
for i in range(labels_pred_optics.size):
    if labels_pred_optics[i] >= 0:
        labels_pred_optics[i] = 1

%matplotlib widget
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df1.iloc[:,1],df1.iloc[:,2],df1.iloc[:,0], c=labels_pred_optics)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Timestamp')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'Timestamp')

##### 4.4.4 ST-OPTICS

In [40]:
clf_st_optics.fit(df1.iloc[:,:3])

labels_pred_stoptics = clf_st_optics.labels

for i in range(labels_pred_stoptics.size):
    if labels_pred_stoptics[i] >= 0:
        labels_pred_stoptics[i] = 1

%matplotlib widget
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df1.iloc[:,1],df1.iloc[:,2],df1.iloc[:,0], c=labels_pred_stoptics)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Timestamp')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'Timestamp')

### 5. Evaluate models

   **5.1 Silhouette Scores**

In [31]:
# Silhouette score DBSCAN
ss_dbscan = silhouette_score(df1.iloc[:,:3],labels_pred_dbscan)

# Silhouette score ST-DBSCAN
ss_stdbscan = silhouette_score(df1.iloc[:,:3],labels_pred_stdbscan)

# Silhouette score OPTICS
ss_optics = silhouette_score(df1.iloc[:,:3],labels_pred_optics)

# Silhouette score ST-OPTICS
ss_stoptics = silhouette_score(df1.iloc[:,:3],labels_pred_stoptics)
print(f"Silhouette scores:\nDBSCAN: {ss_dbscan}\nST-DBSCAN: {ss_stdbscan}\nOPTICS: {ss_optics}\nST-OPTICS: {ss_stoptics}")

Silhouette scores:
DBSCAN: 0.004875167353651087
ST-DBSCAN: 0.017654084873375348
OPTICS: 0.007203188751228375
ST-OPTICS: -0.01743249133026968


   **5.2 Accuracy in respect to hand-labeled data from Dataset 2**

In [32]:
#Preprocessing Dataset 2
x = df2.iloc[:,1].to_numpy()
y = df2.iloc[:,2].to_numpy()
ts = df2.iloc[:,0].to_numpy()
labels = df2.iloc[:,3].to_numpy()

predics = labels

# Decoding labels to binary values

for i in range(predics.size):
    if predics[i] == "fixation":
        predics[i] = 1
    if predics[i] == "saccade":
        predics[i] = 0


        
# Converting labels to integers
y_true = predics.astype(int)

# Fitting to DBSCAN with Dataset 2
clf_dbscan.fit(df2.iloc[:,:3])

y_pred_dbscan = clf_dbscan.labels_


for i in range(y_pred_dbscan.size):
    if y_pred_dbscan[i] >= 0:
        y_pred_dbscan[i] = 1
    elif y_pred_dbscan[i] <= 0:
        y_pred_dbscan[i] = 0

        
# Fitting to ST-DBSCAN with Dataset 2
clf_st_dbscan.fit(df2.iloc[:,:3])

y_pred_stdbscan = clf_st_dbscan.labels


for i in range(y_pred_stdbscan.size):
    if y_pred_stdbscan[i] >= 0:
        y_pred_stdbscan[i] = 1
    elif y_pred_stdbscan[i] <= 0:
        y_pred_stdbscan[i] = 0


# Fitting to OPTICS with Dataset 2
clf_optics.fit(df2.iloc[:,:3])

y_pred_optics = clf_optics.labels_


for i in range(y_pred_optics.size):
    if y_pred_optics[i] >= 0:
        y_pred_optics[i] = 1
    elif y_pred_optics[i] <= 0:
        y_pred_optics[i] = 0



# Fitting to ST-OPTICS with Dataset 2
clf_st_optics.fit(df2.iloc[:,:3])

y_pred_stoptics = clf_st_optics.labels


for i in range(y_pred_stoptics.size):
    if y_pred_stoptics[i] >= 0:
        y_pred_stoptics[i] = 1
    elif y_pred_stoptics[i] <= 0:
        y_pred_stoptics[i] = 0


# Calculating and printing accuracacy scores for each model type        
acc_dbscan = accuracy_score(y_true, y_pred_dbscan)
acc_stdbscan = accuracy_score(y_true, y_pred_stdbscan)
acc_optics = accuracy_score(y_true, y_pred_optics)
acc_stoptics = accuracy_score(y_true, y_pred_stoptics)


print("Accuracy of Clustering with DBSCAN:", acc_dbscan*100)
print("Accuracy of Clustering with ST-DBSCAN:", acc_stdbscan*100)
print("Accuracy of Clustering with OPTICS:", acc_optics*100)
print("Accuracy of Clustering with ST-OPTICS:", acc_stoptics*100)
print("\n")

# Confusion matrices for each model
cm_dbscan = confusion_matrix(y_true, y_pred_dbscan)
cm_stdbscan = confusion_matrix(y_true, y_pred_stdbscan)
cm_optics = confusion_matrix(y_true, y_pred_optics)
cm_stoptics = confusion_matrix(y_true, y_pred_stoptics)

cm_dbscan
cm_stdbscan
cm_optics
cm_stoptics
      
print("CM with DBSCAN:\n", cm_dbscan)
print("CM with ST-DBSCAN:\n", cm_stdbscan)
print("CM with OPTICS:\n", cm_optics)
print("CM with ST-OPTICS:\n", cm_stoptics)


# tn, fp, 
# fn, tp

Accuracy of Clustering with DBSCAN: 78.95362663495838
Accuracy of Clustering with ST-DBSCAN: 85.01783590963139
Accuracy of Clustering with OPTICS: 71.70035671819262
Accuracy of Clustering with ST-OPTICS: 75.38644470868014


CM with DBSCAN:
 [[ 97 174]
 [  3 567]]
CM with ST-DBSCAN:
 [[158 113]
 [ 13 557]]
CM with OPTICS:
 [[190  81]
 [157 413]]
CM with ST-OPTICS:
 [[186  85]
 [122 448]]


### 6. Conclusions 

Silhouette scores are not a good metric for measuring the clustering, because the Silhouette score measures the difference between intra-cluster and nearest clusters distances. However, fixations do often appear in immediate vicinity to eachother, rendering silhouette scores a less accurate metric for comparison of the models. 
However, applying all four models to Dataset 2 (hand-labeled) shows that the labeling done by ST-DBSCAN give an accuracy score of 85.02%, and ST-OPTICS gives 75.38%. The high-performance of the latter is not reflected in the Silhouette scores. 
It is worthwhile to note that ST-DBSCAN uses a fixed euclidean distance metric for the spatial and temporal data, whereas ST-OPTICS uses a range of euclidean spatial distance values in combination with a fixed temporal distance value. In addition, the data from Dataset 2 shows less spatial variability than "wild" data. 

In [33]:
%matplotlib widget
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df2.iloc[:,1],df2.iloc[:,2],df2.iloc[:,0], c=y_pred_stdbscan)
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Timestamp')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 0, 'Timestamp')

### 7. References

1. Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.
2. Birant, Derya, and Alp Kut. (2007) "ST-DBSCAN: An algorithm for clustering spatial–temporal data." Data & Knowledge Engineering 60.1: 208-221.
3. Ester, M., H. P. Kriegel, J. Sander, and X. Xu, (1996) "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise". In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231.
4. Peca, I., Fuchs, G., Vrotsou, K., Andrienko, N. V., & Andrienko, G. L. (2012). Scalable Cluster Analysis of Spatial Events. In EuroVA@ EuroVis.
5. Schreiber, D. (2020) "Klassifizierung von okulomotorischen Ereignissen (Fixierungen und Sakkaden)in webcam-basierten Eye-Tracking Daten". Martin-Luther-Univeristy Halle-Wittenberg. 
6. Zemblys, Raimondas, et al. (2018)"Using machine learning to detect events in eye-tracking data." Behavior research methods 50.1 : 160-181.

### 7. Annex

In [34]:
# Recreating Dataframe with labels from all models
pd.set_option("display.max_rows", None, "display.max_columns", None)


df2["DBSCAN-labels"] = y_pred_dbscan
df2["STDBSCAN-labels"] = y_pred_stdbscan
df2["OPTICS-labels"] = y_pred_optics
df2["STOPTICS-labels"] = y_pred_stoptics

In [35]:
df2

Unnamed: 0,timestamp,x,y,label,DBSCAN-labels,STDBSCAN-labels,OPTICS-labels,STOPTICS-labels
0,0,739.023953,417.475902,1,1,1,1,0
1,38,707.481049,444.210737,1,1,1,1,0
2,82,713.926225,445.593758,1,1,1,1,1
3,127,704.775582,469.488595,1,1,1,1,1
4,165,704.159921,476.988842,1,1,1,1,1
5,204,747.280255,447.978626,1,1,1,1,1
6,251,752.142984,492.622463,1,1,1,1,1
7,292,767.401839,512.817108,1,1,1,1,1
8,333,777.341754,508.626762,1,1,1,1,1
9,373,727.863724,502.614338,1,1,1,1,1
