# Working with Local Dataset

In this tutorial, we will show how to use your own local dataset with the Dataset class. The Dataset class can help you to manage and process your eyetracking data.

For demonstration purposes, we will use the raw data provided by the Toy dataset, a sample dataset that comes with pymovements.

In [2]:
import pymovements as pm

toy_dataset = pm.datasets.ToyDataset(
    root='data/',
    download=True,
    extract=True,
    remove_finished=True,
)

Downloading http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/ to data/ToyDataset/downloads/pymovements-toy-dataset.zip


pymovements-toy-dataset.zip: 0.00B [00:00, ?B/s]

## Define your Experiment

To use the Dataset class, we first need to create an Experiment instance. This class represents the properties of the experiment, such as the screen dimensions and sampling rate.

In [3]:
experiment = pm.gaze.Experiment(
    screen_width_px=1280,
    screen_height_px=1024,
    screen_width_cm=38,
    screen_height_cm=30.2,
    distance_cm=68,
    origin='lower left',
    sampling_rate=1000,
)

## Parameters for File Parsing

We also define a `filename_regex` which is a regular expression used to match and extract values from filenames of data files in the dataset. For example, `r'trial_(?P<text_id>\d+)_(?P<page_id>\d+).csv'` will match filenames that follow the pattern `trial_{text_id}_{page_id}.csv` and extract the values of `text_id` and `page_id` for each file.

In [4]:
filename_regex = r'trial_(?P<text_id>\d+)_(?P<page_id>\d+).csv'

Both values of `text_id` and `page_id` are numeric. We can use a map to define the casting of these values.

In [5]:
filename_regex_dtypes = {
    'text_id': int,
    'page_id': int,
}

We can also adjust how the CSV files are read.

The `column_map` dictionary maps the original column names in the CSV files to the desired column names. Here the original column names are 'timestamp', **'x'**, and **'y'**, and the desired column names are **'time'**, **'x_right_pix'**, and **'y_right_pix'**, respectively.

In [6]:
column_map = {
    'timestamp': 'time',
    'x': 'x_right_pix',
    'y': 'y_right_pix',
}

Here, we specify that the separator in the CSV files is a tab (**'\t'**), and we provide the list of original column names and desired column names as the **'columns'** and **'new_columns'** parameters, respectively.

In [7]:
read_csv_kwargs = {
    'sep': '\t',
    'columns': list(column_map.keys()),
    'new_columns': list(column_map.values()),
}

## Define and load the Dataset

Finaly we create a **Dataset** instance by passing in the root directory, Experiment instance, and other optional parameters such as the filename regular expression and custom CSV reading parameters. The `dataset_dirname`, `raw_dirname`, `preprocessed_dirname`, and `events_dirname` parameters define the names of the directories for the dataset, raw data, preprocessed data, and events data, respectively.

In [8]:
# Define the path to the dataset directory
dataset_dir = './data/ToyDataset/'

# Set up the Dataset object
dataset = pm.datasets.Dataset(
    root=dataset_dir,
    experiment=experiment,
    filename_regex=filename_regex,
    filename_regex_dtypes=filename_regex_dtypes,
    custom_read_kwargs=read_csv_kwargs,
    dataset_dirname='.',
    raw_dirname='raw',
    preprocessed_dirname='preprocessed',
    events_dirname='events',
)

Now we can load the dataset. Here we select a subset including the first page of texts with ID 1 and 2.

In [9]:
subset = {
    'text_id': [1, 2],
    'page_id': 1,
}

dataset.load(subset=subset)

  0%|          | 0/2 [00:00<?, ?it/s]

# Use the Dataset

Once we have created the Dataset instance, we can use its methods to preprocess and analyze data in our local dataset.

In [18]:
dataset.gaze[0].frame

text_id,page_id,time,x_right_pix,y_right_pix,x_right_pos,y_right_pos,x_right_vel,y_right_vel
i64,i64,f64,f64,f64,f64,f64,f64,f64
1,1,2.415266e6,176.8,140.2,-11.420403,-9.148145,-5.235971,-13.666945
1,1,2.415267e6,176.7,139.8,-11.422806,-9.157834,-3.004237,-9.630308
1,1,2.415268e6,176.7,139.3,-11.422806,-9.169943,-0.772503,-5.59367
1,1,2.415269e6,176.6,139.3,-11.42521,-9.169943,1.459231,-1.557032
1,1,2.41527e6,176.7,139.3,-11.422806,-9.169943,4.034446,1.556983
1,1,2.415271e6,176.8,139.5,-11.420403,-9.1651,6.695697,3.459956
1,1,2.415272e6,177.3,139.8,-11.408386,-9.157834,7.983442,3.20046
1,1,2.415273e6,177.8,140.0,-11.396367,-9.15299,6.78167,3.200507
1,1,2.415274e6,178.3,140.0,-11.384348,-9.15299,3.948804,2.941092
1,1,2.415275e6,178.3,139.9,-11.384348,-9.155412,0.343335,3.460254


Here we use the `pix2deg` method to convert the pixel coordinates to degrees of visual angle.

In [14]:
dataset.pix2deg()

dataset.gaze[0].frame

  0%|          | 0/2 [00:00<?, ?it/s]

text_id,page_id,time,x_right_pix,y_right_pix,x_right_pos,y_right_pos,x_right_vel,y_right_vel
i64,i64,f64,f64,f64,f64,f64,f64,f64
1,1,2.415266e6,176.8,140.2,-11.420403,-9.148145,-5.235971,-13.666945
1,1,2.415267e6,176.7,139.8,-11.422806,-9.157834,-3.004237,-9.630308
1,1,2.415268e6,176.7,139.3,-11.422806,-9.169943,-0.772503,-5.59367
1,1,2.415269e6,176.6,139.3,-11.42521,-9.169943,1.459231,-1.557032
1,1,2.41527e6,176.7,139.3,-11.422806,-9.169943,4.034446,1.556983
1,1,2.415271e6,176.8,139.5,-11.420403,-9.1651,6.695697,3.459956
1,1,2.415272e6,177.3,139.8,-11.408386,-9.157834,7.983442,3.20046
1,1,2.415273e6,177.8,140.0,-11.396367,-9.15299,6.78167,3.200507
1,1,2.415274e6,178.3,140.0,-11.384348,-9.15299,3.948804,2.941092
1,1,2.415275e6,178.3,139.9,-11.384348,-9.155412,0.343335,3.460254


We can use the `pos2vel` method to calculate the velocity of the gaze position.

In [15]:
dataset.pos2vel(method='savitzky_golay', window_length=7, polyorder=2)

dataset.gaze[0].frame

  0%|          | 0/2 [00:00<?, ?it/s]

text_id,page_id,time,x_right_pix,y_right_pix,x_right_pos,y_right_pos,x_right_vel,y_right_vel
i64,i64,f64,f64,f64,f64,f64,f64,f64
1,1,2.415266e6,176.8,140.2,-11.420403,-9.148145,-5.235971,-13.666945
1,1,2.415267e6,176.7,139.8,-11.422806,-9.157834,-3.004237,-9.630308
1,1,2.415268e6,176.7,139.3,-11.422806,-9.169943,-0.772503,-5.59367
1,1,2.415269e6,176.6,139.3,-11.42521,-9.169943,1.459231,-1.557032
1,1,2.41527e6,176.7,139.3,-11.422806,-9.169943,4.034446,1.556983
1,1,2.415271e6,176.8,139.5,-11.420403,-9.1651,6.695697,3.459956
1,1,2.415272e6,177.3,139.8,-11.408386,-9.157834,7.983442,3.20046
1,1,2.415273e6,177.8,140.0,-11.396367,-9.15299,6.78167,3.200507
1,1,2.415274e6,178.3,140.0,-11.384348,-9.15299,3.948804,2.941092
1,1,2.415275e6,178.3,139.9,-11.384348,-9.155412,0.343335,3.460254
