# Saving and Loading Preprocessed Data

## What you will learn in this tutorial:

* how to save your preprocessed data
* how to load your preprocessed data

## Preparations

We import `pymovements` as the alias `pm` for convenience.

In [None]:
import pymovements as pm

Let's start by downloading our `ToyDataset` and loading in its data:

In [None]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()

Now let's load in the data and do some preprocessing:

In [None]:
dataset.pix2deg()
dataset.pos2vel()

dataset.gaze[0].frame.head()

We have now added some additional columns for degrees in visual angle and velocity.

## Saving

Saving your preprocessed data is as simple as:

In [None]:
dataset.save_preprocessed()

All of the preprocessed data is saved into this directory:

In [None]:
dataset.paths.preprocessed

Let's confirm it by printing all the new files in this directory:

In [None]:
print(list(dataset.paths.preprocessed.glob('*/*/*')))

All of the files have been saved into the `Dataset.paths.preprocessed` as `feather` files.

If we want to save the data into an alternative directory and also use a different file format like `csv` we can use the following:

In [None]:
dataset.save_preprocessed(preprocessed_dirname='preprocessed_csv', extension='csv')

Let's confirm again by printing all the new files in this alternative directory:

In [None]:
alternative_dirpath = dataset.path / 'preprocessed_csv'
print(list(alternative_dirpath.glob('*/*/*')))

## Loading

Now let's imagine that this preprocessing and saving was done in another file and we only want to load the preprocessed data.

We simulate this by initializing a new dataset object. We don't need to download any additional data.

In [None]:
events_dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')

The preprocessed data can now simply be loaded by setting `preprocessed` to `True`:

In [None]:
events_dataset.load(preprocessed=True)

events_dataset.gaze[0].frame.head()

By default, the `preprocessed` directory and the `feather` extension will be chosen.

In case of alternative directory names or other file formats you can use the following:

In [None]:
events_dataset.load(
    preprocessed=True,
    preprocessed_dirname='preprocessed_csv',
    extension='csv',
)
events_dataset.gaze[0].frame.head()

## What you have learned in this tutorial:

* saving your preprocesed data using `Dataset.save_preprocessed()`
* load your preprocesed data using `Dataset.load(preprocessed=True)`
* using custom directory names by specifying `preprocessed_dirname`
* using other file formats than the default `feather` format by specifying `extension`