# Deep Imputation for SKeleton Data (DISK) tutorial

**Author: France ROSE, @Measuring Behavior 2024, Aberdeen**

DISK addresses the problem of missing data in recordings of skeleton data, coming from video pose estimation or motion capture in 2D and 3D.
It relies on a *totally unsupervised* training framework and has been tested on 7 datasets of different species, number of keypoints and behavioral tasks.

**Training principle**

<center><img src="images/imputation_method_summary_wskeleton.png" width=700></center>


**Comparison of imputation error on the different datasets**
    
<center><img src="images/barplot_newmissing_compare_networks.png" width=800></center>

- Link to the preprint:
- Link to the githup repo: https://github.com/bozeklab/DISK.git

--- 
In this tutorial we will:
   - install DISK (takes about 15 minutes)
   - see the configuration files system used in DISK
   - launch a training of a model
   - use a pretrained network to visualize imputed samples
   - visualize the DISK learned representations via U-map
   - use a pretrained network to impute real gaps
   - discuss how to apply it on new data

---
*For google colab:* Go to "Runtime" ->"change runtime type"->select "Python3", and then select "GPU"

In [None]:
!git clone https://github.com/bozeklab/DISK.git cloned-DISK-repo
%cd cloned-DISK-repo
!ls

In [None]:
!pip install -r DISK/requirements.txt -e .

*For google colab:* When this is done, you may need to restart the colab runtime to complete the installation. Go to Runtime > Restart runtime

In [1]:
import DISK
import hydra
from hydra import compose, initialize
from omegaconf import OmegaConf
import os

%load_ext autoreload
%autoreload 2

basedir = '/home/france/Documents/cloned-DISK-repo'
%cd $basedir

/home/france/Documents/cloned-DISK-repo


---
# Train a DISK model

**Download test data**

Human motion capture data rom the CMU MoCap dataset (see http://mocap.cs.cmu.edu/), doanloaded from https://ericguo5513.github.io/action-to-motion/##data

<center><img src="notebooks/images/fig1_human_mocap_presentation.png" width=200>

In [25]:
!mkdir datasets
%cd datasets
!gdown https://drive.google.com/uc?id=1PXECUljc5qr8kz9H2LxT4LhS6P4uN4ck
!unzip Human_DISK_dataset.zip
%cd ../

mkdir: cannot create directory ‘datasets’: File exists
/home/france/Documents/cloned-DISK-repo/datasets
Downloading...
From (original): https://drive.google.com/uc?id=1PXECUljc5qr8kz9H2LxT4LhS6P4uN4ck
From (redirected): https://drive.google.com/uc?id=1PXECUljc5qr8kz9H2LxT4LhS6P4uN4ck&confirm=t&uuid=47656a83-4908-4838-b85a-bf5d79ca63e2
To: /home/france/Documents/cloned-DISK-repo/datasets/Human_DISK_dataset.zip
100%|██████████████████████████████████████| 1.26G/1.26G [01:20<00:00, 15.7MB/s]
Archive:  Human_DISK_dataset.zip
replace Mocap_keypoints_60_stride30_new/test_dataset_w-0-nans.npz? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C
/home/france/Documents/cloned-DISK-repo


**Change in `conf_missing.yaml`:**
- defaults.network=gru
- training.epochs=10
- training.print_every=3
- dataset.name=Mocap_keypoints_60_stride30_new
- dataset.skeleton_file=Mocap_keypoints_60_stride30_new/skeleton.py
- feed_data.transforms.add_missing.files=[Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv, \
                                          Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv]

**Comment on lists in the `config` files**

You can use either of the two following syntaxes:

- Dash + new line
```
files:
  - file1
  - file2
```

- Python-like list, items separated by a comma
```
files: [file1, file2]
```

**Run Training**

In [15]:
%run DISK/main_fillmissing.py

[2024-04-18 15:00:48,884][root][INFO] - basedir: /home/france/Documents/cloned-DISK-repo
[2024-04-18 15:00:48,885][root][INFO] - {'network': {'num_layers': 3, 'dropout': 0, 'type': 'GRU', 'size_layer': 512}, 'dataset': {'name': 'Mocap_keypoints_60_stride30_new', 'skeleton_file': 'Mocap_keypoints_60_stride30_new/skeleton.py'}, 'training': {'epochs': 10, 'batch_size': 32, 'learning_rate': 0.001, 'seed': False, 'load': None, 'n_cpus': 8, 'loss': {'type': 'l1', 'mask': True, 'factor': 100}, 'model_scheduler': {'type': 'lambdalr', 'steps_epoch': 500, 'rate': 0.95}, 'print_every': 3, 'mu_sigma': False, 'beta_mu_sigma': 0.5}, 'feed_data': {'mask': True, 'transforms': {'add_missing': {'pad': [1, 0], 'indep_keypoints': True, 'files': ['Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv', 'Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv']}, 'viewinvariant': False, 'normalize': False, 'normalizecube': True}, 'verbose': 0}}
[2024-04-18 15:00:48,925][root][INFO] - Device: cu

---
# Download a pretrain model

In [3]:
!mkdir models
%cd models
!gdown https://drive.google.com/uc?id=1b8Px-lbTddOrMZW9dozJPVLjxh0PjnTp
!unzip Human_transformer_proba.zip
!gdown https://drive.google.com/uc?id=1tGL8eyafpwJS7wdNGB5o_tuABd8qOBHB
!unzip Human_GRU.zip
%cd ../

mkdir: cannot create directory ‘models’: File exists
/home/france/Documents/cloned-DISK-repo/models
Downloading...
From: https://drive.google.com/uc?id=1b8Px-lbTddOrMZW9dozJPVLjxh0PjnTp
To: /home/france/Documents/cloned-DISK-repo/models/Human_transformer_proba.zip
100%|██████████████████████████████████████| 4.54M/4.54M [00:00<00:00, 24.7MB/s]
Archive:  Human_transformer_proba.zip
   creating: 03-10-24_transformer_NLL/
  inflating: 03-10-24_transformer_NLL/training_losses.txt  
  inflating: 03-10-24_transformer_NLL/main_fillmissing.log  
  inflating: 03-10-24_transformer_NLL/loss.svg  
  inflating: 03-10-24_transformer_NLL/model_epoch1470  
   creating: 03-10-24_transformer_NLL/.hydra/
  inflating: 03-10-24_transformer_NLL/.hydra/config.yaml  
  inflating: 03-10-24_transformer_NLL/.hydra/overrides.yaml  
  inflating: 03-10-24_transformer_NLL/.hydra/hydra.yaml  
  inflating: 03-10-24_transformer_NLL/loss_dark.svg  
Downloading...
From (original): https://drive.google.com/uc?id=1tGL8eyaf

# Use pretrain model to test imputation and generate plots 

**Change in `conf_test.yaml`:**
- hydra.run.dir: models/test_Human
- dataset.name=Mocap_keypoints_60_stride30_new
- dataset.skeleton_file=Mocap_keypoints_60_stride30_new/skeleton.py
- feed_data.transforms.add_missing = [Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv,
                                      Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv]
- evaluate.chepoints=[models/03-07-06_GRU, models/03-10-24_transformer_NLL]
- evaluate.n_plots=10


In [2]:
%run DISK/test_fillmissing.py

[DEBUG][19-Apr-24 16:16:53] Setting JobRuntime:name=UNKNOWN_NAME
[DEBUG][19-Apr-24 16:16:53] Setting JobRuntime:name=test_fillmissing


[2024-04-19 16:16:53,569][root][INFO] - [BASEDIR] /home/france/Documents/cloned-DISK-repo
[2024-04-19 16:16:53,569][root][INFO] - [OUTPUT DIR] /home/france/Documents/cloned-DISK-repo/models/test_Human
[2024-04-19 16:16:53,570][root][INFO] - {'dataset': {'name': 'Mocap_keypoints_60_stride30_new', 'stride': 30, 'skeleton_file': 'Mocap_keypoints_60_stride30_new/skeleton.py'}, 'feed_data': {'mask': True, 'transforms': {'add_missing': {'pad': [1, 1], 'files': ['Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv', 'Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv']}, 'viewinvariant': True, 'normalize': False, 'normalizecube': True}, 'verbose': 0}, 'evaluate': {'n_cpus': 6, 'batch_size': 8, 'checkpoints': ['models/03-07-06_GRU', 'models/03-10-24_transformer_NLL'], 'n_plots': 10, 'azim': 60, 'size': 2.5, 'only_holes': True, 'original_coordinates': False, 'suffix': '', 'name_items': [['network', 'type'], ['training', 'mu_sigma']], 'merge': True, 'merge_sets_file': '', 'n_

Iterating on batch:   0%|                               | 0/116 [00:00<?, ?it/s]

[2024-04-19 16:17:00,315][root][INFO] - Starting sample plots
[2024-04-19 16:18:12,694][root][INFO] - Done with sample plots


Iterating on batch:   1%|▏                    | 1/116 [01:12<2:19:47, 72.93s/it]

[2024-04-19 16:18:13,092][root][INFO] - Starting sample plots
[2024-04-19 16:19:25,869][root][INFO] - Done with sample plots


Iterating on batch: 100%|█████████████████████| 116/116 [03:01<00:00,  1.57s/it]

[2024-04-19 16:20:01,634][root][INFO] - Finished with iterating the dataset
[2024-04-19 16:20:01,659][root][INFO] - n lines in result df: 13447
[2024-04-19 16:20:01,665][root][INFO] - RMSE per sample averaged: 
method_param
linear_interp                     0.232473
type-GRU_mu_sigma-False           0.029953
type-transformer_mu_sigma-True    0.021518
Name: RMSE, dtype: Float64





[2024-04-19 16:20:14,096][root][INFO] - Model type-transformer_mu_sigma-True: PEARSONR COEFF 0.9178670134396143, PVAL 0.0


---

In the output log `test_fillmissing.log`, you can find 2 interesting informations:
- Number of tested samples: `n lines in result df: 4626`
- Averaged normalized RMSE for each tested method on these N samples:
```
RMSE per sample averaged: 
method_param
linear_interp                     0.232473
type-GRU_mu_sigma-False           0.029953
type-transformer_mu_sigma-True    0.021518
Name: RMSE, dtype: Float64
```

Runing the test scripts created a few files:
- In `models/test_Human`
  - `barplot_comparison_RMSE_...png` gives the RMSE per keypoint for the compared methods
  ⁻  `comparison_length_hole_[all/kp]_vs_RMSE_...png` gives the RMSE wrt the length of the gap for the compared methods. The difference between the 2 plots are when a sample has multiple gaps, then "all" averages the RMSE per sample, while "kp" averages the RMSE per hole
  - `mean_RMSE.csv` reports what was in the log, i.e. the "RMSE per sample averaged"
  - `total_RMSE_...csv` reports the error on every gap of every sample, and can be used to further analyze the performance of the models
  - if a model with proba head was tested, then `corrplot...png` correlationg between the estimated error per sample and the real error made by the imputation, and `thresholding_curve...png` which shows how the RMSE in the imputed samples varies with the number of remaining samples
- In `models/03-07-06_GRU/test/visualize_prediction_val`, plots with example of imputations

**Comparison of RMSE**

<center><img src="notebooks/images/human_comparison_length_hole_all_vs_RMSE_repeat-0.png" width=500></center>

**Example of imputation**

<center><img src="notebooks/images/human_reconstruction_xyz_0_repeat-0.png" width=500>

# U-map of the sequences

In [4]:
%run DISK/embedding_umap.py --batch_size 1 --checkpoint_folder models/03-10-24_transformer_NLL --stride 60 --dataset_path /home/france/Documents/cloned-DISK-repo

[INFO][02-May-24 11:37:05] Loaded skeleton with links [(0, 1), (1, 2), (2, 3), (0, 16), (16, 17), (17, 18), (18, 19), (0, 12), (12, 13), (13, 14), (14, 15), (1, 8), (8, 9), (9, 10), (10, 11), (1, 4), (4, 5), (5, 6), (6, 7)] and colors ['orange', 'orange', 'orange', 'gold', 'gold', 'gold', 'gold', 'grey', 'grey', 'grey', 'grey', 'cornflowerblue', 'cornflowerblue', 'cornflowerblue', 'cornflowerblue', 'turquoise', 'turquoise', 'turquoise', 'turquoise']
[INFO][02-May-24 11:37:05] Loading datasets...
[INFO][02-May-24 11:37:07] Device: cuda
[INFO][02-May-24 11:37:07] Loading transformer model...
[INFO][02-May-24 11:37:07] Network constructed
[INFO][02-May-24 11:37:07] Loading with epoch = 1470
[INFO][02-May-24 11:37:07] Loading with ave_loss_train = -92.11919800145033
[INFO][02-May-24 11:37:07] Loading with ave_rmse_train = 0.024657995028068144
[INFO][02-May-24 11:37:07] Loading with ave_loss_eval = -90.12528375478891
[INFO][02-May-24 11:37:07] Loading with ave_rmse_eval = 0.0245952966551368

(2000, 2)


[INFO][02-May-24 11:37:35] Done with train hidden representation...
Extract hidden: 100%|#########################| 434/434 [00:07<00:00, 61.20it/s]
[INFO][02-May-24 11:37:42] Done with val hidden representation...
[INFO][02-May-24 11:37:42] hidden eval vectors (434, 153600)
[INFO][02-May-24 11:37:42] hidden train vectors (2000, 153600)
[INFO][02-May-24 11:37:42] columns: ['file', 'action', 'action_str', 'movement', 'upside_down', 'speed_xy', 'speed_z', 'average_height', 'back_length', 'dist_barycenter_shoulders', 'height_shoulders', 'angleXY_shoulders', 'dist_bw_knees', 'dist_knees_shoulders', 'angle_back_base']
[INFO][02-May-24 11:37:42] Computing the umap projection


(434, 2)


[INFO][02-May-24 11:39:14] Finished projecting
[INFO][02-May-24 11:39:16] Finished projecting on the train
[INFO][02-May-24 11:39:54] Finished projecting on the eval
[INFO][02-May-24 11:39:54] Apply k-means...
[INFO][02-May-24 11:42:17] drawing umap with colors = file
[INFO][02-May-24 11:42:17] drawing umap with colors = action
[INFO][02-May-24 11:42:17] drawing umap with colors = action_str
[INFO][02-May-24 11:42:17] drawing umap with colors = movement
[INFO][02-May-24 11:42:17] drawing umap with colors = upside_down
[INFO][02-May-24 11:42:17] drawing umap with colors = speed_xy
[INFO][02-May-24 11:42:17] drawing umap with colors = speed_z
[INFO][02-May-24 11:42:18] drawing umap with colors = average_height
[INFO][02-May-24 11:42:18] drawing umap with colors = back_length
[INFO][02-May-24 11:42:18] drawing umap with colors = dist_barycenter_shoulders
[INFO][02-May-24 11:42:18] drawing umap with colors = height_shoulders
[INFO][02-May-24 11:42:18] drawing umap with colors = angleXY_sho

# Impute the dataset

Human has no missing data, so to demonstrate the imputation, we will use the Rat dataset from https://www.nature.com/articles/s41592-021-01106-6

The Rat dataset consists in five 3D motion capture recordings from rats placed in a transparent circular arena for minutes to hours.

Figure from Dunn et al. 2021:
<center><img src="notebooks/images/dunn_fig2.webp" width=700></center>

In [10]:
%cd datasets
!gdown https://drive.google.com/uc?id=1dpgBqqdwHWN4fcUzaeVt_Sq1wH-e4lhK
!unzip Rat7M_seq_DISK_dataset.zip
%cd ../

/home/france/Documents/cloned-DISK-repo/datasets
Downloading...
From (original): https://drive.google.com/uc?id=1dpgBqqdwHWN4fcUzaeVt_Sq1wH-e4lhK
From (redirected): https://drive.google.com/uc?id=1dpgBqqdwHWN4fcUzaeVt_Sq1wH-e4lhK&confirm=t&uuid=75b8541c-48f8-4bbb-a38b-2675ac682796
To: /home/france/Documents/cloned-DISK-repo/datasets/Rat7M_seq_DISK_dataset.zip
100%|████████████████████████████████████████| 341M/341M [00:22<00:00, 14.9MB/s]
Archive:  Rat7M_seq_DISK_dataset.zip
   creating: DANNCE_seq_keypoints_60_stride30_fill10/
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/count_vs_keypoint_DANNCE_seq_keypoints_60_stride30_fill10.svg  
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/hist_length_per_keypoint_DANNCE_seq_keypoints_60_stride30_fill10.svg  
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/hist_length_original_vs_fake_DANNCE_seq_keypoints_60_stride30_fill10.png  
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/val_dataset_w-0-nans.npz  
  inflating: DANN

In [1]:
%cd models
!gdown https://drive.google.com/uc?id=1hbEkwTI2ir0T54UywVv4r9GbzfgveCaC
!unzip Rat7M_transformer_proba.zip
%cd ../

/home/france/Documents/cloned-DISK-repo/models
Downloading...
From: https://drive.google.com/uc?id=1hbEkwTI2ir0T54UywVv4r9GbzfgveCaC
To: /home/france/Documents/cloned-DISK-repo/models/Rat7M_transformer_proba.zip
100%|██████████████████████████████████████| 9.04M/9.04M [00:00<00:00, 14.8MB/s]
Archive:  Rat7M_transformer_proba.zip
   creating: 05-12-23_transformer_NLL/
  inflating: 05-12-23_transformer_NLL/training_losses.txt  
  inflating: 05-12-23_transformer_NLL/model_last_epoch1500  
  inflating: 05-12-23_transformer_NLL/main_fillmissing.log  
  inflating: 05-12-23_transformer_NLL/loss.svg  
   creating: 05-12-23_transformer_NLL/.hydra/
  inflating: 05-12-23_transformer_NLL/.hydra/config.yaml  
  inflating: 05-12-23_transformer_NLL/.hydra/overrides.yaml  
  inflating: 05-12-23_transformer_NLL/.hydra/hydra.yaml  
  inflating: 05-12-23_transformer_NLL/model_epoch1340  
  inflating: 05-12-23_transformer_NLL/loss_dark.svg  
/home/france/Documents/cloned-DISK-repo


---
Change in conf_impute.yaml:

    hydra.run.dir: models/impute_Rat7M
    dataset.name=DANNCE_seq_keypoints_60_stride30_fill10
    dataset.skeleton_file=DANNCE_seq_keypoints_60_stride30_fill10/skeleton.py
    evaluate.chepoint=models/05-12-23_transformer_NLL
    evaluate.n_cpus=1


In [24]:
%run DISK/impute_dataset.py

[2024-05-03 14:22:31,946][root][INFO] - [BASEDIR] /home/france/Documents/cloned-DISK-repo
[2024-05-03 14:22:31,947][root][INFO] - [OUTPUT DIR] /home/france/Documents/cloned-DISK-repo/models/impute_Rat7M
[2024-05-03 14:22:31,947][root][INFO] - {'dataset': {'name': 'DANNCE_seq_keypoints_60_stride30_fill10', 'skeleton_file': 'DANNCE_seq_keypoints_60_stride30_fill10/skeleton.py'}, 'feed_data': {'verbose': 0, 'pad': [1, 1], 'batch_size': 8}, 'evaluate': {'checkpoint': 'models/05-12-23_transformer_NLL', 'n_cpus': 1, 'n_plots': 20, 'save': True, 'save_dataset': True, 'only_holes': True, 'threshold_error_score': 0.1, 'suffix': None, 'name_items': [['network', 'type']]}}
[2024-05-03 14:22:31,948][root][INFO] - Loaded skeleton with links [(0, 1), (0, 2), (1, 2), (1, 3), (3, 4), (4, 5), (3, 6), (4, 6), (4, 7), (6, 7), (5, 7), (5, 8), (5, 9), (3, 12), (12, 10), (11, 10), (3, 13), (13, 14), (14, 15), (9, 16), (16, 19), (8, 17), (17, 18)] and colors ['orange', 'orange', 'orange', 'gold', 'gold', 'go

100%|█████████████████████████████████████████████| 4/4 [00:19<00:00,  4.96s/it]

[2024-05-03 14:22:52,534][root][INFO] - Found 28311 imputable timepoints over the 271707 total missing timepoints (10.4 %)
[2024-05-03 14:22:52,535][root][INFO] - Lengths of imputable segments (25th, 50th, 75th percentiles): [40. 59. 59.]





[2024-05-03 14:22:52,854][root][INFO] - Loaded skeleton with links [(0, 1), (0, 2), (1, 2), (1, 3), (3, 4), (4, 5), (3, 6), (4, 6), (4, 7), (6, 7), (5, 7), (5, 8), (5, 9), (3, 12), (12, 10), (11, 10), (3, 13), (13, 14), (14, 15), (9, 16), (16, 19), (8, 17), (17, 18)] and colors ['orange', 'orange', 'orange', 'gold', 'gold', 'gold', 'grey', 'grey', 'grey', 'grey', 'grey', 'gold', 'gold', 'cornflowerblue', 'cornflowerblue', 'cornflowerblue', 'turquoise', 'turquoise', 'turquoise', 'hotpink', 'hotpink', 'purple', 'purple']


100%|█████████████████████████████████████████████| 2/2 [00:10<00:00,  5.27s/it]

[2024-05-03 14:23:03,394][root][INFO] - Found 33672 imputable timepoints over the 126823 total missing timepoints (26.6 %)
[2024-05-03 14:23:03,395][root][INFO] - Lengths of imputable segments (25th, 50th, 75th percentiles): [39. 52. 59.]





[2024-05-03 14:23:03,618][root][INFO] - Loaded skeleton with links [(0, 1), (0, 2), (1, 2), (1, 3), (3, 4), (4, 5), (3, 6), (4, 6), (4, 7), (6, 7), (5, 7), (5, 8), (5, 9), (3, 12), (12, 10), (11, 10), (3, 13), (13, 14), (14, 15), (9, 16), (16, 19), (8, 17), (17, 18)] and colors ['orange', 'orange', 'orange', 'gold', 'gold', 'gold', 'grey', 'grey', 'grey', 'grey', 'grey', 'gold', 'gold', 'cornflowerblue', 'cornflowerblue', 'cornflowerblue', 'turquoise', 'turquoise', 'turquoise', 'hotpink', 'hotpink', 'purple', 'purple']


100%|█████████████████████████████████████████████| 1/1 [00:04<00:00,  4.45s/it]

[2024-05-03 14:23:08,070][root][INFO] - Found 16810 imputable timepoints over the 59928 total missing timepoints (28.1 %)
[2024-05-03 14:23:08,071][root][INFO] - Lengths of imputable segments (25th, 50th, 75th percentiles): [38. 46. 59.]



Iterating on batch: 100%|█████████████████████| 178/178 [03:10<00:00,  1.07s/it]

[2024-05-03 14:26:18,496][root][INFO] - test, dataset_path = /home/france/Documents/cloned-DISK-repo/datasets/DANNCE_seq_keypoints_60_stride30_fill10



Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/france/Documents/cloned-DISK-repo/DISK/impute_dataset.py", line 344, in evaluate
    print(dataset.input_files)
AttributeError: 'ImputeDataset' object has no attribute 'input_files'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.


AttributeError: 'tuple' object has no attribute 'tb_frame'

---
# How to prepare your own dataset

## Create a *DISK dataset* from your data

You would need to use he `create_dataset.py` script with the companion `conf_create_dataset.yaml` file.
The first step is to load your own data.
Currently supported formats are:
- .h5 extracted from SLEAP (see https://sleap.ai/develop/tutorials/analysis.html)
- .csv with column names 'keypoint1_x', 'keypoint1_y', ...
<center><img src="notebooks/images/csv_input_format.png" width=300></center>
- .npy matrix of shape (but no keypoint names)
- .mat exported from Qualisys software
- .pkl data (used for Drosophila DF3D dataset)
- others? You can write a small open function (as here https://github.com/bozeklab/DISK/blob/main/DISK/create_dataset.py#L76) or **open a github issue with a link to an example file so we write it for you**

## How to choose the parameters to create your own dataset
- length of sequence / stride
- fill
- Link to FAQ (https://github.com/bozeklab/DISK/blob/main/FAQ.md)