# Deep Imputation for SKeleton Data (DISK) tutorial

**Author: France ROSE, @Measuring Behavior 2024, Aberdeen**

DISK addresses the problem of missing data in recordings of skeleton data, coming from video pose estimation or motion capture in 2D and 3D.
It relies on a *totally unsupervised* training framework and has been tested on 7 datasets of different species, number of keypoints and behavioral tasks.

**Training principle**

<center><img src="https://raw.githubusercontent.com/bozeklab/DISK//main/images/imputation_method_summary_wskeleton.png" width=700></center>


**Comparison of imputation error on the different datasets**
    
<center><img src="https://raw.githubusercontent.com/bozeklab/DISK//main/images/barplot_newmissing_compare_networks.png" width=800></center>

- Link to the preprint: https://www.biorxiv.org/content/10.1101/2024.05.03.592173v1
- Link to the githup repo: https://github.com/bozeklab/DISK.git

---
In this tutorial we will:
   - install DISK (takes about 15 minutes)
   - see the configuration files system used in DISK
   - launch a training of a model
   - use a pretrained network to visualize imputed samples
   - visualize the DISK learned representations via U-map
   - use a pretrained network to impute real gaps
   - discuss how to apply it on new data

---
*For google colab:* Go to "Runtime" ->"change runtime type"->select "Python3", and then select "GPU"

In [2]:
!git clone https://github.com/bozeklab/DISK.git cloned-DISK-repo
!ls cloned-DISK-repo

Cloning into 'cloned-DISK-repo'...
remote: Enumerating objects: 2341, done.[K
remote: Counting objects: 100% (80/80), done.[K
remote: Compressing objects: 100% (60/60), done.[K
remote: Total 2341 (delta 52), reused 40 (delta 20), pack-reused 2261 (from 2)[K
Receiving objects: 100% (2341/2341), 5.27 MiB | 28.24 MiB/s, done.
Resolving deltas: 100% (1735/1735), done.
DISK	       FAQ.md  LICENSE.txt  notebooks  setup.py
DISK.egg-info  images  METADATA.in  README.md  tests


In [3]:
!python3 --version

Python 3.11.11


In [4]:
## from: https://gist.github.com/kargaranamir/e0b7910fed0a3189563d9254c7a2c439
## need to install python 3.9 as for 2025.03 the default version is 3.11
## for python 3.9 run this cell
!wget -O mini.sh https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh
!chmod +x mini.sh
!bash ./mini.sh -b -f -p /usr/local
!conda install -q -y jupyter
!conda install -q -y google-colab -c conda-forge
!python -m ipykernel install --name "py39" --user

--2025-03-27 15:31:26--  https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.32.241, 104.16.191.158, 2606:4700::6810:20f1, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.32.241|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61451533 (59M) [application/x-sh]
Saving to: ‘mini.sh’


2025-03-27 15:31:27 (102 MB/s) - ‘mini.sh’ saved [61451533/61451533]

PREFIX=/usr/local
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ | done
Solving environment: - \ | / done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - brotlipy==0.7.0=py39h27cfd23_1003
    - ca-certificates==2020.12.8=h06a4308_0
    - certifi==2020.12.5=py39h06a4308_0
    - cffi==1.14.4=py39h261ae71_0
    - chardet==3.0.4=py39h06a4308_1003
    - conda-package-handling==1.7.2=py39h27cfd23_1
    - conda==4

In [5]:
# verify python version
!python3 --version

Python 3.9.21


In [6]:
%cd cloned-DISK-repo
!python3 -m pip install -r DISK/requirements.txt -e .
%cd ..

/content/cloned-DISK-repo
Looking in links: https://download.pytorch.org/whl/cu113/torch_stable.html
Obtaining file:///content/cloned-DISK-repo
Collecting einops==0.6.1
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 789 kB/s 
[?25hCollecting h5py==3.7.0
  Downloading h5py-3.7.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
[K     |████████████████████████████████| 4.5 MB 6.5 MB/s 
Collecting hydra-core==1.2.0
  Downloading hydra_core-1.2.0-py3-none-any.whl (151 kB)
[K     |████████████████████████████████| 151 kB 72.2 MB/s 
Collecting imageio<2.10,>=2.3
  Downloading imageio-2.9.0-py3-none-any.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 66.8 MB/s 
Collecting matplotlib<3.8,>=3.1
  Downloading matplotlib-3.7.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)
[K     |████████████████████████████████| 11.6 MB 50.8 MB/s 
Collecting pandas==1.4.4
  Downloading pandas-1.4.4-cp39

In [7]:
# with the python version change, it only works using "!python3" so everything needs to be written in the script to be run with the python version we installed
%%writefile test_imports.py

import DISK
import hydra
from hydra import compose, initialize
from omegaconf import OmegaConf
import os

print('Test imports successful')

Writing test_imports.py


In [8]:
%load_ext autoreload
%autoreload 2
!python3 test_imports.py


Test imports successful


---
# Train a DISK model

**Download test data**

Human motion capture data rom the CMU MoCap dataset (see http://mocap.cs.cmu.edu/), downloaded from https://ericguo5513.github.io/action-to-motion/##data

<center><img src="https://raw.githubusercontent.com/bozeklab/DISK//main/notebooks/images/fig1_human_mocap_presentation.png" width=200>

In [9]:
import os
if not os.path.exists('/content/datasets'):
  os.mkdir('/content/datasets')

%cd datasets
!gdown https://drive.google.com/uc?id=1PXECUljc5qr8kz9H2LxT4LhS6P4uN4ck
!unzip Human_DISK_dataset.zip
%cd ../

/content/datasets
Downloading...
From (original): https://drive.google.com/uc?id=1PXECUljc5qr8kz9H2LxT4LhS6P4uN4ck
From (redirected): https://drive.google.com/uc?id=1PXECUljc5qr8kz9H2LxT4LhS6P4uN4ck&confirm=t&uuid=d41637fe-83c8-4d82-92ab-03e7c0884daa
To: /content/datasets/Human_DISK_dataset.zip
100% 1.26G/1.26G [00:17<00:00, 73.1MB/s]
Archive:  Human_DISK_dataset.zip
   creating: Mocap_keypoints_60_stride30_new/
  inflating: Mocap_keypoints_60_stride30_new/test_dataset_w-0-nans.npz  
  inflating: Mocap_keypoints_60_stride30_new/train_dataset_w-0-nans.npz  
  inflating: Mocap_keypoints_60_stride30_new/val_dataset_w-0-nans.npz  
  inflating: Mocap_keypoints_60_stride30_new/test_fulllength_dataset_w-0-nans.npz  
  inflating: Mocap_keypoints_60_stride30_new/train_fulllength_dataset_w-0-nans.npz  
  inflating: Mocap_keypoints_60_stride30_new/val_fulllength_dataset_w-0-nans.npz  
  inflating: Mocap_keypoints_60_stride30_new/hist_length_original_vs_fake_uniform.png  
  inflating: Mocap_keypoi

You can change manually the following parameters (recommended if you want to get familiar with the structure of the config files), or directly copy the content of the file [here](https://github.com/bozeklab/DISK/blob/main/notebooks/conf_missing.yaml)

**Change in `conf_missing.yaml` from `cloned-DISK-repo/DISK/conf`:**
- defaults.network=gru
- hydra.run.dir: models/${now:%d-%m-%y}_gru_human
- dataset.name=Mocap_keypoints_60_stride30_new
- dataset.skeleton_file=Mocap_keypoints_60_stride30_new/skeleton.py
- training.epochs=5
- training.n_cpus=1
- training.print_every=1
- feed_data.transforms.add_missing.files=[Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv,Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv]
- feed_data.transforms.viewinvariant=true

In [10]:
%cp cloned-DISK-repo/notebooks/*.yaml cloned-DISK-repo/DISK/conf

**Comment on lists in the `config` files**

You can use either of the two following syntaxes:

- Dash + new line
```
files:
  - file1
  - file2
```

- Python-like list, items separated by a comma
```
files: [file1, file2]
```

**Run Training**

In [11]:
# verify you are using GPUs/TPUs for this one (takes about 12 minutes vs 3 hours on CPUs)
%cd /content
!python3 cloned-DISK-repo/DISK/main_fillmissing.py

/content
[2025-03-27 15:40:06,229][root][INFO] - basedir: /content
[2025-03-27 15:40:06,229][root][INFO] - {'network': {'num_layers': 3, 'dropout': 0, 'type': 'GRU', 'size_layer': 512}, 'dataset': {'name': 'Mocap_keypoints_60_stride30_new', 'skeleton_file': 'Mocap_keypoints_60_stride30_new/skeleton.py'}, 'training': {'epochs': 5, 'batch_size': 32, 'learning_rate': 0.001, 'seed': False, 'load': None, 'n_cpus': 1, 'loss': {'type': 'l1', 'mask': True, 'factor': 100}, 'model_scheduler': {'type': 'lambdalr', 'steps_epoch': 500, 'rate': 0.95}, 'print_every': 1, 'mu_sigma': False, 'beta_mu_sigma': 0.5}, 'feed_data': {'mask': True, 'transforms': {'add_missing': {'pad': [1, 0], 'indep_keypoints': True, 'files': ['Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv', 'Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv']}, 'viewinvariant': True, 'normalize': False, 'normalizecube': True, 'swap': 0.1}, 'verbose': 0}}
[2025-03-27 15:40:06,263][root][INFO] - Device: cuda
[2025-03

---
# Download a pretrained model

In [12]:
if not os.path.exists('/content/models'):
  os.mkdir('/content/models')

%cd /content/models
!gdown https://drive.google.com/uc?id=1b8Px-lbTddOrMZW9dozJPVLjxh0PjnTp
!unzip Human_transformer_proba.zip
!gdown https://drive.google.com/uc?id=1tGL8eyafpwJS7wdNGB5o_tuABd8qOBHB
!unzip Human_GRU.zip
%cd /content

/content/models
Downloading...
From: https://drive.google.com/uc?id=1b8Px-lbTddOrMZW9dozJPVLjxh0PjnTp
To: /content/models/Human_transformer_proba.zip
100% 4.54M/4.54M [00:00<00:00, 35.3MB/s]
Archive:  Human_transformer_proba.zip
   creating: 03-10-24_transformer_NLL/
  inflating: 03-10-24_transformer_NLL/training_losses.txt  
  inflating: 03-10-24_transformer_NLL/main_fillmissing.log  
  inflating: 03-10-24_transformer_NLL/loss.svg  
  inflating: 03-10-24_transformer_NLL/model_epoch1470  
   creating: 03-10-24_transformer_NLL/.hydra/
  inflating: 03-10-24_transformer_NLL/.hydra/config.yaml  
  inflating: 03-10-24_transformer_NLL/.hydra/overrides.yaml  
  inflating: 03-10-24_transformer_NLL/.hydra/hydra.yaml  
  inflating: 03-10-24_transformer_NLL/loss_dark.svg  
Downloading...
From (original): https://drive.google.com/uc?id=1tGL8eyafpwJS7wdNGB5o_tuABd8qOBHB
From (redirected): https://drive.google.com/uc?id=1tGL8eyafpwJS7wdNGB5o_tuABd8qOBHB&confirm=t&uuid=0d8cc76c-024b-4c08-abe1-ae7ddd8

# Use pretrain model to test imputation and generate plots

You can change manually the following parameters (recommended if you want to get familiar with the structure of the config files), or directly copy the content of the file [here](https://github.com/bozeklab/DISK/blob/main/notebooks/conf_test.yaml)

**Change in `conf_test.yaml`:**
- hydra.run.dir: models/test_Human
- dataset.name=Mocap_keypoints_60_stride30_new
- dataset.skeleton_file=Mocap_keypoints_60_stride30_new/skeleton.py
- feed_data.transforms.add_missing = [Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv, Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv]
- evaluate.n_cpus=1
- evaluate.checkpoints=[models/03-07-06_GRU, models/03-10-24_transformer_NLL]
- evaluate.n_plots=4


In [13]:
%cd /content
!python3 cloned-DISK-repo/DISK/test_fillmissing.py

/content
[DEBUG][27-Mar-25 15:47:55] Setting JobRuntime:name=UNKNOWN_NAME
[DEBUG][27-Mar-25 15:47:55] Setting JobRuntime:name=test_fillmissing
[2025-03-27 15:47:55,546][root][INFO] - [BASEDIR] /content
[2025-03-27 15:47:55,546][root][INFO] - [OUTPUT DIR] /content/models/test_Human
[2025-03-27 15:47:55,546][root][INFO] - {'dataset': {'name': 'Mocap_keypoints_60_stride30_new', 'stride': 30, 'skeleton_file': 'Mocap_keypoints_60_stride30_new/skeleton.py'}, 'feed_data': {'mask': True, 'transforms': {'add_missing': {'pad': [1, 1], 'files': ['Mocap_keypoints_60_stride30_new/proba_missing_uniform.csv', 'Mocap_keypoints_60_stride30_new/proba_missing_length_uniform.csv']}, 'viewinvariant': True, 'normalize': False, 'normalizecube': True, 'swap': 0}, 'verbose': 0}, 'evaluate': {'n_cpus': 1, 'batch_size': 8, 'checkpoints': ['models/03-07-06_GRU', 'models/03-10-24_transformer_NLL'], 'n_plots': 4, 'threshold_pck': 0.01, 'azim': 60, 'size': 2.5, 'only_holes': True, 'original_coordinates': False, 'suf

---

In the output log `test_fillmissing.log`, you can find 2 interesting informations:
- Number of tested samples: `n lines in result df: 4626`
- Averaged normalized RMSE / MPJPE / PCK for each tested method on these N samples:
```
RMSE per sample averaged:
method_param
linear_interp                     0.232473
type-GRU_mu_sigma-False           0.029953
type-transformer_mu_sigma-True    0.021518
Name: RMSE, dtype: Float64
```

Runing the test scripts created a few files:
- In `models/test_Human`
  - `barplot_comparison_RMSE_...png` gives the RMSE per keypoint for the compared methods
  ⁻  `comparison_length_hole_[all/kp]_vs_RMSE_...png` gives the RMSE wrt the length of the gap for the compared methods. The difference between the 2 plots are when a sample has multiple gaps, then "all" averages the RMSE per sample, while "kp" averages the RMSE per hole
  - `mean_metrics.csv` reports what was in the log, i.e. the "RMSE per sample averaged"
  - `total_metrics_...csv` reports the error on every gap of every sample, and can be used to further analyze the performance of the models
  - if a model with proba head was tested, then `corrplot...png` correlationg between the estimated error per sample and the real error made by the imputation, and `thresholding_curve...png` which shows how the RMSE in the imputed samples varies with the number of remaining samples
- In `models/03-07-06_GRU/test/visualize_prediction_val`, plots with example of imputations

**Comparison of RMSE**

<center><img src="https://raw.githubusercontent.com/bozeklab/DISK//main/notebooks/images/human_comparison_length_hole_all_vs_RMSE_repeat-0.png" width=500></center>

**Example of imputation**

<center><img src="https://raw.githubusercontent.com/bozeklab/DISK//main/notebooks/images/human_reconstruction_xyz_0_repeat-0.png" width=500>

# U-map of the sequences

In [14]:
%cd /content
!python3 cloned-DISK-repo/DISK/embedding_umap.py --batch_size 1 --checkpoint_folder models/03-10-24_transformer_NLL --stride 60 --dataset_path .

/content
[INFO][27-Mar-25 15:51:07] Loaded skeleton with links [(0, 1), (1, 2), (2, 3), (0, 16), (16, 17), (17, 18), (18, 19), (0, 12), (12, 13), (13, 14), (14, 15), (1, 8), (8, 9), (9, 10), (10, 11), (1, 4), (4, 5), (5, 6), (6, 7)] and colors ['orange', 'orange', 'orange', 'gold', 'gold', 'gold', 'gold', 'grey', 'grey', 'grey', 'grey', 'cornflowerblue', 'cornflowerblue', 'cornflowerblue', 'cornflowerblue', 'turquoise', 'turquoise', 'turquoise', 'turquoise']
[INFO][27-Mar-25 15:51:07] Loading datasets...
[INFO][27-Mar-25 15:51:09] Device: cuda
[INFO][27-Mar-25 15:51:09] Loading transformer model...
[INFO][27-Mar-25 15:51:12] Network constructed
[INFO][27-Mar-25 15:51:12] Loading with epoch = 1470
[INFO][27-Mar-25 15:51:12] Loading with ave_loss_train = -92.11919800145033
[INFO][27-Mar-25 15:51:12] Loading with ave_rmse_train = 0.024657995028068144
[INFO][27-Mar-25 15:51:12] Loading with ave_loss_eval = -90.12528375478891
[INFO][27-Mar-25 15:51:12] Loading with ave_rmse_eval = 0.0245952

In [15]:
# to visualize Umaps with different coloring (change the file)
import IPython
IPython.display.HTML(filename="/content/models/03-10-24_transformer_NLL/Mocap_keypoints_60_stride30_new_normed_umap_colors-action_str_latent.html")

Output hidden; open in https://colab.research.google.com to view.

# Impute a dataset

Human has no missing data, so to demonstrate the imputation, we will use the Rat dataset from https://www.nature.com/articles/s41592-021-01106-6

The Rat dataset consists in five 3D motion capture recordings from rats placed in a transparent circular arena for minutes to hours.

Figure from Dunn et al. 2021:
<center><img src="https://raw.githubusercontent.com/bozeklab/DISK/main/notebooks/images/dunn_fig2.webp" width=700></center>

In [16]:
import os
if not os.path.exists('/content/datasets'):
  os.mkdir('/content/datasets')
%cd /content/datasets
!gdown https://drive.google.com/uc?id=14Yjpj_8Gy7i4-Gc2LKhQpchW_B8pykfd
!unzip Rat7M_seq_DISK_dataset.zip
!gdown https://drive.google.com/uc?id=1t_tPwyzNCDK_YUJzzbAJwn3YbtHmDi_z
!unzip rat7M_raw_data.zip
%cd /content

/content/datasets
Downloading...
From (original): https://drive.google.com/uc?id=14Yjpj_8Gy7i4-Gc2LKhQpchW_B8pykfd
From (redirected): https://drive.google.com/uc?id=14Yjpj_8Gy7i4-Gc2LKhQpchW_B8pykfd&confirm=t&uuid=5a14fa50-2940-4df0-b47e-ba2fe551db9f
To: /content/datasets/Rat7M_seq_DISK_dataset.zip
100% 3.66G/3.66G [00:37<00:00, 96.7MB/s]
Archive:  Rat7M_seq_DISK_dataset.zip
   creating: DANNCE_seq_keypoints_60_stride30_fill10/
   creating: DANNCE_seq_keypoints_60_stride30_fill10/.hydra/
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/.hydra/hydra.yaml  
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/.hydra/config.yaml  
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/.hydra/config_create_dataset.yaml  
 extracting: DANNCE_seq_keypoints_60_stride30_fill10/.hydra/overrides.yaml  
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/hyperparameters.txt  
  inflating: DANNCE_seq_keypoints_60_stride30_fill10/train_dataset_w-1-nans.npz  
  inflating: DANNCE_seq_keypoints_60_

In [17]:
if not os.path.exists('/content/models'):
  os.mkdir('/content/models')
%cd /content/models
!gdown https://drive.google.com/uc?id=1hbEkwTI2ir0T54UywVv4r9GbzfgveCaC
!unzip Rat7M_transformer_proba.zip
%cd /content/

/content/models
Downloading...
From: https://drive.google.com/uc?id=1hbEkwTI2ir0T54UywVv4r9GbzfgveCaC
To: /content/models/Rat7M_transformer_proba.zip
100% 9.04M/9.04M [00:00<00:00, 40.1MB/s]
Archive:  Rat7M_transformer_proba.zip
   creating: 05-12-23_transformer_NLL/
  inflating: 05-12-23_transformer_NLL/training_losses.txt  
  inflating: 05-12-23_transformer_NLL/model_last_epoch1500  
  inflating: 05-12-23_transformer_NLL/main_fillmissing.log  
  inflating: 05-12-23_transformer_NLL/loss.svg  
  inflating: 05-12-23_transformer_NLL/model_epoch1340  
  inflating: 05-12-23_transformer_NLL/loss_dark.svg  
   creating: 05-12-23_transformer_NLL/.hydra/
  inflating: 05-12-23_transformer_NLL/.hydra/config.yaml  
  inflating: 05-12-23_transformer_NLL/.hydra/overrides.yaml  
  inflating: 05-12-23_transformer_NLL/.hydra/hydra.yaml  
/content


---
You can change manually the following parameters (recommended if you want to get familiar with the structure of the config files), or directly copy the content of the file [here](https://github.com/bozeklab/DISK/blob/main/notebooks/conf_impute.yaml)

Change in conf_impute.yaml:

  - hydra.run.dir: models/impute_Rat7M
  - dataset.name=DANNCE_seq_keypoints_60_stride30_fill10
  - dataset.skeleton_file=DANNCE_seq_keypoints_60_stride30_fill10/skeleton.py
  - evaluate.checkpoint=models/05-12-23_transformer_NLL
  - evaluate.n_cpus=1
  - evaluate.save_dataset=false


In [18]:
%cd /content/
!python3 cloned-DISK-repo/DISK/impute_dataset.py

/content
[DEBUG][27-Mar-25 16:04:15] Setting JobRuntime:name=UNKNOWN_NAME
[DEBUG][27-Mar-25 16:04:15] Setting JobRuntime:name=impute_dataset
[2025-03-27 16:04:15,774][root][INFO] - [BASEDIR] /content
[2025-03-27 16:04:15,775][root][INFO] - [OUTPUT DIR] /content/models/impute_Rat7M
[2025-03-27 16:04:15,775][root][INFO] - {'dataset': {'name': 'DANNCE_seq_keypoints_60_stride30_fill10', 'skeleton_file': 'DANNCE_seq_keypoints_60_stride30_fill10/skeleton.py'}, 'feed_data': {'verbose': 0, 'pad': [1, 1], 'batch_size': 1}, 'evaluate': {'checkpoint': 'models/05-12-23_transformer_NLL', 'threshold_error_score': 0.1, 'threshold_pck': 0.1, 'n_cpus': 1, 'n_plots': 5, 'save': False, 'save_dataset': True, 'path_to_original_files': 'datasets', 'only_holes': True, 'suffix': None, 'name_items': [['network', 'type']]}}
[2025-03-27 16:04:15,777][root][INFO] - Loaded skeleton with links [(0, 1), (0, 2), (1, 2), (1, 3), (3, 4), (4, 5), (3, 6), (4, 6), (4, 7), (6, 7), (5, 7), (5, 8), (5, 9), (3, 12), (12, 10),

---
# How to prepare your own dataset

## Create a *DISK dataset* from your data

You would need to use the `create_dataset.py` script with the companion `conf_create_dataset.yaml` file.
The first step is to load your own data.
Currently supported formats are:
- h5 and .csv from DeepLabCut
- .h5 extracted from SLEAP (see https://sleap.ai/develop/tutorials/analysis.html)
- .csv with column names 'keypoint1_x', 'keypoint1_y', ...
<center><img src="https://raw.githubusercontent.com/bozeklab/DISK/main/notebooks/images/csv_input_format.png" width=300></center>
- .npy matrix of shape (but no keypoint names)
- .mat exported from Qualisys software
- .pkl data (used for Drosophila DF3D dataset)
- others? You can write a small open function (as here https://github.com/bozeklab/DISK/blob/main/DISK/create_dataset.py#L76) or **open a github issue with a link to an example file so we write it for you**

## How to choose the parameters to create your own dataset
- length of sequence / stride
- fill
- Link to FAQ (https://github.com/bozeklab/DISK/blob/main/FAQ.md)