OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

OpenTouch is an egocentric in-the-wild dataset and cross-modal learning framework for visual (RGB), tactile (pressure), and hand-pose modalities.

Dataset

The OpenTouch data is organized as synchronized multimodal recordings:

egocentric RGB video streams
full hand tactile pressure maps
hand pose

The dataset is hosted on Google Drive. We use gdown to download all files:

pip install gdown
bash scripts/download_data.sh
cd data && unzip final_annotations.zip && cd ..

See scripts/download_data.sh for the full list of Google Drive file IDs.

Environment Setup

conda create -n opentouch python=3.10
conda activate opentouch
pip install -e .

MANO Mesh Visualization (Optional)

The rendering scripts require extra dependencies:

git submodule update --init --recursive
pip install -e ".[rendering]"
cd EasyMocap && pip install -e . && cd ..

You also need the MANO hand model:

Download MANO_RIGHT.pkl from the MANO project
Place it in preprocess/scratch/MANO_RIGHT.pkl

# Generate a synchronized visualization from an HDF5 recording:
python preprocess/build_demo.py \
    --hdf5 data/fablab_ml_p1.hdf5 \
    --demo-id demo_05 \
    --fps 30

Example output: simple RGB+tactile view and tri-view with MANO/pose rendering.

Output path: data/<dataset_name>/<demo_id>/combined.mp4

Convert HDF5 to Arrow Dataset

# Retrieval dataset
python build_retrieval_data.py \
    --input-dir data \
    --output-dir preprocessed_data/train_dataset

# Classification dataset
python build_label_data.py \
    --input-dir data \
    --output-dir preprocessed_data/classification_peak \
    --label-mapping-path final_annotations \
    --label-column action \
    --frame-index-column peak_idx \
    --temporal-radius 10

Model Backbone

The default visual backbone is DINOv3 ViT-B/16 (facebook/dinov3-vitb16-pretrain-lvd1689m). Access to this model may require approval from Meta. Please refer to DINOv3 for more details.

Retrieval

bash scripts/train.sh

Or run directly:

CUDA_VISIBLE_DEVICES=0 python -m opentouch_train.main \
    --train-data preprocessed_data/train_dataset \
    --model OpenTouch-DINOv3-B16-Retrieval \
    --task-type v2t \
    --batch-size 128 \
    --lr 1e-4 \
    --epochs 300 \
    --precision amp \
    --workers 8 \
    --sequence-length 20

If you want to train with multiple GPUs, use distributed data parallel (DDP): please see scripts/train_multigpu.sh for the full reference configuration.

Task Types

Set --task-type to choose the retrieval task:

Task Type	Description
`v2t`	Visual $\leftrightarrow$ tactile
`p2t`	Pose $\leftrightarrow$ tactile
`v2p`	Visual $\leftrightarrow$ pose
`vp2t`	Visual + pose $\leftrightarrow$ tactile
`tp2v`	Tactile + pose $\leftrightarrow$ visual
`vt2p`	Visual + tactile $\leftrightarrow$ pose

Classification

Train action or grip classifiers on top of the same encoders:

bash scripts/train_classifier.sh

Or run directly:

CUDA_VISIBLE_DEVICES=0 python -m opentouch_train.classification_main \
    --train-data preprocessed_data/classification_peak \
    --model OpenTouch-DINOv3-B16-Classify \
    --task action \
    --modalities visual tactile \
    --batch-size 64 \
    --lr 3e-3 \
    --epochs 500 \
    --precision amp

Classification Options

Flag	Description
`--task`	Classification task: `action` or `grip`
`--modalities`	Input modalities: `visual`, `tactile`, `pose` (any combination).

Evaluation

Model name, task type, and modalities are auto-detected from the checkpoint or params.txt.

Retrieval

bash scripts/eval.sh logs/<run_name>/checkpoints/epoch_<N>.pt

Classification

bash scripts/eval_classifier.sh logs/<run_name>/checkpoints/epoch_<N>.pt

Citation

If you find this work helpful, please consider citing:

@article{song2025opentouch,
  title={OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction},
  author={Song, Yuxin Ray and Li, Jinzhou and Fu, Rao and Murphy, Devin and Zhou, Kaichen and Shiv, Rishi and Li, Yaqi and Xiong, Haoyu and Owens, Crystal Elaine and Du, Yilun and others},
  journal={arXiv preprint arXiv:2512.16842},
  year={2025}
}

Acknowledgments

This codebase builds on OpenCLIP.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
EasyMocap @ d2a0a2d		EasyMocap @ d2a0a2d
assets		assets
preprocess		preprocess
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
build_label_data.py		build_label_data.py
build_retrieval_data.py		build_retrieval_data.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

Dataset

Environment Setup

MANO Mesh Visualization (Optional)

Convert HDF5 to Arrow Dataset

Model Backbone

Retrieval

Task Types

Classification

Classification Options

Evaluation

Retrieval

Classification

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

Folders and files

Latest commit

History

Repository files navigation

OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

Dataset

Environment Setup

MANO Mesh Visualization (Optional)

Convert HDF5 to Arrow Dataset

Model Backbone

Retrieval

Task Types

Classification

Classification Options

Evaluation

Retrieval

Classification

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages