Trained solely on real-world human grasping data, HUG generates diverse human hand grasps for any user-selected object in a single RGB-D image captured from a stereo camera. HUG works with any stereo camera, anywhere, out of the box.
- Paper and website
- Inference + visualization code
-
1M-HUGsdataset (planned 2026/06/29) -
HUG-Benchbenchmark, assets + sim eval (planned 2026/06/29) - Training code (planned 2026/06/29)
Tested on Ubuntu 22.04/24.04, CUDA 12.8, PyTorch 2.9.1, Python 3.10.
# 1) Environment
conda env create -f environment.yaml && conda activate hug
pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu128
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.9.1+cu128.html
pip install --no-build-isolation git+https://github.com/mattloper/chumpy.git@580566e
pip install -e .
# 2) Download required assets listed below- MANO: Register → download and unzip the MANO models → copy contents of
mano_v*_*/toassets/mano/ - DINOv2: Auto-downloads on first use
- HUG weights:
hf download kevinywu/hug hug_full.safetensors --local-dir checkpoints/
HUG predicts human grasps in MANO form for selected objects in the camera frame. Currently, only inference is supported. We provide sample inputs of one image from each scene in HUG-Bench.
CKPT=checkpoints/hug_full.safetensors
DATA=data/hug_bench/
# App: click an object to predict a grasp
# --save-pred saves each clicked prediction to $DATA/grasp_pred/
python -m hug.app --checkpoint-path "$CKPT" --dataset-path "$DATA" --save-predIf predictions are saved with --save-pred, you can visualize them with:
python -m hug.visualize_predictions --dataset-path "$DATA"You can also run inference on your own captures. Put three files in one folder, we provide an example in data/custom/ for a ZED 2i output.
- RGB: 8-bit image ("
rgb.png"/"rgb.jpg"), any H×W, grayscale is also supported. - Depth: 16-bit single-channel PNG ("
depth.png" inuint16), millimeter units, same H×W as RGB and registered to it. Use S2M2 to estimate depth for best results. - Intrinsics: text file ("
intrinsics.txt") at the RGB resolution: either four numbersfx fy cx cyor a 3×3 K matrix..npy/.jsonalso accepted.
# Prepare inputs writes <stem>.pkl into the folder
python -m hug.prepare_inputs --dataset-path data/custom
python -m hug.app --checkpoint-path "$CKPT" --dataset-path data/custom --save-predIf you find our work useful, please consider citing our paper:
@article{wu2026hug,
title={Human Universal Grasping},
author={Kevin Yuanbo Wu and Tianxing Zhou and Isaac Tu and Billy Yan and Irmak Guzey and David Fouhey and Dandan Shan and Lerrel Pinto},
journal={arXiv preprint arXiv:2606.17054},
year={2026}
}
