Skip to content

chen-si-cs/AnyHand

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation

Chen Si1   Yulin Liu1   Bo Ai1   Jianwen Xie2   Rolandos Alexandros Potamias3   Chuanxia Zheng4   Hao Su1

1UC San Diego   2Lambda, Inc   3Imperial College London   4Nanyang Technological University

teaser


Overview

AnyHand is a large-scale synthetic RGB-D dataset for 3D hand pose estimation, containing 2.5M single-hand and 4.1M hand-object interaction images with full geometric annotations (RGB, depth, mask, 3D pose/shape, camera intrinsics).

This repository releases fine-tuned checkpoints of HaMeR and WiLoR co-trained with AnyHand, which achieve consistent improvements on standard benchmarks (FreiHAND, HO-3D) and better generalization to out-of-domain scenes. More components are coming β€” see the roadmap below.


πŸ—ΊοΈ Roadmap

Component Status
Fine-tuned HaMeR & WiLoR checkpoints + unified AnyHandPredictor βœ… Released
AnyHandNet-D πŸ”œ Coming soon
AnyHand generation pipeline πŸ”œ Coming soon
AnyHand dataset πŸ”œ Coming soon

Part 1 β€” RGB Hand Pose Estimation (HaMeR & WiLoR + AnyHand)

We release improved checkpoints for both HaMeR and WiLoR, co-trained with AnyHand. These are drop-in replacements for the original checkpoints β€” no architecture changes are needed.

1.1 Clone This Repo (with Submodules)

Both WiLoR and HaMeR are included as git submodules.

git clone --recurse-submodules https://github.com/chen-si-cs/AnyHand.git
cd AnyHand

If you already cloned without --recurse-submodules:

git submodule update --init --recursive

This populates WiLoR/ (WiLoR codebase) and third_party/hamer/ (HaMeR codebase).

1.2 Install Dependencies

conda create -n anyhand python=3.10 -y
conda activate anyhand

Install PyTorch (adjust CUDA version β€” see pytorch.org):

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Then run the preparation scripts for whichever backend(s) you need: WiLoR only (recommended for most users):

bash scripts/prepare_wilor.sh

HaMeR only:

bash scripts/prepare_hamer.sh

Both:

bash scripts/prepare_wilor.sh
bash scripts/prepare_hamer.sh

Each script installs the corresponding Python package, downloads the AnyHand checkpoint, and prints a checklist of remaining manual steps.

1.3 Set Up MANO

WiLoR requires the MANO hand model, which must be downloaded manually due to its license.

  1. Register and download from the MANO website
  2. Unzip and place the right hand model at:
AnyHand/
└── mano_data/
    └── MANO_RIGHT.pkl

Note: By using MANO, you agree to the MANO license terms.

1.4 Download Checkpoints

The prepare scripts above handle this automatically once you fill in your HuggingFace username. See the scripts for manual wget alternatives. After running, your layout will be:

pretrained_models/
β”œβ”€β”€ anyhand_wilor.ckpt          ← AnyHand fine-tuned WiLoR
β”œβ”€β”€ model_config_wilor.yaml     ← WiLoR config
β”œβ”€β”€ detector.pt                 ← YOLO hand detector (shared by both)
└── hamer_ckpts/
    └── checkpoints/
        β”œβ”€β”€ anyhand_hamer.ckpt  ← AnyHand fine-tuned HaMeR
        └── model_config.yaml   ← HaMeR config

1.5 Run Inference β€” Unified Predictor

We provide AnyHandPredictor, a single class that wraps both models behind one consistent API. It always uses WiLoR's YOLO hand detector for bbox detection, then dispatches to whichever reconstruction backbone you choose.

Predict

from anyhand import AnyHandPredictor

# WiLoR backend (default)
predictor = AnyHandPredictor(backend='wilor')
hands = predictor.predict('path/to/image.jpg')

for hand in hands:
    print(f"{'Right' if hand.is_right else 'Left'} hand  score={hand.score:.2f}")
    print(f"  MANO pose  : {hand.mano_pose.shape}")    # (48,)
    print(f"  MANO shape : {hand.mano_shape.shape}")   # (10,)
    print(f"  Vertices   : {hand.vertices.shape}")     # (778, 3)
    print(f"  Keypoints3D: {hand.keypoints_3d.shape}") # (21, 3)
    print(f"  Keypoints2D: {hand.keypoints_2d.shape}") # (21, 2)
    print(f"  Cam translation : {hand.cam_t}")              # (3,)

# HaMeR backend
predictor = AnyHandPredictor(backend='hamer')
hands = predictor.predict('path/to/image.jpg')

# Both at once β€” same bboxes, two sets of predictions
# hand.backend == 'wilor' or 'hamer' tells you which is which
predictor = AnyHandPredictor(backend='both')
hands = predictor.predict('path/to/image.jpg')

# Batch of images
import cv2
imgs = [cv2.imread(p) for p in ['img1.jpg', 'img2.jpg']]
batch_results = predictor.predict(imgs)  # List[List[HandPrediction]]

# Override backend per call
hands = predictor.predict('photo.jpg', backend='wilor')

Render mesh overlay

# Renders all detected hand meshes overlaid on the image.
# Returns a BGR uint8 numpy array ready for cv2.imwrite().
overlay = predictor.render_overlay('photo.jpg', hands)
cv2.imwrite('out.jpg', overlay)

# Custom mesh colour (float RGB in [0, 1])
overlay = predictor.render_overlay('photo.jpg', hands, mesh_color=(0.9, 0.4, 0.2))

Save per-hand .obj meshes

# Saves <prefix>_0.obj, <prefix>_1.obj, … into out_dir/.
# Returns the list of absolute paths written.
paths = predictor.save_meshes(hands, out_dir='out/meshes', prefix='frame0042')
print(paths)
# ['…/out/meshes/frame0042_0.obj', '…/out/meshes/frame0042_1.obj']

Project 3D points to 2D pixels

img = cv2.imread('photo.jpg')

for hand in hands:
    # Works on vertices (778, 3) or keypoints (21, 3) β€” any (N, 3) array
    kpts_2d = AnyHandPredictor.project_3d_to_2d(
        hand.keypoints_3d,
        hand.cam_t,
        hand.focal_length,
        img_size=(img.shape[1], img.shape[0]),
    )  # (21, 2) float32 pixel coordinates

    for x, y in kpts_2d.astype(int):
        cv2.circle(img, (x, y), 4, (0, 255, 0), -1)

cv2.imwrite('keypoints.jpg', img)

Custom checkpoint paths

predictor = AnyHandPredictor(
    backend        = 'wilor',
    wilor_ckpt     = '/path/to/anyhand_wilor.ckpt',
    wilor_cfg      = '/path/to/model_config_wilor.yaml',
    detector_pt    = '/path/to/detector.pt',
    device         = 'cuda:0',
    det_conf       = 0.4,   # YOLO detection confidence threshold
    rescale_factor = 2.0,   # bbox padding factor
    batch_size     = 32,
)

Command-line demo (WiLoR's original script, with AnyHand checkpoint)

python WiLoR/demo.py \
    --img_folder demo_img \
    --out_folder demo_out \
    --checkpoint pretrained_models/anyhand_wilor.ckpt \
    --cfg        pretrained_models/model_config_wilor.yaml \
    --save_mesh

Part 2 β€” RGB-D Hand Pose Estimation (AnyHandNet-D) πŸ”œ

Coming soon.


Part 3 β€” AnyHand Generation Pipeline πŸ”œ

Coming soon.


Part 4 β€” AnyHand Dataset πŸ”œ

Coming soon.



License


Citation

If you find AnyHand useful, please cite our work and the baselines:

@misc{si2026anyhand,
  title         = {AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation},
  author        = {Si, Chen and Liu, Yulin and Ai, Bo and Xie, Jianwen and
                   Potamias, Rolandos Alexandros and Zheng, Chuanxia and Su, Hao},
  year          = {2026},
  eprint        = {XXXX.XXXXX},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

@misc{potamias2024wilor,
    title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
    author={Rolandos Alexandros Potamias and Jinglei Zhang and Jiankang Deng and Stefanos Zafeiriou},
    year={2024},
    eprint={2409.12259},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

@inproceedings{pavlakos2024reconstructing,
    title={Reconstructing Hands in 3D with Transformers},
    author={Pavlakos, Georgios and Shan, Dandan and Radosavovic, Ilija and Kanazawa, Angjoo and Fouhey, David and Malik, Jitendra},
    booktitle={CVPR},
    year={2024}
}

About

AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors