The repository contains the official implementation for the paper "TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction".
- demo/
- demo.py — 2D/3D keypoint inference (MLLM)
- eval/
- eval.py — evaluation script
- eval_wrapper/ — wrappers, parsers, task definitions, visualization
- demo_script.sh / eval_script.sh — runnable examples
- Linux
- Python 3.10+
- CUDA-capable GPU (recommended for vLLM / FlashAttention)
pip install torch==2.8.0 torchdata==0.11.0 torchvision==0.23.0
pip install -r requirements.txtYou may download released checkpoints from huggingface and put it under ./checkpoints
Note: This repository references external modules (e.g., model weights, optional packages). Make sure they are available in your environment.
Run the provided script:
bash demo_script.shRun the provided evaluation script:
bash eval_script.sh--model_path: local path or HuggingFace repo--backend:transformersorvllm--focal_length,--princpt_x,--princpt_y: camera intrinsics--input_path,--output_path: input/output paths
This project builds on and is inspired by several excellent open-source projects and tools, including:
- Rex-Omni for the code base of Qwen-VL SFT.
- Qwen3-VL for the code of Qwen3-VL finetuning and inference.
- vLLM for the efficient inference acceleration.
- SAM-3D-Body for 3D body mesh recovery and visualization in the demo pipeline.
We also thank the community contributors and dataset providers who make research and evaluation possible.
See the LICENSE file for details.