A toolkit for robotic tasks
- Zero-shot classification using OpenAI CLIP.
- Zero-shot text-to-bbox approach for object detection using GroundingDINO.
- Zero-shot bbox-to-mask approach for object detection using SegmentAnything (MobileSAM).
- Zero-shot image-to-depth approach for depth estimation using Depth Anything.
- Zero-shot feature upsampling using FeatUp.
TODO
- Python 3.7 or higher (tested 3.9)
- torch (tested 2.0)
- torchvision
git clone https://github.com/IRVLUTD/robokit.git && cd robokit
pip install -r requirements.txt
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
python setup.py installNote: Check GroundingDINO installation for the following error
NameError: name '_C' is not defined- Note: All test scripts are located in the
testdirectory. Place the respective test scripts in the root directory to run. - SAM:
test_sam.py - GroundingDINO + SAM:
test_gdino_sam.py - GroundingDINO + SAM + CLIP:
test_gdino_sam_clip.py - Depth Anything:
test_depth_anything.py - FeatUp:
test_featup.py - Test Datasets:
test_dataset.pypython test_dataset.py --gpu 0 --dataset <ocid_object_test/osd_object_test>
Future goals for this project include:
- Add a config to set the pretrained checkpoints dynamically
- More: TODO
This project is based on the following repositories (license check mandatory):
This project is licensed under the MIT License. However, before using this tool please check the respective works for specific licenses.