initial code release

NVlabs · Jan 28, 2024 · 3fa41a7 · 3fa41a7
1 parent a9c35f6
commit 3fa41a7
Show file tree

Hide file tree

Showing 20 changed files with 3,217 additions and 9 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,12 @@
+colmap.db*
+colmap_sparse
+colmap_text
+
+data/
+**__pycache__**
+imgui.ini
+
+commands.sh
+
+*.jpg
+*.png
diff --git a/LICENSE.txt b/LICENSE.txt
@@ -0,0 +1,94 @@
+Copyright (c) 2024, NVIDIA Corporation & affiliates. All rights reserved.
+
+
+=======================================================================
+
+1. Definitions
+
+"Licensor" means any person or entity that distributes its Work.
+
+"Software" means the original work of authorship made available under
+this License.
+
+"Work" means the Software and any additions to or derivative works of
+the Software that are made available under this License.
+
+The terms "reproduce," "reproduction," "derivative works," and
+"distribution" have the meaning as provided under U.S. copyright law;
+provided, however, that for the purposes of this License, derivative
+works shall not include works that remain separable from, or merely
+link (or bind by name) to the interfaces of, the Work.
+
+Works, including the Software, are "made available" under this License
+by including in or with the Work either (a) a copyright notice
+referencing the applicability of this License to the Work, or (b) a
+copy of this License.
+
+2. License Grants
+
+    2.1 Copyright Grant. Subject to the terms and conditions of this
+    License, each Licensor grants to you a perpetual, worldwide,
+    non-exclusive, royalty-free, copyright license to reproduce,
+    prepare derivative works of, publicly display, publicly perform,
+    sublicense and distribute its Work and any resulting derivative
+    works in any form.
+
+3. Limitations
+
+    3.1 Redistribution. You may reproduce or distribute the Work only
+    if (a) you do so under this License, (b) you include a complete
+    copy of this License with your distribution, and (c) you retain
+    without modification any copyright, patent, trademark, or
+    attribution notices that are present in the Work.
+
+    3.2 Derivative Works. You may specify that additional or different
+    terms apply to the use, reproduction, and distribution of your
+    derivative works of the Work ("Your Terms") only if (a) Your Terms
+    provide that the use limitation in Section 3.3 applies to your
+    derivative works, and (b) you identify the specific derivative
+    works that are subject to Your Terms. Notwithstanding Your Terms,
+    this License (including the redistribution requirements in Section
+    3.1) will continue to apply to the Work itself.
+
+    3.3 Use Limitation. The Work and any derivative works thereof only
+    may be used or intended for use non-commercially. Notwithstanding
+    the foregoing, NVIDIA and its affiliates may use the Work and any
+    derivative works commercially. As used herein, "non-commercially"
+    means for research or evaluation purposes only.
+
+    3.4 Patent Claims. If you bring or threaten to bring a patent claim
+    against any Licensor (including any claim, cross-claim or
+    counterclaim in a lawsuit) to enforce any patents that you allege
+    are infringed by any Work, then your rights under this License from
+    such Licensor (including the grant in Section 2.1) will terminate
+    immediately.
+
+    3.5 Trademarks. This License does not grant any rights to use any
+    Licensor�s or its affiliates� names, logos, or trademarks, except
+    as necessary to reproduce the notices described in this License.
+
+    3.6 Termination. If you violate any term of this License, then your
+    rights under this License (including the grant in Section 2.1) will
+    terminate immediately.
+
+4. Disclaimer of Warranty.
+
+THE WORK IS PROVIDED "AS IS" WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
+NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER
+THIS LICENSE.
+
+5. Limitation of Liability.
+
+EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
+THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
+SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
+INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
+OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
+(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
+LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
+COMMERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF
+THE POSSIBILITY OF SUCH DAMAGES.
+
+=======================================================================
diff --git a/README.md b/README.md
@@ -1,19 +1,122 @@
-[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
+# HANDAL
 
-# HANDAL Dataset
+<img src="intro.jpeg">
 
-Dataset is now available for manual [download](https://drive.google.com/drive/folders/1ANTZAXDbZbszyxa7ZO9_DcFHe8tbdCIN?usp=drive_link).  Automated tools are forthcoming...
+## HANDAL Dataset
+
+The HANDAL dataset is category-level dataset for object pose estimation and affordance prediction, with a focus on hardware and kitchen tool objects that are of the proper size and shape for functional robotic grasping (e.g. pliers, utensils, and screwdrivers). The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories.
+
+Dataset is available for download from [Google Drive](https://drive.google.com/drive/folders/1ANTZAXDbZbszyxa7ZO9_DcFHe8tbdCIN?usp=drive_link). Dataset is released under [CC-BY-NC-SA-4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/).
 
 Dataset follows the [BOP format](https://github.com/thodan/bop_toolkit/blob/master/docs/bop_datasets_format.md), with some minor modifications:
-- Image size is provided in the `scene_camera.json` info file (since multiple camera sensors were used to capture data)
-- Depth is from NeRF ([Instant NGP](https://github.com/NVlabs/instant-ngp)) rather than from a depth sensor
+- Image size is provided in `scene_camera.json` for each scene, since multiple camera sensors were used to capture data
+- Depth is rendered from NeRF reconstructions of the scene ([Instant NGP](https://github.com/NVlabs/instant-ngp)) rather than from a depth sensor
 - RGB images are compressed using JPEG rather than PNG
 
-*(Note:  Not all categories are available yet.  Some are uploading, others are still processing.  Stay tuned over the next several days...)*
+For more details, see the [project page](https://nvlabs.github.io/HANDAL) and our 2023 [IROS paper](https://arxiv.org/abs/2308.01477).
+
+
+## HANDAL Toolkit
+
+Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi-automated processing, allowing us to produce high-quality 3D annotations without crowd-sourcing. This section covers how to install the HANDAL toolkit and use it to annotate static scenes.
+
+
+### Setup 
+
+Our toolkit requires an NVIDIA GPU capable of running [Instant-NGP](https://github.com/NVlabs/instant-ngp/blob/master/README.md#requirements). We have only tested the toolkit on Ubuntu.
+
+#### Installing the HANDAL toolkit
+We recommend installing the HANDAL pipeline in a separate Python environment, using Python 3.10 or greater.
+```
+conda create -n handal-toolkit python=3.10
+git clone https://https://github.com/NVlabs/HANDAL.git
+cd HANDAL
+pip install -r requirements.txt 
+```
+
+#### Installing Instant-NGP
+Our pipeline leverages the [Instant-NGP](https://github.com/NVlabs/instant-ngp) Python bindings for scene reconstruction, depth estimation, and mesh construction.
+Follow the instructions to build [Instant-NGP](https://github.com/NVlabs/instant-ngp#building-instant-ngp-windows--linux) and its [Python bindings](https://github.com/NVlabs/instant-ngp#python-bindings).
+Some potential headaches may be avoided by building the Python bindings in the same Python environment as the one used to install the HANDAL toolkit.
+Finally, set the `CONFIG_NGP_PATH` variable in `configs/default_config.sh` to the root directory of the Instant-NGP repository.
+
+The following commands worked for our local build of Instant-NGP. We share them in case they are helpful, but we recommend following the official instructions if any problems are encountered:
+```
+conda activate handal-toolkit
+cd HANDAL
+git clone --recursive https://github.com/nvlabs/instant-ngp submodules/instant-ngp
+cd submodules/instant-ngp
+conda install -c conda-forge libgcc-ng libstdcxx-ng cmake  # fix the missing GLIBCXX 3.4.30 error
+pip install -r requirements.txt
+cmake . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
+cmake --build build --config RelWithDebInfo -j
+python ./scripts/run.py --help
+```
+
+#### Installing a video segmentation tool
+To initialize the pipeline, we require rough segmentation masks of the object to be annotated. We recommend considering the video segmentation tool [Cutie](https://github.com/hkchengrex/Cutie), a follow-up work to [XMem](https://github.com/hkchengrex/XMem). To avoid conflicting with the other requirements, we recommend installing it in a separate Python environment.
+
+
+### Running the example
+
+To test the installation, download the example scenes from [Google Drive](https://drive.google.com/drive/folders/1znqQgfNfe5yoDd3SVy7JJfkAy4CXRTpy?usp=sharing) and extract them to `example/`. Object segmentation masks are already provided for these examples in the `input_masks/` directory.
+
+#### Process reference scene
+To generate the initial reconstruction and annotation of the reference scan, run the following command:
+```
+bash run.sh example/drill_reference
+```
+If the command runs successfully, you will find initial annotations and results in `example/drill_reference/`, including raw BOP annotations `bop_raw/scene_*_initial.json`, a mesh with vertex colors `meshes/colored.ply`, and rendered depth maps `depth_scene/`. The initial annotations lack ground truth scale because we use RGB-only inputs. The next steps will interactively place the reference object in a canonical pose and estimate the correct scale.
+
+First, ensure that `example/drill_reference/config.sh` contains the following line to indicate that this is a reference scene:
+```
+CONFIG_IS_REFERENCE=1
+```
+Then, run the following command:
+```
+bash run.sh example/drill_reference --interactive
+```
+First, an interactive tool will place the object in a canonical pose. The canonical pose is automatically initialized using an oriented bounding cuboid fit to the object mesh.
+Use the keys `r`, `g`, and `b` to rotate the object in 1 degree increments around the `x`, `y`, and `z` axes, respectively.
+(The `shift` key can be used to rotate in the opposite direction, the `ctrl` key can be used to make 90 degree rotations, and the `alt` key can be used to make 0.1 degree rotations.)
+
+Use the `v` key to cycle through canonical viewing directions (corresponding to the [BOP](https://github.com/thodan/bop_toolkit/blob/master/docs/bop_datasets_format.md#coordinate-systems) documentation): front view (red x-axis pointing out of screen), side view (green y-axis pointing out of screen), and top view (blue z-axis pointing out of screen).
+
+For this example, in the front view, we recommend rotating such that the top of the drill is aligned with the positive `z` axis (blue arrow) and the front of the drill is pointing in the negative `y` direction (green arrow).
+
+Once the object is in the desired pose, press `q` to quit the interactive tool and write the pose to `example/drill_reference/canonical_pose.json`.
+
+Next, you will be prompted to enter the measured dimensions of the object in millimeters. For the drill example, the dimensions are 230mm x 130mm x 75mm corresponding to the height, width, and depth of the object in BOP canonical pose. (Or, equivalently, the size of bounding cuboid along the z-, y-, and x-axes, respectively.)
+The estimated scale will be written to `example/drill_reference/canonical_scale.json`.
+
+Finally, the pipeline will apply the canonical pose and scale and finalize the BOP annotations in `example/drill_reference/bop/`.
+
+#### Process and align additional scene
+
+To process an additional scene (e.g. `example/drill_scene1`) and align it to the reference scene, first ensure that `example/drill_scene1/config.sh` contains the following line to indicate the reference scene (relative to the scene's directory):
+```
+CONFIG_REFERENCE_SCENE=../drill_reference
+```
+Then run the following command:
+```
+bash run.sh example/drill_scene1 --interactive
+```
+After the initial reconstruction, the interactive GUI will first prompt for the canonical pose, followed by a prompt for scale. (Be careful when scaling as some dimensions may not be accurate if the object was occluded.) Then the interative GUI will guide the alignment of the object to the reference scan. In addition to the rotation keys used to set the canonical pose, the `x`, `y`, and `z` keys may be used to translate the object in 1mm increments along the `x`, `y`, and `z` axes. (As before, the `shift`, `ctrl`, and `alt` keys may be used to translate in the opposite direction, make 50mm adjustments, and make 0.2mm adjustments, respectively.)
+
+Finally, the pipeline will use this transformation and the mesh copied from the reference scene to generate final annotation files in `example/drill_scene1/bop/`.
+
 
-For more details (including the IROS '23 paper describing the work), see the [project page](https://nvlabs.github.io/HANDAL)
+## Citation and License
 
-Code for re-creating the dataset will be coming in a few weeks...
+```bibtex
+@InProceedings{handaliros23,
+  title={{HANDAL}: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions},
+  author={Andrew Guo and Bowen Wen and Jianhe Yuan and Jonathan Tremblay and Stephen Tyree and Jeffrey Smith and Stan Birchfield},
+  booktitle={IROS},
+  year={2023}
+}
+```
 
-Dataset is released under [CC-BY-NC-SA-4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/)
+The dataset is released under [CC-BY-NC-SA-4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/). The code is released under the [NVIDIA Source Code License](LICENSE.txt).
 
+Copyright © 2023-2024, NVIDIA Corporation. All rights reserved.
diff --git a/config/default_config.sh b/config/default_config.sh
@@ -0,0 +1,35 @@
+# Scene scale foe NGP reconstruction
+CONFIG_AABB_SCALE=4
+
+# Target number of frames to extract from video
+CONFIG_FRAME_COUNT=125 
+
+# Path to directory containing object masks (relative to scene path)
+CONFIG_MASK_DIR=input_masks
+
+# Number of NGP optimization steps
+CONFIG_N_STEPS=5000 
+
+# Path to NGP repository (use $SOURCE_PATH to refer to toolkit repository root dir)
+CONFIG_NGP_PATH=$SOURCE_PATH/submodules/instant-ngp
+
+# Filename of scene video (relative to scene path)
+CONFIG_VIDEO_NAME=input.mp4
+
+# Depth tolerance (in ~mm) for generating visibility masks
+CONFIG_VISIBILITY_TOLERANCE=25
+
+# Flag to indicate whether this is a reference mesh
+CONFIG_IS_REFERENCE=0
+
+# Scale factor for the scene
+CONFIG_SCALE_FACTOR=0
+
+# Reference mesh for aligning/scaling across scenes
+CONFIG_REFERENCE_SCENE=
+
+# BOP object/mesh ID
+CONFIG_BOP_ID=0
+
+# Use Poisson mesh reconstruction from NGP depth maps
+CONFIG_USE_POISSON_MESH=0
diff --git a/example/README.md b/example/README.md
@@ -0,0 +1 @@
+Three example scenes (a reference scan of a drill, plus two scans of the same drill in different scenes) may be downloaded from [Google Drive](https://drive.google.com/drive/folders/1znqQgfNfe5yoDd3SVy7JJfkAy4CXRTpy?usp=sharing) and extracted here. Object segmentation masks are already provided for these examples in the `input_masks/` directory.
diff --git a/intro.jpeg b/intro.jpeg
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,10 @@
+git+https://github.com/thodan/bop_toolkit.git
+decord
+flask>=2.2.5
+numpy
+open3d>=0.18.0
+opencv-python 
+rich
+scipy
+tqdm 
+trimesh[easy]