From 0569df4a4097c175f7d1b17decdef053646ee9e3 Mon Sep 17 00:00:00 2001 From: rbler1234 Date: Fri, 24 Jan 2025 16:01:59 +0800 Subject: [PATCH 1/4] edit readme --- README.md | 93 +++++++++++++++++++++++++++++++++--------------- models/README.md | 21 ++++++++++- 2 files changed, 84 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index a852d5b..12cfef0 100644 --- a/README.md +++ b/README.md @@ -21,12 +21,14 @@ ## πŸ“‹ Contents -1. [About](#-about) -2. [Getting Started](#-getting-started) -3. [Model and Benchmark](#-model-and-benchmark) -4. [TODO List](#-todo-list) +1. [About](#topic1) +2. [Getting Started](#topic2) +3. [MMScan API Tutorial](#topic3) +4. [MMScan Benchmark](#topic4) +5. [TODO List](#topic5) ## 🏠 About + @@ -55,7 +57,8 @@ Furthermore, we use this high-quality dataset to train state-of-the-art 3D visua grounding and LLMs and obtain remarkable performance improvement both on existing benchmarks and in-the-wild evaluation. -## πŸš€ Getting Started: +## πŸš€ Getting Started + ### Installation @@ -98,6 +101,7 @@ existing benchmarks and in-the-wild evaluation. Please refer to the [guide](data_preparation/README.md) here. ## πŸ‘“ MMScan API Tutorial + The **MMScan Toolkit** provides comprehensive tools for dataset handling and model evaluation in tasks. @@ -137,39 +141,41 @@ Each dataset item is a dictionary containing key elements: (1) 3D Modality -- **"ori_pcds"** (tuple\[tensor\]): Raw point cloud data from the `.pth` file. -- **"pcds"** (np.ndarray): Point cloud data, dimensions (\[n_points, 6(xyz+rgb)\]). -- **"instance_labels"** (np.ndarray): Instance IDs for each point. -- **"class_labels"** (np.ndarray): Class IDs for each point. -- **"bboxes"** (dict): Bounding boxes in the scan. +- **"ori_pcds"** (tuple\[tensor\]): Original point cloud data extracted from the .pth file. +- **"pcds"** (np.ndarray): Point cloud data with dimensions [n_points, 6(xyz+rgb)], representing the coordinates and color of each point. +- **"instance_labels"** (np.ndarray): Instance ID assigned to each point in the point cloud. +- **"class_labels"** (np.ndarray): Class IDs assigned to each point in the point cloud. +- **"bboxes"** (dict): Information about bounding boxes within the scan. (2) Language Modality -- **"sub_class"**: Sample category. -- **"ID"**: Unique sample ID. -- **"scan_id"**: Corresponding scan ID. -- **--------------For Visual Grounding Task** -- **"target_id"** (list\[int\]): IDs of target objects. -- **"text"** (str): Grounding text. +- **"sub_class"**: The sample category of the sample. +- **"ID"**: A unique identifier for the sample. +- **"scan_id"**:Identifier corresponding to the related scan. + + *For Visual Grounding Task* +- **"target_id"** (list\[int\]): IDs of target objects. +- **"text"** (str): Text used for grounding. - **"target"** (list\[str\]): Types of target objects. - **"anchors"** (list\[str\]): Types of anchor objects. - **"anchor_ids"** (list\[int\]): IDs of anchor objects. -- **"tokens_positive"** (dict): Position indices of mentioned objects in the text. -- **--------------ForQuestion Answering Task** -- **"question"** (str): The question text. +- **"tokens_positive"** (dict): Indices of positions where mentioned objects appear in the text. + + *For Question Answering Task* +- **"question"** (str): The text of the question. - **"answers"** (list\[str\]): List of possible answers. - **"object_ids"** (list\[int\]): Object IDs referenced in the question. - **"object_names"** (list\[str\]): Types of referenced objects. - **"input_bboxes_id"** (list\[int\]): IDs of input bounding boxes. -- **"input_bboxes"** (list\[np.ndarray\]): Input bounding boxes, 9 DoF. +- **"input_bboxes"** (list\[np.ndarray\]): Input bounding box data, with 9 degrees of freedom. (3) 2D Modality -- **'img_path'** (str): Path to RGB image. -- **'depth_img_path'** (str): Path to depth image. -- **'intrinsic'** (np.ndarray): Camera intrinsic parameters for RGB images. -- **'depth_intrinsic'** (np.ndarray): Camera intrinsic parameters for depth images. -- **'extrinsic'** (np.ndarray): Camera extrinsic parameters. +- **'img_path'** (str): File path to the RGB image. +- **'depth_img_path'** (str): File path to the depth image. +- **'intrinsic'** (np.ndarray): Intrinsic parameters of the camera for RGB images. +- **'depth_intrinsic'** (np.ndarray): Intrinsic parameters of the camera for Depth images. +- **'extrinsic'** (np.ndarray): Extrinsic parameters of the camera. - **'visible_instance_id'** (list): IDs of visible objects in the image. ### MMScan Evaluator @@ -182,7 +188,9 @@ For the visual grounding task, our evaluator computes multiple metrics including - **AP and AR**: These metrics calculate the precision and recall by considering each sample as an individual category. - **AP_C and AR_C**: These versions categorize samples belonging to the same subclass and calculate them together. -- **gtop-k**: An expanded metric that generalizes the traditional top-k metric, offering insights into broader performance aspects. +- **gTop-k**: An expanded metric that generalizes the traditional Top-k metric, offering insights into broader performance aspects. + +*Note:* Here, AP corresponds to APsample in the paper, and AP_C corresponds to APbox in the paper. Below is an example of how to utilize the Visual Grounding Evaluator: @@ -301,11 +309,38 @@ The input structure remains the same as for the question answering evaluator: ] ``` -### Models +## πŸ† MMScan Benchmark + + + +### MMScan Visual Grounding Benchmark -We have adapted the MMScan API for some [models](./models/README.md). +| Methods | gTop-1 | gTop-3 | APsample | APbox | AR | Release | Download | +|---------|--------|--------|---------------------|------------------|----|-------|----| +| ScanRefer | 4.74 | 9.19 | 9.49 | 2.28 | 47.68 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/Scanrefer) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) | +| MVT | 7.94 | 13.07 | 13.67 | 2.50 | 86.86 | ~ | ~ | +| BUTD-DETR | 15.24 | 20.68 | 18.58 | 9.27 | 66.62 | ~ | ~ | +| ReGround3D | 16.35 | 26.13 | 22.89 | 5.25 | 43.24 | ~ | ~ | +| EmbodiedScan | 19.66 | 34.00 | 29.30 | **15.18** | 59.96 | [code](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan/models/EmbodiedScan) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) | +| 3D-VisTA | 25.38 | 35.41 | 33.47 | 6.67 | 87.52 | ~ | ~ | +| ViL3DRef | **26.34** | **37.58** | **35.09** | 6.65 | 86.86 | ~ | ~ | + +### MMScan Question Answering Benchmark +| Methods | Overall | ST-attr | ST-space | OO-attr | OO-space | OR| Advanced | Release | Download | +|---|--------|--------|--------|--------|--------|--------|-------|----|----| +| LL3DA | 45.7 | 39.1 | 58.5 | 43.6 | 55.9 | 37.1 | 24.0| [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LL3DA) | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | +| LEO |54.6 | 48.9 | 62.7 | 50.8 | 64.7 | 50.4 | 45.9 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LEO) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link)| +| LLaVA-3D |**61.6** | 58.5 | 63.5 | 56.8 | 75.6 | 58.0 | 38.5|~ | ~ | + +*Note:* These two tables only show the results for main metrics; see the paper for complete results. + +We have released the codes of some models under [./models](./models/README.md). ## πŸ“ TODO List -- \[ \] More Visual Grounding baselines and Question Answering baselines. + + +- \[ \] MMScan annotation and samples for ARKitScenes. +- \[ \] Online evaluation platform for the MMScan benchmark. +- \[ \] Codes of more MMScan Visual Grounding baselines and Question Answering baselines. - \[ \] Full release and further updates. diff --git a/models/README.md b/models/README.md index 5309e7b..86aabf9 100644 --- a/models/README.md +++ b/models/README.md @@ -21,7 +21,11 @@ These are 3D visual grounding models adapted for the mmscan-devkit. Currently, t ```bash python -u scripts/train.py --use_color --eval_only --use_checkpoint "path/to/pth" ``` +#### ckpts & Logs +| Epoch | gTop-1 @ 0.25/0.50 | Config | Download | +| :-------: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| 50 | 4.74 / 2.52 | [config](https://drive.google.com/file/d/1iJtsjt4K8qhNikY8UmIfiQy1CzIaSgyU/view?usp=drive_link) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) ### EmbodiedScan 1. Follow the [EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan/blob/main/README.md) to setup the Env. Download the [Multi-View 3D Detection model's weights](https://download.openmmlab.com/mim-example/embodiedscan/mv-3ddet.pth) and change the "load_from" path in the config file under `configs/grounding` to the path where the weights are saved. @@ -47,6 +51,11 @@ These are 3D visual grounding models adapted for the mmscan-devkit. Currently, t # Multiple GPU testing python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth --launcher="pytorch" ``` +#### ckpts & Logs + +| Input modality | Load pretrain | Epoch | gTop-1 @ 0.25/0.50 | Config | Download | +| :-------: | :----: | :----: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Point cloud | True | 12 | 19.66 / 8.82 | [config](https://github.com/rbler1234/EmbodiedScan/blob/mmscan-devkit/models/EmbodiedScan/configs/grounding/pcd_4xb24_mmscan_vg_num256.py) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) ## 3D Question Answering Models @@ -84,6 +93,13 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, --tmp_path path/to/tmp --api_key your_api_key --eval_size -1 --nproc 4 ``` +#### ckpts & Logs + +| Detector | Captioner | Iters | GPT score overall | Download | +| :-------: | :----: | :----: | :---------: |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Vote2Cap-DETR | ll3da | 100k | 45.7 | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | + + ### LEO @@ -117,5 +133,8 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, --tmp_path path/to/tmp --api_key your_api_key --eval_size -1 --nproc 4 ``` +#### ckpts & Logs -PS : It is possible that LEO may encounter an "NaN" error in the MultiHeadAttentionSpatial module due to the training setup when training more epoches. ( no problem for 4GPU one epoch) +| LLM | 2d/3d backbones | epoch | GPT score overall | Config | Download | +| :-------: | :----: | :----: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Vicuna7b | ConvNeXt / PointNet++ | 1 | 54.6 | [config](https://drive.google.com/file/d/1CJccZd4TOaT_JdHj073UKwdA5PWUDtja/view?usp=drive_link) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link) | From 8c0344a1fd4ca19a8465154e68ab253908a90d9d Mon Sep 17 00:00:00 2001 From: rbler1234 Date: Sat, 25 Jan 2025 23:42:23 +0800 Subject: [PATCH 2/4] edit readme --- README.md | 95 +++++++++++++++++++++++------------------------- models/README.md | 22 +++++------ 2 files changed, 57 insertions(+), 60 deletions(-) diff --git a/README.md b/README.md index 12cfef0..1fce8ef 100644 --- a/README.md +++ b/README.md @@ -22,10 +22,9 @@ ## πŸ“‹ Contents 1. [About](#topic1) -2. [Getting Started](#topic2) +2. [MMScan Benchmark](#topic2) 3. [MMScan API Tutorial](#topic3) -4. [MMScan Benchmark](#topic4) -5. [TODO List](#topic5) +4. [TODO List](#topic4) ## 🏠 About @@ -57,10 +56,43 @@ Furthermore, we use this high-quality dataset to train state-of-the-art 3D visua grounding and LLMs and obtain remarkable performance improvement both on existing benchmarks and in-the-wild evaluation. -## πŸš€ Getting Started + +## πŸ† MMScan Benchmark + -### Installation +### MMScan Visual Grounding Benchmark + +| Methods | gTop-1 | gTop-3 | APsample | APbox | AR | Release | Download | +|---------|--------|--------|---------------------|------------------|----|-------|----| +| ScanRefer | 4.74 | 9.19 | 9.49 | 2.28 | 47.68 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/Scanrefer) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) | +| MVT | 7.94 | 13.07 | 13.67 | 2.50 | 86.86 | - | - | +| BUTD-DETR | 15.24 | 20.68 | 18.58 | 9.27 | 66.62 | - | - | +| ReGround3D | 16.35 | 26.13 | 22.89 | 5.25 | 43.24 | - | - | +| EmbodiedScan | 19.66 | 34.00 | 29.30 | **15.18** | 59.96 | [code](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan/models/EmbodiedScan) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) | +| 3D-VisTA | 25.38 | 35.41 | 33.47 | 6.67 | 87.52 | - | - | +| ViL3DRef | **26.34** | **37.58** | **35.09** | 6.65 | 86.86 | - | - | + +### MMScan Question Answering Benchmark +| Methods | Overall | ST-attr | ST-space | OO-attr | OO-space | OR| Advanced | Release | Download | +|---|--------|--------|--------|--------|--------|--------|-------|----|----| +| LL3DA | 45.7 | 39.1 | 58.5 | 43.6 | 55.9 | 37.1 | 24.0| [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LL3DA) | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | +| LEO |54.6 | 48.9 | 62.7 | 50.8 | 64.7 | 50.4 | 45.9 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LEO) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link)| +| LLaVA-3D |**61.6** | 58.5 | 63.5 | 56.8 | 75.6 | 58.0 | 38.5|- | - | + +*Note:* These two tables only show the results for main metrics; see the paper for complete results. + +We have released the codes of some models under [./models](./models/README.md). + + + +## πŸš€ MMScan API Tutorial + + +The **MMScan Toolkit** provides comprehensive tools for dataset handling and model evaluation in tasks. + +### Getting Started + 1. Clone Github repo. @@ -80,13 +112,13 @@ existing benchmarks and in-the-wild evaluation. Use `"all"` to install all components and specify `"VG"` or `"QA"` if you only need to install the components for Visual Grounding or Question Answering, respectively. -### Data Preparation +3. Download and prepare the dataset. -1. Download the Embodiedscan and MMScan annotation. (Fill in the [form](https://docs.google.com/forms/d/e/1FAIpQLScUXEDTksGiqHZp31j7Zp7zlCNV7p_08uViwP_Nbzfn3g6hhw/viewform) to apply for downloading) + a. Download the Embodiedscan and MMScan annotation. (Fill in the [form](https://docs.google.com/forms/d/e/1FAIpQLScUXEDTksGiqHZp31j7Zp7zlCNV7p_08uViwP_Nbzfn3g6hhw/viewform) to apply for downloading) - Create a folder `mmscan_data/` and then unzip the files. For the first zip file, put `embodiedscan` under `mmscan_data/embodiedscan_split` and rename it to `embodiedscan-v1`. For the second zip file, put `MMScan-beta-release` under `mmscan_data/MMScan-beta-release` and `embodiedscan-v2` under `mmscan_data/embodiedscan_split`. + b. Create a folder `mmscan_data/` and then unzip the files. For the first zip file, put `embodiedscan` under `mmscan_data/embodiedscan_split` and rename it to `embodiedscan-v1`. For the second zip file, put `MMScan-beta-release` under `mmscan_data/MMScan-beta-release` and `embodiedscan-v2` under `mmscan_data/embodiedscan_split`. - The directory structure should be as below: + The directory structure should be as below, after then, refer to the [guide](data_preparation/README.md) here. ``` mmscan_data @@ -96,14 +128,6 @@ existing benchmarks and in-the-wild evaluation. β”œβ”€β”€ MMScan-beta-release # MMScan veta data in 'embodiedscan-v2-beta.zip' ``` -2. Prepare the point clouds files. - - Please refer to the [guide](data_preparation/README.md) here. - -## πŸ‘“ MMScan API Tutorial - - -The **MMScan Toolkit** provides comprehensive tools for dataset handling and model evaluation in tasks. To import the MMScan API, you can use the following commands: @@ -121,7 +145,7 @@ import mmscan.QuestionAnsweringEvaluator as MMScan_QA_evaluator import mmscan.GPTEvaluator as MMScan_GPT_evaluator ``` -### MMScan Dataset +### MMScan Dataset Tool The dataset tool in MMScan allows seamless access to data required for various tasks within MMScan. @@ -152,16 +176,14 @@ Each dataset item is a dictionary containing key elements: - **"sub_class"**: The sample category of the sample. - **"ID"**: A unique identifier for the sample. - **"scan_id"**:Identifier corresponding to the related scan. - - *For Visual Grounding Task* +- *For Visual Grounding task* - **"target_id"** (list\[int\]): IDs of target objects. - **"text"** (str): Text used for grounding. - **"target"** (list\[str\]): Types of target objects. - **"anchors"** (list\[str\]): Types of anchor objects. - **"anchor_ids"** (list\[int\]): IDs of anchor objects. - **"tokens_positive"** (dict): Indices of positions where mentioned objects appear in the text. - - *For Question Answering Task* +- *For Qusetion Answering task* - **"question"** (str): The text of the question. - **"answers"** (list\[str\]): List of possible answers. - **"object_ids"** (list\[int\]): Object IDs referenced in the question. @@ -178,7 +200,7 @@ Each dataset item is a dictionary containing key elements: - **'extrinsic'** (np.ndarray): Extrinsic parameters of the camera. - **'visible_instance_id'** (list): IDs of visible objects in the image. -### MMScan Evaluator +### MMScan Evaluator Tool Our evaluation tool is designed to streamline the assessment of model outputs for the MMScan task, providing essential metrics to gauge model performance effectively. @@ -309,36 +331,11 @@ The input structure remains the same as for the question answering evaluator: ] ``` -## πŸ† MMScan Benchmark - - - -### MMScan Visual Grounding Benchmark - -| Methods | gTop-1 | gTop-3 | APsample | APbox | AR | Release | Download | -|---------|--------|--------|---------------------|------------------|----|-------|----| -| ScanRefer | 4.74 | 9.19 | 9.49 | 2.28 | 47.68 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/Scanrefer) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) | -| MVT | 7.94 | 13.07 | 13.67 | 2.50 | 86.86 | ~ | ~ | -| BUTD-DETR | 15.24 | 20.68 | 18.58 | 9.27 | 66.62 | ~ | ~ | -| ReGround3D | 16.35 | 26.13 | 22.89 | 5.25 | 43.24 | ~ | ~ | -| EmbodiedScan | 19.66 | 34.00 | 29.30 | **15.18** | 59.96 | [code](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan/models/EmbodiedScan) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) | -| 3D-VisTA | 25.38 | 35.41 | 33.47 | 6.67 | 87.52 | ~ | ~ | -| ViL3DRef | **26.34** | **37.58** | **35.09** | 6.65 | 86.86 | ~ | ~ | - -### MMScan Question Answering Benchmark -| Methods | Overall | ST-attr | ST-space | OO-attr | OO-space | OR| Advanced | Release | Download | -|---|--------|--------|--------|--------|--------|--------|-------|----|----| -| LL3DA | 45.7 | 39.1 | 58.5 | 43.6 | 55.9 | 37.1 | 24.0| [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LL3DA) | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | -| LEO |54.6 | 48.9 | 62.7 | 50.8 | 64.7 | 50.4 | 45.9 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LEO) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link)| -| LLaVA-3D |**61.6** | 58.5 | 63.5 | 56.8 | 75.6 | 58.0 | 38.5|~ | ~ | - -*Note:* These two tables only show the results for main metrics; see the paper for complete results. -We have released the codes of some models under [./models](./models/README.md). ## πŸ“ TODO List - + - \[ \] MMScan annotation and samples for ARKitScenes. - \[ \] Online evaluation platform for the MMScan benchmark. diff --git a/models/README.md b/models/README.md index 86aabf9..f5dd0a1 100644 --- a/models/README.md +++ b/models/README.md @@ -23,9 +23,9 @@ These are 3D visual grounding models adapted for the mmscan-devkit. Currently, t ``` #### ckpts & Logs -| Epoch | gTop-1 @ 0.25/0.50 | Config | Download | -| :-------: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| 50 | 4.74 / 2.52 | [config](https://drive.google.com/file/d/1iJtsjt4K8qhNikY8UmIfiQy1CzIaSgyU/view?usp=drive_link) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) +| Epoch | gTop-1 @ 0.25|gTop-1 @0.50 | Config | Download | +| :-------: | :---------:| :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| 50 | 4.74 | 2.52 | [config](https://drive.google.com/file/d/1iJtsjt4K8qhNikY8UmIfiQy1CzIaSgyU/view?usp=drive_link) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) ### EmbodiedScan 1. Follow the [EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan/blob/main/README.md) to setup the Env. Download the [Multi-View 3D Detection model's weights](https://download.openmmlab.com/mim-example/embodiedscan/mv-3ddet.pth) and change the "load_from" path in the config file under `configs/grounding` to the path where the weights are saved. @@ -53,9 +53,9 @@ These are 3D visual grounding models adapted for the mmscan-devkit. Currently, t ``` #### ckpts & Logs -| Input modality | Load pretrain | Epoch | gTop-1 @ 0.25/0.50 | Config | Download | -| :-------: | :----: | :----: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| Point cloud | True | 12 | 19.66 / 8.82 | [config](https://github.com/rbler1234/EmbodiedScan/blob/mmscan-devkit/models/EmbodiedScan/configs/grounding/pcd_4xb24_mmscan_vg_num256.py) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) +| Input Modality | Det Pretrain | Epoch | gTop-1 @ 0.25 | gTop-1 @ 0.50 | Config | Download | +| :-------: | :----: | :----:| :----: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Point Cloud | ✔ | 12 | 19.66 | 8.82 | [config](https://github.com/rbler1234/EmbodiedScan/blob/mmscan-devkit/models/EmbodiedScan/configs/grounding/pcd_4xb24_mmscan_vg_num256.py) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) ## 3D Question Answering Models @@ -95,9 +95,9 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, ``` #### ckpts & Logs -| Detector | Captioner | Iters | GPT score overall | Download | +| Detector | Captioner | Iters | Overall GPT Score | Download | | :-------: | :----: | :----: | :---------: |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| Vote2Cap-DETR | ll3da | 100k | 45.7 | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | +| Vote2Cap-DETR | LL3DA | 100k | 45.7 | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | @@ -135,6 +135,6 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, ``` #### ckpts & Logs -| LLM | 2d/3d backbones | epoch | GPT score overall | Config | Download | -| :-------: | :----: | :----: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | -| Vicuna7b | ConvNeXt / PointNet++ | 1 | 54.6 | [config](https://drive.google.com/file/d/1CJccZd4TOaT_JdHj073UKwdA5PWUDtja/view?usp=drive_link) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link) | +| LLM | 2D Backbone | 3D Backbone | Epoch | Overall GPT Score | Config | Download | +| :-------: | :----: | :----: | :----: |:---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| Vicuna7b | ConvNeXt | PointNet++ | 1 | 54.6 | [config](https://drive.google.com/file/d/1CJccZd4TOaT_JdHj073UKwdA5PWUDtja/view?usp=drive_link) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link) | From 5677efa629c32c4e29c0a6348ea11d3e98db3c08 Mon Sep 17 00:00:00 2001 From: rbler1234 Date: Sun, 26 Jan 2025 14:09:05 +0800 Subject: [PATCH 3/4] edit readme --- README.md | 95 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 48 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index 1fce8ef..ec22624 100644 --- a/README.md +++ b/README.md @@ -21,13 +21,14 @@ ## πŸ“‹ Contents -1. [About](#topic1) -2. [MMScan Benchmark](#topic2) -3. [MMScan API Tutorial](#topic3) -4. [TODO List](#topic4) +1. [About](#-about) +2. [Getting Started](#-getting-started) +3. [MMScan API Tutorial](#-mmscan-api-tutorial) +4. [MMScan Benchmark](#-mmscan-benchmark) +5. [TODO List](#-todo-list) ## 🏠 About - + @@ -56,43 +57,10 @@ Furthermore, we use this high-quality dataset to train state-of-the-art 3D visua grounding and LLMs and obtain remarkable performance improvement both on existing benchmarks and in-the-wild evaluation. +## πŸš€ Getting Started -## πŸ† MMScan Benchmark - - - -### MMScan Visual Grounding Benchmark - -| Methods | gTop-1 | gTop-3 | APsample | APbox | AR | Release | Download | -|---------|--------|--------|---------------------|------------------|----|-------|----| -| ScanRefer | 4.74 | 9.19 | 9.49 | 2.28 | 47.68 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/Scanrefer) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) | -| MVT | 7.94 | 13.07 | 13.67 | 2.50 | 86.86 | - | - | -| BUTD-DETR | 15.24 | 20.68 | 18.58 | 9.27 | 66.62 | - | - | -| ReGround3D | 16.35 | 26.13 | 22.89 | 5.25 | 43.24 | - | - | -| EmbodiedScan | 19.66 | 34.00 | 29.30 | **15.18** | 59.96 | [code](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan/models/EmbodiedScan) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) | -| 3D-VisTA | 25.38 | 35.41 | 33.47 | 6.67 | 87.52 | - | - | -| ViL3DRef | **26.34** | **37.58** | **35.09** | 6.65 | 86.86 | - | - | - -### MMScan Question Answering Benchmark -| Methods | Overall | ST-attr | ST-space | OO-attr | OO-space | OR| Advanced | Release | Download | -|---|--------|--------|--------|--------|--------|--------|-------|----|----| -| LL3DA | 45.7 | 39.1 | 58.5 | 43.6 | 55.9 | 37.1 | 24.0| [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LL3DA) | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | -| LEO |54.6 | 48.9 | 62.7 | 50.8 | 64.7 | 50.4 | 45.9 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LEO) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link)| -| LLaVA-3D |**61.6** | 58.5 | 63.5 | 56.8 | 75.6 | 58.0 | 38.5|- | - | - -*Note:* These two tables only show the results for main metrics; see the paper for complete results. - -We have released the codes of some models under [./models](./models/README.md). - - - -## πŸš€ MMScan API Tutorial - - -The **MMScan Toolkit** provides comprehensive tools for dataset handling and model evaluation in tasks. - -### Getting Started +### Installation 1. Clone Github repo. @@ -112,13 +80,13 @@ The **MMScan Toolkit** provides comprehensive tools for dataset handling and mod Use `"all"` to install all components and specify `"VG"` or `"QA"` if you only need to install the components for Visual Grounding or Question Answering, respectively. -3. Download and prepare the dataset. +### Data Preparation - a. Download the Embodiedscan and MMScan annotation. (Fill in the [form](https://docs.google.com/forms/d/e/1FAIpQLScUXEDTksGiqHZp31j7Zp7zlCNV7p_08uViwP_Nbzfn3g6hhw/viewform) to apply for downloading) +1. Download the Embodiedscan and MMScan annotation. (Fill in the [form](https://docs.google.com/forms/d/e/1FAIpQLScUXEDTksGiqHZp31j7Zp7zlCNV7p_08uViwP_Nbzfn3g6hhw/viewform) to apply for downloading) - b. Create a folder `mmscan_data/` and then unzip the files. For the first zip file, put `embodiedscan` under `mmscan_data/embodiedscan_split` and rename it to `embodiedscan-v1`. For the second zip file, put `MMScan-beta-release` under `mmscan_data/MMScan-beta-release` and `embodiedscan-v2` under `mmscan_data/embodiedscan_split`. + Create a folder `mmscan_data/` and then unzip the files. For the first zip file, put `embodiedscan` under `mmscan_data/embodiedscan_split` and rename it to `embodiedscan-v1`. For the second zip file, put `MMScan-beta-release` under `mmscan_data/MMScan-beta-release` and `embodiedscan-v2` under `mmscan_data/embodiedscan_split`. - The directory structure should be as below, after then, refer to the [guide](data_preparation/README.md) here. + The directory structure should be as below: ``` mmscan_data @@ -128,6 +96,14 @@ The **MMScan Toolkit** provides comprehensive tools for dataset handling and mod β”œβ”€β”€ MMScan-beta-release # MMScan veta data in 'embodiedscan-v2-beta.zip' ``` +2. Prepare the point clouds files. + + Please refer to the [guide](data_preparation/README.md) here. + +## πŸ‘“ MMScan API Tutorial + + +The **MMScan Toolkit** provides comprehensive tools for dataset handling and model evaluation in tasks. To import the MMScan API, you can use the following commands: @@ -145,7 +121,7 @@ import mmscan.QuestionAnsweringEvaluator as MMScan_QA_evaluator import mmscan.GPTEvaluator as MMScan_GPT_evaluator ``` -### MMScan Dataset Tool +### MMScan Dataset The dataset tool in MMScan allows seamless access to data required for various tasks within MMScan. @@ -200,7 +176,7 @@ Each dataset item is a dictionary containing key elements: - **'extrinsic'** (np.ndarray): Extrinsic parameters of the camera. - **'visible_instance_id'** (list): IDs of visible objects in the image. -### MMScan Evaluator Tool +### MMScan Evaluator Our evaluation tool is designed to streamline the assessment of model outputs for the MMScan task, providing essential metrics to gauge model performance effectively. @@ -331,11 +307,36 @@ The input structure remains the same as for the question answering evaluator: ] ``` +## πŸ† MMScan Benchmark + + + +### MMScan Visual Grounding Benchmark + +| Methods | gTop-1 | gTop-3 | APsample | APbox | AR | Release | Download | +|---------|--------|--------|---------------------|------------------|----|-------|----| +| ScanRefer | 4.74 | 9.19 | 9.49 | 2.28 | 47.68 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/Scanrefer) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) | +| MVT | 7.94 | 13.07 | 13.67 | 2.50 | 86.86 | - | - | +| BUTD-DETR | 15.24 | 20.68 | 18.58 | 9.27 | 66.62 | - | - | +| ReGround3D | 16.35 | 26.13 | 22.89 | 5.25 | 43.24 | - | - | +| EmbodiedScan | 19.66 | 34.00 | 29.30 | **15.18** | 59.96 | [code](https://github.com/OpenRobotLab/EmbodiedScan/tree/mmscan/models/EmbodiedScan) | [model](https://drive.google.com/file/d/1F6cHY6-JVzAk6xg5s61aTT-vD-eu_4DD/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1Ua_-Z2G3g0CthbeBkrR1a7_sqg_Spd9s/view?usp=drive_link) | +| 3D-VisTA | 25.38 | 35.41 | 33.47 | 6.67 | 87.52 | - | - | +| ViL3DRef | **26.34** | **37.58** | **35.09** | 6.65 | 86.86 | - | - | + +### MMScan Question Answering Benchmark +| Methods | Overall | ST-attr | ST-space | OO-attr | OO-space | OR| Advanced | Release | Download | +|---|--------|--------|--------|--------|--------|--------|-------|----|----| +| LL3DA | 45.7 | 39.1 | 58.5 | 43.6 | 55.9 | 37.1 | 24.0| [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LL3DA) | [model](https://drive.google.com/file/d/1mcWNHdfrhdbtySBtmG-QRH1Y1y5U3PDQ/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1VHpcnO0QmAvMa0HuZa83TEjU6AiFrP42/view?usp=drive_link) | +| LEO |54.6 | 48.9 | 62.7 | 50.8 | 64.7 | 50.4 | 45.9 | [code](https://github.com/rbler1234/EmbodiedScan/tree/mmscan-devkit/models/LEO) | [model](https://drive.google.com/drive/folders/1HZ38LwRe-1Q_VxlWy8vqvImFjtQ_b9iA?usp=drive_link)| +| LLaVA-3D |**61.6** | 58.5 | 63.5 | 56.8 | 75.6 | 58.0 | 38.5|- | - | +*Note:* These two tables only show the results for main metrics; see the paper for complete results. + +We have released the codes of some models under [./models](./models/README.md). ## πŸ“ TODO List - + - \[ \] MMScan annotation and samples for ARKitScenes. - \[ \] Online evaluation platform for the MMScan benchmark. From 0ed18db9d9e794f608bd477d98075afd1c845180 Mon Sep 17 00:00:00 2001 From: rbler1234 Date: Mon, 24 Feb 2025 10:52:32 +0800 Subject: [PATCH 4/4] update readme --- README.md | 26 ++++++++++++++------------ models/README.md | 36 ++++++++++++++++++------------------ 2 files changed, 32 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index ec22624..9870dfd 100644 --- a/README.md +++ b/README.md @@ -93,7 +93,7 @@ existing benchmarks and in-the-wild evaluation. β”œβ”€β”€ embodiedscan_split β”‚ β”œβ”€β”€embodiedscan-v1/ # EmbodiedScan v1 data in 'embodiedscan.zip' β”‚ β”œβ”€β”€embodiedscan-v2/ # EmbodiedScan v2 data in 'embodiedscan-v2-beta.zip' - β”œβ”€β”€ MMScan-beta-release # MMScan veta data in 'embodiedscan-v2-beta.zip' + β”œβ”€β”€ MMScan-beta-release # MMScan data in 'embodiedscan-v2-beta.zip' ``` 2. Prepare the point clouds files. @@ -145,17 +145,21 @@ Each dataset item is a dictionary containing key elements: - **"pcds"** (np.ndarray): Point cloud data with dimensions [n_points, 6(xyz+rgb)], representing the coordinates and color of each point. - **"instance_labels"** (np.ndarray): Instance ID assigned to each point in the point cloud. - **"class_labels"** (np.ndarray): Class IDs assigned to each point in the point cloud. -- **"bboxes"** (dict): Information about bounding boxes within the scan. +- **"bboxes"** (dict): Information about bounding boxes within the scan, structured as { object ID: + { + "type": object type (str), + "bbox": 9 DoF box (np.ndarray) + }} (2) Language Modality -- **"sub_class"**: The sample category of the sample. -- **"ID"**: A unique identifier for the sample. -- **"scan_id"**:Identifier corresponding to the related scan. +- **"sub_class"**: The category of the sample. +- **"ID"**: The sample's ID. +- **"scan_id"**: The scan's ID. - *For Visual Grounding task* -- **"target_id"** (list\[int\]): IDs of target objects. +- **"target_id"** (list\[int\]): IDs of target objects. - **"text"** (str): Text used for grounding. -- **"target"** (list\[str\]): Types of target objects. +- **"target"** (list\[str\]): Text prompt to specify the target grounding object. - **"anchors"** (list\[str\]): Types of anchor objects. - **"anchor_ids"** (list\[int\]): IDs of anchor objects. - **"tokens_positive"** (dict): Indices of positions where mentioned objects appear in the text. @@ -165,14 +169,14 @@ Each dataset item is a dictionary containing key elements: - **"object_ids"** (list\[int\]): Object IDs referenced in the question. - **"object_names"** (list\[str\]): Types of referenced objects. - **"input_bboxes_id"** (list\[int\]): IDs of input bounding boxes. -- **"input_bboxes"** (list\[np.ndarray\]): Input bounding box data, with 9 degrees of freedom. +- **"input_bboxes"** (list\[np.ndarray\]): Input 9-DoF bounding boxes. (3) 2D Modality - **'img_path'** (str): File path to the RGB image. - **'depth_img_path'** (str): File path to the depth image. - **'intrinsic'** (np.ndarray): Intrinsic parameters of the camera for RGB images. -- **'depth_intrinsic'** (np.ndarray): Intrinsic parameters of the camera for Depth images. +- **'depth_intrinsic'** (np.ndarray): Intrinsic parameters of the camera for depth images. - **'extrinsic'** (np.ndarray): Extrinsic parameters of the camera. - **'visible_instance_id'** (list): IDs of visible objects in the image. @@ -186,7 +190,7 @@ For the visual grounding task, our evaluator computes multiple metrics including - **AP and AR**: These metrics calculate the precision and recall by considering each sample as an individual category. - **AP_C and AR_C**: These versions categorize samples belonging to the same subclass and calculate them together. -- **gTop-k**: An expanded metric that generalizes the traditional Top-k metric, offering insights into broader performance aspects. +- **gTop-k**: An expanded metric that generalizes the traditional Top-k metric, offering superior flexibility and interpretability compared to traditional ones when oriented towards multi-target grounding. *Note:* Here, AP corresponds to APsample in the paper, and AP_C corresponds to APbox in the paper. @@ -310,7 +314,6 @@ The input structure remains the same as for the question answering evaluator: ## πŸ† MMScan Benchmark - ### MMScan Visual Grounding Benchmark | Methods | gTop-1 | gTop-3 | APsample | APbox | AR | Release | Download | @@ -337,7 +340,6 @@ We have released the codes of some models under [./models](./models/README.md). ## πŸ“ TODO List - - \[ \] MMScan annotation and samples for ARKitScenes. - \[ \] Online evaluation platform for the MMScan benchmark. - \[ \] Codes of more MMScan Visual Grounding baselines and Question Answering baselines. diff --git a/models/README.md b/models/README.md index f5dd0a1..db3bc96 100644 --- a/models/README.md +++ b/models/README.md @@ -2,56 +2,56 @@ These are 3D visual grounding models adapted for the mmscan-devkit. Currently, two models have been released: EmbodiedScan and ScanRefer. -### Scanrefer +### ScanRefer -1. Follow the [Scanrefer](https://github.com/daveredrum/ScanRefer/blob/master/README.md) to setup the Env. For data preparation, you need not load the datasets, only need to download the [preprocessed GLoVE embeddings](https://kaldir.vc.in.tum.de/glove.p) (~990MB) and put them under `data/` +1. Follow the [ScanRefer](https://github.com/daveredrum/ScanRefer/blob/master/README.md) to setup the environment. For data preparation, you need not load the datasets, only need to download the [preprocessed GLoVE embeddings](https://kaldir.vc.in.tum.de/glove.p) (~990MB) and put them under `data/` 2. Install MMScan API. 3. Overwrite the `lib/config.py/CONF.PATH.OUTPUT` to your desired output directory. -4. Run the following command to train Scanrefer (one GPU): +4. Run the following command to train ScanRefer (one GPU): ```bash python -u scripts/train.py --use_color --epoch {10/25/50} ``` -5. Run the following command to evaluate Scanrefer (one GPU): +5. Run the following command to evaluate ScanRefer (one GPU): ```bash python -u scripts/train.py --use_color --eval_only --use_checkpoint "path/to/pth" ``` -#### ckpts & Logs +#### Results and Models | Epoch | gTop-1 @ 0.25|gTop-1 @0.50 | Config | Download | | :-------: | :---------:| :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | | 50 | 4.74 | 2.52 | [config](https://drive.google.com/file/d/1iJtsjt4K8qhNikY8UmIfiQy1CzIaSgyU/view?usp=drive_link) | [model](https://drive.google.com/file/d/1C0-AJweXEc-cHTe9tLJ3Shgqyd44tXqY/view?usp=drive_link) \| [log](https://drive.google.com/file/d/1ENOS2FE7fkLPWjIf9J76VgiPrn6dGKvi/view?usp=drive_link) ### EmbodiedScan -1. Follow the [EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan/blob/main/README.md) to setup the Env. Download the [Multi-View 3D Detection model's weights](https://download.openmmlab.com/mim-example/embodiedscan/mv-3ddet.pth) and change the "load_from" path in the config file under `configs/grounding` to the path where the weights are saved. +1. Follow the [EmbodiedScan](https://github.com/OpenRobotLab/EmbodiedScan/blob/main/README.md) to setup the environment. Download the [Multi-View 3D Detection model's weights](https://download.openmmlab.com/mim-example/embodiedscan/mv-3ddet.pth) and change the "load_from" path in the config file under `configs/grounding` to the path where the weights are saved. 2. Install MMScan API. -3. Run the following command to train EmbodiedScan (multiple GPU): +3. Run the following command to train EmbodiedScan (multiple GPUs): ```bash # Single GPU training python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save - # Multiple GPU training + # Multiple GPUs training python tools/train.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py --work-dir=path/to/save --launcher="pytorch" ``` -4. Run the following command to evaluate EmbodiedScan (multiple GPU): +4. Run the following command to evaluate EmbodiedScan (multiple GPUs): ```bash # Single GPU testing python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth - # Multiple GPU testing + # Multiple GPUs testing python tools/test.py configs/grounding/pcd_4xb24_mmscan_vg_num256.py path/to/load_pth --launcher="pytorch" ``` -#### ckpts & Logs +#### Results and Models | Input Modality | Det Pretrain | Epoch | gTop-1 @ 0.25 | gTop-1 @ 0.50 | Config | Download | | :-------: | :----: | :----:| :----: | :---------: | :--------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | @@ -63,7 +63,7 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, ### LL3DA -1. Follow the [LL3DA](https://github.com/Open3DA/LL3DA/blob/main/README.md) to setup the Env. For data preparation, you need not load the datasets, only need to: +1. Follow the [LL3DA](https://github.com/Open3DA/LL3DA/blob/main/README.md) to setup the environment. For data preparation, you need not load the datasets, only need to: (1) download the [release pre-trained weights.](https://huggingface.co/CH3COOK/LL3DA-weight-release/blob/main/ll3da-opt-1.3b.pth) and put them under `./pretrained` @@ -73,13 +73,13 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, 3. Edit the config under `./scripts/opt-1.3b/eval.mmscanqa.sh` and `./scripts/opt-1.3b/tuning.mmscanqa.sh` -4. Run the following command to train LL3DA (4 GPU): +4. Run the following command to train LL3DA (4 GPUs): ```bash bash scripts/opt-1.3b/tuning.mmscanqa.sh ``` -5. Run the following command to evaluate LL3DA (4 GPU): +5. Run the following command to evaluate LL3DA (4 GPUs): ```bash bash scripts/opt-1.3b/eval.mmscanqa.sh @@ -93,7 +93,7 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, --tmp_path path/to/tmp --api_key your_api_key --eval_size -1 --nproc 4 ``` -#### ckpts & Logs +#### Results and Models | Detector | Captioner | Iters | Overall GPT Score | Download | | :-------: | :----: | :----: | :---------: |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | @@ -103,7 +103,7 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, ### LEO -1. Follow the [LEO](https://github.com/embodied-generalist/embodied-generalist/blob/main/README.md) to setup the Env. For data preparation, you need not load the datasets, only need to: +1. Follow the [LEO](https://github.com/embodied-generalist/embodied-generalist/blob/main/README.md) to setup the environment. For data preparation, you need not load the datasets, only need to: (1) Download [Vicuna-7B](https://huggingface.co/huangjy-pku/vicuna-7b/tree/main) and update cfg_path in configs/llm/\*.yaml @@ -113,13 +113,13 @@ These are 3D question answering models adapted for the mmscan-devkit. Currently, 3. Edit the config under `scripts/train_tuning_mmscan.sh` and `scripts/test_tuning_mmscan.sh` -4. Run the following command to train LEO (4 GPU): +4. Run the following command to train LEO (4 GPUs): ```bash bash scripts/train_tuning_mmscan.sh ``` -5. Run the following command to evaluate LEO (4 GPU): +5. Run the following command to evaluate LEO (4 GPUs): ```bash bash scripts/test_tuning_mmscan.sh