Adding pose estimation models, adding new models to previous tasks

SysCV · Nov 29, 2021 · f6ef24d · f6ef24d
1 parent 10311ad
commit f6ef24d
Show file tree

Hide file tree

Showing 107 changed files with 2,575 additions and 121 deletions.
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
 
 In this repository, we provide popular models for each task in the [BDD100K dataset](https://www.bdd100k.com/). For each task in the dataset, we make publicly available the model weights, evaluation results, predictions, visualizations, as well as scripts to performance evaluation and visualization. The goal is to provide a set of competitive baselines to facilitate research and provide a common benchmark for comparison.
 
-The number of pre-trained models in this zoo is :one::one::five:. **You can include your models in this repo as well!** See [contribution](./doc/CONTRIBUTING.md) instructions.
+The number of pre-trained models in this zoo is :one::seven::nine:. **You can include your models in this repo as well!** See [contribution](./doc/CONTRIBUTING.md) instructions.
 
 This repository currently supports the tasks listed below. For more information about each task, click on the task name. We plan to support all tasks in the BDD100K dataset eventually; see the [roadmap](#roadmap) for our plan and progress.
 
@@ -17,14 +17,15 @@ This repository currently supports the tasks listed below. For more information
 - [**Drivable Area**](./drivable)
 - [**Multiple Object Tracking (MOT)**](./mot)
 - [**Multiple Object Tracking and Segmentation (MOTS)**](./mots)
+- [**Pose Estimation**](./pose)
 
 If you have any questions, please go to the BDD100K [discussions](https://github.com/bdd100k/bdd100k/discussions).
 
 ## Roadmap
 
+- [x] Pose estimation
 - [ ] Lane marking
 - [ ] Panoptic segmentation
-- [ ] Pose estimation
 
 ## Dataset
 

diff --git a/det/README.md b/det/README.md
diff --git a/det/configs/det/faster_rcnn_hrnetv2p_w18_1x_det_bdd100k.py b/det/configs/det/faster_rcnn_hrnetv2p_w18_1x_det_bdd100k.py
@@ -0,0 +1,15 @@
+"""HRNet18, 1x schedule."""
+
+_base_ = "./faster_rcnn_hrnetv2p_w32_1x_det_bdd100k.py"
+model = dict(
+    pretrained="open-mmlab://msra/hrnetv2_w18",
+    backbone=dict(
+        extra=dict(
+            stage2=dict(num_channels=(18, 36)),
+            stage3=dict(num_channels=(18, 36, 72)),
+            stage4=dict(num_channels=(18, 36, 72, 144)),
+        ),
+    ),
+    neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256),
+)
+load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w18_1x_det_bdd100k.pth"
diff --git a/det/configs/det/faster_rcnn_hrnetv2p_w18_3x_det_bdd100k.py b/det/configs/det/faster_rcnn_hrnetv2p_w18_3x_det_bdd100k.py
@@ -0,0 +1,15 @@
+"""HRNet18, 3x schedule, MS training."""
+
+_base_ = "./faster_rcnn_hrnetv2p_w32_3x_det_bdd100k.py"
+model = dict(
+    pretrained="open-mmlab://msra/hrnetv2_w18",
+    backbone=dict(
+        extra=dict(
+            stage2=dict(num_channels=(18, 36)),
+            stage3=dict(num_channels=(18, 36, 72)),
+            stage4=dict(num_channels=(18, 36, 72, 144)),
+        ),
+    ),
+    neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256),
+)
+load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w18_3x_det_bdd100k.pth"
diff --git a/det/configs/det/faster_rcnn_hrnetv2p_w32_1x_det_bdd100k.py b/det/configs/det/faster_rcnn_hrnetv2p_w32_1x_det_bdd100k.py
@@ -0,0 +1,49 @@
+"""HRNet32, 1x schedule."""
+
+_base_ = "./faster_rcnn_r50_fpn_1x_det_bdd100k.py"
+model = dict(
+    pretrained="open-mmlab://msra/hrnetv2_w32",
+    backbone=dict(
+        _delete_=True,
+        type="HRNet",
+        extra=dict(
+            stage1=dict(
+                num_modules=1,
+                num_branches=1,
+                block="BOTTLENECK",
+                num_blocks=(4,),
+                num_channels=(64,),
+            ),
+            stage2=dict(
+                num_modules=1,
+                num_branches=2,
+                block="BASIC",
+                num_blocks=(4, 4),
+                num_channels=(32, 64),
+            ),
+            stage3=dict(
+                num_modules=4,
+                num_branches=3,
+                block="BASIC",
+                num_blocks=(4, 4, 4),
+                num_channels=(32, 64, 128),
+            ),
+            stage4=dict(
+                num_modules=3,
+                num_branches=4,
+                block="BASIC",
+                num_blocks=(4, 4, 4, 4),
+                num_channels=(32, 64, 128, 256),
+            ),
+        ),
+    ),
+    neck=dict(
+        _delete_=True,
+        type="HRFPN",
+        in_channels=[32, 64, 128, 256],
+        out_channels=256,
+    ),
+)
+data = dict(samples_per_gpu=2, workers_per_gpu=2)
+optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)
+load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w32_1x_det_bdd100k.pth"
diff --git a/det/configs/det/faster_rcnn_hrnetv2p_w32_3x_det_bdd100k.py b/det/configs/det/faster_rcnn_hrnetv2p_w32_3x_det_bdd100k.py
@@ -0,0 +1,49 @@
+"""HRNet32, 3x schedule, MS training."""
+
+_base_ = "./faster_rcnn_r50_fpn_3x_det_bdd100k.py"
+model = dict(
+    pretrained="open-mmlab://msra/hrnetv2_w32",
+    backbone=dict(
+        _delete_=True,
+        type="HRNet",
+        extra=dict(
+            stage1=dict(
+                num_modules=1,
+                num_branches=1,
+                block="BOTTLENECK",
+                num_blocks=(4,),
+                num_channels=(64,),
+            ),
+            stage2=dict(
+                num_modules=1,
+                num_branches=2,
+                block="BASIC",
+                num_blocks=(4, 4),
+                num_channels=(32, 64),
+            ),
+            stage3=dict(
+                num_modules=4,
+                num_branches=3,
+                block="BASIC",
+                num_blocks=(4, 4, 4),
+                num_channels=(32, 64, 128),
+            ),
+            stage4=dict(
+                num_modules=3,
+                num_branches=4,
+                block="BASIC",
+                num_blocks=(4, 4, 4, 4),
+                num_channels=(32, 64, 128, 256),
+            ),
+        ),
+    ),
+    neck=dict(
+        _delete_=True,
+        type="HRFPN",
+        in_channels=[32, 64, 128, 256],
+        out_channels=256,
+    ),
+)
+data = dict(samples_per_gpu=2, workers_per_gpu=2)
+optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)
+load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w32_3x_det_bdd100k.pth"
diff --git a/det/configs/det/libra_faster_rcnn_r101_fpn_3x_det_bdd100k.py b/det/configs/det/libra_faster_rcnn_r101_fpn_3x_det_bdd100k.py
@@ -0,0 +1,10 @@
+"""Libra R-CNN with ResNet101-FPN, 3x schedule, MS training."""
+
+_base_ = "./libra_faster_rcnn_r50_fpn_3x_det_bdd100k.py"
+model = dict(
+    backbone=dict(
+        depth=101,
+        init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"),
+    )
+)
+load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/libra_faster_rcnn_r101_fpn_3x_det_bdd100k.pth"
diff --git a/det/configs/det/libra_faster_rcnn_r50_fpn_1x_det_bdd100k.py b/det/configs/det/libra_faster_rcnn_r50_fpn_1x_det_bdd100k.py
@@ -0,0 +1,53 @@
+"""Libra R-CNN with ResNet50-FPN, 1x schedule."""
+
+_base_ = "./faster_rcnn_r50_fpn_1x_det_bdd100k.py"
+model = dict(
+    neck=[
+        dict(
+            type="FPN",
+            in_channels=[256, 512, 1024, 2048],
+            out_channels=256,
+            num_outs=5,
+        ),
+        dict(
+            type="BFP",
+            in_channels=256,
+            num_levels=5,
+            refine_level=2,
+            refine_type="non_local",
+        ),
+    ],
+    roi_head=dict(
+        bbox_head=dict(
+            loss_bbox=dict(
+                _delete_=True,
+                type="BalancedL1Loss",
+                alpha=0.5,
+                gamma=1.5,
+                beta=1.0,
+                loss_weight=1.0,
+            )
+        )
+    ),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(sampler=dict(neg_pos_ub=5), allowed_border=-1),
+        rcnn=dict(
+            sampler=dict(
+                _delete_=True,
+                type="CombinedSampler",
+                num=512,
+                pos_fraction=0.25,
+                add_gt_as_proposals=True,
+                pos_sampler=dict(type="InstanceBalancedPosSampler"),
+                neg_sampler=dict(
+                    type="IoUBalancedNegSampler",
+                    floor_thr=-1,
+                    floor_fraction=0,
+                    num_bins=3,
+                ),
+            )
+        ),
+    ),
+)
+load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/libra_faster_rcnn_r50_fpn_1x_det_bdd100k.pth"
diff --git a/det/configs/det/libra_faster_rcnn_r50_fpn_3x_det_bdd100k.py b/det/configs/det/libra_faster_rcnn_r50_fpn_3x_det_bdd100k.py
@@ -0,0 +1,53 @@
+"""Libra R-CNN with ResNet50-FPN, 3x schedule, MS training."""
+
+_base_ = "./faster_rcnn_r50_fpn_3x_det_bdd100k.py"
+model = dict(
+    neck=[
+        dict(
+            type="FPN",
+            in_channels=[256, 512, 1024, 2048],
+            out_channels=256,
+            num_outs=5,
+        ),
+        dict(
+            type="BFP",
+            in_channels=256,
+            num_levels=5,
+            refine_level=2,
+            refine_type="non_local",
+        ),
+    ],
+    roi_head=dict(
+        bbox_head=dict(
+            loss_bbox=dict(
+                _delete_=True,
+                type="BalancedL1Loss",
+                alpha=0.5,
+                gamma=1.5,
+                beta=1.0,
+                loss_weight=1.0,
+            )
+        )
+    ),
+    # model training and testing settings
+    train_cfg=dict(
+        rpn=dict(sampler=dict(neg_pos_ub=5), allowed_border=-1),
+        rcnn=dict(
+            sampler=dict(
+                _delete_=True,
+                type="CombinedSampler",
+                num=512,
+                pos_fraction=0.25,
+                add_gt_as_proposals=True,
+                pos_sampler=dict(type="InstanceBalancedPosSampler"),
+                neg_sampler=dict(
+                    type="IoUBalancedNegSampler",
+                    floor_thr=-1,
+                    floor_fraction=0,
+                    num_bins=3,
+                ),
+            )
+        ),
+    ),
+)
+load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/libra_faster_rcnn_r50_fpn_3x_det_bdd100k.pth"
diff --git a/det/test.py b/det/test.py
@@ -107,9 +107,7 @@ def main() -> None:
 
     cfg = Config.fromfile(args.config)
     if cfg.load_from is None:
-        cfg_name = os.path.split(args.config)[-1].replace(
-            "_bdd100k.py", ".pth"
-        )
+        cfg_name = os.path.split(args.config)[-1].replace(".py", ".pth")
         cfg.load_from = MODEL_SERVER + cfg_name
     if args.cfg_options is not None:
         cfg.merge_from_dict(args.cfg_options)

diff --git a/doc/CONTRIBUTING.md b/doc/CONTRIBUTING.md
@@ -59,6 +59,7 @@ Each task in BDD100K has its own template and guidelines. Click the links below
 - [**Drivable Area**](#semantic-segmentation-and-drivable-area)
 - [**Multiple Object Tracking (MOT)**](#mot)
 - [**Multiple Object Tracking and Segmentation (MOTS)**](#mots)
+- [**Pose Estimation**](#pose-estimation)
 
 ## Tagging
 
@@ -379,3 +380,58 @@ Multiple object tracking and segmentation requires detecting, tracking, and segm
 | ResNet-50 |    28.1    |   45.4    |    874     | [scores](https://dl.cv.ethz.ch/bdd100k/mots/scores-val/pcan-frcnn_r50_fpn_12e_mots_bdd100k.json) |    31.9     |    50.4    |     845     | [scores](https://dl.cv.ethz.ch/bdd100k/mots/scores-test/pcan-frcnn_r50_fpn_12e_mots_bdd100k.json) | [config](https://github.com/SysCV/pcan/blob/main/configs/segtrack-frcnn_r50_fpn_12e_bdd10k.py) | [model](https://dl.cv.ethz.ch/bdd100k/mots/models/pcan-frcnn_r50_fpn_12e_mots_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/mots/models/pcan-frcnn_r50_fpn_12e_mots_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/mots/preds/pcan-frcnn_r50_fpn_12e_mots_bdd100k.zip) | [visuals](https://dl.cv.ethz.ch/bdd100k/mots/visuals/pcan-frcnn_r50_fpn_12e_mots_bdd100k.zip) |
 
 [[Code](https://github.com/SysCV/pcan)] [[Usage Instructions](https://github.com/SysCV/pcan/blob/main/docs/GET_STARTED.md)]
+
+## Pose Estimation
+
+Template and guidelines below:
+
+### Method Name
+
+[Paper name]() [Venue and Year]
+
+Authors: Author list
+
+<details>
+<summary>Abstract</summary>
+Put your abstract here.
+</details>
+
+#### Results
+
+| Backbone | Input Size | Pose AP-val | Scores-val | Pose AP-test | Scores-test | Config | Weights | Preds | Visuals |
+| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
+|          |            |           | [scores]() |             | [scores]() | [config]() | [model]() \| [MD5]() | [preds]() | [visuals]() |
+
+[[Code]()] [[Usage Instructions]()]
+
+Other information.
+
+### Guidelines
+
+- The scores file should be a JSON file with evaluation results for all the BDD100K pose estimation [metrics](https://doc.bdd100k.com/evaluate.html#pose-estimation).
+- The predictions should be a JSON file containing model predictions for the entire validation set.
+- The visuals should be a zip file with pose visualizations on the validation set.
+
+Example below:
+
+### HRNet
+
+[Deep High-Resolution Representation Learning for Visual Recognition](https://arxiv.org/abs/1908.07919) [CVPR 2019 / TPAMI 2020]
+
+Authors: [Jingdong Wang](https://jingdongwang2017.github.io/), [Ke Sun](https://github.com/sunke123), [Tianheng Cheng](https://scholar.google.com/citations?user=PH8rJHYAAAAJ), Borui Jiang, Chaorui Deng, [Yang Zhao](https://yangyangkiki.github.io/), Dong Liu, [Yadong Mu](http://www.muyadong.com/), Mingkui Tan, [Xinggang Wang](https://xinggangw.info/), [Wenyu Liu](http://eic.hust.edu.cn/professor/liuwenyu/), [Bin Xiao](https://www.microsoft.com/en-us/research/people/bixi/)
+
+<details>
+<summary>Abstract</summary>
+High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at [this https URL](https://github.com/HRNet).
+</details>
+
+#### Results
+
+| Backbone | Input Size | Pose AP-val | Scores-val | Pose AP-test | Scores-test | Config | Weights | Preds | Visuals |
+| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
+| HRNet-w32 | 256 * 192 | 48.83 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w32_256x192_pose_bdd100k.json) | 46.13 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w32_256x192_pose_bdd100k.json) | [config](./configs/hrnet_w32_256x192_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_256x192_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_256x192_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w32_256x192_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w32_256x192_pose_bdd100k.zip) |
+| HRNet-w48 | 256 * 192 | 50.32 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w48_256x192_pose_bdd100k.json) | 47.36 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w48_256x192_pose_bdd100k.json) | [config](./configs/hrnet_w48_256x192_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_256x192_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_256x192_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w48_256x192_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w48_256x192_pose_bdd100k.zip) |
+| HRNet-w32 | 320 * 256 | 49.86 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w32_320x256_pose_bdd100k.json) | 46.90 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w32_320x256_pose_bdd100k.json) | [config](./configs/hrnet_w32_320x256_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_320x256_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_320x256_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w32_320x256_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w32_320x256_pose_bdd100k.zip) |
+| HRNet-w48 | 320 * 256 | 50.16 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w48_320x256_pose_bdd100k.json) | 47.32 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w48_320x256_pose_bdd100k.json) | [config](./configs/hrnet_w48_320x256_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_320x256_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_320x256_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w48_320x256_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w48_320x256_pose_bdd100k.zip) |
+
+[[Code](https://github.com/HRNet)] [[Usage Instructions](https://github.com/SysCV/bdd100k-models/tree/main/pose#usage)]
diff --git a/doc/PREPARE_DATASET.md b/doc/PREPARE_DATASET.md
@@ -12,6 +12,9 @@ On the official download page, the required data and annotations for each task a
 - `object detection` set:
   - images: `100K Images`
   - annotations: `Detection 2020 Labels`
+- `pose estimation` set:
+  - images: `100K Images`
+  - annotations: `Pose Estimation Labels`
 - `instance segmentation` set:
   - images: `10K Images`
   - annotations: `Instance Segmentation`
@@ -44,6 +47,14 @@ python -m bdd100k.label.to_coco -m det \
     -o bdd100k/jsons/det_${SET_NAME}_cocofmt.json
 ```
 
+To convert the pose estimation set, you can run:
+```bash
+mkdir bdd100k/jsons
+python -m bdd100k.label.to_coco -m pose \
+    -i bdd100k/labels/pose_21/pose_${SET_NAME}.json \
+    -o bdd100k/jsons/pose_${SET_NAME}_cocofmt.json
+```
+
 To convert the instance segmentation set, you can run:
 ```bash
 mkdir bdd100k/jsons
@@ -103,6 +114,9 @@ bdd100k-models
         │   ├── det_20
         |   |   ├── det_train.json
         |   |   └── det_val.json
+        │   ├── pose_21
+        |   |   ├── pose_train.json
+        |   |   └── pose_val.json
         │   ├── ins_seg
         |   |   ├── bitmasks
         |   |   |  ├── train
@@ -137,6 +151,8 @@ bdd100k-models
         └── jsons
             ├── det_train_cocofmt.json
             ├── det_val_cocofmt.json
+            ├── pose_train_cocofmt.json
+            ├── pose_val_cocofmt.json
             ├── ins_seg_train_cocofmt.json
             ├── ins_seg_val_cocofmt.json
             ├── box_track_train_cocofmt.json

diff --git a/doc/images/pose.gif b/doc/images/pose.gif
diff --git a/doc/images/pose1.png b/doc/images/pose1.png