Skip to content

Commit

Permalink
Adding pose estimation models, adding new models to previous tasks
Browse files Browse the repository at this point in the history
  • Loading branch information
thomasehuang committed Nov 29, 2021
1 parent 10311ad commit f6ef24d
Show file tree
Hide file tree
Showing 107 changed files with 2,575 additions and 121 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

In this repository, we provide popular models for each task in the [BDD100K dataset](https://www.bdd100k.com/). For each task in the dataset, we make publicly available the model weights, evaluation results, predictions, visualizations, as well as scripts to performance evaluation and visualization. The goal is to provide a set of competitive baselines to facilitate research and provide a common benchmark for comparison.

The number of pre-trained models in this zoo is :one::one::five:. **You can include your models in this repo as well!** See [contribution](./doc/CONTRIBUTING.md) instructions.
The number of pre-trained models in this zoo is :one::seven::nine:. **You can include your models in this repo as well!** See [contribution](./doc/CONTRIBUTING.md) instructions.

This repository currently supports the tasks listed below. For more information about each task, click on the task name. We plan to support all tasks in the BDD100K dataset eventually; see the [roadmap](#roadmap) for our plan and progress.

Expand All @@ -17,14 +17,15 @@ This repository currently supports the tasks listed below. For more information
- [**Drivable Area**](./drivable)
- [**Multiple Object Tracking (MOT)**](./mot)
- [**Multiple Object Tracking and Segmentation (MOTS)**](./mots)
- [**Pose Estimation**](./pose)

If you have any questions, please go to the BDD100K [discussions](https://github.com/bdd100k/bdd100k/discussions).

## Roadmap

- [x] Pose estimation
- [ ] Lane marking
- [ ] Panoptic segmentation
- [ ] Pose estimation

## Dataset

Expand Down
107 changes: 77 additions & 30 deletions det/README.md

Large diffs are not rendered by default.

15 changes: 15 additions & 0 deletions det/configs/det/faster_rcnn_hrnetv2p_w18_1x_det_bdd100k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""HRNet18, 1x schedule."""

_base_ = "./faster_rcnn_hrnetv2p_w32_1x_det_bdd100k.py"
model = dict(
pretrained="open-mmlab://msra/hrnetv2_w18",
backbone=dict(
extra=dict(
stage2=dict(num_channels=(18, 36)),
stage3=dict(num_channels=(18, 36, 72)),
stage4=dict(num_channels=(18, 36, 72, 144)),
),
),
neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256),
)
load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w18_1x_det_bdd100k.pth"
15 changes: 15 additions & 0 deletions det/configs/det/faster_rcnn_hrnetv2p_w18_3x_det_bdd100k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""HRNet18, 3x schedule, MS training."""

_base_ = "./faster_rcnn_hrnetv2p_w32_3x_det_bdd100k.py"
model = dict(
pretrained="open-mmlab://msra/hrnetv2_w18",
backbone=dict(
extra=dict(
stage2=dict(num_channels=(18, 36)),
stage3=dict(num_channels=(18, 36, 72)),
stage4=dict(num_channels=(18, 36, 72, 144)),
),
),
neck=dict(type="HRFPN", in_channels=[18, 36, 72, 144], out_channels=256),
)
load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w18_3x_det_bdd100k.pth"
49 changes: 49 additions & 0 deletions det/configs/det/faster_rcnn_hrnetv2p_w32_1x_det_bdd100k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"""HRNet32, 1x schedule."""

_base_ = "./faster_rcnn_r50_fpn_1x_det_bdd100k.py"
model = dict(
pretrained="open-mmlab://msra/hrnetv2_w32",
backbone=dict(
_delete_=True,
type="HRNet",
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block="BOTTLENECK",
num_blocks=(4,),
num_channels=(64,),
),
stage2=dict(
num_modules=1,
num_branches=2,
block="BASIC",
num_blocks=(4, 4),
num_channels=(32, 64),
),
stage3=dict(
num_modules=4,
num_branches=3,
block="BASIC",
num_blocks=(4, 4, 4),
num_channels=(32, 64, 128),
),
stage4=dict(
num_modules=3,
num_branches=4,
block="BASIC",
num_blocks=(4, 4, 4, 4),
num_channels=(32, 64, 128, 256),
),
),
),
neck=dict(
_delete_=True,
type="HRFPN",
in_channels=[32, 64, 128, 256],
out_channels=256,
),
)
data = dict(samples_per_gpu=2, workers_per_gpu=2)
optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)
load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w32_1x_det_bdd100k.pth"
49 changes: 49 additions & 0 deletions det/configs/det/faster_rcnn_hrnetv2p_w32_3x_det_bdd100k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"""HRNet32, 3x schedule, MS training."""

_base_ = "./faster_rcnn_r50_fpn_3x_det_bdd100k.py"
model = dict(
pretrained="open-mmlab://msra/hrnetv2_w32",
backbone=dict(
_delete_=True,
type="HRNet",
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block="BOTTLENECK",
num_blocks=(4,),
num_channels=(64,),
),
stage2=dict(
num_modules=1,
num_branches=2,
block="BASIC",
num_blocks=(4, 4),
num_channels=(32, 64),
),
stage3=dict(
num_modules=4,
num_branches=3,
block="BASIC",
num_blocks=(4, 4, 4),
num_channels=(32, 64, 128),
),
stage4=dict(
num_modules=3,
num_branches=4,
block="BASIC",
num_blocks=(4, 4, 4, 4),
num_channels=(32, 64, 128, 256),
),
),
),
neck=dict(
_delete_=True,
type="HRFPN",
in_channels=[32, 64, 128, 256],
out_channels=256,
),
)
data = dict(samples_per_gpu=2, workers_per_gpu=2)
optimizer = dict(type="SGD", lr=0.02, momentum=0.9, weight_decay=0.0001)
load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/faster_rcnn_hrnetv2p_w32_3x_det_bdd100k.pth"
10 changes: 10 additions & 0 deletions det/configs/det/libra_faster_rcnn_r101_fpn_3x_det_bdd100k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""Libra R-CNN with ResNet101-FPN, 3x schedule, MS training."""

_base_ = "./libra_faster_rcnn_r50_fpn_3x_det_bdd100k.py"
model = dict(
backbone=dict(
depth=101,
init_cfg=dict(type="Pretrained", checkpoint="torchvision://resnet101"),
)
)
load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/libra_faster_rcnn_r101_fpn_3x_det_bdd100k.pth"
53 changes: 53 additions & 0 deletions det/configs/det/libra_faster_rcnn_r50_fpn_1x_det_bdd100k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""Libra R-CNN with ResNet50-FPN, 1x schedule."""

_base_ = "./faster_rcnn_r50_fpn_1x_det_bdd100k.py"
model = dict(
neck=[
dict(
type="FPN",
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5,
),
dict(
type="BFP",
in_channels=256,
num_levels=5,
refine_level=2,
refine_type="non_local",
),
],
roi_head=dict(
bbox_head=dict(
loss_bbox=dict(
_delete_=True,
type="BalancedL1Loss",
alpha=0.5,
gamma=1.5,
beta=1.0,
loss_weight=1.0,
)
)
),
# model training and testing settings
train_cfg=dict(
rpn=dict(sampler=dict(neg_pos_ub=5), allowed_border=-1),
rcnn=dict(
sampler=dict(
_delete_=True,
type="CombinedSampler",
num=512,
pos_fraction=0.25,
add_gt_as_proposals=True,
pos_sampler=dict(type="InstanceBalancedPosSampler"),
neg_sampler=dict(
type="IoUBalancedNegSampler",
floor_thr=-1,
floor_fraction=0,
num_bins=3,
),
)
),
),
)
load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/libra_faster_rcnn_r50_fpn_1x_det_bdd100k.pth"
53 changes: 53 additions & 0 deletions det/configs/det/libra_faster_rcnn_r50_fpn_3x_det_bdd100k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""Libra R-CNN with ResNet50-FPN, 3x schedule, MS training."""

_base_ = "./faster_rcnn_r50_fpn_3x_det_bdd100k.py"
model = dict(
neck=[
dict(
type="FPN",
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5,
),
dict(
type="BFP",
in_channels=256,
num_levels=5,
refine_level=2,
refine_type="non_local",
),
],
roi_head=dict(
bbox_head=dict(
loss_bbox=dict(
_delete_=True,
type="BalancedL1Loss",
alpha=0.5,
gamma=1.5,
beta=1.0,
loss_weight=1.0,
)
)
),
# model training and testing settings
train_cfg=dict(
rpn=dict(sampler=dict(neg_pos_ub=5), allowed_border=-1),
rcnn=dict(
sampler=dict(
_delete_=True,
type="CombinedSampler",
num=512,
pos_fraction=0.25,
add_gt_as_proposals=True,
pos_sampler=dict(type="InstanceBalancedPosSampler"),
neg_sampler=dict(
type="IoUBalancedNegSampler",
floor_thr=-1,
floor_fraction=0,
num_bins=3,
),
)
),
),
)
load_from = "https://dl.cv.ethz.ch/bdd100k/det/models/libra_faster_rcnn_r50_fpn_3x_det_bdd100k.pth"
4 changes: 1 addition & 3 deletions det/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,7 @@ def main() -> None:

cfg = Config.fromfile(args.config)
if cfg.load_from is None:
cfg_name = os.path.split(args.config)[-1].replace(
"_bdd100k.py", ".pth"
)
cfg_name = os.path.split(args.config)[-1].replace(".py", ".pth")
cfg.load_from = MODEL_SERVER + cfg_name
if args.cfg_options is not None:
cfg.merge_from_dict(args.cfg_options)
Expand Down
56 changes: 56 additions & 0 deletions doc/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Each task in BDD100K has its own template and guidelines. Click the links below
- [**Drivable Area**](#semantic-segmentation-and-drivable-area)
- [**Multiple Object Tracking (MOT)**](#mot)
- [**Multiple Object Tracking and Segmentation (MOTS)**](#mots)
- [**Pose Estimation**](#pose-estimation)

## Tagging

Expand Down Expand Up @@ -379,3 +380,58 @@ Multiple object tracking and segmentation requires detecting, tracking, and segm
| ResNet-50 | 28.1 | 45.4 | 874 | [scores](https://dl.cv.ethz.ch/bdd100k/mots/scores-val/pcan-frcnn_r50_fpn_12e_mots_bdd100k.json) | 31.9 | 50.4 | 845 | [scores](https://dl.cv.ethz.ch/bdd100k/mots/scores-test/pcan-frcnn_r50_fpn_12e_mots_bdd100k.json) | [config](https://github.com/SysCV/pcan/blob/main/configs/segtrack-frcnn_r50_fpn_12e_bdd10k.py) | [model](https://dl.cv.ethz.ch/bdd100k/mots/models/pcan-frcnn_r50_fpn_12e_mots_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/mots/models/pcan-frcnn_r50_fpn_12e_mots_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/mots/preds/pcan-frcnn_r50_fpn_12e_mots_bdd100k.zip) | [visuals](https://dl.cv.ethz.ch/bdd100k/mots/visuals/pcan-frcnn_r50_fpn_12e_mots_bdd100k.zip) |

[[Code](https://github.com/SysCV/pcan)] [[Usage Instructions](https://github.com/SysCV/pcan/blob/main/docs/GET_STARTED.md)]

## Pose Estimation

Template and guidelines below:

### Method Name

[Paper name]() [Venue and Year]

Authors: Author list

<details>
<summary>Abstract</summary>
Put your abstract here.
</details>

#### Results

| Backbone | Input Size | Pose AP-val | Scores-val | Pose AP-test | Scores-test | Config | Weights | Preds | Visuals |
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| | | | [scores]() | | [scores]() | [config]() | [model]() \| [MD5]() | [preds]() | [visuals]() |

[[Code]()] [[Usage Instructions]()]

Other information.

### Guidelines

- The scores file should be a JSON file with evaluation results for all the BDD100K pose estimation [metrics](https://doc.bdd100k.com/evaluate.html#pose-estimation).
- The predictions should be a JSON file containing model predictions for the entire validation set.
- The visuals should be a zip file with pose visualizations on the validation set.

Example below:

### HRNet

[Deep High-Resolution Representation Learning for Visual Recognition](https://arxiv.org/abs/1908.07919) [CVPR 2019 / TPAMI 2020]

Authors: [Jingdong Wang](https://jingdongwang2017.github.io/), [Ke Sun](https://github.com/sunke123), [Tianheng Cheng](https://scholar.google.com/citations?user=PH8rJHYAAAAJ), Borui Jiang, Chaorui Deng, [Yang Zhao](https://yangyangkiki.github.io/), Dong Liu, [Yadong Mu](http://www.muyadong.com/), Mingkui Tan, [Xinggang Wang](https://xinggangw.info/), [Wenyu Liu](http://eic.hust.edu.cn/professor/liuwenyu/), [Bin Xiao](https://www.microsoft.com/en-us/research/people/bixi/)

<details>
<summary>Abstract</summary>
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at [this https URL](https://github.com/HRNet).
</details>

#### Results

| Backbone | Input Size | Pose AP-val | Scores-val | Pose AP-test | Scores-test | Config | Weights | Preds | Visuals |
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| HRNet-w32 | 256 * 192 | 48.83 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w32_256x192_pose_bdd100k.json) | 46.13 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w32_256x192_pose_bdd100k.json) | [config](./configs/hrnet_w32_256x192_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_256x192_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_256x192_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w32_256x192_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w32_256x192_pose_bdd100k.zip) |
| HRNet-w48 | 256 * 192 | 50.32 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w48_256x192_pose_bdd100k.json) | 47.36 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w48_256x192_pose_bdd100k.json) | [config](./configs/hrnet_w48_256x192_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_256x192_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_256x192_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w48_256x192_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w48_256x192_pose_bdd100k.zip) |
| HRNet-w32 | 320 * 256 | 49.86 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w32_320x256_pose_bdd100k.json) | 46.90 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w32_320x256_pose_bdd100k.json) | [config](./configs/hrnet_w32_320x256_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_320x256_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w32_320x256_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w32_320x256_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w32_320x256_pose_bdd100k.zip) |
| HRNet-w48 | 320 * 256 | 50.16 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-val/hrnet_w48_320x256_pose_bdd100k.json) | 47.32 | [scores](https://dl.cv.ethz.ch/bdd100k/pose/scores-test/hrnet_w48_320x256_pose_bdd100k.json) | [config](./configs/hrnet_w48_320x256_pose_bdd100k.py) | [model](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_320x256_pose_bdd100k.pth) \| [MD5](https://dl.cv.ethz.ch/bdd100k/pose/models/hrnet_w48_320x256_pose_bdd100k.md5) | [preds](https://dl.cv.ethz.ch/bdd100k/pose/preds/hrnet_w48_320x256_pose_bdd100k.json) | [visuals](https://dl.cv.ethz.ch/bdd100k/pose/visuals/hrnet_w48_320x256_pose_bdd100k.zip) |

[[Code](https://github.com/HRNet)] [[Usage Instructions](https://github.com/SysCV/bdd100k-models/tree/main/pose#usage)]
16 changes: 16 additions & 0 deletions doc/PREPARE_DATASET.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ On the official download page, the required data and annotations for each task a
- `object detection` set:
- images: `100K Images`
- annotations: `Detection 2020 Labels`
- `pose estimation` set:
- images: `100K Images`
- annotations: `Pose Estimation Labels`
- `instance segmentation` set:
- images: `10K Images`
- annotations: `Instance Segmentation`
Expand Down Expand Up @@ -44,6 +47,14 @@ python -m bdd100k.label.to_coco -m det \
-o bdd100k/jsons/det_${SET_NAME}_cocofmt.json
```

To convert the pose estimation set, you can run:
```bash
mkdir bdd100k/jsons
python -m bdd100k.label.to_coco -m pose \
-i bdd100k/labels/pose_21/pose_${SET_NAME}.json \
-o bdd100k/jsons/pose_${SET_NAME}_cocofmt.json
```

To convert the instance segmentation set, you can run:
```bash
mkdir bdd100k/jsons
Expand Down Expand Up @@ -103,6 +114,9 @@ bdd100k-models
│ ├── det_20
| | ├── det_train.json
| | └── det_val.json
│ ├── pose_21
| | ├── pose_train.json
| | └── pose_val.json
│ ├── ins_seg
| | ├── bitmasks
| | | ├── train
Expand Down Expand Up @@ -137,6 +151,8 @@ bdd100k-models
└── jsons
├── det_train_cocofmt.json
├── det_val_cocofmt.json
├── pose_train_cocofmt.json
├── pose_val_cocofmt.json
├── ins_seg_train_cocofmt.json
├── ins_seg_val_cocofmt.json
├── box_track_train_cocofmt.json
Expand Down
Binary file added doc/images/pose.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/images/pose1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit f6ef24d

Please sign in to comment.