DETRs Beat YOLOs on Real-time Object Detection

简介

RT-DETR是第一个实时端到端目标检测器。具体而言，我们设计了一个高效的混合编码器，通过解耦尺度内交互和跨尺度融合来高效处理多尺度特征，并提出了IoU感知的查询选择机制，以优化解码器查询的初始化。此外，RT-DETR支持通过使用不同的解码器层来灵活调整推理速度，而不需要重新训练，这有助于实时目标检测器的实际应用。RT-DETR-L在COCO val2017上实现了53.0%的AP，在T4 GPU上实现了114FPS，RT-DETR-X实现了54.8%的AP和74FPS，RT-DETR-H实现了56.3%的AP和40FPS，在速度和精度方面都优于相同规模的所有YOLO检测器。RT-DETR-R50实现了53.1%的AP和108FPS，RT-DETR-R101实现了54.3%的AP和74FPS，在精度上超过了全部使用相同骨干网络的DETR检测器。若要了解更多细节，请参考我们的论文paper.

基础模型

Model	Epoch	Backbone	Input shape	$AP^{val}$	$AP^{val}_{50}$	Params(M)	FLOPs(G)	T4 TensorRT FP16(FPS)	Pretrained Model	config
RT-DETR-R18	6x	ResNet-18	640	46.5	63.8	20	60	217	download	config
RT-DETR-R34	6x	ResNet-34	640	48.9	66.8	31	92	161	download	config
RT-DETR-R50-m	6x	ResNet-50	640	51.3	69.6	36	100	145	download	config
RT-DETR-R50	6x	ResNet-50	640	53.1	71.3	42	136	108	download	config
RT-DETR-R101	6x	ResNet-101	640	54.3	72.7	76	259	74	download	config
RT-DETR-L	6x	HGNetv2	640	53.0	71.6	32	110	114	download	config
RT-DETR-X	6x	HGNetv2	640	54.8	73.1	67	234	74	download	config
RT-DETR-H	6x	HGNetv2	640	56.3	74.8	123	490	40	download	config

高精度模型

Model	Epoch	backbone	input shape	$AP^{val}$	$AP^{val}_{50}$	Pretrained Model	config
RT-DETR-Swin	3x	Swin_L_384	640	56.2	73.5	download	config
RT-DETR-FocalNet	3x	FocalNet_L_384	640	56.9	74.3	download	config

Objects365预训练模型

Model	Epoch	Dataset	Input shape	$AP^{val}$	$AP^{val}_{50}$	T4 TensorRT FP16(FPS)	Weight	Logs
RT-DETR-R18	1x	Objects365	640	22.9	31.2	-	download	log
RT-DETR-R18	5x	COCO + Objects365	640	49.2	66.6	217	download	log
RT-DETR-R50	1x	Objects365	640	35.1	46.2	-	download	log
RT-DETR-R50	2x	COCO + Objects365	640	55.3	73.4	108	download	log
RT-DETR-R101	1x	Objects365	640	36.8	48.3	-	download	log
RT-DETR-R101	2x	COCO + Objects365	640	56.2	74.5	74	download	log

Notes:

COCO + Objects365 代表使用Objects365预训练权重，在COCO上finetune的结果

注意事项:

RT-DETR 基础模型均使用4个GPU训练。
RT-DETR 在COCO train2017上训练，并在val2017上评估。
高精度模型RT-DETR-Swin和RT-DETR-FocalNet使用8个GPU训练，显存需求较高。

快速开始

依赖包:

PaddlePaddle >= 2.4.1

安装

安装指导文档

训练&评估

单卡GPU上训练:

# training on single-GPU
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml --eval

多卡GPU上训练:

# training on multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml --fleet --eval

评估:

python tools/eval.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
              -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams

测试:

python tools/infer.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
              -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams \
              --infer_img=./demo/000000570688.jpg

详情请参考快速开始文档.

部署

1. 导出模型

cd PaddleDetection
python tools/export_model.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
              -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams trt=True \
              --output_dir=output_inference

2. 转换模型至ONNX

安装Paddle2ONNX 和 ONNX

pip install onnx==1.13.0
pip install paddle2onnx==1.0.5

转换模型:

paddle2onnx --model_dir=./output_inference/rtdetr_r50vd_6x_coco/ \
            --model_filename model.pdmodel  \
            --params_filename model.pdiparams \
            --opset_version 16 \
            --save_file rtdetr_r50vd_6x_coco.onnx

3. 转换成TensorRT（可选）

确保TensorRT的版本>=8.5.1
TRT推理可以参考RT-DETR的部分代码或者其他网络资源

trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx \
        --workspace=4096 \
        --shapes=image:1x3x640x640 \
        --saveEngine=rtdetr_r50vd_6x_coco.trt \
        --avgRuns=100 \
        --fp16

量化压缩

详细步骤请参考：RT-DETR自动化量化压缩

模型	Base mAP	ACT量化mAP	TRT-FP32	TRT-FP16	TRT-INT8	配置文件	量化模型
RT-DETR-R50	53.1	53.0	32.05ms	9.12ms	6.96ms	config	Model
RT-DETR-R101	54.3	54.1	54.13ms	12.68ms	9.20ms	config	Model
RT-DETR-HGNetv2-L	53.0	52.9	26.16ms	8.54ms	6.65ms	config	Model
RT-DETR-HGNetv2-X	54.8	54.6	49.22ms	12.50ms	9.24ms	config	Model

上表测试环境：Tesla T4，TensorRT 8.6.0，CUDA 11.7，batch_size=1。
也可直接参考：PaddleSlim自动化压缩示例

其他

1. 参数量和计算量统计

可以使用以下代码片段实现参数量和计算量的统计

import paddle
from ppdet.core.workspace import load_config, merge_config
from ppdet.core.workspace import create

cfg_path = './configs/rtdetr/rtdetr_r50vd_6x_coco.yml'
cfg = load_config(cfg_path)
model = create(cfg.architecture)

blob = {
    'image': paddle.randn([1, 3, 640, 640]),
    'im_shape': paddle.to_tensor([[640, 640]]),
    'scale_factor': paddle.to_tensor([[1., 1.]])
}
paddle.flops(model, None, blob, custom_ops=None, print_detail=False)

2. YOLOs端到端速度测速

可以参考RT-DETR benchmark部分或者其他网络资源

引用RT-DETR

如果需要在你的研究中使用RT-DETR，请通过以下方式引用我们的论文：

@misc{lv2023detrs,
      title={DETRs Beat YOLOs on Real-time Object Detection},
      author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
      year={2023},
      eprint={2304.08069},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DETRs Beat YOLOs on Real-time Object Detection

最新动态

简介

基础模型

高精度模型

Objects365预训练模型

快速开始

部署

量化压缩

其他

引用RT-DETR

Files

README.md

Latest commit

History

README.md

File metadata and controls

DETRs Beat YOLOs on Real-time Object Detection

最新动态

简介

基础模型

高精度模型

Objects365预训练模型

快速开始

部署

量化压缩

其他

引用RT-DETR