Skip to content

A Review of YOLO Object Detection Based on Deep Learning

Notifications You must be signed in to change notification settings

GreenTeaHua/YOLO-Review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

更新的资料

【综述】一文看懂YOLO系列:YOLOV6、YOLOX、PPYOLO、PPYOLOE、YOLOV5,https://zhuanlan.zhihu.com/p/533243893

基于深度学习的YOLO目标检测综述

https://jeit.ac.cn/cn/article/doi/10.11999/JEIT210790

摘 要:

目标检测是计算机视觉领域的一个基础任务和研究热点。YOLO(You Only Look Once)将目标检测概括 为一个回归问题,实现端到端的训练和检测,由于其良好的速度-精度平衡,近几年一直处于目标检测领域的领先 地位,被成功地研究、改进和应用到众多不同领域。该文对YOLO系列算法及其重要改进、应用进行了详细调 研。首先,系统的梳理了YOLO家族及重要改进,包含YOLOv1-v4,YOLOv5,Scaled-YOLOv4,YOLOR和最 新的YOLOX。然后,对YOLO中重要的基础网络,损失函数进行了详细的分析和总结。其次,依据不同的改进 思路或应用场景对YOLO算法进行了系统的分类归纳。例如,注意力机制,3D,航拍场景,边缘计算等。最后, 总结了YOLO的特点,并结合最新的文献分析可能的改进思路和研究趋势。

关键词:深度学习;卷积神经网络;目标检测;YOLO

A Review of YOLO Object Detection Based on Deep Learning

Abstract:

The YOLO (You Only Look Once) algorithm and its important improvement and application are investigated in detail. This paper presents a comprehensive investigation on the YOLO series algorithms, as well as their important improvement and applications. Firstly, this paper systematically combs the basic development process of the YOLO family and its important improvements are systematically reviewed, including YOLOv1-v4, YOLOv5, Scaled-YOLOv4, YOLOR, and the latest YOLOX. Then, detailed analyses and summaries of the backbone network module and the loss module are concluded, which play an important role in YOLO series. Next, the YOLO algorithms are systematically classified according to different improvement ideas or application scenarios. For example, attention mechanism, three dimensional scenes, aerial scenes, edge computing, etc. Finally, we summarize the characteristics of the YOLO series, and analyze the possible ideas for further improvement and research trends are analyzed based on the latest literature.

Key words: Deep learning; Convolutional Neural Network (CNN); Object detection; YOLO

DOI: 10.11999/JEIT210790

引用

引用本文:

邵延华, 张铎, 楚红雨, 张晓强, 饶云波. 基于深度学习的YOLO目标检测综述[J]. 电子与信息学报. doi: 10.11999/JEIT210790 shu

Citation:

SHAO Yanhua, ZHANG Duo, CHU Hongyu, ZHANG Xiaoqiang, RAO Yunbo. A Review of YOLO Object Detection Based on Deep Learning[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT210790

bib

@articleInfo{210790, title = "基于深度学习的YOLO目标检测综述", journal = "电子与信息学报", volume = "44", number = "210790, pages = "1", year = "2022", note = "", issn = "1009-5896", doi = "10.11999/JEIT210790", url = "//article/id/dae079cf-663b-42d8-8408-11cf435b2138", author = "邵延华","张铎","楚红雨","张晓强","饶云波", keywords = "深度学习","卷积神经网络","目标检测","YOLO", }

参考文献

参 考 文 献

[1] LI Liu, OUYANG Wanli, WANG Xiaogang, et al. Deep learning for generic object detection: A survey[J]. International journal of computer vision, 2020, 128(2): 261-318. doi:10.1007/s11263-019-01247-4.

[2] ZOU Zhengxia, SHI Zhenwei, GUO Yuhong, et al. Object detection in 20 years: A survey[J]. arXiv preprint arXiv:1905.05055, 2019.

[3] Dalal N and Triggs B. Histograms of oriented gradients for human detection[C]. 2005 IEEE Conference on Computer Vision and Pattern Recognition, 2005, 1: 886-893. doi:10.1109/ CVPR.2005.177.

[4] KRIZHEVSKY A, SUTSKEVER I and HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25: 1097-1105.

[5] LECUN Y, BENGIO Y and HINTON G. Deep learning[J]. nature, 2015, 521(7553): 436-444. doi:10.1038/nature14539.

[6] JIAO Licheng, ZHANG Fan, LIU Fang, et al. A survey of deep learning-based object detection[J]. IEEE access, 2019, 7: 128837-128868. doi:10.1109/access.2019.2939201.

[7] WU Xiongwei, SAHOO Doyen and HOI Steven C.H. Recent advances in deep learning for object detection[J]. Neurocomputing, 2020, 396: 39-64. doi:10.1016/j.neucom. 2020.01.085.

[8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788. doi:10.1109/CVPR.2016.91.

[9] WANG C Y, YEH I H and LIAO H Y M. You Only Learn One Representation: Unified Network for Multiple Tasks[J]. arXiv preprint arXiv:2105.04206, 2021.

[10] GE Zheng, LIU Songtao, WANG Feng, et al. YOLOX: Exceeding YOLO Series in 2021[J]. arXiv preprint arXiv:2105.08430, 2021.

[11] REDMON J and FARHADI A. YOLO9000: better, faster, stronger[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 7263-7271. doi:10.1109/CVPR. 2017.690.

[12] REDMON J and FARHADI A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.

[13] BOCHKOVSKIY A, WANG C Y and LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.

[14] GLENN J, ALEX S, JIRKA B, et al. ultralytics/yolov5: v3.1 - Bug Fixes and Performance Improvements [OL]. https://doi.org/10.5281/ zenodo.4154370. 2020.10.

[15] WANG C Y, BOCHKOVSKIY A and LIAO H Y M. Scaled-yolov4: Scaling cross stage partial network[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 13029-13038.

[16] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context[C]. European conference on computer vision. Springer, Cham, 2014: 740-755. doi:10.1007/978-3-319-10602-1_48.

[17] 罗会兰, 陈鸿坤. 基于深度学习的目标检测研究综述[J]. 电子学报, 2020, 48(6): 1230-1239.

LUO Huilan and CHEN Hongkun. Survey of Object Detection Based on Deep Learning[J]. Acta Electronica Sinica, 2020, 48(6): 1230-1239.

[18] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1-9. doi:10.1109/CVPR.2015. 7298594.

[19] EVERINGHAM M, ESLAMI S M A, VAN Gool L, et al. The pascal visual object classes challenge: A retrospective[J]. International journal of computer vision, 2015, 111(1): 98-136.

[20] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. doi:10.1109/CVPR.2016.90.

[21] WANG C Y, LIAO H Y M, YEH I H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, workshops. 2020: 390-391. doi:10.1109/ CVPRW50498.2020.00203.

[22] MISRA D. Mish: A self regularized non-monotonic neural activation function[J]. arXiv preprint arXiv:1908.08681, 2019, 4: 2.

[23] LIU Shu, QI Lu, QIN Haifang, et al. Path aggregation network for instance segmentation[C]. 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8759-8768. doi:10.1109/CVPR.2018.00913.

[24] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

[25] GHIASI G, LIN T Y and LE Q V. Nas-fpn: Learning scalable feature pyramid architecture for object detection[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7036-7045. doi:10.1109/CVPR.2019. 00720.

[26] ELFWING S, UCHIBE E and DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11. doi:10.1016/j.neunet.2017.12.012.

[27] HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3[C]. 2019 IEEE/CVF International Conference on Computer Vision. 2019: 1314-1324. doi:10.1109/ICCV. 2019.00140.

[28] MA Ninging, ZHANG Xiangyu, ZHENG Hai-Tao, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]. 2018 European conference on computer vision, 2018: 116-131. doi:10.1007/978-3-030-01264 -9_8.

[29] 李成跃,姚剑敏,林志贤,等. 基于改进YOLO轻量化网络的目标检测方法[J]. 激光与光电子学进展, 2020, 57(14):37-45. doi:10.3788/LOP57.141003.

LI Chengyue, YAO Jiangmin, LIN Zhixian, et al. Object Detection Method Based on Improved YOLO Lightweight Network[J]. Laser & Optoelectronics Progress, China, 2020, 57(14):37-45. doi:10.3788/LOP57.141003.

[30] HU Jie, SHEN Li and SUN Gang. Squeeze-and-excitation networks[C]. 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018: 7132-7141. doi:10.1109/CVPR. 2018.00745.

[31] YANG Yang and DENG Hongmin. GC-YOLOv3: you only look once with global context block[J]. Electronics, 2020, 9(8): 1235. doi:10.3390/electronics9081235.

[32] WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]. 2018 European Conference on computer vision. 2018: 3-19. doi:10.1007/978-3-030-01234-2_1.

[33] ZHENG Zhaohui, WANG Ping, LIU Wei, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]. 2020 AAAI Conference on Artificial Intelligence. 2020, 34(07): 12993-13000.

[34] REZATOFIGHI H, TSOI N, GWAK J Y, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.

[35] BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS--improving object detection with one line of code[C]. 2017 IEEE international conference on computer vision. 2017: 5561-5569.

[36] CHEN Zhiming, CHEN Kean, LIN Weiyao, et al. PIoU loss: Towards accurate oriented object detection in complex environments[C]. European Conference on Computer Vision, Springer, Cham, 2020: 195-211.

[37] DU Dawei, ZHU Pengfei, WEN Longyin, et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results[C]. 2019 IEEE/CVF International Conference on Computer Vision Workshops. 2019: 0-0. doi:10.1109/ICCVW.2019.00030.

[38] UNIVERSITY OF SASKATCHEWAN. Kaggle Competition: Global Wheat Detection [OL]. Available: https://www.kaggle.com/c/ global-wheat-detection. 2020.3.

[39] HUANG Zhanchao, WANG Jianlin, FU Xuesong, et al. DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection[J]. Information Sciences, 2020, 522: 241-258. doi:10.1016/j.ins.2020.02.067.

[40] HUANG Xin, WANG Xinxin, LV Wenyu, et al. PP-YOLOv2: A Practical Object Detector[J]. arXiv preprint arXiv:2104.10419, 2021.

[41] DING Jian, XUE Nan, XIA Gui-Song, et al. Object detection in aerial images: A large-scale benchmark and challenges[J]. arXiv preprint arXiv:2102.12219, 2021.

[42] TEKIN B, SINHA S N, FUA P. Real-time seamless single shot 6d object pose prediction[C] 2018 IEEE Conference on Computer Vision and Pattern Recognition. 2018: 292-301. doi:10.1109/CVPR.2018.00038.

[43] SIMON M, AMENDE K, KRAUS A, et al. Complexer-yolo: Real-time 3d object detection and tracking on semantic point clouds[C] 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019: 0-0.

[44] TAKAHASHI M, JI Y, UMEDA K, et al. Expandable YOLO: 3D object detection from RGB-D images[C]. 2020 21st International Conference on Research and Education in Mechatronics (REM). IEEE, 2020: 1-5.

[45] DING Caiwen, WANG Shuo, LIU Ning, et al. REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs[C]. 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2019: 33-42. doi:10.1145/3289602.3293904.

[46] LEE Y, LEE C, LEE H J, et al. Fast Detection of Objects Using a YOLOv3 Network for a Vending Machine[C]. 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2019: 132-136. doi:10.1109/aicas. 2019.8771517.

[47] AZIMI S. Shuffledet: Real-time vehicle detection network in on-board embedded uav imagery[C] 2018 European Conference on Computer Vision (ECCV) Workshops. 2018: 0-0.

[48] TIJTGAT N, VAN Ranst W, GOEDEME T, et al. Embedded real-time object detection for a UAV warning system[C]. 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy, 2017: 2110-2118. doi:10.1109/ICCVW.2017.247.

[49] ZHANG Pengyi, ZHONG Yunxin and LI Xiaoqiong. SlimYOLOv3: Narrower, faster and better for real-time UAV applications[C]. 2019 IEEE/CVF International Conference on Computer Vision Workshops. 2019: 0-0. doi:10.1109/ICCVW. 2019.00011.

[50] HENDRY and CHEN R C. Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning[J]. Image and Vision Computing, 2019, 87: 47-56. doi:10.1016/j.imavis.2019. 04.007.

[51] TU Renwei, ZHU Zhongjie, BAI Yongqiang, et al. Improved YOLO v3 network-based object detection for blind zones of heavy trucks[J]. Journal of Electronic Imaging, 2020, 29(5): 053002. doi: 10.1117/1.JEI.29.5.053002.

[52] YANG Shuo, ZHANG Junxing, BO Chunjuan, et al. Fast vehicle logo detection in complex scenes[J]. Optics & Laser Technology, 2019, 110: 196-201. doi:10.1016/j.optlastec. 2018.08.007.

[53] YANG Fan, YANG Deming, HE Zhiming, et al. Automobile Fine-Grained Detection Algorithm Based on Multi-Improved YOLOv3 in Smart Streetlights[J]. Algorithms, 2020, 13(5): 114. doi:10.3390/a13050114.

[54] LI Min, ZHANG Zhijie, LEI Liping, et al. Agricultural greenhouses detection in high-resolution satellite images based on convolutional neural networks: Comparison of faster r-cnn, yolo v3 and ssd[J]. Sensors, 2020, 20(17): 4938. doi:10.3390/ s20174938.

[55] WU Dihua, LV Shuaichao, JIANG Mei, et al. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments[J]. Computers and Electronics in Agriculture, 2020, 178: 105742. doi:10.1016/j.compag.2020.105742.

[56] XU Zhi-Feng, JIA Rui-Sheng, SUN HongMei, et al. Light-YOLOv3: fast method for detecting green mangoes in complex scenes using picking robots[J]. Applied Intelligence, 2020, 50(12): 4670-4687. doi:10.1007/s10489-020-01818-w.

[57] SHARIF M, AMIN J, SIDDIQA A, et al. Recognition of different types of leukocytes using YOLOv2 and optimized bag-of-features[J]. IEEE Access, 2020, 8: 167448-167459. doi:10.1109/access.2020.3021660.

[58] ZHUANG Zhemin, LIU Guobao, DING Wanli, et al. Cardiac VFM visualization and analysis based on YOLO deep learning model and modified 2D continuity equation[J]. Computerized Medical Imaging and Graphics, 2020, 82: 101732. doi:10.1016/j.compmedimag.2020.101732.

[59] KYRKOU C. YOLOpeds: efficient real-time single-shot pedestrian detection for smart camera applications[J]. IET Computer Vision, 2020, 14(7): 417-425. doi:10.1049/iet-cvi. 2019.0897.

[60] 赵斌,王春平,付强.显著性背景感知的多尺度红外行人检测方法[J].电子与信息学报,2020,42(10):2524-2532.

ZHAO Bin, WANG Chunping and FU Qiang. Multi-scale Pedestrian Detection in Infrared Images with Salient Background-awareness[J]. Journal of Electronics & Information Technology, 2020, 42(10): 2524-2532.

[61] KRISTO M, IVASIC-KOS M, POBAR M. Thermal object detection in difficult weather conditions using YOLO[J]. IEEE Access, 2020, 8: 125459-125476. doi:10.1109/access. 2020. 3007481.

[62] LIU Peng, SONG Changlin, LI Junmin, et al. DETECTION OF TRANSMISSION LINE AGAINST EXTERNAL FORCE DAMAGE BASED ON IMPROVED YOLOv3[J]. International Journal of Robotics and Automation, 2020, 35(6). doi: 10.2316/J.2020.206-0479.

[63] XIE Yiqun, CAI Jiannan, BHOJWANI R, et al. A locally-constrained yolo framework for detecting small and densely-distributed building footprints[J]. International Journal of Geographical Information Science, 2020, 34(4): 777-801. doi:10.1080/13658816.2019.1624761.

[64] LUO Yanyang, SHAO Yanhua, CHU Hongyu, et al. CNN-based blade tip vortex region detection in flow field[C]. Eleventh International Conference on Graphics and Image Processing (ICGIP 2019). International Society for Optics and Photonics, 2020, 11373: 113730P.

About

A Review of YOLO Object Detection Based on Deep Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published