Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079

AlexeyAB · 2020-03-21T01:29:41Z

batch=1 (sync-mode)
CPU, VPU
- OpenCV 4.2.0 (master-branch 21 Mar 2020)
- OpenVINO 2020.1.033
GPU
- CUDA 10.0
- cuDNN 7.4.2
- Darknet (Mar 22, 2020) GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1

Accuracy and FPS:

Model	AP50...95 (MSCOCO), accuracy	mAP50 (MSCOCO), accuracy	CPU - 90 Watt - FP32 (Intel Core i7-6700K 4GHz 8 Logical Cores) OpenCV-DLIE, FPS	VPU - 2 Watt - FP16 (Intel Myriad X) OpenCV-DLIE, FPS	GPU - 175 Watt - FP32/16 (nVidia GeForce RTX 2070) Darknet-cuDNN, FPS
yolov4-tiny 416x416		40.2%	-	-	330
yolov3-tiny 416x416		33.1%	35	6.5	340
yolov3-tiny-PRN 416x416		33.1%	46	5.3	370
EfficientNetB0-Yolo 416x416		45.5%	11	-	55
yolov3 416x416	31.0%	55.3%	-	-	-
yolov3-spp 512x512		~59.6%	3.3	1.1	52
csresnext50-opt 512x512	42.4%	64.4%	3.5	0.64	37
csdarknet53-opt 256x256 async=3	33.3%	53.0%	14	11	74
csdarknet53-opt 512x512	42.4%	64.5%	3.5	1.23	50
csdarknet53-mish 512x512 (YOLOv4)	43.0%	64.9%	-	-	50
csresnext50-opt 608x608	43.2%	65.4%	-	-	34
csdarknet53-mish 608x608 (YOLOv4)	43.5%	65.7%	-	-	37

The text was updated successfully, but these errors were encountered:

WongKinYiu · 2020-03-21T01:45:02Z

@AlexeyAB Hello,

So currently EfficientNetB0-Yolo is the fastest model on VPU?

AlexeyAB · 2020-03-21T02:02:17Z

@WongKinYiu Hi,

Yes, it seems VPU (Intel Myriad X) is highly optimized for Grouped-convolutional and may be SE-blocks. I will test it more.

Maybe with new Google-Coral-TPU-edge in general, the performance ratio will be the same as with Intel Myriad X.

So maybe it makes sense to train GhostNet ghostnet.cfg.txt and yolov3-tiny-3l-ghostnet (as a new tiny-yolo model): #4418 (comment)

WongKinYiu · 2020-03-21T02:05:06Z

@AlexeyAB Thanks,

ghostnet now training 40k/800k iterations.

AlexeyAB · 2020-03-23T18:24:40Z

@WongKinYiu Do you train ghostnet with CutMix+Mosaic+Label-smoothing?

Also did we get improvement for any network with DropBlock?

LukeAI · 2020-03-23T19:57:08Z

This is a fantastic resource, if at all possible, it'd be great to also see results for "batch=4" or similar.

WongKinYiu · 2020-03-23T22:13:14Z

@AlexeyAB No, just ghostnet.cfg.txt your provided before.

AlexeyAB · 2020-03-23T23:04:59Z

@WongKinYiu I also added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg that you can try to train with subdivisions=6 or 4

WongKinYiu · 2020-03-23T23:12:51Z

@AlexeyAB thanks, i am seeing the code of new commits.

WongKinYiu · 2020-03-25T04:48:20Z

@AlexeyAB i set subdivisions=4 and the training is start now.

ShaneHsieh · 2020-03-27T05:19:46Z

Hi @AlexeyAB
When you test CPU, VPU , do you use FP32?
As far as I know, VPU can use FP16 and Int8. this information is very important.

AlexeyAB · 2020-03-27T13:17:10Z

@ShaneHsieh I added this information, so CPU uses FP32, VPU uses FP16, GPU uses FP32/16 (Tensor Cores). These devices use the lowest possible precision of floating point values with increasing speed and without loss of accuracy.

ShaneHsieh · 2020-03-30T03:51:57Z

Thank.
Compare CPU and GPU when use FP32 , CPU use EfficientNetB0-Yolo can get better performance. it is good information.

andeyeluguo · 2020-04-26T06:31:43Z

what does the opencv-DLIE mean?

WongKinYiu · 2020-04-26T06:42:35Z

OpenCV-DLIE (deep learning Inference Engine), supported by OpenVINO Toolkit.

WongKinYiu · 2020-04-26T07:42:54Z

Yes, you can use opencv dnn module to run the models.
For example, yolov3, yolov3-tiny-prn, efficientnetb0-yolo...

But due to mish activation function and eliminate grid sensitivity not yet supported by opencv dnn module, you can not run yolov4 in this time.

andeyeluguo · 2020-04-26T07:46:02Z

Does it support alexeyAB's version ?, I now only find the tensorflow's yolo version that OpenVINO support.

WongKinYiu · 2020-04-26T07:51:05Z

for your reference opencv/opencv#16436

andeyeluguo · 2020-04-26T08:00:44Z

will you please give me a tutorial of how to deploy the cfg file to xml which OpenVINO supports? I see the question on the site
Does OpenCV-OpenVINO version supports Yolo v3 network?
It may be asked by alexeyAB.

WongKinYiu · 2020-04-26T08:09:27Z

Darknet is supported already. https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV

AlexeyAB · 2020-04-26T10:41:22Z

@andeyeluguo For using Yolo with OpenVINO (on CPU, GPU, VPU, ...) you should

install OpenVINO as usual
install OpenCV with OpenVINO-backend: https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend
run yolov3.cfg + yolov3.weights by using OpenCV-dnn https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html examples how to use Yolo
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
- https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.py

YOLOv4 will be supported for OpenCV+OpenVINO soon: opencv/opencv#17148

I added Yolo v2 to OpenCV 2.5 years ago: opencv/opencv#9705

mmaaz60 · 2020-04-27T21:34:20Z

Can these models also be run on NCS 2 using the OpenCV DNN module with IE backend?

Luxonis-Brandon · 2020-04-28T16:41:48Z

@mmaaz60 it seems like that is the case. We will be trying on DepthAI (Myriad X based) shortly and will circle back.

Also @AlexeyAB if you have any instructions on how to use YOLOv4 on VPU, we'd be keen to try them out on DepthAI.

AlexeyAB · 2020-04-28T17:16:39Z

@Luxonis-Brandon

Current version of YOLOv4 is for Real-time on GPU. Later we will release YOLOv4-VPU for real-time >= 30 FPS on VPU.

There are two ways to run YOLOv4 on MyriadX:

Support for YOLOv4 in OpenVINO - Wait until it is added to OpenVINO
Support for YOLOv4 in OpenCV-dnn (with OpenVINO IE-backend ) - wait for solving this issue: Feature-request: State-of-art Yolo v4 Detector opencv/opencv#17148

Right now, you can try to use a slightly simpler version of YOLOv4, which is 0.5% worse on VPU Intel MyriadX by using C++ with OpenVINO:

or (width=512 height=512 in cfg with accuracy 42.4% AP and speed 1.2 FPS) look at the table Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079 (comment)
or (width=320 height=320 in cfg 40.5% AP and 3 FPS)
or (width=320 height=320 in cfg 40.5% AP and ~7 FPS with async=3 streams)

use

cfg: https://drive.google.com/open?id=15WhN7W8UZo7-4a0iLkx11Z7_sDVHU4l1
weights: https://drive.google.com/open?id=1ULnPnamS5A6lOgidlBXD24IdxoDAFaaV
example: https://github.com/opencv/open_model_zoo/tree/master/demos/object_detection_demo_yolov3_async
1. just change anchors https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L118-L119 to these values:
  
  darknet/cfg/yolov4.cfg
  
  Line 1141 in 36c73c5
  
  anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
2. instead of this code: https://github.com/opencv/open_model_zoo/blob/7d235755e2d17f6186b11243a169966e4f05385a/demos/object_detection_demo_yolov3_async/main.cpp#L196-L197
  use this code

// actually should be 1.05, 1.1 and 1.2 for correspond [yolo] layers istead of 1.1
            double x = (col + output_blob[box_index + 0 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_w;
            double y = (row + output_blob[box_index + 1 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_h;

AlexeyAB · 2020-04-28T21:01:01Z

@Luxonis-Brandon

I just tested csdarknet53-opt (YOLOv4 without MISH in cfg set: width=256 height=256 - 33.3% AP | 53.0% AP50) on your DepthAI (Myriad X) device with network resolution 256x256 and async=3 by using OpenCV (OpenVINO IE-backend) and get 11 FPS.

AlexeyAB · 2020-06-17T19:54:31Z

ausk · 2020-06-23T08:05:52Z

OpenCV 4.4.0-pre compiled by self. OpenVino 2020.R3, Myriad.
net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)

Input 416x416

efficient-b0 395 ms
yolov3, 550 ms
yolov3-tiny-prn, 168 ms
yolov3-tiny, 128 ms
yolov4, 940 ms
efnet-coco, 395 ms

AlexeyAB · 2020-06-25T14:14:21Z

YOLOv4-tiny released: #6067

linyib · 2024-03-12T08:04:05Z

Hi, Who has efficientnet-lite3.weights file, can you share it with me?

AlexeyAB added the Explanations Explanations of the source code, algorithms or method of use label Mar 21, 2020

AlexeyAB mentioned this issue Mar 21, 2020

Implemented weighted-multi_input-[shortcut] layer with weights-normalization #4662

Open

This was referenced Mar 21, 2020

CSPResNeXt50-PANet-SPP ultralytics/yolov3#698

Closed

EfficientNetb0-Yolo speed slow #4447

Open

AlexeyAB pinned this issue Mar 22, 2020

AlexeyAB mentioned this issue Mar 23, 2020

Difference between the provided cfg/*.cfg files #5092

Open

AlexeyAB mentioned this issue Mar 25, 2020

[Questions] Looking for some informations about object detection #5107

Closed

WongKinYiu mentioned this issue Apr 25, 2020

Question about YoLoV4 vs EfficientDet #5311

Open

AlexeyAB mentioned this issue May 3, 2020

YOLO v4 for VPU #5467

Closed

Luxonis-Brandon mentioned this issue May 5, 2020

DepthAI Support - Spatial AI Support w/ OpenVINO opendatacam/opendatacam#203

Open

cenit unpinned this issue May 6, 2020

AlexeyAB pinned this issue May 7, 2020

AlexeyAB mentioned this issue May 12, 2020

numerical issues in the mish implementation #5452

Closed

AlexeyAB mentioned this issue Jun 17, 2020

OpenCV DNN OpenVino Example #5992

Open

AlexeyAB unpinned this issue Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079

Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079

AlexeyAB commented Mar 21, 2020 •

edited

Loading

WongKinYiu commented Mar 21, 2020

AlexeyAB commented Mar 21, 2020

WongKinYiu commented Mar 21, 2020

AlexeyAB commented Mar 23, 2020 •

edited

Loading

LukeAI commented Mar 23, 2020

WongKinYiu commented Mar 23, 2020

AlexeyAB commented Mar 23, 2020

WongKinYiu commented Mar 23, 2020

WongKinYiu commented Mar 25, 2020

ShaneHsieh commented Mar 27, 2020

AlexeyAB commented Mar 27, 2020 •

edited

Loading

ShaneHsieh commented Mar 30, 2020

andeyeluguo commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

andeyeluguo commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

andeyeluguo commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

AlexeyAB commented Apr 26, 2020 •

edited

Loading

mmaaz60 commented Apr 27, 2020

Luxonis-Brandon commented Apr 28, 2020

AlexeyAB commented Apr 28, 2020

AlexeyAB commented Apr 28, 2020 •

edited

Loading

AlexeyAB commented Jun 17, 2020

ausk commented Jun 23, 2020

AlexeyAB commented Jun 25, 2020

linyib commented Mar 12, 2024

Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079

Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079

Comments

AlexeyAB commented Mar 21, 2020 • edited Loading

WongKinYiu commented Mar 21, 2020

AlexeyAB commented Mar 21, 2020

WongKinYiu commented Mar 21, 2020

AlexeyAB commented Mar 23, 2020 • edited Loading

LukeAI commented Mar 23, 2020

WongKinYiu commented Mar 23, 2020

AlexeyAB commented Mar 23, 2020

WongKinYiu commented Mar 23, 2020

WongKinYiu commented Mar 25, 2020

ShaneHsieh commented Mar 27, 2020

AlexeyAB commented Mar 27, 2020 • edited Loading

ShaneHsieh commented Mar 30, 2020

andeyeluguo commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

andeyeluguo commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

andeyeluguo commented Apr 26, 2020

WongKinYiu commented Apr 26, 2020

AlexeyAB commented Apr 26, 2020 • edited Loading

mmaaz60 commented Apr 27, 2020

Luxonis-Brandon commented Apr 28, 2020

AlexeyAB commented Apr 28, 2020

AlexeyAB commented Apr 28, 2020 • edited Loading

AlexeyAB commented Jun 17, 2020

ausk commented Jun 23, 2020

AlexeyAB commented Jun 25, 2020

linyib commented Mar 12, 2024

AlexeyAB commented Mar 21, 2020 •

edited

Loading

AlexeyAB commented Mar 23, 2020 •

edited

Loading

AlexeyAB commented Mar 27, 2020 •

edited

Loading

AlexeyAB commented Apr 26, 2020 •

edited

Loading

AlexeyAB commented Apr 28, 2020 •

edited

Loading