Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079

Open
AlexeyAB opened this issue Mar 21, 2020 · 28 comments
Open

Comparison of some models on CPU vs VPU (neurochip) vs GPU #5079

AlexeyAB opened this issue Mar 21, 2020 · 28 comments
Labels
Explanations Explanations of the source code, algorithms or method of use

Comments

@AlexeyAB
Copy link
Owner

AlexeyAB commented Mar 21, 2020

  • batch=1 (sync-mode)
  • CPU, VPU
    • OpenCV 4.2.0 (master-branch 21 Mar 2020)
    • OpenVINO 2020.1.033
  • GPU
    • CUDA 10.0
    • cuDNN 7.4.2
    • Darknet (Mar 22, 2020) GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1

Accuracy and FPS:

Model AP50...95 (MSCOCO), accuracy mAP50 (MSCOCO), accuracy CPU - 90 Watt - FP32 (Intel Core i7-6700K 4GHz 8 Logical Cores) OpenCV-DLIE, FPS VPU - 2 Watt - FP16 (Intel Myriad X) OpenCV-DLIE, FPS GPU - 175 Watt - FP32/16 (nVidia GeForce RTX 2070) Darknet-cuDNN, FPS
yolov4-tiny 416x416 40.2% - - 330
yolov3-tiny 416x416 33.1% 35 6.5 340
yolov3-tiny-PRN 416x416 33.1% 46 5.3 370
EfficientNetB0-Yolo 416x416 45.5% 11 - 55
yolov3 416x416 31.0% 55.3% - - -
yolov3-spp 512x512 ~59.6% 3.3 1.1 52
csresnext50-opt 512x512 42.4% 64.4% 3.5 0.64 37
csdarknet53-opt 256x256 async=3 33.3% 53.0% 14 11 74
csdarknet53-opt 512x512 42.4% 64.5% 3.5 1.23 50
csdarknet53-mish 512x512 (YOLOv4) 43.0% 64.9% - - 50
csresnext50-opt 608x608 43.2% 65.4% - - 34
csdarknet53-mish 608x608 (YOLOv4) 43.5% 65.7% - - 37
@AlexeyAB AlexeyAB added the Explanations Explanations of the source code, algorithms or method of use label Mar 21, 2020
@WongKinYiu
Copy link
Collaborator

@AlexeyAB Hello,

So currently EfficientNetB0-Yolo is the fastest model on VPU?

@AlexeyAB
Copy link
Owner Author

@WongKinYiu Hi,

Yes, it seems VPU (Intel Myriad X) is highly optimized for Grouped-convolutional and may be SE-blocks. I will test it more.

Maybe with new Google-Coral-TPU-edge in general, the performance ratio will be the same as with Intel Myriad X.

So maybe it makes sense to train GhostNet ghostnet.cfg.txt and yolov3-tiny-3l-ghostnet (as a new tiny-yolo model): #4418 (comment)

@WongKinYiu
Copy link
Collaborator

@AlexeyAB Thanks,

ghostnet now training 40k/800k iterations.

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Mar 23, 2020

@WongKinYiu Do you train ghostnet with CutMix+Mosaic+Label-smoothing?

Also did we get improvement for any network with DropBlock?

@LukeAI
Copy link

LukeAI commented Mar 23, 2020

This is a fantastic resource, if at all possible, it'd be great to also see results for "batch=4" or similar.

@WongKinYiu
Copy link
Collaborator

@AlexeyAB No, just ghostnet.cfg.txt your provided before.

@AlexeyAB
Copy link
Owner Author

@WongKinYiu I also added https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg that you can try to train with subdivisions=6 or 4

@WongKinYiu
Copy link
Collaborator

@AlexeyAB thanks, i am seeing the code of new commits.

@WongKinYiu
Copy link
Collaborator

@AlexeyAB i set subdivisions=4 and the training is start now.

@ShaneHsieh
Copy link

Hi @AlexeyAB
When you test CPU, VPU , do you use FP32?
As far as I know, VPU can use FP16 and Int8. this information is very important.

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Mar 27, 2020

@ShaneHsieh I added this information, so CPU uses FP32, VPU uses FP16, GPU uses FP32/16 (Tensor Cores). These devices use the lowest possible precision of floating point values ​​with increasing speed and without loss of accuracy.

@ShaneHsieh
Copy link

Thank.
Compare CPU and GPU when use FP32 , CPU use EfficientNetB0-Yolo can get better performance. it is good information.

@andeyeluguo
Copy link

what does the opencv-DLIE mean?

@WongKinYiu
Copy link
Collaborator

OpenCV-DLIE (deep learning Inference Engine), supported by OpenVINO Toolkit.

@WongKinYiu
Copy link
Collaborator

Yes, you can use opencv dnn module to run the models.
For example, yolov3, yolov3-tiny-prn, efficientnetb0-yolo...

But due to mish activation function and eliminate grid sensitivity not yet supported by opencv dnn module, you can not run yolov4 in this time.

@andeyeluguo
Copy link

Does it support alexeyAB's version ?, I now only find the tensorflow's yolo version that OpenVINO support.

@WongKinYiu
Copy link
Collaborator

for your reference opencv/opencv#16436

@andeyeluguo
Copy link

will you please give me a tutorial of how to deploy the cfg file to xml which OpenVINO supports? I see the question on the site
Does OpenCV-OpenVINO version supports Yolo v3 network?
It may be asked by alexeyAB.

@WongKinYiu
Copy link
Collaborator

Darknet is supported already. https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Apr 26, 2020

@andeyeluguo For using Yolo with OpenVINO (on CPU, GPU, VPU, ...) you should

  1. install OpenVINO as usual
  2. install OpenCV with OpenVINO-backend: https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend
  3. run yolov3.cfg + yolov3.weights by using OpenCV-dnn https://docs.opencv.org/master/da/d9d/tutorial_dnn_yolo.html examples how to use Yolo

YOLOv4 will be supported for OpenCV+OpenVINO soon: opencv/opencv#17148

I added Yolo v2 to OpenCV 2.5 years ago: opencv/opencv#9705

@mmaaz60
Copy link

mmaaz60 commented Apr 27, 2020

Can these models also be run on NCS 2 using the OpenCV DNN module with IE backend?

@Luxonis-Brandon
Copy link

@mmaaz60 it seems like that is the case. We will be trying on DepthAI (Myriad X based) shortly and will circle back.

Also @AlexeyAB if you have any instructions on how to use YOLOv4 on VPU, we'd be keen to try them out on DepthAI.

@AlexeyAB
Copy link
Owner Author

@Luxonis-Brandon

Current version of YOLOv4 is for Real-time on GPU. Later we will release YOLOv4-VPU for real-time >= 30 FPS on VPU.

modern_gpus


There are two ways to run YOLOv4 on MyriadX:

  1. Support for YOLOv4 in OpenVINO - Wait until it is added to OpenVINO
  2. Support for YOLOv4 in OpenCV-dnn (with OpenVINO IE-backend ) - wait for solving this issue: Feature-request: State-of-art Yolo v4 Detector opencv/opencv#17148

Right now, you can try to use a slightly simpler version of YOLOv4, which is 0.5% worse on VPU Intel MyriadX by using C++ with OpenVINO:

use

// actually should be 1.05, 1.1 and 1.2 for correspond [yolo] layers istead of 1.1
            double x = (col + output_blob[box_index + 0 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_w;
            double y = (row + output_blob[box_index + 1 * side_square]*1.1 + (1 - 1.1)/2) / side * resized_im_h;

@AlexeyAB
Copy link
Owner Author

AlexeyAB commented Apr 28, 2020

@Luxonis-Brandon

I just tested csdarknet53-opt (YOLOv4 without MISH in cfg set: width=256 height=256 - 33.3% AP | 53.0% AP50) on your DepthAI (Myriad X) device with network resolution 256x256 and async=3 by using OpenCV (OpenVINO IE-backend) and get 11 FPS.

@AlexeyAB
Copy link
Owner Author

OpenCV_Vs_TensorRT

@ausk
Copy link

ausk commented Jun 23, 2020

OpenCV 4.4.0-pre compiled by self. OpenVino 2020.R3, Myriad.
net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)

Input 416x416

efficient-b0 395 ms
yolov3, 550 ms
yolov3-tiny-prn, 168 ms
yolov3-tiny, 128 ms
yolov4, 940 ms
efnet-coco, 395 ms

@AlexeyAB
Copy link
Owner Author

YOLOv4-tiny released: #6067

@AlexeyAB AlexeyAB unpinned this issue Jun 25, 2020
@linyib
Copy link

linyib commented Mar 12, 2024

Hi, Who has efficientnet-lite3.weights file, can you share it with me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Explanations Explanations of the source code, algorithms or method of use
Projects
None yet
Development

No branches or pull requests

9 participants