Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Speed on CPU and GPU #1398

Closed
akde opened this issue Aug 14, 2018 · 10 comments
Closed

About Speed on CPU and GPU #1398

akde opened this issue Aug 14, 2018 · 10 comments

Comments

@akde
Copy link

akde commented Aug 14, 2018

Hi @AlexeyAB Thanks a lot for the repository, especially for the tracking part. And of course for the quick responses.

I switched into AlexeyAB repository from pjreddie because of tracking feature. Now I am trying to maximize the FPS I am getting with AlexeyAB repository.

I have tested AlexeyAB and pjreddie repositories on the same PC and video (1920 x 1080, avi format with yolov3-tiny).
Here are the results:

computer specifications
GPU GT 730
CPU i7-4790 CPU @ 3.60GHz
CUDA 8.0
Ubuntu 16.04
<------------ pjreddie ------------ ------------> <------------ AlexeyAB ------------>
GPU 0 1 1 0 1 1
AVX - - - 0 0 1
OPENMP 0 0 1 0 0 1
LIBSO - - - 0 0 1
FPS 1.1 10.4 10.4 1.2 3.3 3.3

As you can see from the table, without the GPU support, both repositories are doing equal. However when GPU support is enabled, with the AlexeyAB repository the maximum fps I can get is 3.3 while it was 10.4 in the pjreddie repository. So am I doing something wrong while using @AlexeyAB 's repository? What can I do to get higher fps?

@AlexeyAB
Copy link
Owner

@akde Hi,

  • What command do you use for both repositories?
  • Be sure to use DEBUG=0
  • Try to install cuDNN and set CUDNN=1 in the Makefile.
  • AVX=1 and OPENMP=1 has an effect only if GPU=0 (i.e. only of is used CPU instead of GPU)

@akde
Copy link
Author

akde commented Aug 14, 2018

Hi @AlexeyAB thx for immediate response:

  • What command do you use for both repositories?
    ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights ~/darknet/teklerbad.mp4

  • Be sure to use DEBUG=0
    in both Makefiles DEBUG = 0

  • Try to install cuDNN and set CUDNN=1 in the Makefile.
    Sadly the compute capacity of my GPU is NOT high enough for cuDNN (to be more specific compute capacity is 2.1 but to install cuDNN it has to be at least 3.0)

  • AVX=1 and OPENMP=1 has an effect only if GPU=0 (i.e. only of is used CPU instead of GPU)
    Just as you said when GPU is enabled AVX and OPENMP does NOT have any effect of FPS it is 3.3 in both cases.

@AlexeyAB
Copy link
Owner

computer specifications
GPU GT 730
  • Try to install cuDNN and set CUDNN=1 in the Makefile.
    Sadly the compute capacity of my GPU is NOT high enough for cuDNN (to be more specific compute capacity is 2.1 but to install cuDNN it has to be at least 3.0)

As I see GeForce GT 730 has 3.5 compute capacity: https://en.wikipedia.org/wiki/CUDA#GPUs_supported

image

@akde
Copy link
Author

akde commented Aug 14, 2018

@AlexeyAB Thx for the reply!

At first I thought the same! But then realized that is has multiple versions:
image
https://developer.nvidia.com/cuda-gpus

@AlexeyAB
Copy link
Owner

@akde Yes, looks like it is 2.1

Try to change this line:

ARCH= -gencode arch=compute_30,code=sm_30 \

to this:
ARCH= -gencode arch=compute_21,code=sm_21 \

May be it will not work propertly for old GPU.

@akde
Copy link
Author

akde commented Aug 14, 2018

@AlexeyAB thx for immediate response!

in my case it was
ARCH= -gencode arch=compute_20,code=sm_20

then I changed it into
ARCH= -gencode arch=compute_21,code=sm_21
and it gave me this error.

nvcc fatal : Unsupported gpu architecture 'compute_21'
Makefile:136: recipe for target 'obj/convolutional_kernels.o' failed
make: *** [obj/convolutional_kernels.o] Error 1

so I believe the correct way is ARCH= -gencode arch=compute_20,code=sm_20

I am attaching the Makefiles for both (AlexeyAB and pjreddie) repositories. Everything looks the same to me except the FPS.
https://drive.google.com/open?id=1HQevPS6fgRzNCtN4yV2Dk_bwTuqO6mCp

It is a bit weird to get 3x higher performance in such similar cases.

I have read other issues and saw that there is a rescale line which is NOT in pjreddie repository. Can this line cause 3x FPS reduction?

@AlexeyAB
Copy link
Owner

I have read other issues and saw that there is a rescale line which is NOT in pjreddie repository. Can this line cause 3x FPS reduction?

What is the rescale line?

I can only recommend to use modern GPU. I don't have old GPU, so I can't test and tune code for it.

@akde
Copy link
Author

akde commented Aug 14, 2018

@AlexeyAB You are totally right I need to have more recent set up. But for now that is the best I can get.

The rescale line is mentioned here

darknet/src/detector.c

Line 1124 in 3e856ec
image sized = resize_image(im, net.w, net.h);

@AlexeyAB
Copy link
Owner

@akde No. I don't use resize_image() function for detector demo on video. I use very fast OpenCV function (AVX/OpenMP optimized) cvResize() instead of resize_image():

cvResize(src, *in_img, CV_INTER_LINEAR);


Since in the pjreddie/darknet is used letterbox_image_into() for detector demo on video https://github.com/pjreddie/darknet/blob/9a4b19c4158b064a164e34a83ec8a16401580850/src/demo.c#L144
that uses slow function resize_image() inside: https://github.com/pjreddie/darknet/blob/9a4b19c4158b064a164e34a83ec8a16401580850/src/image.c#L959


You issue is related to some CUDA-functions that isn't optimized for very old GPU.

@akde
Copy link
Author

akde commented Aug 15, 2018

@AlexeyAB Hi, thx for the detailed explanation and all the concern during conversation. Apparently as you suggest, I need to update my set up.

@akde akde closed this as completed Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants