Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does 'route' layer work in yolov2? #487

Open
wsyzzz opened this issue Mar 20, 2018 · 23 comments
Open

How does 'route' layer work in yolov2? #487

wsyzzz opened this issue Mar 20, 2018 · 23 comments
Labels
want enhancement Want to improve accuracy, speed or functionality

Comments

@wsyzzz
Copy link

wsyzzz commented Mar 20, 2018

Hi everyone, does anyone know how 'route' layer work in yolov2? I google it and only find that "the route layer is to bring finer grained features in from earlier in the network". So how it 'bring finer grained features in'?

E.g. I provide a case to explain. I use ./darknet detector test cfg/coco.data cfg/yolo.cfg yolo.weights data/dog.jpg


layer        |          filters         |           size          |            input                   |             output           |
0 conv                    32                     3×3  /  1                   416×416×3                ->           416×416×32
……
16  conv                 512                     3×3  /  1                   26×26×256                ->           26×26×512
……
24  conv                1024                     3×3  /  1                   13×13×1024               ->           13×13×1024
25  route                16
26  conv                 64                      1×1  /  1                   26×26×512                ->           26×26×64
27  reorg                                             /  2                   26×26×64                 ->           13×13×256
28  route              27  24
29  conv                1024                     3×3  /  1                   13×13×1280               ->           13×13×1024
……

And in 'yolo.cfg', I found that

[route]                            
layers=-9
……
[route]  
layers=-1,-3 

So It's clear that the 25th route layer uses 16th layer for 'layers=-9' in .cfg file, the 28th route layer uses 27th and 24th layer for 'layers=-1, -3' in .cfg file.

Here is my question. How the route layer how the 25th route layer uses 13×13×1024 input from the 24th layer and 26×26×512 input from the 16th layer to obtain 26×26×512 output(according to the 26th input)?

I am looking for any advice. Thanks for your response!

@AlexeyAB
Copy link
Owner

Route-layer is the same as concat-lyaer in the Caffe.
(When route use only one input - then route-layer is the same as identity-layer in the Caffe)

More: #120 (comment)

@wsyzzz
Copy link
Author

wsyzzz commented Mar 21, 2018

In layer-25, you mean we take the result of layer-16 as output of layer-25 as well as input of layer-26. But how do we deal with the output of layer-24? If we drop it, layer-17-24 will be meaningless.
@AlexeyAB

@AlexeyAB
Copy link
Owner

Layer_27 will concatenate layer_24 + layer_26.

yolo_voc 2 0

@wsyzzz
Copy link
Author

wsyzzz commented Mar 21, 2018

So the route-layer and reorg-layer are a module to concatenate current output (like layer-24's output) and previous output(like layer-16's output), which means 'bring finer grained features in from earlier in the network'.
And their functions are just like what their names fit. The route-layer is like a route sign and pointing to the layer we want to concatenate. The reorg-layer is actually 'reorganization' layer.
Thank you, Alexey! You are the most patient author I've ever met!

Here's another question I'm wondering. My object detection is slow. I use yolo-voc.cfg network 416x416 on your fork and get 9.2s/ one picture on average. And my CPU is eight i7-7700K Core with 39.76 GFlops/computer[1]. Due to some restrictions, I cannot use GPU. According to this issue #80, I should achieve about ~0.01 FPS per 1 GFlops-SP. So I should have gotten 0.3976 FPS and ~2.5s/ one picture. Could you figure out what's the problem? Thanks a lot!
[1]CPU performance

@AlexeyAB
Copy link
Owner

~0.01 FPS per 1 GFlops-SP - only if all CPU-resources are used.

But Darknet Yolo well optimized only for GPU, but not for CPU. I.e. Darknet doesn't use SSE3/4/AVX (SIMD) optimizations, so it slower about 3-4x times than it could be.
Did you compile with OPENMP=1 in the Makefile? It will use multi-threads for CPU.


I added Yolo v2 to the OpenCV: opencv/opencv#9705

So if you want to use Yolo on CPU then the fastest way is to use Yolo v2 that built-in OpenCV since 3.4.0 - it can process ~2.5s/ one picture and faster:

@wsyzzz
Copy link
Author

wsyzzz commented Mar 21, 2018

Thanks for your advice! And I tried your first solution -- set OPENMP=1. Then I get results in a flash, but it shows it takes 6.2s. But it is much faster than which one shows that it takes 3.2s by my feeling... Does any trouble with the timer?

@AlexeyAB
Copy link
Owner

Yes, clock_t shows CPU-time instead of real steady time: https://stackoverflow.com/a/10874375/1558037

So it actually works much faster.

@wsyzzz
Copy link
Author

wsyzzz commented Mar 21, 2018

Get it! You really help me a lot! I just need to use an another timer function!

@wsyzzz
Copy link
Author

wsyzzz commented Mar 22, 2018

@AlexeyAB Do you mean you build two versions of Yolo v2? One is built without OpenCV, the other one is built in OpenCV 3.4, and the latter one is faster? Can the latter one support OpenCV 3.1? Thanks~

@AlexeyAB
Copy link
Owner

I mean that OpenCV 3.4.0 already contains Yolo v2 for CPU inside OpenCV. So you can just install OpenCV 3.4.0 without installation Darknet - and you can use the fastest version of Yolo v2 for CPU.

Can the latter one support OpenCV 3.1?

Yolo v2 that built-in OpenCV only since 3.4.0.

But this repository you can use with any OpenCV version.

@TaihuLight
Copy link

TaihuLight commented Mar 28, 2018

@wsyzzz @AlexeyAB
What is the difference in the function and workflow for residual connection between shortcut layer and route layer?

@AlexeyAB
Copy link
Owner

@TaihuLight

  • [route]-layer concatenate the values:

    • 1st input: 1, 2, 3
    • 2nd input: 4, 5, 6
    • output: 1, 2, 3, 4, 5, 6
  • [shortcut]-layer adds (+) the values:

    • 1st input: 1, 2, 3
    • 2nd input: 4, 5, 6
    • output: 5, 7, 9

@TaihuLight
Copy link

TaihuLight commented Mar 28, 2018

@AlexeyAB Thank you, I think SORT-layer #473 can be implemented by changing the functions in [shorcut-]layers as following:

[SORT-shortcut]-layer adds (+) & multiple(*) the values:
1st input: 1, 2, 3
2nd input: 4, 5, 6
output: 5+sqrt(1 * 4), 7+sqrt(2 * 5), 9+sqrt(3 * 6)

default

@AlexeyAB
Copy link
Owner

@TaihuLight

If [shortcut] should be calculated

  • y = x + F(x) - forward
  • delta_x = delta_y + delta_F(x) - delta for back-propagation

And [SORT-shortcut] should be calculated as: https://arxiv.org/pdf/1703.06993.pdf

  • y = x + F(x) + sqrt( ReLU(x) * ReLU(F(x)) + 0.0001 ) - forward
  • so how should be calculated delta for back-propagation?

@TaihuLight
Copy link

TaihuLight commented Mar 28, 2018

@AlexeyAB
https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp
This is the code of SORT codes implemented in caffe by the author of this paper, I am learning it, and hope that it can help us.

@wsyzzz
Copy link
Author

wsyzzz commented Mar 30, 2018

@AlexeyAB
Here is a small question while I was using the example of Yolo. I used https://github.com/AlexeyAB/opencv.git and running the yolo_object_detection.cpp. It returned error: 'readNetFromDarknet' was not declared in this scope in line 43. And I found this function in opencv/modules/dnn/include/opencv2/dnn/dnn.hpp in line 620 between CV_EXPORTS_W Importer and createCaffeImporter (I'm sorry I don't know how to make a link to that line). But in the files I cloned, the same dnn.hpp doesn't have this function and createCaffeImporter is just behind CV_EXPORTS_W Importer.

I can download dnn.hpp individually to update the file, but it may confuse some people. Could you try to figure it out? Thanks.

@AlexeyAB
Copy link
Owner

@wsyzzz
Just use original OpenCV, I pulled Yolo v2 directly into OpenCV since 3.4.0:


If you want to use my repo, just switch to the branch dnn_darknet_yolo_v2: https://github.com/AlexeyAB/opencv/tree/dnn_darknet_yolo_v2
The rules of the contributions are such that all pullrequests must be done from additional brunches.

@fvlntn
Copy link

fvlntn commented Apr 6, 2018

@AlexeyAB

If forward is y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + 0.01)

https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp says for 2 inputs x1 / x2:
(this is how it is implemented in Caffe and symmetrical for x1 and x2)

sortbackward

@TaihuLight
Copy link

TaihuLight commented Apr 12, 2018

@ralek67 @AlexeyAB
What does negativeReLUSlope mean in your format?
What does top_diff denote in https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp?

@fvlntn
Copy link

fvlntn commented Apr 12, 2018

If ReLU is leaky:
y = x if x > 0
0.01x if x < 0
then negativeReLUSlope = 0.01

In Tiffany SORT Layer it is = 0

Forward : y = x + ???
Back: dx = dy + ???

top_diff = dy it's delta for backpropagation
bottom_diff[i] = top_diff[i] * (1.0+bottom_gradient_data[i]*(bottom_data[i] > 0));

In my formula it is the inverse: it should be dx1/dy = 1 + ...
Then if you multiply by dy you have the formula as in Tiffany:
dx1 = dy * ( 1 + Gradient * (x1 > 0))
so by identification:
dx1 = bottom_diff
dx2 = bottom_diff_1
dy = top_diff

Hope you understand

@AlexeyAB AlexeyAB added enhancement want enhancement Want to improve accuracy, speed or functionality and removed enhancement labels Apr 12, 2018
@TaihuLight
Copy link

TaihuLight commented Apr 16, 2018

@ralek67 Could you share the process of getting your formula of the gradient?
https://stackoverflow.com/questions/44512126/how-to-calculate-gradients-in-resnet-architecture

For forward, if y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + 0.001 and ReLU is leaky, SORT_short_cut can be implemented by replacing
out[out_index] += add[add_index];
with

float sqrt_shift=0.001;
out[out_index] = out[out_index] + add[add_index] + sqrtf(max(sqrt_shift,out[out_index] * add[add_index]+sqrt_shift));

Is it correct? the understand of the following code is correct?

@ralek67 @AlexeyAB @wsyzzz 

@fvlntn
Copy link

fvlntn commented Apr 19, 2018

I don't know how to add it in Darknet tbh.
But just reading your forward pass is wrong, you wrote:
out = out + add + sqrt (max(sqrtshift, outadd+sqrtshift))
(its obviously wrong since out
add is clearly positive after ReLU so max(sqrtshift, somethingpositive+sqrtshift means its always equals to somethingpositive+sqrtshift so it could be simplified to sqrt(out*add+sqrtshift) except if leaky)

and it should be:
out = out + add + sqrt (max(0,out)*max(0,add) + sqrtshift)
in blas.c

But according to the author of paper, it was implemented like this:
https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp

Forward :
image

If you just read Backward_cpu function for instance, you get to:
image
image

And that's exactly what math says if ReLU isn't leaky:
image

Thing is you keep talking about y = x + F(x) instead of 2 inputs y = x1 + x2
If that's what you want:
If y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + sqrtshift)
then
screenshot_20180419-101049

But that's just 16 years old student maths.

@TaihuLight
Copy link

TaihuLight commented Apr 20, 2018

(1) So, the forward pass y = x + F(x) + sqrt( x*F(x) + sqrtshift ) , sqrtshift = 0.001and ReLU is linear for the shorcut layer in YOLOv3, If out denotes the input value x=state.net.layers[l.index].output, and add denotes F(x)=state.input which is the output value of the last layer, then SORT_short_cut can be implemented with

float sqrt_shift=0.001;
out[out_index] = out[out_index] + add[add_index] + sqrt(max(0.0,out[out_index]) * max(0.0,add[add_index])+sqrt_shift);

in the funciton shortcut_cpu() of blas.c

@ralek67 Thank you very much! Although I have spent three days to understand functions of the shortcut layer, I still do not kown how to implemente the backward of !
(2)Then,the gradient in the backward sort_shortcut layer is calculated as follow,is it corret?
Besides, how to modify the backward for shortcut layer if the gradient is correct? I need your help sincerely.
qq 20180420165256

(3) Meanwhile, the forward [shortcut] is y=x + F(x) in YOLOv3, then how to understand the code of the backward i.e, backward_shortcut_layer()?

  • delta_x = delta_y + delta_F(x) - delta for back-propagation
  • gradient_array() denotes l.delta =l.delta * gradient (l.output,a) , gradient (l.output,a) computes the derivative of ReLU (is linear for YOLOv3) in the shorcut_layer, and l.delta denotes the sum of the incoming gradient ??
  • axpy_cpu denotes state.delta = state.delta + 1 * l.delta
  • shortcut_cpu denotes state.net.layers[l.index].delta = state.net.layers[l.index].delta + l.delta

In the above diagram, E denotes error for output neuron, the gradient delta_x is the sum of the incoming gradient delta_y and the product of the gradients delta_y and delta_F.

(4)Then how to understand axpy_cpu() and shortcut_cpu() in backward of original shortcut layer according to the processe of gradient computation shown in the above diagram?
@AlexeyAB @ralek67

void backward_shortcut_layer(const layer l, network_state state)  
{
    gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta); 
    axpy_cpu(l.outputs*l.batch, 1, l.delta, 1, state.delta, 1);  
    shortcut_cpu(l.batch, l.out_w, l.out_h, l.out_c, l.delta, l.w, l.h, l.c, state.net.layers[l.index].delta); //dx=dx+dy
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
want enhancement Want to improve accuracy, speed or functionality
Projects
None yet
Development

No branches or pull requests

4 participants