-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for fused convolutions #2294
Conversation
This looks good. Does it speed up performance? CPU and GPU? |
I didn't benchmark anything, but I guess it will be the same as without "fusing" the convolutions, since we are still calling the
If we could add a method like |
I would do that. Just need to add a boolean flag to the |
Yes, if that is possible, me too :) |
Yeah, I would go with a .disable() on the affine layer :) |
I've updated the
So, some improvements. that we can get for free :) |
So 7% performance gain depending on which way you look at it. Nice work! |
Darknet FPS / dlib FPS ~ 70%. So it must be something else that's holding dlib back. But this is definitely a step closer |
Resnet 50 has only 49 batch normalizations (YOLOv3 has 75 and YOLOv4 109). So I am expecting to gain a bit more on those models :) |
@davisking when you have time, let us know what you think about this PR, notably the naming of functions, etc. Also, maybe that visitor could be more generic and accept the case where the batch norm has an fc layer as its input... But I am not sure it's worth it... But in that case, the visitor should be named differently. Ah, I put the visitor there because it depends directly on the |
It does make you wonder if there is any point in the affine layer anymore. Unless people are using it during training too. My guess is they are using it for converting batchnorm layers. In which case, this visitor should always be used. So maybe the visitor should somehow be implicitly called when inferring networks. Hmm |
From the I guess it should be disabled by default, and then enabled once it's initialized with a |
After fixing the segfault, now I get this error when trying to do inference in a fused network that had bias disabled in convolutions:
So I am guessing there are some other parameters that should be modified elsewhere in the network... I'll keep digging. EDIT: I found the problem, the |
Sweet. Can't look today. Might look tomorrow. Not sure. |
No hurries :) Thank you! |
Update, I tried fusing the weights of another network (VoVNet based) I had already trained, which already had duplicative bias disabled. |
Yeah, it defaults to an identity transform by default. So nothing wrong with having it default do not actually doing anything until it gets assigned. Not sure how I missed this comment and didn't reply until now :/ |
I'm severely tardy in looking at this PR too :( Should I look now or are you editing it again? |
There's no hurry, don't feel any pressure :) |
The functionality is there already, I should probably add some tests for networks that have convolutions with and without biases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this looks cool. I left some comments. Totally add some tests too :)
Co-authored-by: Davis E. King <davis@dlib.net>
If the tests pass, then it's ready :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice :)
I was having a problem in
I found that this specific commit is the one causing that problem. Testing it with the commit before this one, it works. |
Oh, that's odd, I've been using this a lot, without issues. Can you give more details? Maybe it's related to CUDA? I've only ever tried it with CUDA 11, I think. Edit: I just saw the issue. |
I should read the code, but it seems to me that the problem might be with cudnn 7 |
I've been playing a bit with the idea of having fused convolutions (convolution + batch_norm) in dlib.
I think the first step would be to move all the operations that are done by the
affine_
layer into the convolution, that is, update the bias of the convolution and re-scale the filters.This PR adds some helper methods that allow doing this. The next step could be adding a new layer that can be constructed from an
affine_
layer and it's a no-op, like the tag layers, or add a version of the affine layer that does nothing (just outputs its input, without copying or anything). How would you approach this?Finally, here's an example that uses a visitor to update the convolutions that are below an affine layer.
It can be build from by putting the file into the examples folder and loading the pretrained resnet 50 from the
dnn_introduction3_ex.cpp
. If we manage to make something interesting out of it, maybe it would be interesting to have this visitor, too.output with this image (
elephant.jpg
):UPDATE: make visitor more generic and show a results with a real image