Add support for fused convolutions #2294

arrufat · 2021-02-02T17:18:49Z

I've been playing a bit with the idea of having fused convolutions (convolution + batch_norm) in dlib.
I think the first step would be to move all the operations that are done by the affine_ layer into the convolution, that is, update the bias of the convolution and re-scale the filters.

This PR adds some helper methods that allow doing this. The next step could be adding a new layer that can be constructed from an affine_ layer and it's a no-op, like the tag layers, or add a version of the affine layer that does nothing (just outputs its input, without copying or anything). How would you approach this?

Finally, here's an example that uses a visitor to update the convolutions that are below an affine layer.
It can be build from by putting the file into the examples folder and loading the pretrained resnet 50 from the dnn_introduction3_ex.cpp. If we manage to make something interesting out of it, maybe it would be interesting to have this visitor, too.

#include "resnet.h"

#include <dlib/dnn.h>
#include <dlib/image_io.h>

using namespace std;
using namespace dlib;

class visitor_fuse_convolutions
{
    public:
    template <typename T> void fuse_convolutions(T&) const
    {
        // disable other layer types
    }

    // handle the standard case (convolutional layer followed by affine;
    template <long nf, long nr, long nc, int sy, int sx, int py, int px, typename U, typename E>
    void fuse_convolutions(add_layer<affine_, add_layer<con_<nf, nr, nc, sy, sx, py, px>, U>, E>& l)
    {
        // get the parameters from the affine layer as alias_tensor_instance
        auto gamma = l.layer_details().get_gamma();
        auto beta = l.layer_details().get_beta();

        // get the convolution below the affine layer and its paramaters
        auto& conv = l.subnet().layer_details();
        const long num_filters_out = conv.num_filters();
        const long num_rows = conv.nr();
        const long num_cols = conv.nc();
        tensor& params = conv.get_layer_params();
        // guess the number of input filters
        long num_filters_in;
        if (conv.bias_is_disabled())
            num_filters_in = params.size() / num_filters_out / num_rows / num_cols;
        else
            num_filters_in = (params.size() - num_filters_out) / num_filters_out / num_rows / num_cols;

        // set the new number of parameters for this convolution
        const size_t num_params = num_filters_in * num_filters_out * num_rows * num_cols + num_filters_out;
        alias_tensor filters(num_filters_out, num_filters_in, num_rows, num_cols);
        alias_tensor biases(1, num_filters_out);
        if (conv.bias_is_disabled())
        {
            conv.enable_bias();
            resizable_tensor new_params = params;
            new_params.set_size(num_params);
            biases(new_params, filters.size()) = 0;
            params = new_params;
        }

        // update the biases
        auto b = biases(params, filters.size());
        b+= mat(beta);

        // rescale the filters
        DLIB_CASSERT(filters.num_samples() == gamma.k());
        auto t = filters(params, 0);
        float* f = t.host();
        const float* g = gamma.host();
        for (long n = 0; n < filters.num_samples(); ++n)
        {
            for (long k = 0; k < filters.k(); ++k)
            {
                for (long r = 0; r < filters.nr(); ++r)
                {
                    for (long c = 0; c < filters.nc(); ++c)
                    {
                        f[tensor_index(t, n, k, r, c)] *= g[n];
                    }
                }
            }
        }

        // reset the affine layer
        gamma = 1;
        beta = 0;
    }

    template <typename input_layer_type>
    void operator()(size_t , input_layer_type& l) const
    {
        // ignore other layers
    }

    template <typename T, typename U, typename E>
    void operator()(size_t , add_layer<T, U, E>& l)
    {
        fuse_convolutions(l);
    }
};

int main(const int argc, const char** argv)
try
{
    resnet::infer_50 net1, net2;
    std::vector<std::string> labels;
    deserialize("resnet50_1000_imagenet_classifier.dnn") >> net1 >> labels;
    net2 = net1;
    matrix<rgb_pixel> image;
    load_image(image, "elephant.jpg");

    const auto& label1 = labels[net1(image)];
    const auto& out1 = net1.subnet().get_output();
    resizable_tensor probs(out1);
    tt::softmax(probs, out1);
    cout << "pred1: " << label1 << " (" << max(mat(probs)) << ")" << endl;


    // fuse the convolutions in the network
    dlib::visit_layers_backwards(net2, visitor_fuse_convolutions());
    const auto& label2 = labels[net2(image)];
    const auto& out2 = net2.subnet().get_output();
    tt::softmax(probs, out2);
    cout << "pred2: " << label2 << " (" << max(mat(probs)) << ")" << endl;

    cout << "max abs difference: " << max(abs(mat(out1) - mat(out2))) << endl;
    DLIB_CASSERT(max(abs(mat(out1) - mat(out2))) < 1e-2);
}
catch (const exception& e)
{
    cout << e.what() << endl;
    return EXIT_FAILURE;
}

output with this image (elephant.jpg):

pred1: African_elephant (0.962677)
pred2: African_elephant (0.962623)
max abs difference: 0.00436211

UPDATE: make visitor more generic and show a results with a real image

pfeatherstone · 2021-02-03T07:59:47Z

This looks good. Does it speed up performance? CPU and GPU?

arrufat · 2021-02-03T08:19:09Z

I didn't benchmark anything, but I guess it will be the same as without "fusing" the convolutions, since we are still calling the affine_ layer. Now we need a layer that can be constructed from an affine_ layer and it's a no-op. Then process could look like:

train with network defined with bn_con_
assign to the same network, but defined with affine_
run the visitor_fuse_convolutions
assign the modified network to another one defined with a no-op layer instead of affine_

If we could add a method like .disable() to the affine_ layer and make it a no-op, that would be great, since step 4 could be done in the visitor.

pfeatherstone · 2021-02-03T08:29:40Z

If we could add a method like .disable() to the affine_ layer and make it a no-op, that would be great, since step 4 could be done in the visitor.

I would do that. Just need to add a boolean flag to the affine_ layer. Need to make that flag serializable and bump the version of the layer when serializing while still supporting the old version (i.e. without the boolean flag) when deserializing. That's the easiest thing to do from a user perspective. They don't have to create a 3rd type of network and they don't need to care about how deserialization works.

arrufat · 2021-02-03T09:16:17Z

Yes, if that is possible, me too :)

davisking · 2021-02-03T11:03:56Z

Yeah, I would go with a .disable() on the affine layer :)

arrufat · 2021-02-03T13:28:58Z

I've updated the affine_ layer to support the disabled mode.
In that example, I ran the image with both networks 1000, and these are the average timings:

layer	inference	fps	net size	VRAM
affine	4.115 ms	243	86.9994 MiB	594 MiB
disabled	3.829 ms	261	86.8261 MiB	588 MiB

So, some improvements. that we can get for free :)

pfeatherstone · 2021-02-03T13:35:28Z

So 7% performance gain depending on which way you look at it. Nice work!

pfeatherstone · 2021-02-03T13:37:01Z

Darknet FPS / dlib FPS ~ 70%. So it must be something else that's holding dlib back. But this is definitely a step closer

arrufat · 2021-02-03T13:47:46Z

Resnet 50 has only 49 batch normalizations (YOLOv3 has 75 and YOLOv4 109). So I am expecting to gain a bit more on those models :)

arrufat · 2021-02-03T13:51:41Z

@davisking when you have time, let us know what you think about this PR, notably the naming of functions, etc.

Also, maybe that visitor could be more generic and accept the case where the batch norm has an fc layer as its input... But I am not sure it's worth it... But in that case, the visitor should be named differently.

Ah, I put the visitor there because it depends directly on the affine_ layer (I tried to put it into the utilities header, but that didn't work.

pfeatherstone · 2021-02-03T15:00:30Z

It does make you wonder if there is any point in the affine layer anymore. Unless people are using it during training too. My guess is they are using it for converting batchnorm layers. In which case, this visitor should always be used. So maybe the visitor should somehow be implicitly called when inferring networks. Hmm

arrufat · 2021-02-04T01:29:17Z

From the affine_ layer documentation, this layer does nothing (identity transform) unless initialized with a bn_ layer. And it has no trainable parameters, so there's no point in using it for training.

I guess it should be disabled by default, and then enabled once it's initialized with a bn_ layer.
That way, when the user couts the net, it will see disabled if the affine_ hasn't been initialized properly. Thoughts on this?

dlib/dnn/layers.h

arrufat · 2021-02-04T04:47:47Z

After fixing the segfault, now I get this error when trying to do inference in a fused network that had bias disabled in convolutions:

Error detected at line 258.
Error detected in file ../external/dlib/dlib/cuda/cudnn_dlibapi.cpp.
Error detected in function void dlib::cuda::add(float, dlib::tensor&, float, const dlib::tensor&).

Failing expression was (have_same_dimensions(src, dest) || (src.num_samples()==1 && src.k()==dest.k() && src.nr()==1 && src.nc()==1) || (src.num_samples()==1 && src.k()==dest.k() && src.nr()==dest.nr() && src.nc()==dest.nc()) || (src.num_samples()==1 && src.k()==1 && src.nr()==dest.nr() && src.nc()==dest.nc()) || (src.num_samples()==dest.num_samples() && src.k()==1 && src.nr()==1 && src.nc()==1)) && is_same_object(src,dest) == false.

         dest.num_samples(): 1
         dest.k():           64
         dest.nr():          112
         dest.nc():          112
         src.num_samples():  0
         src.k():            0
         src.nr():           0
         src.nc():           0

So I am guessing there are some other parameters that should be modified elsewhere in the network... I'll keep digging.

EDIT: I found the problem, the biases alias tensor was only setup if the bias was enabled, I have to set it up if I enable bias. Either that, or set it up no matter what.

…into fused-convolution

davisking · 2021-09-05T12:46:56Z

Sweet. Can't look today. Might look tomorrow. Not sure.

arrufat · 2021-09-05T12:49:16Z

No hurries :) Thank you!

arrufat · 2021-09-06T11:01:00Z

Update, I tried fusing the weights of another network (VoVNet based) I had already trained, which already had duplicative bias disabled.
The inference speed increased by around 7% as expected, and the VRAM usage decreased as well, in contrast to the YOLOv3 model... I don't know what you're going to do with this information, but here it is anyway :)

davisking · 2021-09-25T15:21:51Z

From the affine_ layer documentation, this layer does nothing (identity transform) unless initialized with a bn_ layer. And it has no trainable parameters, so there's no point in using it for training.

I guess it should be disabled by default, and then enabled once it's initialized with a bn_ layer.
That way, when the user couts the net, it will see disabled if the affine_ hasn't been initialized properly. Thoughts on this?

Yeah, it defaults to an identity transform by default. So nothing wrong with having it default do not actually doing anything until it gets assigned. Not sure how I missed this comment and didn't reply until now :/

davisking · 2021-09-25T15:23:00Z

I'm severely tardy in looking at this PR too :( Should I look now or are you editing it again?

arrufat · 2021-09-25T15:27:09Z

There's no hurry, don't feel any pressure :)
You can have a look now, I think it's ready (it has worked on all networks I tried)

arrufat · 2021-09-25T15:34:14Z

The functionality is there already, I should probably add some tests for networks that have convolutions with and without biases.
I mean, it works, but there are no tests.
And also probably update an example program (of your choosing) to showcase this new functionality. Having the extra 7% speedup is quite nice :)

davisking

Yeah this looks cool. I left some comments. Totally add some tests too :)

dlib/cuda/tensor_tools.h

dlib/dnn/layers.h

dlib/dnn/layers_abstract.h

Co-authored-by: Davis E. King <davis@dlib.net>

arrufat · 2021-09-26T14:51:23Z

If the tests pass, then it's ready :)

davisking

This is nice :)

facug91 · 2022-08-26T19:34:42Z

I was having a problem in master when running the same example code I used in #2655, using Docker image nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04. When I executed the code I got the following exception:

Error while calling cudnnConvolutionBiasActivationForward( context(), &alpha1, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &alpha2, out_desc, out, descriptor(biases), biases.device(), identity_activation_descriptor(), out_desc, out) in file /home/user/dlib/dlib/cuda/cudnn_dlibapi.cpp:1219. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED

I found that this specific commit is the one causing that problem. Testing it with the commit before this one, it works.

arrufat · 2022-08-26T20:07:54Z

Oh, that's odd, I've been using this a lot, without issues. Can you give more details? Maybe it's related to CUDA? I've only ever tried it with CUDA 11, I think.

Edit: I just saw the issue.

facug91 · 2022-08-26T21:50:48Z

I should read the code, but it seems to me that the problem might be with cudnn 7

add helper methods to implement fused convolutions

923e1ca

arrufat mentioned this pull request Feb 3, 2021

Feature suggestions dlibml/darknet#3

Open

10 tasks

fix grammar

885c920

arrufat added 2 commits February 3, 2021 22:17

add method to disable affine layer and updated serialization

7d7bb4c

add documentation for .disable()

297e20f

arrufat added 2 commits February 3, 2021 22:44

add fuse_convolutions visitor and documentation

00dc4f1

update docs: net is not constant

22ab24b

arrufat changed the title ~~add helper methods to implement fused convolutions~~ Add support for fused convolutions Feb 3, 2021

fix xml formatting and use std::boolalpha

5276f9c

fix warning and updated net requirement for visitor

fc4ce85

arrufat commented Feb 4, 2021

View reviewed changes

dlib/dnn/layers.h Outdated Show resolved Hide resolved

arrufat and others added 3 commits February 4, 2021 13:23

fix segfault in fuse_convolutions visitor

239eba0

copy unconditionally

3452368

make the visitor class a friend of the con_ class

02f8515

arrufat and others added 3 commits February 4, 2021 14:53

setup the biases alias tensor after enabling bias

c9e77f2

simplify visitor a bit

3fd6fef

Merge remote-tracking branch 'refs/remotes/origin/fused-convolution' …

210b43b

…into fused-convolution

arrufat added 5 commits September 5, 2021 17:31

update documentation

5cbc72c

update documentation

1bc87dc

remove ccache leftovers from CMakeLists.txt

890fce2

Re-add new line

3f2bca2

fix enable/disable bias on unallocated networks

990f395

update comment to mention cudnnConvolutionBiasActivationForward

c5a851e

Merge branch 'davisking:master' into fused-convolution

45bd918

davisking requested changes Sep 25, 2021

View reviewed changes

arrufat and others added 7 commits September 26, 2021 01:11

fix typo

6505f73

Co-authored-by: Davis E. King <davis@dlib.net>

Apply documentation suggestions from code review

b6a4aae

Co-authored-by: Davis E. King <davis@dlib.net>

update affine docs to talk in terms of gamma and beta

6dc2f85

simplify tensor_conv interface

a104d67

fix tensor_conv operator() with biases

fbed9c6

add fuse_layers test

06d60a5

add an example on how to use the fuse_layers function

8faf641

fix typo

60873a8

davisking approved these changes Oct 11, 2021

View reviewed changes

davisking merged commit adca747 into davisking:master Oct 11, 2021

arrufat mentioned this pull request Aug 27, 2022

Performance regression from version 19.23 onwards #2655

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for fused convolutions #2294

Add support for fused convolutions #2294

arrufat commented Feb 2, 2021 •

edited

Loading

pfeatherstone commented Feb 3, 2021

arrufat commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021

arrufat commented Feb 3, 2021

davisking commented Feb 3, 2021

arrufat commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021 •

edited

Loading

arrufat commented Feb 3, 2021

arrufat commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021 •

edited

Loading

arrufat commented Feb 4, 2021

arrufat commented Feb 4, 2021 •

edited

Loading

davisking commented Sep 5, 2021

arrufat commented Sep 5, 2021

arrufat commented Sep 6, 2021

davisking commented Sep 25, 2021

davisking commented Sep 25, 2021

arrufat commented Sep 25, 2021

arrufat commented Sep 25, 2021

davisking left a comment

arrufat commented Sep 26, 2021

davisking left a comment

facug91 commented Aug 26, 2022 •

edited

Loading

arrufat commented Aug 26, 2022 •

edited

Loading

facug91 commented Aug 26, 2022

Add support for fused convolutions #2294

Add support for fused convolutions #2294

Conversation

arrufat commented Feb 2, 2021 • edited Loading

pfeatherstone commented Feb 3, 2021

arrufat commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021

arrufat commented Feb 3, 2021

davisking commented Feb 3, 2021

arrufat commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021 • edited Loading

arrufat commented Feb 3, 2021

arrufat commented Feb 3, 2021

pfeatherstone commented Feb 3, 2021 • edited Loading

arrufat commented Feb 4, 2021

arrufat commented Feb 4, 2021 • edited Loading

davisking commented Sep 5, 2021

arrufat commented Sep 5, 2021

arrufat commented Sep 6, 2021

davisking commented Sep 25, 2021

davisking commented Sep 25, 2021

arrufat commented Sep 25, 2021

arrufat commented Sep 25, 2021

davisking left a comment

Choose a reason for hiding this comment

arrufat commented Sep 26, 2021

davisking left a comment

Choose a reason for hiding this comment

facug91 commented Aug 26, 2022 • edited Loading

arrufat commented Aug 26, 2022 • edited Loading

facug91 commented Aug 26, 2022

arrufat commented Feb 2, 2021 •

edited

Loading

pfeatherstone commented Feb 3, 2021 •

edited

Loading

pfeatherstone commented Feb 3, 2021 •

edited

Loading

arrufat commented Feb 4, 2021 •

edited

Loading

facug91 commented Aug 26, 2022 •

edited

Loading

arrufat commented Aug 26, 2022 •

edited

Loading