Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

border_mode='same' and/or custom zero padding for convolutions #2118

Closed
2 of 3 tasks
benanne opened this issue Sep 18, 2014 · 19 comments
Closed
2 of 3 tasks

border_mode='same' and/or custom zero padding for convolutions #2118

benanne opened this issue Sep 18, 2014 · 19 comments

Comments

@benanne
Copy link
Contributor

benanne commented Sep 18, 2014

Several convolution implementations (cuda-convnet, cudnn, ...) provide a way to implicitly pad the input with a border of zeros. This mechanism is then used to implement the various 'border modes' (i.e. full, valid, same).

Currently Theano's theano.tensor.nnet.conv2d only allows to set the border_mode, and as far as I can tell all current implementations only support valid and full, but not same.

The main advantage of border_mode='same' is that the output feature maps are the same size as the input, so this makes it a lot easier to design architectures with lots of stacked layers.

Implementing a same convolution in Theano is of course possible by doing the padding manually and then using a valid convolution, or by using a full convolution and then slicing the output. But neither of these methods is optimal.

Considering that same convolutions are pretty commonly used now in state of the art networks (Krizhevky's 2012 ImageNet entry, VGG, GoogLeNet, ...), it would be useful to have proper support for this in Theano as well.

The easiest way to achieve this would be to make theano.tensor.nnet.conv2d support border_mode='same'. The most flexible way would be to explicitly expose a pad argument, like cuda-convnet and cudnn. The most convenient way would be to support both :)

I don't really have any code to contribute unfortunately, but I wanted to bring this up in case someone else is interested in this and wants to work on it.

TODO:

@madisonmay
Copy link
Contributor

👍

Agreed that this would be an excellent addition to Theano in light of how common 'same' convolutions are in recent papers.

@nouiz
Copy link
Member

nouiz commented Sep 19, 2014

I agree it would be good, but I don't have the time. If you want to use
just the GPU version, I can describe how to do it quickly on the CPU, by
just disable the c code for the case that it don't support.

If you use directly GpuCorrMM, do it support it? There parameter is there,
but I forgot if the implementation is finished for it. If that is the
implementation that you want to use, it would be faster just to finish it
if it isn't.

On Fri, Sep 19, 2014 at 1:07 AM, Madison May notifications@github.com
wrote:

[image: 👍]

Agreed that this would be an excellent addition to Theano in light of how
common 'same' convolutions are in recent white papers.


Reply to this email directly or view it on GitHub
#2118 (comment).

@benanne
Copy link
Contributor Author

benanne commented Sep 19, 2014

Ideally it would be available in all implementations so I can use whatever is fastest :) I don't think I have the skills to implement this on the GPU (or the time to learn them, to be honest), which is why I made this issue instead of a pull request. But now it's here, so hopefully someone with the same needs and the right skill set will see it!

@abergeron
Copy link
Member

Actually if you want to do it for GpuDnnConv, you would have to skill for it :) It's just a matter of computing the right amount of padding to pass in the descriptor.

@benanne
Copy link
Contributor Author

benanne commented Sep 23, 2014

That would mean I have to use GpuDnnConv directly though, right? I wouldn't be able to rely on the optimization to swap it in because theano.tensor.nnet.conv2d and GpuConv don't support border_mode='same'.

I like how I can currently try every implementation (legacy, cudnn, conv_gemm and conv_fft) just by setting a flag (well, except cuda-convnet), so in the long term I'd love to see this implemented everywhere. But I guess it's a good first step, I'll look into it if I find some time.

I believe I saw that conv_gemm already supports something similar, but it's called half instead of same. I guess this actually makes a little more sense semantically, since it's not possible to do a same convolution if the filter size is even (or it would be asymmetric).

So there are a number of ways this could be implemented:

  • Theano could complain if same is used with an even filter size
  • ... or it could just pad asymmetrically, but I guess some implementations (like cudnn) would not support this
  • or we could just call it half everywhere instead of same and then it's unambiguous. Of course in that case it's up to the user to realize that with even filter sizes, the output dimensionality will not be the same as the input dimensionality (it will differ by 1).

I guess the latter is probably the cleanest approach, but same is also used by numpy (http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html), so it would be inconsistent with that. Not really an issue for me personally but I thought I'd mention this.

@nouiz
Copy link
Member

nouiz commented Sep 24, 2014

Just a quick comment, same is used by scipy convolution. we must have the
same behavior as it. If it happen that half is the same, I think we should
rename half to same. If they are not the same, we could support both,
depending of the need and people time to implement them:)

I also like to have a good "cond2d() + opt" interface. But it will need
some work on the conv2d (implement same/half/pad and implement in a simple
graph the subsample). No time for this in the short time.

@f0k
Copy link
Contributor

f0k commented Oct 7, 2014

Several convolution implementations (cuda-convnet, cudnn, ...) provide a way to implicitly pad the input with a border of zeros. This mechanism is then used to implement the various 'border modes' (i.e. full, valid, same).

Some implementations include a different algorithm for border_mode='full', though, because the gradient wrt. input of a valid convolution is a full convolution. In the Pylearn2 wrapper of cuda-convnet, this is ImgActs, in Theano's wrapper of the caffe convolution, this is GpuCorrMM_gradInputs. Even if they support custom zero-padding as an additional feature, using it for a full convolution in the forward pass is a lot slower than using their implementation for the gradient wrt. inputs. (This is something to be taken care of by the graph optimizer, but we need to ensure the information is there to do it.)

Now a general problem is that border_mode and pad are redundant, with the latter being more expressive. cuda-convnet, cuDNN and the caffe convolution only support padding and no border_mode. For the caffe convolution wrapper, I made pad support integers, integer tuples and string constants ("full" and "half"), and basically dropped support for border_mode (it's still there as a parameter for clarity, but it only supports "valid"). This exposes the full feature set of the convolution implementations we have at hands, and I guess the cleanest way would be changing the conv2d interface to support pad and remove border_mode. For compatibility reasons and consistency with numpy/scipy, this will not happen, but we could either add pad and use border_mode only as a deprecated way of setting pad, or we could have border_mode support integers, integer tuples, and string constants ("valid", "full" and "half", maybe "same" if somebody feels like implementing it).

I also like to have a good "cond2d() + opt" interface. But it will need some work on the conv2d (implement same/half/pad and implement in a simple graph the subsample). No time for this in the short time.

I think it's not necessary to have the legacy ConvOp support everything (at least not the GPU version). It could just serve as a vehicle to hold the parameters, hope to be replaced by another implementation and bail out at runtime if it finds that it was left in the graph with an unsupported combination of parameters.

@abergeron, @nouiz: What do you think about extending the conv2d() interface to support:

  • border_mode="valid": padding of (0, 0)
  • border_mode="full": padding of (kh-1, kw-1)
  • border_mode="half": padding of (kh//2, kw//2)
  • border_mode=n: padding of (n,n)
  • border_mode=(n,m): padding of (n,m)

The latter two options abuse border_mode for custom padding, but as I said above, replacing border_mode by pad is not feasible for compatibility reasons and supporting both border_mode and pad is redundant (what to do if both are given?).

I feel reasonably acquainted with the code to work on that (not now, but next month).

@nouiz
Copy link
Member

nouiz commented Oct 9, 2014

I'm ok with this extension of border_mode. I agree, we need to keep compatibility with numpy semantic, but we can extend it. We can put in the documentation that other system call it pad.

I'll probably keep the pad parameter in the op that use it, to don't break user. Just make sure that if both pad and border_mode are provided, they must be equivalent. Otherwise raise an error.

@f0k
Copy link
Contributor

f0k commented Oct 9, 2014

I'll probably keep the pad parameter in the op that use it, to don't break user.

Yes, I think the different wrappers of existing convolution implementations should provide whatever their implementations offer and not be forced to mimic the conv2d() interface. The graph optimizers can deal with how to call a particular wrapper to replace a particular ConvOp instance.

@jwingit
Copy link

jwingit commented Sep 11, 2015

A simple way to emulate border_mode=same is to simply access a subset of the conv2D output. E.g., for square filter you could define this module:

def conv2D_border_mode_same( x, w ):
    fso = T.shape(w)[2] - 1  # this is the filter size minus 1
    return conv2d(x, w, border_mode='full')[:,:,fso:nps+fso,fso:nps+fso]

@nouiz
Copy link
Member

nouiz commented Sep 12, 2015

Yes, but that trigger extra computation. It would be better to not do that.
But you give a quick workaround. This is useful until it get done. There is
a PR that have a new CPU convolution with custom padding. We already have
that for cudnn.

On Fri, Sep 11, 2015 at 6:01 PM, jwingit notifications@github.com wrote:

A simple way to emulate border_mode=half is to simply access a subset of
the conv2D output. E.g., for square filter you could define this module:

def conv2D_border_mode_same( x, w ):
fso = T.shape(w)[2] - 1 # this is the filter size minus 1
return conv2d(x, wu, border_mode='full')[:,:,fso:nps+fso,fso:nps+fso]


Reply to this email directly or view it on GitHub
#2118 (comment).

@nouiz
Copy link
Member

nouiz commented Feb 24, 2017

border mode is done for some time already. We probably won't implement border_mode=same, so closing.

@nouiz nouiz closed this as completed Feb 24, 2017
@ragavvenkatesan
Copy link

ragavvenkatesan commented Mar 7, 2017

     if border_mode == 'same':               # this is used typically by VGG.
        _out_height = image_shape[2]
        _out_width = image_shape[3]
        x = (filter_shape[2] - 1) // 2
        y = (filter_shape[3] - 1) // 2
        border_mode = (x,y)

        from theano.sandbox import cuda
        if not cuda.dnn.dnn_available():
            raise Exception ("cuDNN is needed for this type of convolution.")

        else:
            if subsample[0] == 1 and subsample[1] == 1:
                self.out = cuda.dnn.dnn_conv (
                        img = input ,
                        kerns = filters,
                        border_mode= border_mode,
                        conv_mode = 'cross'
                        )
            else:
                self.out = cuda.dnn.dnn_conv (
                        img = input ,
                        kerns = filters,
                        border_mode= border_mode,
                        subsample = subsample,
                        conv_mode = 'cross'
                        )

Here is how I do this at the moment, but with theano 0.10.0 cuda backend is going to removed. I wonder if there is a reason why same won't be supported in theano using the libgpuarray backed?

@nouiz
Copy link
Member

nouiz commented Mar 8, 2017 via email

@f0k
Copy link
Contributor

f0k commented Mar 8, 2017

Here is how I do this at the moment

Looks like you'd just want to use border_mode='half': http://deeplearning.net/software/theano/library/tensor/nnet/conv.html#theano.tensor.nnet.conv2d
There is no implementation of border_mode='same' since that would require asymmetric padding for even filter sizes. For odd filter sizes, border_mode='half' does exactly what you want. Sorry if the naming is confusing, I guess that was based on my suggestion, to make clear that it does symmetric padding with half the filter size.

@rjpg
Copy link

rjpg commented May 13, 2018

In the "same" mode (zero padding) it would be nice to have something like "same donut" and "same cylinder" to choose "top/bottom" ("cylinder-vertical") of "side to side" ("cylinder-horizontal") and instead of zeros it would "roll" the extremes of the images ... For example the top zeros would be the values on the bottom of the image. This would be nice in multivariables time-series problems , where I have better results than using LSTM ... (Instead of a CNN looking to 2D images it looks into a 2D map of several values through time ). The kernel map (CNN "neuron") instead of making correlations with "zeros", while working on the edge of the multivariable time-serie map, it would be nice to make correlations with with the variables from top to bottom...

Well it is difficult to explain ... Where can I see the source code of keras tensorflow where they put zeros around the image ? so I can make a new version for me to put other things instead of zeros ?

@lamblin
Copy link
Member

lamblin commented May 14, 2018

#5827 is about that feature request.
It is unlikely we will work on integrating this in the interface, but you could probably do something with subtensors and concatenation to emulate it.

@f0k
Copy link
Contributor

f0k commented May 25, 2018

Where can I see the source code of keras tensorflow where they put zeros around the image ?

On CUDA GPUs, the actual implementation is usually in cuDNN, which is a closed-source library provided by Nvidia. Libraries will usually have fall-back implementations that you could modify (often based on im2col and gemm), but are a bit slower. The easiest workaround will be to pad your network input (with wraparound) and then use unpadded ("valid") convolutions throughout.

@rjpg
Copy link

rjpg commented May 25, 2018

yes nice ideia ... I will build some custom layer with using keras lambda function to transform the maps and include the lines on top and bottom of the feature with the values of the bottom and top lines of the feature to create an horizontal cylinder ... according with the size of the filter used in the next layer (half of it...). Nice Ideia ! thanks .

this way the kernel (cnn neuron) will travel the features maps like in the asteroid game :-) what goes up appear down :-) in a multivariable time series it will correlate all variables with all ... and not with zeros (the ones on the top and bottom)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants