Unshared convolution #6007

vikramnitin9 · 2017-06-04T16:31:17Z

Related to #5620 .

Task list :

C code for grad inputs and tests
All tests need to pass.
Change weight tensor dimension ordering instead of dimshuffle
Integrate unshared tests with AbstractConv
GPU code

gvtulder

Thanks for starting the pull request. I know you're not finished yet, but I left some first comments.

gvtulder · 2017-06-05T09:43:56Z

theano/tensor/nnet/abstract_conv.py

@@ -1507,6 +1526,13 @@ def conv(self, img, kern, mode="valid", dilation=1):
        out_shape = get_conv_output_shape(img.shape, kern.shape,
                                          mode, [1] * self.convdim, dilation)

+        out_rows = out_shape[self.convdim]
+        out_cols = out_shape[self.convdim+1]


This only works for 2D, so make sure you check that somewhere if you don't want to implement it now (the rest of the function does 3D as well).

I also don't think self.convdim is the best thing to use here, it doesn't really make sense. It's just a coincidence that batch and output channels use exactly self.convdim dimensions, so I'd just use 2.

Even better: you could make this whole first part work in a convdim-agnostic way, by checking if kern.shape[2] == np.prod(out_shape[2:]).

gvtulder · 2017-06-05T09:44:13Z

theano/tensor/nnet/abstract_conv.py

+        out_rows = out_shape[self.convdim]
+        out_cols = out_shape[self.convdim+1]
+
+        if unshared == True:


Do you have a specific reason for explicitly checking for == True here? (See also unshared == False below.)

gvtulder · 2017-06-05T09:45:17Z

theano/tensor/nnet/abstract_conv.py

+                            else:
+                                for reg in xrange(kern.shape[1]):
+                                    row = reg // out_cols
+                                    col = reg - row*out_cols


Perhaps col = reg % out_cols would be nicer?

gvtulder · 2017-06-05T09:51:14Z

theano/tensor/nnet/abstract_conv.py

+            if self.convdim == 3:
+                raise NotImplementedError("Unshared 3D convolution not implemented")
+            elif kern.type.ndim != 3 + self.convdim:
+                raise TypeError('kern must be %dD tensor' % (3 + self.convdim))   


I would swap this: explicitly check that self.convdim == 2 and raise an error if it is not. The AbstractConv class works for any convdim, even though it's only currently used for 2D and 3D.

gvtulder · 2017-06-05T09:53:22Z

theano/tensor/nnet/__init__.py

+        If true, then unshared or 'locally connected' convolution will be
+        performed. A different kernel will be used for each region of the
+        input.
+


I think it would be useful to change the documentation for filters as well, to explain how you should pass the unshared weights.

gvtulder · 2017-06-05T10:10:22Z

What will the shape of the filter parameter look like for unshared convolution? From your Python implementation, it seems that you combine the output_rows * output_cols filters in one extra dimension, i.e., (output channels, input channels, output rows * output cols, filter rows, filter columns). Is this the shape we want?

For comparison: Lasagne uses two additional dimensions, i.e., (output channels, input channels, output rows, output cols, filter rows, filter columns). I think I would find that easier to work with.

vikramnitin9 · 2017-06-07T19:50:54Z

The method I've used to compute the gradients wrt inputs seems very messy. In order to express it as a convolution, the weight matrix had to be rearranged quite a bit. Is there a better way?

gvtulder · 2017-06-11T19:33:52Z

theano/tensor/nnet/abstract_conv.py

@@ -1565,6 +1565,7 @@ def conv(self, img, kern, mode="valid", dilation=1, unshared=False):
                            if unshared is True:
                                for row in xrange(kern.shape[2]):
                                    for col in xrange(kern.shape[3]):
+                                        val = _valfrommode('valid')
                                        out[b, n, row, col] += _convolve2d(img[b, im0, row:row + kern.shape[4],
                                                                           col:col + kern.shape[5]],
                                                                           dilated_kern[n, im0, row, col, ...],


Is convolve2d necessary here? Since the image part you're extracting has the same size as the kernel, isn't this equal to an elementwise multiplication and summation? (Perhaps with a flip somewhere.)

I'd retained the convolve2d because it would eliminate two more loops, and because then I would need to flip the filter manually. But I've made it element-wise now.

gvtulder · 2017-06-11T20:18:49Z

theano/tensor/nnet/abstract_conv.py

                            else:
-                                kern[:, :, i, j, m, m] = kern[:, :, i + m, j + n, m, n]
+                                new_kern[:, :, i, j, m, n] = kern[:, :, i + m, j + n, m, n]


Does this work? It's somewhat complex, so I might have made a mistake in my manual derivations, but I think this is missing the first part of the kernel and shifting everything else too far to the left.

In a 1D case, with a 5-pixel input and 3-pixel kernel, I think you want the kernel to be changed like this:

kern = [[a, b, c], [d, e, f], [g, h, i]] new_kern = [[_, _, a], [_, b, d], [c, e, g], [f, h, _], [i, _, _]]

but this implementation would change it to this, I think:

new_kern = [[a, b, c], [d, e, _], [g, _, _], [_, _, _], [_, _, _]]

gvtulder · 2017-06-11T20:46:42Z

The method I've used to compute the gradients wrt inputs seems very messy. In order to express it as a convolution, the weight matrix had to be rearranged quite a bit. Is there a better way?

Maybe it's better to implement the gradient computation directly in Python, without using convolve2d?

I guess the reason to use convolve2d is that it can speed things up in the shared case, because it's really doing a series of convolutions. In the unshared case you have to call it separately for every pixel, so convolve2d might not be much faster than an elementwise multiplication. For the gradient wrt inputs you have to do the same, but you would need a tricky reorganisation of the kernel matrix first. At that point, convolve2d may be more trouble than it's worth.

I would suggest making a new helper function for unshared grad wrt inputs in BaseAbstractConv, reusing a little bit of code from conv, and using that in the gradInputs perform. I think that would be simpler than trying to reorganise the kernel so you can do a forward convolution.

What do you think?

gvtulder · 2017-06-11T20:56:11Z

theano/tensor/nnet/abstract_conv.py

+            if (kern.shape[2], kern.shape[3]) != (out_shape[2], out_shape[3]):
+                raise ValueError('Kernel shape ({},{}) does not match '
+                                 'output size ({},{})'.format(kern.shape[2], kern.shape[3],
+                                                              out_shape[2], out_shape[3]))

        out = np.zeros(out_shape, dtype=img.dtype)
        dil_kern_shp = kern.shape[:-self.convdim] + tuple(


A few lines below, I think you might need to change this assignment of dilated_kern:

dilated_kern[(slice(None), slice(None)) + tuple(slice(None, None, dilation[i]) for i in range(self.convdim)) ] = kern

because it doesn't include the row/column dimensions in an unshared convolution.

gvtulder · 2017-06-11T21:21:37Z

theano/tensor/nnet/abstract_conv.py

@@ -1022,7 +1044,8 @@ def conv2d_grad_wrt_weights(input,
                                               border_mode=border_mode,
                                               subsample=subsample,
                                               filter_flip=filter_flip,
-                                               filter_dilation=filter_dilation)
+                                               filter_dilation=filter_dilation,
+                                               unshared=False)


I think you want unshared=unshared here.

I'd actually added this locally and forgotten to commit it. Fixed now, thanks.

Ditto here :)

gvtulder

Thanks for adding the tests. The forward convolution seems to work fine, good!

There still seems to be a small problem in the grad wrt weights. The test passes, but when I tried to test the value of the result it didn't work (see the test below).

gvtulder · 2017-06-11T21:27:43Z

theano/tensor/nnet/tests/test_abstract_conv.py

+        grad_weights = conv.conv2d_grad_wrt_weights(inputs, topgrad,
+                                                    self.kshp, unshared=True)
+        grad_func = theano.function([inputs, topgrad], grad_weights, mode=self.mode)
+        grad_val = grad_func(inputs_val, topgrad_val)


Can I suggest a small extension of this test that also checks the result?

def test_gradweight(self): inputs = theano.tensor.tensor4() topgrad = theano.tensor.tensor4() inputs_val = np.random.random(self.imshp).astype(theano.config.floatX) topgrad_val = np.random.random(self.topgrad_shape).astype(theano.config.floatX) # compute gradient for unshared convolution grad_weights_unshared = conv.conv2d_grad_wrt_weights(inputs, topgrad, self.kshp, unshared=True) grad_func_unshared = theano.function([inputs, topgrad], grad_weights_unshared, mode=self.mode) grad_val_unshared = grad_func_unshared(inputs_val, topgrad_val) # compute gradient for shared convolution as a reference grad_weights_shared = conv.conv2d_grad_wrt_weights(inputs, topgrad, self.kshp, unshared=False) grad_func_shared = theano.function([inputs, topgrad], grad_weights_shared, mode=self.mode) grad_val_shared = grad_func_shared(inputs_val, topgrad_val) # summing over all unshared gradients should give the shared gradient utt.assert_allclose(grad_val_shared, np.sum(grad_val_unshared, axis=(2, 3)))

I'd actually added this locally and forgotten to commit it. Fixed now, thanks.

vikramnitin9 · 2017-06-13T16:20:58Z

I have a doubt here. Why do we flip the topgrad, convolve and then again flip the output? Isn't this equivalent to flipping (or not flipping) the kern and convolving?

vikramnitin9 · 2017-06-13T16:23:18Z

I would suggest making a new helper function for unshared grad wrt inputs in BaseAbstractConv, reusing a little bit of code from conv, and using that in the gradInputs perform. I think that would be simpler than trying to reorganise the kernel so you can do a forward convolution.

I've just packed it all into the gradInputs perform function for now. Do you think it looks fine or should I move it into a separate function?

gvtulder

Good work.

In general, I think it's good to try to keep things 2D/3D agnostic, except for the few cases where that is really inconvenient (mostly the perform functions). I suppose that once unshared convolution works for 2D, at some point we'll want to apply it to 3D as well, and then it's easier if we don't have to change more than necessary.

A second thing: the perform functions are becoming fairly long. Maybe it would be better to move some of the unshared-specific code to helper functions, so that the difference between shared and unshared is a bit easier to see. For instance, the _convolve2d function could have a _convolve2d_unshared equivalent.

Exaggerated, it's now a bit like this:

    def perform(...):
        if self.unshared:
            # a lot of code with nested loops
        else:
            self.conv(a, b)

but I would prefer:

    def perform(...):
        if self.unshared:
            self.unshared_conv(a, b)
        else:
            self.conv(a, b)

gvtulder · 2017-06-14T13:55:41Z

theano/tensor/nnet/abstract_conv.py

+        else:
+            dilated_kern[(slice(None), slice(None)) +
+                         tuple(slice(None, None, dilation[i]) for i in range(self.convdim))
+                         ] = kern


You can get rid of the if if you do something like this:

dilated_kern[(slice(None),) * (dilated_kern.ndim - self.convdim), ...

gvtulder · 2017-06-14T13:57:25Z

theano/tensor/nnet/abstract_conv.py

-
+            dilated_kern[(slice(None),) * 4 +
+                         tuple(slice(None, None, dilation[i]) for i in range(self.convdim))
+                         ] = kern


I'd prefer not to use these magic numbers like 4 that only work for the 2D case, since it's probably useful to have unshared 3D convolution in the future as well.

gvtulder · 2017-06-14T14:02:00Z

theano/tensor/nnet/abstract_conv.py

+            broadcastable = [topgrad.broadcastable[1],
+                             img.broadcastable[1]] + ([False] * 4)
+            output = theano.tensor.TensorType(theano.config.floatX,
+                                              broadcastable=broadcastable)()


Is there a reason you made this a special case for unshared convolution? Wouldn't the line output = img.type.clone(...)() for the shared convolution work here as well?

In any case, I don't think it's safe to assume that the image is of type theano.config.floatX, so using img.type.clone would probably be better?

gvtulder · 2017-06-14T14:05:15Z

theano/tensor/nnet/abstract_conv.py

+                         ] = kern
+            # Flip the kernel since the convolution is done manually
+            dilated_kern = dilated_kern[(slice(None),) * 4 +
+                                        (slice(None, None, -1),) * 2]


I wouldn't do this flip here. Instead, you can do it when you compute the result of the convolution. That makes the shared and unshared cases more similar.

gvtulder · 2017-06-14T14:06:56Z

theano/tensor/nnet/abstract_conv.py

+                                            for k_col in xrange(kern.shape[5]):
+                                                out[b, n, row, col] += img[b, im0, row + k_row,
+                                                                           col + k_col] * \
+                                                    dilated_kern[n, im0, row, col, k_row, k_col]


Instead of flipping the kernel earlier, compute the correct indices here.

I was actually thinking of doing an np.sum(np.multiply(..., ...)) to eliminate the two inner loops and not unnecessarily call _convolve2d. Do you think I should? That will mean flipping the kernel earlier.

Yes, that could work.

gvtulder · 2017-06-14T14:33:53Z

theano/tensor/nnet/abstract_conv.py

+            if any(self.filter_dilation[i] > 1 for i in range(self.convdim)):
+                kern = kern[(slice(None),) * 4 +
+                            tuple(slice(None, None, self.filter_dilation[i])
+                                  for i in range(self.convdim))]


Keep this outside the if unshared.

gvtulder · 2017-06-14T14:35:02Z

theano/tensor/nnet/abstract_conv.py

-                        (slice(None, None, -1),) * self.convdim)
-        topgrad = topgrad.transpose(axes_order)[flip_filters]
-        img = img.transpose(axes_order)
-
        if self.unshared is True:


Remember to check that self.convdim == 2.

gvtulder · 2017-06-14T14:37:00Z

theano/tensor/nnet/abstract_conv.py


-        # Rearranging the weights into the correct shape for unshared
        if self.unshared is True:


Check that self.convdim == 2.

gvtulder · 2017-06-14T14:47:39Z

theano/tensor/nnet/abstract_conv.py

@@ -1690,9 +1690,9 @@ def perform(self, node, inp, out_):
            img = new_img
        if not self.filter_flip:
            if self.unshared is True:


Check self.convdim == 2 (probably earlier in the function).

I've now added a check in the init function of BaseAbstractConv. Will that do?

It's certainly good to have it there, and hopefully it's somewhat temporary anyway. However, I do think I'd like to have an explicit check in perform as well. It's good to have the check and warning close to where the limitation actually is, in my opinion. It's useful as documentation and also makes it future-proof: if at some point the check is removed from BaseAbstractConv, it's better for perform to crash than to silently run a shared convolution.

Compare this with the 2D/3D limitation of the convolution: that's also explicitly checked in BaseAbstractConv.conv.

gvtulder · 2017-06-14T15:08:57Z

theano/tensor/nnet/abstract_conv.py

+                                        reg_row = row - ((kern.shape[4] - 1) - k_row)
+                                        reg_col = col - ((kern.shape[45] - 1) - k_col)
+                                        reg_row = max(0, min(reg_row, kern.shape[2] - 1))
+                                        reg_col = max(0, min(reg_col, kern.shape[3] - 1))


I'm surprised to see max/min here, are you sure this is correct? If a value falls outside the available range, wouldn't you want to ignore it, rather than add it to the nearest row/col?

Isn't this the same as the loops and indices for the forward convolution, except that the source and target variables are swapped?

I'm surprised to see max/min here, are you sure this is correct? If a value falls outside the available range, wouldn't you want to ignore it, rather than add it to the nearest row/col?

I did this since the topgrad for these entries will anyway be zero so nothing will be added. But I've made a new version now.

Isn't this the same as the loops and indices for the forward convolution, except that the source and target variables are swapped?

The difference is that the row and col of the kern vary within the inner two loops also.

gvtulder · 2017-06-14T16:02:10Z

I have a doubt here. Why do we flip the topgrad, convolve and then again flip the output? Isn't this equivalent to flipping (or not flipping) the kern and convolving?

Possibly, yes. Maybe there is a combination of dilation, subsampling etc. when it doesn't? (I tried to come up with an example that wouldn't work, but I failed.)

vikramnitin9 · 2017-06-14T18:18:45Z

Regarding moving code into helper functions, I'd probably have to have different functions for each of the three operations (forward, gradWeights, gradInput), since the computation performed in the loop is different in each of these cases (unlike for a normal computation, where the same conv function can be used).

Should I go ahead and create separate functions just the same?

gvtulder · 2017-06-14T20:40:37Z

Regarding moving code into helper functions, I'd probably have to have different functions for each of the three operations (forward, gradWeights, gradInput), since the computation performed in the loop is different in each of these cases (unlike for a normal computation, where the same conv function can be used).

Should I go ahead and create separate functions just the same?

Yes, I still think that would be useful. But maybe that's a matter of taste. Ideally, I'd like to see a perform method that includes only a few unshared-specific lines.

gvtulder · 2017-06-14T21:50:07Z

theano/tensor/nnet/abstract_conv.py

-                raise NotImplementedError('Unshared convolution not implemented for %dD'
-                                          % self.convdim)
-            elif kern.type.ndim != 6:
+            if kern.type.ndim != 6:


!= 2 + 2 * self.convdim

gvtulder · 2017-06-14T21:51:24Z

theano/tensor/nnet/abstract_conv.py

-                                        reg_col = max(0, min(reg_col, kern.shape[3] - 1))
+                                        if reg_row not in range(kern.shape[2]) or \
+                                                reg_col not in range(kern.shape[3]):
+                                            continue


I don't think this is very efficient: it's creating these range objects for every pixel. Normal comparisons are probably faster. (And I'd invert the comparison, so you don't need the continue.)

gvtulder · 2017-06-15T11:57:29Z

Regarding moving code into helper functions, I'd probably have to have different functions for each of the three operations (forward, gradWeights, gradInput), since the computation performed in the loop is different in each of these cases (unlike for a normal computation, where the same conv function can be used).

Actually, I think you can do with only one helper function for unshared convolution, if you use a direction argument that indicates if it's a forward, gradinputs or gradweights convolution (similar to the parameter in the corrMM implementation). The loops should be exactly the same for all three cases, I think. It's just the computation inside the loop that read and write from different variables.

gvtulder

I like the new unshared2d function.

gvtulder · 2017-06-21T09:48:43Z

theano/tensor/nnet/abstract_conv.py

-    else:
-        kshp = kernel_shape[2:]
+
+    kshp = kernel_shape[-2:]


I like that you got rid of the unshared argument. Perhaps you could use the length of image_shape to compute the value of convdim, then use that instead of 2?

gvtulder · 2017-06-21T09:49:07Z

theano/tensor/nnet/abstract_conv.py

@@ -1520,7 +1517,7 @@ def flops(self, inp, outp):
            raise NotImplementedError(
                'flops not implemented for convdim={}', self.convdim)

-    def conv(self, img, kern, mode="valid", dilation=1, unshared=False):
+    def conv(self, img, kern, mode="valid", dilation=1, unshared=False, direction=0):


Could you change the direction to a string? (E.g., "forward", "backprop weights", "backprop inputs" as in BaseCorrMM.)

gvtulder · 2017-06-21T09:56:36Z

theano/tensor/nnet/abstract_conv.py

-            dil_kernshp = tuple((kern.shape[2 + i] - 1) * self.filter_dilation[i] + 1
-                                for i in range(self.convdim))
+        dil_kernshp = tuple((kern.shape[2 + i] - 1) * self.filter_dilation[i] + 1
+                            for i in range(self.convdim))


Doesn't kern.shape[2 + i] give you the output shape for unshared convolution?

If you don't want a special case for unshared, perhaps kern.shape[-self.convdim + i] would work?

gvtulder · 2017-06-21T10:33:22Z

theano/tensor/nnet/abstract_conv.py

+                            if (reg_row >= 0 and reg_row < kern.shape[0]) and \
+                                    (reg_col >= 0 and reg_col < kern.shape[1]):
+                                out[row, col] += kern[reg_row, reg_col, k_row, k_col] * \
+                                    inp[reg_row, reg_col]


I think this is more complicated than it needs to be. If you loop over the correct range, I don't think you need these ifs checking whether you're inside the range.

Reversing your forward convolution, these loops give the same result for the grad wrt inputs:

for row in xrange(kern.shape[0]): for col in xrange(kern.shape[1]): for k_row in xrange(kern.shape[2]): for k_col in xrange(kern.shape[3]): out[row + k_row, col + k_col] += inp[row, col] * \ kern[row, col, -(k_col + 1), -(k_row + 1)]

This certainly looks a lot simpler!

But for the common unshared2d function, the outer two loops are common for all three directions (they go from 0 to out_shape[0] and 0 to out_shape[1]). We need them to go from 0 to kern.shape[0] and 0 to kern.shape[1].

I think the easiest way to fix this is to have a condition to check whether row and col are less than kern.shape[0] and kern.shape[1] respectively, rather than modifying the loops for k_row and k_col.

The conditions will look much simpler and cleaner also.

gvtulder

I haven't yet looked at your C changes in detail. I'll do that when there is a bit more.

gvtulder · 2017-06-22T16:09:27Z

theano/tensor/nnet/abstract_conv.py

+                                out[row + k_row, col + k_col] += inp[row, col] * \
+                                    kern[row, col, -(k_row + 1), -(k_col + 1)]
+                else:
+                    raise ValueError("unshared2d: invalid value for 'direction'")


It might be helpful to include the value in the error message.

gvtulder · 2017-06-22T16:27:36Z

theano/tensor/nnet/abstract_conv.py

+                    out[row, col, ...] += kern[row, col] * \
+                        inp[row:row + out_shape[2], col:col + out_shape[3]]
+                elif direction == "backprop inputs":
+                    if row < kern.shape[0] and col < kern.shape[1]:


I appreciate your effort not to duplicate code, but I still think this solution is less than ideal. It feels a bit silly to have two loops that go on for too long, and then correct this with an if that checks the indices.

There may also be a conceptual problem: your for loops suggest that you're iterating over the rows and columns of out, while in fact for the grad wrt inputs you're iterating over the rows and columns of kern (which just happens to be slightly smaller than out).

So, I would propose to move the for row and for col loops inside the ifs, so that they're specific to each direction. This adds three lines of code, but is easier to understand and does not have loops that are longer than necessary. (As an added bonus it tests the value of direction only once.)

vikramnitin9 · 2017-06-22T18:37:03Z

Right now, the weights are of the form [out_filters, in_filters, out_rows, out_cols, kH, kW]. I think it actually make more logical sense to have them as [out_filters, out_rows, out_cols, in_filters, kH, kW] (one block of shape [in_filters, kH, kW] corresponding to each region).

Also, it makes things easier in the C code (having those three dims together).

vikramnitin9 · 2017-06-23T18:29:02Z

@gvtulder Do you think I should go ahead and change this?

gvtulder · 2017-06-23T19:49:31Z

@vikramnitin9 I'm not yet convinced that it's a good idea.

Of the current order, I like that the first two dimensions always have the same meaning for shared and unshared. There's now a clear relation img.shape[1] == kern.shape[1] and kern.shape[0] == out.shape[1], which you can depend on without knowing if it's shared or unshared convolution. So while it may make some things simpler, I think it would add complexity in a lot of other places.

Lasagne uses [out_filters, in_filters, out_rows, out_cols, kH, kW], which may be another reason to use that approach. (Keras puts in_filters at the end.) (Edit: this is incorrect, it's using ..., kH, kW, out_rows, out_cols].)

Would it make the C or GPU code a lot more efficient if the out_cols, out_rows came before the in_filters? (There was some open-ended discussion in the issue you linked to before.)

vikramnitin9 · 2017-06-24T05:48:53Z

im2col reshapes the image as [regions, in_filters*kH*kW]. Normally the weight matrix would be implicitly reshaped to [out_filters, in_filters*kH*kW].

For unshared, it could be reshaped to [out_filters, regions, in_filters*kH*kW] and regions-number of vector-matrix products will be performed. This will be easy if in_filters, kH and kW are together in the weights array so we don't actually need to reshape.

So that's why I said the C code will become easier.

Lasagne uses [out_filters, in_filters, kH, kW, out_rows, out_cols]. That can be implicitly reshaped to [out_filters, in_filters*kH*kW, regions]. Only thing is, I find it hard to imagine the weight matrix logically organised like this.

Pylearn2 uses [out_rows, out_cols, in_filters, kH, kW, num_groups, filters_per_group]

Either way, the three dimensions being together makes the reshaping easy.

gvtulder · 2017-06-26T08:23:07Z

@vikramnitin9 You're right, I misread the order of the Lasagne weights.

If it's more efficient, as you say, to have [out_filters, out_rows, out_cols, in_filters, kH, kW], then I think it's fine to use that order. Would it be a good idea to finish the C implementation first, using that new shape, before you go back and change the Python code you already wrote? You can do a dimshuffle for now to convert the input to the new shape.

vikramnitin9 · 2017-06-27T05:50:14Z

Sure. I'm currently working on the C code under the assumption that weights are in the required format. Still need to do grad wrt inputs and add tests.

vikramnitin9 · 2017-07-12T18:07:11Z

Rebase done. Should be all fine now.

gvtulder · 2017-07-12T21:20:52Z

jenkins test this

gvtulder · 2017-07-12T21:30:02Z

It looks like the magic phrase doesn't work for me (although Jenkins does run if I create a PR myself, but that might be a different switch). It was worth trying.

@nouiz or @lamblin, could you trigger the GPU tests please?

vikramnitin9 · 2017-07-13T04:41:32Z

The Travis tests passed except for the abstractconv, which timed out as usual.

For the Jenkins test, should I add unshared tests to theano/gpuarray/tests/test_abstractconv.py or will the current set do?

gvtulder · 2017-07-13T07:35:46Z

Hmm, yes, there need to be some GPU tests. I thought I had seen them, also because you had these errors, but apparently I was wrong.

I think there need to be tests in test_gemmcorr.py and in test_abstractconv.py in the theano.gpuarray directory. (In fact, I suspect the abstractconv tests might even fail right now, because you've introduced this new u parameter that the GPU tests don't know about.) The abstractconv tests should be fairly easy to expand, the other test can be based on the existing GPU and CPU tests.

vikramnitin9 · 2017-07-15T17:40:43Z

Sorry for the delay. Here are the GPU tests.

gvtulder

Thanks, the tests look good. I have some comments about explicitly skipping invalid parameter values, but otherwise I think they should be fine.

gvtulder · 2017-07-16T08:14:22Z

theano/gpuarray/tests/test_abstractconv.py

@@ -26,7 +26,7 @@ def setup_class(cls):
        # provide_shape is not used by the cuDNN impementation
        cls.provide_shape = [False]

-    def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1)):
+    def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1), u=False):


It would be good to raise a SkipTest error if unshared is true. (Like for fd.) See also the other instances below.

I've changed this, thanks

gvtulder · 2017-07-16T08:16:54Z

theano/gpuarray/tests/test_abstractconv.py

@@ -196,7 +196,7 @@ def setup_class(cls):
        cls.shared = staticmethod(gpuarray_shared_constructor)
        cls.mode = mode_with_gpu.excluding('cudnn')

-    def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1, 1)):
+    def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1, 1), u=False):


SkipTest for those cases as well. (You shouldn't just ignore values you don't like, since that would suggest that the test is run and passes.)

For the 3D case, I've put a check in run_fwd to SkipTest if unshared is True and it is not a 2D convolution.

Actually, that would require passing the parameter here, which I haven't done. Rather than add it, I'll just put a SkipTest right here.

gvtulder · 2017-07-16T08:29:38Z

theano/tensor/nnet/tests/test_corr.py

@@ -425,8 +425,7 @@ def test_non_contiguous(self):

 class TestUnsharedCorr2D(utt.InferShapeTester):
    if theano.config.mode == "FAST_COMPILE":
-        # mode = theano.compile.get_mode("FAST_RUN").excluding("conv_dnn")
-        mode = theano.compile.get_default_mode().including('gpuarray').excluding('gpu').excluding('cudnn')
+        mode = theano.compile.get_mode("FAST_RUN").excluding("conv_dnn")


Doyou need to exclude conv_dnn here? Since the shared convolution tests don't do this.

gvtulder · 2017-07-16T08:32:17Z

theano/tensor/nnet/tests/test_corr.py

@@ -453,13 +452,13 @@ def test_fwd(self):

        conv_unshared = corr.CorrMM(unshared=True, subsample=self.sub,
                                    filter_dilation=self.dil)(self.input, self.filters)
-        unshared_func = theano.function([self.input, self.filters], conv_unshared)
+        unshared_func = theano.function([self.input, self.filters], conv_unshared, mode=self.mode)


This looks a bit weird, but it is because you're using the Python implementation as your reference, right? I'm thinking if it wouldn't be better to have a separate implementation for the tests, like the one that's used for the normal convolution.

gvtulder · 2017-07-16T08:38:35Z

@nouiz or @lamblin, there at GPU tests now, so can you give this a go on Jenkins?

lamblin · 2017-07-17T17:35:52Z

Jenkins test this please

lamblin · 2017-07-17T17:38:51Z

Also, tests are timing out on Travis because some parts exceed 50 minutes.
I think we should find a way of testing fewer cases or smaller sizes, at least in the slower (python) modes.

gvtulder

A few small things I came across.

gvtulder · 2017-07-24T14:54:57Z

theano/gpuarray/blas.py

+                PyErr_Format(PyExc_ValueError,
+                             "GpuCorrMM: impossible output shape\\n"
+                             "  bottom shape: %%ld x %%ld x %%ld x %%ld\\n"
+                             "  weights shape: %%ld x %%ld x %%ld x %%ld %%ld %%ld\\n"


There are some x missing near the end.

gvtulder · 2017-07-24T14:55:38Z

theano/gpuarray/blas.py

+                PyErr_Format(PyExc_ValueError,
+                             "GpuCorrMM backprop wrt. weights: impossible output shape\\n"
+                             "  bottom shape: %%ld x %%ld x %%ld x %%ld\\n"
+                             "  weights shape: %%ld x %%ld x %%ld x %%ld %%ld %%ld\\n"


There are some x missing near the end.

gvtulder · 2017-07-24T14:56:24Z

theano/gpuarray/blas.py

+                PyErr_Format(PyExc_ValueError,
+                             "GpuCorrMM backprop wrt. inputs: impossible output shape\\n"
+                             "  bottom shape: %%ld x %%ld x %%ld x %%ld\\n"
+                             "  weight shape: %%ld x %%ld x %%ld x %%ld\\n"


This is missing the extra two dimensions.

gvtulder · 2017-07-24T14:56:57Z

theano/gpuarray/corr_gemm.c

+        if (unshared) {
+            PyErr_Format(PyExc_ValueError,
+                    "GpuCorrMM requires weight to be C-contiguous, "
+                    "but strides are: %ld %ld %ld %ld\n",


This is missing two dimensions.

gvtulder · 2017-07-24T14:59:32Z

theano/gpuarray/corr_gemm.c

+                    "  bottom shape: %d %d %d %d\n"
+                    "  weight shape: %d %ld %ld %d %d %d"
+                    " (expected %d %d %d %d %d %d)\n"
+                    "  top shape(calculated): %d %d %d %d\n",


Since the shared version uses %ld everywhere, it's probably good to do that here as well. (My compiler gave some warnings here.)

gvtulder · 2017-07-24T14:59:40Z

theano/gpuarray/corr_gemm.c

+                    "GpuCorrMM shape inconsistency:\n"
+                    "  bottom shape: %d %d %d %d\n"
+                    "  weight shape: %d %d %d %d %d %d\n"
+                    "  top shape: %ld %ld %ld %ld (expected %d %d %d %d)\n",


gvtulder · 2017-07-24T15:20:39Z

For what it's worth: I tried to use gemv in the forward convolution gpuarray/corr_gemm.c on my system. With device=cuda, it works without problems. opencl on my GPU worked as well. Only opencl on CPU with the Intel OpenCL library had a segfault.

It may be interesting to change the GPU version to use the batched gemm functions, instead of calling gemm multiple times. (But it's more important to get the tests to work first, I think.)

vikramnitin9 · 2017-07-25T15:14:39Z

Interestingly, I had a segfault with opencl on the Intel Integrated HD Graphics device, but it ran fine on my i5 CPU.

gvtulder · 2017-07-27T10:15:32Z

When you start working on the rebase, be careful that you keep things backwards compatible: put the new parameters, props etc. for unshared convolution after the parameters that have already been added for the grouped convolution.

vikramnitin9 · 2017-08-07T17:57:39Z

Update : I'm working on the integration with grouped convolution. Should be done tomorrow or day after.

vikramnitin9 · 2017-08-09T19:10:29Z

I've squashed the commits, rebased and started a new PR since this one is already very messy. That is at #6286. I'm closing this PR.

gvtulder reviewed Jun 5, 2017

View reviewed changes

vikramnitin9 changed the title ~~Unshared convolution python code~~ Unshared convolutio Jun 8, 2017

vikramnitin9 changed the title ~~Unshared convolutio~~ Unshared convolution Jun 8, 2017

gvtulder reviewed Jun 11, 2017

View reviewed changes

vikramnitin9 force-pushed the unshared branch 2 times, most recently from 1b14a0d to 8d44e98 Compare June 13, 2017 11:10

gvtulder reviewed Jun 14, 2017

View reviewed changes

gvtulder reviewed Jun 21, 2017

View reviewed changes

gvtulder reviewed Jun 22, 2017

View reviewed changes

vikramnitin9 force-pushed the unshared branch from e66e3c1 to 2352592 Compare June 22, 2017 17:37

vikramnitin9 added 7 commits July 12, 2017 23:14

Errors fixed

ddc0807

GPU corr_gemm code (untested)

99e857a

Unnecessary changes rolled back

9855d5d

More GPU code but gemm error 11

54bbc82

Fixed mistakes caused while copying from CPU

d8192a0

Errors fixed

a6001bf

Fixed error with .data for gpuarray

9caa5a7

vikramnitin9 force-pushed the unshared branch from 2ab0c3b to 9caa5a7 Compare July 12, 2017 18:06

GPU tests

5b6e1e6

vikramnitin9 force-pushed the unshared branch from 1133d62 to 5b6e1e6 Compare July 15, 2017 17:48

gvtulder reviewed Jul 16, 2017

View reviewed changes

gvtulder reviewed Jul 24, 2017

View reviewed changes

Suggestions implemented for error messages

6a26585

Jenkins errors fixed

57f9e43

vikramnitin9 mentioned this pull request Aug 9, 2017

Implement Unshared Convolution #6286

Merged

vikramnitin9 closed this Aug 9, 2017


		# Rearranging the weights into the correct shape for unshared
		if self.unshared is True:

Unshared convolution #6007

Unshared convolution #6007

Conversation

vikramnitin9 commented Jun 4, 2017 • edited

gvtulder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvtulder Jun 5, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvtulder commented Jun 5, 2017

vikramnitin9 commented Jun 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvtulder commented Jun 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramnitin9 Jun 13, 2017 • edited

Choose a reason for hiding this comment

gvtulder left a comment

Choose a reason for hiding this comment

gvtulder Jun 11, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramnitin9 commented Jun 13, 2017

vikramnitin9 commented Jun 13, 2017

gvtulder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramnitin9 Jun 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvtulder commented Jun 14, 2017

vikramnitin9 commented Jun 14, 2017

gvtulder commented Jun 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvtulder commented Jun 15, 2017

gvtulder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvtulder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramnitin9 commented Jun 22, 2017 • edited

vikramnitin9 commented Jun 23, 2017

gvtulder commented Jun 23, 2017 • edited

vikramnitin9 commented Jun 24, 2017 • edited

gvtulder commented Jun 26, 2017

vikramnitin9 commented Jun 27, 2017

vikramnitin9 commented Jul 12, 2017

gvtulder commented Jul 12, 2017

gvtulder commented Jul 12, 2017

vikramnitin9 commented Jul 13, 2017 • edited

gvtulder commented Jul 13, 2017

vikramnitin9 commented Jul 15, 2017

gvtulder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikramnitin9 commented Jun 4, 2017 •

edited

gvtulder Jun 5, 2017 •

edited

vikramnitin9 Jun 13, 2017 •

edited

gvtulder Jun 11, 2017 •

edited

vikramnitin9 Jun 14, 2017 •

edited

vikramnitin9 commented Jun 22, 2017 •

edited

gvtulder commented Jun 23, 2017 •

edited

vikramnitin9 commented Jun 24, 2017 •

edited

vikramnitin9 commented Jul 13, 2017 •

edited

vikramnitin9 Aug 4, 2017 •

edited

vikramnitin9 Aug 4, 2017 •

edited