Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unshared convolution #6007

Closed
wants to merge 42 commits into from
Closed

Conversation

vikramnitin9
Copy link
Contributor

@vikramnitin9 vikramnitin9 commented Jun 4, 2017

Related to #5620 .

Task list :

  • C code for grad inputs and tests

  • All tests need to pass.

  • Change weight tensor dimension ordering instead of dimshuffle

  • Integrate unshared tests with AbstractConv

  • GPU code

Copy link
Contributor

@gvtulder gvtulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting the pull request. I know you're not finished yet, but I left some first comments.

@@ -1507,6 +1526,13 @@ def conv(self, img, kern, mode="valid", dilation=1):
out_shape = get_conv_output_shape(img.shape, kern.shape,
mode, [1] * self.convdim, dilation)

out_rows = out_shape[self.convdim]
out_cols = out_shape[self.convdim+1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for 2D, so make sure you check that somewhere if you don't want to implement it now (the rest of the function does 3D as well).

I also don't think self.convdim is the best thing to use here, it doesn't really make sense. It's just a coincidence that batch and output channels use exactly self.convdim dimensions, so I'd just use 2.

Even better: you could make this whole first part work in a convdim-agnostic way, by checking if kern.shape[2] == np.prod(out_shape[2:]).

out_rows = out_shape[self.convdim]
out_cols = out_shape[self.convdim+1]

if unshared == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a specific reason for explicitly checking for == True here? (See also unshared == False below.)

else:
for reg in xrange(kern.shape[1]):
row = reg // out_cols
col = reg - row*out_cols
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps col = reg % out_cols would be nicer?

if self.convdim == 3:
raise NotImplementedError("Unshared 3D convolution not implemented")
elif kern.type.ndim != 3 + self.convdim:
raise TypeError('kern must be %dD tensor' % (3 + self.convdim))
Copy link
Contributor

@gvtulder gvtulder Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would swap this: explicitly check that self.convdim == 2 and raise an error if it is not. The AbstractConv class works for any convdim, even though it's only currently used for 2D and 3D.

If true, then unshared or 'locally connected' convolution will be
performed. A different kernel will be used for each region of the
input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to change the documentation for filters as well, to explain how you should pass the unshared weights.

@gvtulder
Copy link
Contributor

gvtulder commented Jun 5, 2017

What will the shape of the filter parameter look like for unshared convolution? From your Python implementation, it seems that you combine the output_rows * output_cols filters in one extra dimension, i.e., (output channels, input channels, output rows * output cols, filter rows, filter columns). Is this the shape we want?

For comparison: Lasagne uses two additional dimensions, i.e., (output channels, input channels, output rows, output cols, filter rows, filter columns). I think I would find that easier to work with.

@vikramnitin9
Copy link
Contributor Author

The method I've used to compute the gradients wrt inputs seems very messy. In order to express it as a convolution, the weight matrix had to be rearranged quite a bit. Is there a better way?

@vikramnitin9 vikramnitin9 changed the title Unshared convolution python code Unshared convolutio Jun 8, 2017
@vikramnitin9 vikramnitin9 changed the title Unshared convolutio Unshared convolution Jun 8, 2017
@@ -1565,6 +1565,7 @@ def conv(self, img, kern, mode="valid", dilation=1, unshared=False):
if unshared is True:
for row in xrange(kern.shape[2]):
for col in xrange(kern.shape[3]):
val = _valfrommode('valid')
out[b, n, row, col] += _convolve2d(img[b, im0, row:row + kern.shape[4],
col:col + kern.shape[5]],
dilated_kern[n, im0, row, col, ...],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is convolve2d necessary here? Since the image part you're extracting has the same size as the kernel, isn't this equal to an elementwise multiplication and summation? (Perhaps with a flip somewhere.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd retained the convolve2d because it would eliminate two more loops, and because then I would need to flip the filter manually. But I've made it element-wise now.

else:
kern[:, :, i, j, m, m] = kern[:, :, i + m, j + n, m, n]
new_kern[:, :, i, j, m, n] = kern[:, :, i + m, j + n, m, n]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? It's somewhat complex, so I might have made a mistake in my manual derivations, but I think this is missing the first part of the kernel and shifting everything else too far to the left.

In a 1D case, with a 5-pixel input and 3-pixel kernel, I think you want the kernel to be changed like this:

kern = [[a, b, c], [d, e, f], [g, h, i]]
new_kern = [[_, _, a], [_, b, d], [c, e, g], [f, h, _], [i, _, _]]

but this implementation would change it to this, I think:

new_kern = [[a, b, c], [d, e, _], [g, _, _], [_, _, _], [_, _, _]]

@gvtulder
Copy link
Contributor

The method I've used to compute the gradients wrt inputs seems very messy. In order to express it as a convolution, the weight matrix had to be rearranged quite a bit. Is there a better way?

Maybe it's better to implement the gradient computation directly in Python, without using convolve2d?

I guess the reason to use convolve2d is that it can speed things up in the shared case, because it's really doing a series of convolutions. In the unshared case you have to call it separately for every pixel, so convolve2d might not be much faster than an elementwise multiplication. For the gradient wrt inputs you have to do the same, but you would need a tricky reorganisation of the kernel matrix first. At that point, convolve2d may be more trouble than it's worth.

I would suggest making a new helper function for unshared grad wrt inputs in BaseAbstractConv, reusing a little bit of code from conv, and using that in the gradInputs perform. I think that would be simpler than trying to reorganise the kernel so you can do a forward convolution.

What do you think?

if (kern.shape[2], kern.shape[3]) != (out_shape[2], out_shape[3]):
raise ValueError('Kernel shape ({},{}) does not match '
'output size ({},{})'.format(kern.shape[2], kern.shape[3],
out_shape[2], out_shape[3]))

out = np.zeros(out_shape, dtype=img.dtype)
dil_kern_shp = kern.shape[:-self.convdim] + tuple(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines below, I think you might need to change this assignment of dilated_kern:

dilated_kern[(slice(None), slice(None)) +
             tuple(slice(None, None, dilation[i]) for i in range(self.convdim))
             ] = kern

because it doesn't include the row/column dimensions in an unshared convolution.

@@ -1022,7 +1044,8 @@ def conv2d_grad_wrt_weights(input,
border_mode=border_mode,
subsample=subsample,
filter_flip=filter_flip,
filter_dilation=filter_dilation)
filter_dilation=filter_dilation,
unshared=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want unshared=unshared here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually added this locally and forgotten to commit it. Fixed now, thanks.

Ditto here :)

Copy link
Contributor

@gvtulder gvtulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests. The forward convolution seems to work fine, good!

There still seems to be a small problem in the grad wrt weights. The test passes, but when I tried to test the value of the result it didn't work (see the test below).

grad_weights = conv.conv2d_grad_wrt_weights(inputs, topgrad,
self.kshp, unshared=True)
grad_func = theano.function([inputs, topgrad], grad_weights, mode=self.mode)
grad_val = grad_func(inputs_val, topgrad_val)
Copy link
Contributor

@gvtulder gvtulder Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest a small extension of this test that also checks the result?

    def test_gradweight(self):
        inputs = theano.tensor.tensor4()
        topgrad = theano.tensor.tensor4()

        inputs_val = np.random.random(self.imshp).astype(theano.config.floatX)
        topgrad_val = np.random.random(self.topgrad_shape).astype(theano.config.floatX)

        # compute gradient for unshared convolution
        grad_weights_unshared = conv.conv2d_grad_wrt_weights(inputs, topgrad,
                                                             self.kshp, unshared=True)
        grad_func_unshared = theano.function([inputs, topgrad], grad_weights_unshared, mode=self.mode)
        grad_val_unshared = grad_func_unshared(inputs_val, topgrad_val)

        # compute gradient for shared convolution as a reference
        grad_weights_shared = conv.conv2d_grad_wrt_weights(inputs, topgrad,
                                                           self.kshp, unshared=False)
        grad_func_shared = theano.function([inputs, topgrad], grad_weights_shared, mode=self.mode)
        grad_val_shared = grad_func_shared(inputs_val, topgrad_val)

        # summing over all unshared gradients should give the shared gradient
        utt.assert_allclose(grad_val_shared, np.sum(grad_val_unshared, axis=(2, 3)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually added this locally and forgotten to commit it. Fixed now, thanks.

@vikramnitin9 vikramnitin9 force-pushed the unshared branch 2 times, most recently from 1b14a0d to 8d44e98 Compare June 13, 2017 11:10
@vikramnitin9
Copy link
Contributor Author

I have a doubt here. Why do we flip the topgrad, convolve and then again flip the output? Isn't this equivalent to flipping (or not flipping) the kern and convolving?

@vikramnitin9
Copy link
Contributor Author

I would suggest making a new helper function for unshared grad wrt inputs in BaseAbstractConv, reusing a little bit of code from conv, and using that in the gradInputs perform. I think that would be simpler than trying to reorganise the kernel so you can do a forward convolution.

I've just packed it all into the gradInputs perform function for now. Do you think it looks fine or should I move it into a separate function?

Copy link
Contributor

@gvtulder gvtulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work.

In general, I think it's good to try to keep things 2D/3D agnostic, except for the few cases where that is really inconvenient (mostly the perform functions). I suppose that once unshared convolution works for 2D, at some point we'll want to apply it to 3D as well, and then it's easier if we don't have to change more than necessary.

A second thing: the perform functions are becoming fairly long. Maybe it would be better to move some of the unshared-specific code to helper functions, so that the difference between shared and unshared is a bit easier to see. For instance, the _convolve2d function could have a _convolve2d_unshared equivalent.

Exaggerated, it's now a bit like this:

    def perform(...):
        if self.unshared:
            # a lot of code with nested loops
        else:
            self.conv(a, b)

but I would prefer:

    def perform(...):
        if self.unshared:
            self.unshared_conv(a, b)
        else:
            self.conv(a, b)

else:
dilated_kern[(slice(None), slice(None)) +
tuple(slice(None, None, dilation[i]) for i in range(self.convdim))
] = kern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of the if if you do something like this:

dilated_kern[(slice(None),) * (dilated_kern.ndim - self.convdim),
             ...


dilated_kern[(slice(None),) * 4 +
tuple(slice(None, None, dilation[i]) for i in range(self.convdim))
] = kern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to use these magic numbers like 4 that only work for the 2D case, since it's probably useful to have unshared 3D convolution in the future as well.

broadcastable = [topgrad.broadcastable[1],
img.broadcastable[1]] + ([False] * 4)
output = theano.tensor.TensorType(theano.config.floatX,
broadcastable=broadcastable)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you made this a special case for unshared convolution? Wouldn't the line output = img.type.clone(...)() for the shared convolution work here as well?

In any case, I don't think it's safe to assume that the image is of type theano.config.floatX, so using img.type.clone would probably be better?

] = kern
# Flip the kernel since the convolution is done manually
dilated_kern = dilated_kern[(slice(None),) * 4 +
(slice(None, None, -1),) * 2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do this flip here. Instead, you can do it when you compute the result of the convolution. That makes the shared and unshared cases more similar.

for k_col in xrange(kern.shape[5]):
out[b, n, row, col] += img[b, im0, row + k_row,
col + k_col] * \
dilated_kern[n, im0, row, col, k_row, k_col]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of flipping the kernel earlier, compute the correct indices here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of doing an np.sum(np.multiply(..., ...)) to eliminate the two inner loops and not unnecessarily call _convolve2d. Do you think I should? That will mean flipping the kernel earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that could work.

if any(self.filter_dilation[i] > 1 for i in range(self.convdim)):
kern = kern[(slice(None),) * 4 +
tuple(slice(None, None, self.filter_dilation[i])
for i in range(self.convdim))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this outside the if unshared.

(slice(None, None, -1),) * self.convdim)
topgrad = topgrad.transpose(axes_order)[flip_filters]
img = img.transpose(axes_order)

if self.unshared is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to check that self.convdim == 2.


# Rearranging the weights into the correct shape for unshared
if self.unshared is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that self.convdim == 2.

@@ -1690,9 +1690,9 @@ def perform(self, node, inp, out_):
img = new_img
if not self.filter_flip:
if self.unshared is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check self.convdim == 2 (probably earlier in the function).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now added a check in the init function of BaseAbstractConv. Will that do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's certainly good to have it there, and hopefully it's somewhat temporary anyway. However, I do think I'd like to have an explicit check in perform as well. It's good to have the check and warning close to where the limitation actually is, in my opinion. It's useful as documentation and also makes it future-proof: if at some point the check is removed from BaseAbstractConv, it's better for perform to crash than to silently run a shared convolution.

Compare this with the 2D/3D limitation of the convolution: that's also explicitly checked in BaseAbstractConv.conv.

reg_row = row - ((kern.shape[4] - 1) - k_row)
reg_col = col - ((kern.shape[45] - 1) - k_col)
reg_row = max(0, min(reg_row, kern.shape[2] - 1))
reg_col = max(0, min(reg_col, kern.shape[3] - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to see max/min here, are you sure this is correct? If a value falls outside the available range, wouldn't you want to ignore it, rather than add it to the nearest row/col?

Isn't this the same as the loops and indices for the forward convolution, except that the source and target variables are swapped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to see max/min here, are you sure this is correct? If a value falls outside the available range, wouldn't you want to ignore it, rather than add it to the nearest row/col?

I did this since the topgrad for these entries will anyway be zero so nothing will be added. But I've made a new version now.

Isn't this the same as the loops and indices for the forward convolution, except that the source and target variables are swapped?

The difference is that the row and col of the kern vary within the inner two loops also.

@gvtulder
Copy link
Contributor

I have a doubt here. Why do we flip the topgrad, convolve and then again flip the output? Isn't this equivalent to flipping (or not flipping) the kern and convolving?

Possibly, yes. Maybe there is a combination of dilation, subsampling etc. when it doesn't? (I tried to come up with an example that wouldn't work, but I failed.)

@vikramnitin9
Copy link
Contributor Author

Regarding moving code into helper functions, I'd probably have to have different functions for each of the three operations (forward, gradWeights, gradInput), since the computation performed in the loop is different in each of these cases (unlike for a normal computation, where the same conv function can be used).

Should I go ahead and create separate functions just the same?

@gvtulder
Copy link
Contributor

Regarding moving code into helper functions, I'd probably have to have different functions for each of the three operations (forward, gradWeights, gradInput), since the computation performed in the loop is different in each of these cases (unlike for a normal computation, where the same conv function can be used).

Should I go ahead and create separate functions just the same?

Yes, I still think that would be useful. But maybe that's a matter of taste. Ideally, I'd like to see a perform method that includes only a few unshared-specific lines.

raise NotImplementedError('Unshared convolution not implemented for %dD'
% self.convdim)
elif kern.type.ndim != 6:
if kern.type.ndim != 6:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= 2 + 2 * self.convdim

reg_col = max(0, min(reg_col, kern.shape[3] - 1))
if reg_row not in range(kern.shape[2]) or \
reg_col not in range(kern.shape[3]):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is very efficient: it's creating these range objects for every pixel. Normal comparisons are probably faster. (And I'd invert the comparison, so you don't need the continue.)

@gvtulder
Copy link
Contributor

Regarding moving code into helper functions, I'd probably have to have different functions for each of the three operations (forward, gradWeights, gradInput), since the computation performed in the loop is different in each of these cases (unlike for a normal computation, where the same conv function can be used).

Actually, I think you can do with only one helper function for unshared convolution, if you use a direction argument that indicates if it's a forward, gradinputs or gradweights convolution (similar to the parameter in the corrMM implementation). The loops should be exactly the same for all three cases, I think. It's just the computation inside the loop that read and write from different variables.

Copy link
Contributor

@gvtulder gvtulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new unshared2d function.

else:
kshp = kernel_shape[2:]

kshp = kernel_shape[-2:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you got rid of the unshared argument. Perhaps you could use the length of image_shape to compute the value of convdim, then use that instead of 2?

@@ -1520,7 +1517,7 @@ def flops(self, inp, outp):
raise NotImplementedError(
'flops not implemented for convdim={}', self.convdim)

def conv(self, img, kern, mode="valid", dilation=1, unshared=False):
def conv(self, img, kern, mode="valid", dilation=1, unshared=False, direction=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the direction to a string? (E.g., "forward", "backprop weights", "backprop inputs" as in BaseCorrMM.)

dil_kernshp = tuple((kern.shape[2 + i] - 1) * self.filter_dilation[i] + 1
for i in range(self.convdim))
dil_kernshp = tuple((kern.shape[2 + i] - 1) * self.filter_dilation[i] + 1
for i in range(self.convdim))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't kern.shape[2 + i] give you the output shape for unshared convolution?

If you don't want a special case for unshared, perhaps kern.shape[-self.convdim + i] would work?

if (reg_row >= 0 and reg_row < kern.shape[0]) and \
(reg_col >= 0 and reg_col < kern.shape[1]):
out[row, col] += kern[reg_row, reg_col, k_row, k_col] * \
inp[reg_row, reg_col]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more complicated than it needs to be. If you loop over the correct range, I don't think you need these ifs checking whether you're inside the range.

Reversing your forward convolution, these loops give the same result for the grad wrt inputs:

for row in xrange(kern.shape[0]):
    for col in xrange(kern.shape[1]):
        for k_row in xrange(kern.shape[2]):
            for k_col in xrange(kern.shape[3]):
                out[row + k_row, col + k_col] += inp[row, col] * \
                    kern[row, col, -(k_col + 1), -(k_row + 1)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This certainly looks a lot simpler!

But for the common unshared2d function, the outer two loops are common for all three directions (they go from 0 to out_shape[0] and 0 to out_shape[1]). We need them to go from 0 to kern.shape[0] and 0 to kern.shape[1].

I think the easiest way to fix this is to have a condition to check whether row and col are less than kern.shape[0] and kern.shape[1] respectively, rather than modifying the loops for k_row and k_col.

The conditions will look much simpler and cleaner also.

Copy link
Contributor

@gvtulder gvtulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet looked at your C changes in detail. I'll do that when there is a bit more.

out[row + k_row, col + k_col] += inp[row, col] * \
kern[row, col, -(k_row + 1), -(k_col + 1)]
else:
raise ValueError("unshared2d: invalid value for 'direction'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to include the value in the error message.

out[row, col, ...] += kern[row, col] * \
inp[row:row + out_shape[2], col:col + out_shape[3]]
elif direction == "backprop inputs":
if row < kern.shape[0] and col < kern.shape[1]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your effort not to duplicate code, but I still think this solution is less than ideal. It feels a bit silly to have two loops that go on for too long, and then correct this with an if that checks the indices.

There may also be a conceptual problem: your for loops suggest that you're iterating over the rows and columns of out, while in fact for the grad wrt inputs you're iterating over the rows and columns of kern (which just happens to be slightly smaller than out).

So, I would propose to move the for row and for col loops inside the ifs, so that they're specific to each direction. This adds three lines of code, but is easier to understand and does not have loops that are longer than necessary. (As an added bonus it tests the value of direction only once.)

@vikramnitin9
Copy link
Contributor Author

vikramnitin9 commented Jun 22, 2017

Right now, the weights are of the form [out_filters, in_filters, out_rows, out_cols, kH, kW]. I think it actually make more logical sense to have them as [out_filters, out_rows, out_cols, in_filters, kH, kW] (one block of shape [in_filters, kH, kW] corresponding to each region).

Also, it makes things easier in the C code (having those three dims together).

@vikramnitin9
Copy link
Contributor Author

@gvtulder Do you think I should go ahead and change this?

@gvtulder
Copy link
Contributor

gvtulder commented Jun 23, 2017

@vikramnitin9 I'm not yet convinced that it's a good idea.

Of the current order, I like that the first two dimensions always have the same meaning for shared and unshared. There's now a clear relation img.shape[1] == kern.shape[1] and kern.shape[0] == out.shape[1], which you can depend on without knowing if it's shared or unshared convolution. So while it may make some things simpler, I think it would add complexity in a lot of other places.

Lasagne uses [out_filters, in_filters, out_rows, out_cols, kH, kW], which may be another reason to use that approach. (Keras puts in_filters at the end.) (Edit: this is incorrect, it's using ..., kH, kW, out_rows, out_cols].)

Would it make the C or GPU code a lot more efficient if the out_cols, out_rows came before the in_filters? (There was some open-ended discussion in the issue you linked to before.)

@vikramnitin9
Copy link
Contributor Author

vikramnitin9 commented Jun 24, 2017

im2col reshapes the image as [regions, in_filters*kH*kW]. Normally the weight matrix would be implicitly reshaped to [out_filters, in_filters*kH*kW].

For unshared, it could be reshaped to [out_filters, regions, in_filters*kH*kW] and regions-number of vector-matrix products will be performed. This will be easy if in_filters, kH and kW are together in the weights array so we don't actually need to reshape.

So that's why I said the C code will become easier.

Lasagne uses [out_filters, in_filters, kH, kW, out_rows, out_cols]. That can be implicitly reshaped to [out_filters, in_filters*kH*kW, regions]. Only thing is, I find it hard to imagine the weight matrix logically organised like this.

Pylearn2 uses [out_rows, out_cols, in_filters, kH, kW, num_groups, filters_per_group]

Either way, the three dimensions being together makes the reshaping easy.

@gvtulder
Copy link
Contributor

@vikramnitin9 You're right, I misread the order of the Lasagne weights.

If it's more efficient, as you say, to have [out_filters, out_rows, out_cols, in_filters, kH, kW], then I think it's fine to use that order. Would it be a good idea to finish the C implementation first, using that new shape, before you go back and change the Python code you already wrote? You can do a dimshuffle for now to convert the input to the new shape.

@vikramnitin9
Copy link
Contributor Author

Sure. I'm currently working on the C code under the assumption that weights are in the required format. Still need to do grad wrt inputs and add tests.

@vikramnitin9
Copy link
Contributor Author

Rebase done. Should be all fine now.

@gvtulder
Copy link
Contributor

jenkins test this

@gvtulder
Copy link
Contributor

It looks like the magic phrase doesn't work for me (although Jenkins does run if I create a PR myself, but that might be a different switch). It was worth trying.

@nouiz or @lamblin, could you trigger the GPU tests please?

@vikramnitin9
Copy link
Contributor Author

vikramnitin9 commented Jul 13, 2017

The Travis tests passed except for the abstractconv, which timed out as usual.

For the Jenkins test, should I add unshared tests to theano/gpuarray/tests/test_abstractconv.py or will the current set do?

@gvtulder
Copy link
Contributor

Hmm, yes, there need to be some GPU tests. I thought I had seen them, also because you had these errors, but apparently I was wrong.

I think there need to be tests in test_gemmcorr.py and in test_abstractconv.py in the theano.gpuarray directory. (In fact, I suspect the abstractconv tests might even fail right now, because you've introduced this new u parameter that the GPU tests don't know about.) The abstractconv tests should be fairly easy to expand, the other test can be based on the existing GPU and CPU tests.

@vikramnitin9
Copy link
Contributor Author

Sorry for the delay. Here are the GPU tests.

Copy link
Contributor

@gvtulder gvtulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the tests look good. I have some comments about explicitly skipping invalid parameter values, but otherwise I think they should be fine.

@@ -26,7 +26,7 @@ def setup_class(cls):
# provide_shape is not used by the cuDNN impementation
cls.provide_shape = [False]

def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1)):
def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1), u=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to raise a SkipTest error if unshared is true. (Like for fd.) See also the other instances below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this, thanks

@@ -196,7 +196,7 @@ def setup_class(cls):
cls.shared = staticmethod(gpuarray_shared_constructor)
cls.mode = mode_with_gpu.excluding('cudnn')

def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1, 1)):
def tcase(self, i, f, s, b, flip, provide_shape, fd=(1, 1, 1), u=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkipTest for those cases as well. (You shouldn't just ignore values you don't like, since that would suggest that the test is run and passes.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 3D case, I've put a check in run_fwd to SkipTest if unshared is True and it is not a 2D convolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that would require passing the parameter here, which I haven't done. Rather than add it, I'll just put a SkipTest right here.

@@ -425,8 +425,7 @@ def test_non_contiguous(self):

class TestUnsharedCorr2D(utt.InferShapeTester):
if theano.config.mode == "FAST_COMPILE":
# mode = theano.compile.get_mode("FAST_RUN").excluding("conv_dnn")
mode = theano.compile.get_default_mode().including('gpuarray').excluding('gpu').excluding('cudnn')
mode = theano.compile.get_mode("FAST_RUN").excluding("conv_dnn")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doyou need to exclude conv_dnn here? Since the shared convolution tests don't do this.

@@ -453,13 +452,13 @@ def test_fwd(self):

conv_unshared = corr.CorrMM(unshared=True, subsample=self.sub,
filter_dilation=self.dil)(self.input, self.filters)
unshared_func = theano.function([self.input, self.filters], conv_unshared)
unshared_func = theano.function([self.input, self.filters], conv_unshared, mode=self.mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit weird, but it is because you're using the Python implementation as your reference, right? I'm thinking if it wouldn't be better to have a separate implementation for the tests, like the one that's used for the normal convolution.

@gvtulder
Copy link
Contributor

@nouiz or @lamblin, there at GPU tests now, so can you give this a go on Jenkins?

@lamblin
Copy link
Member

lamblin commented Jul 17, 2017

Jenkins test this please

@lamblin
Copy link
Member

lamblin commented Jul 17, 2017

Also, tests are timing out on Travis because some parts exceed 50 minutes.
I think we should find a way of testing fewer cases or smaller sizes, at least in the slower (python) modes.

Copy link
Contributor

@gvtulder gvtulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things I came across.

PyErr_Format(PyExc_ValueError,
"GpuCorrMM: impossible output shape\\n"
" bottom shape: %%ld x %%ld x %%ld x %%ld\\n"
" weights shape: %%ld x %%ld x %%ld x %%ld %%ld %%ld\\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some x missing near the end.

PyErr_Format(PyExc_ValueError,
"GpuCorrMM backprop wrt. weights: impossible output shape\\n"
" bottom shape: %%ld x %%ld x %%ld x %%ld\\n"
" weights shape: %%ld x %%ld x %%ld x %%ld %%ld %%ld\\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some x missing near the end.

PyErr_Format(PyExc_ValueError,
"GpuCorrMM backprop wrt. inputs: impossible output shape\\n"
" bottom shape: %%ld x %%ld x %%ld x %%ld\\n"
" weight shape: %%ld x %%ld x %%ld x %%ld\\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing the extra two dimensions.

if (unshared) {
PyErr_Format(PyExc_ValueError,
"GpuCorrMM requires weight to be C-contiguous, "
"but strides are: %ld %ld %ld %ld\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing two dimensions.

" bottom shape: %d %d %d %d\n"
" weight shape: %d %ld %ld %d %d %d"
" (expected %d %d %d %d %d %d)\n"
" top shape(calculated): %d %d %d %d\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the shared version uses %ld everywhere, it's probably good to do that here as well. (My compiler gave some warnings here.)

"GpuCorrMM shape inconsistency:\n"
" bottom shape: %d %d %d %d\n"
" weight shape: %d %d %d %d %d %d\n"
" top shape: %ld %ld %ld %ld (expected %d %d %d %d)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

@gvtulder
Copy link
Contributor

For what it's worth: I tried to use gemv in the forward convolution gpuarray/corr_gemm.c on my system. With device=cuda, it works without problems. opencl on my GPU worked as well. Only opencl on CPU with the Intel OpenCL library had a segfault.

It may be interesting to change the GPU version to use the batched gemm functions, instead of calling gemm multiple times. (But it's more important to get the tests to work first, I think.)

@vikramnitin9
Copy link
Contributor Author

Interestingly, I had a segfault with opencl on the Intel Integrated HD Graphics device, but it ran fine on my i5 CPU.

@gvtulder
Copy link
Contributor

When you start working on the rebase, be careful that you keep things backwards compatible: put the new parameters, props etc. for unshared convolution after the parameters that have already been added for the grouped convolution.

@vikramnitin9
Copy link
Contributor Author

Update : I'm working on the integration with grouped convolution. Should be done tomorrow or day after.

@vikramnitin9
Copy link
Contributor Author

I've squashed the commits, rebased and started a new PR since this one is already very messy. That is at #6286. I'm closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants