Tied weights with transpose flag for InnerProduct layer #3612

Merged
merged 1 commit into from Feb 25, 2016

Conversation

Projects
None yet
2 participants
Contributor

kashefy commented Jan 29, 2016

I wanted to train an autoencoder where the deocder uses the tranpose of the encoder's weight matrix. This is first discussed in #670 and followed up in #1211 (comment). But it seemed this wasn't resolved. I found @jeffdonahue 's suggestion in this comment to just add a transpose flag to the InnerProduct layer quite reasonable.

This PR adds a transpose flag to the InnerProduct layer as well as its params protobuf message.
When set to true for the deocder, in the forward pass, the call to the matrix multiplication routine is instructed to NOT transpose the weight matrix. Which is what you want in the usual case and for the encoder.

Tying the weights between encoder and decoder requires:

  1. sharing the encoder's weight params with the inner product layer that is the decoder.
  2. setting the share_mode to true in both ip layers (otherwise it won't allow for the shape mismatch)
  3. Adding 'transpose: true' to the decoder's inner_product_param

A sample trainval.prototxt to demonstrate usage.

I haven't written unit tests around this yet. Open to suggestions to what makes sense to test for here.

Thanks for reviewing and looking forward to the feedback.

@jeffdonahue jeffdonahue commented on an outdated diff Jan 29, 2016

src/caffe/layers/inner_product_layer.cu
@@ -19,8 +19,9 @@ void InnerProductLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
caffe_gpu_axpy<Dtype>(N_, bias_multiplier_.cpu_data()[0],
this->blobs_[1]->gpu_data(), top_data);
} else {
- caffe_gpu_gemm<Dtype>(CblasNoTrans, CblasTrans, M_, N_, K_, (Dtype)1.,
- bottom_data, weight, (Dtype)0., top_data);
+ caffe_gpu_gemm<Dtype>(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans,
+ M_, N_, K_, (Dtype)1.,
+ bottom_data, weight, (Dtype)0., top_data);
@jeffdonahue

jeffdonahue Jan 29, 2016

Contributor

remove added indent

@jeffdonahue jeffdonahue commented on an outdated diff Jan 29, 2016

src/caffe/layers/inner_product_layer.cpp
bottom_data, weight, (Dtype)0., top_data);
+
@jeffdonahue

jeffdonahue Jan 29, 2016

Contributor

remove added empty line here and on 84

Contributor

jeffdonahue commented Jan 29, 2016

Thanks @kashefy! This looks pretty good to me.

setting the share_mode to true in both ip layers (otherwise it won't allow for the shape mismatch)

This shouldn't be needed -- instead the weight param should be set to the correct shape by swapping N_ & K_, changing lines 32-33 of inner_product_layer.cpp to be conditioned on transpose.

Besides that, please see the style nitpicks and squash your history to a single commit.

Re testing: it would be good to have a few unit tests:

  • verify the correct shape of the parameter with and without transpose set
  • a gradient check with transpose set
  • a forward check, for example: initialize an IP layer without transpose and the parameter randomly initialized, run Forward, save the result; initialize another IP layer with transpose, manually copy and transpose the value of the parameter from the first IP layer, then run Forward on the same input and check that the result is the same
Contributor

kashefy commented Jan 29, 2016

@jeffdonahue, thanks for the feedback. Will fix styling (travis build failed because of it) and add the unit tests.

Contributor

jeffdonahue commented Jan 30, 2016

Great, thanks. Also just noticed you didn't change backward -- pretty sure that will need a different CBlasTrans setting as well. (But no need to think about it once you write the gradient check :)

Contributor

kashefy commented Feb 3, 2016

quick update:
Wrote unit tests around forward, backward, blob shape (with and without transpose). Tests pass. Fixed styling.
Transposing shared weights needs more work. The transposing ip stores the weight shape in the post-transpose form but the transposing doesn't happen until the multiplication is called. However setting the num_outputs of the decoder collides with these assumptions.
Maybe I need a second flag next to the transpose one to hold off on switching the weight shapes in case the weights are tied with another 'encoder' layer.

@jeffdonahue jeffdonahue commented on an outdated diff Feb 3, 2016

src/caffe/test/test_inner_product_layer.cpp
@@ -148,4 +265,127 @@ TYPED_TEST(InnerProductLayerTest, TestGradient) {
}
}
+TYPED_TEST(InnerProductLayerTest, TestGradientTransposeFalse) {
@jeffdonahue

jeffdonahue Feb 3, 2016

Contributor

Shouldn't this test be TestGradientTransposeTrue (with the corresponding change from set_transpose(false) to set_transpose(true))? This test is effectively a duplicate of the existing test above (TestGradient), I'd think.

And if this test is changed to be done with transpose on, is there still anything additionally tested by TestBackwardTranspose? I would think the combination of TestForward with the gradient check would cover all functionality.

@jeffdonahue jeffdonahue and 1 other commented on an outdated diff Feb 3, 2016

src/caffe/test/test_inner_product_layer.cpp
+ const int count = this->blob_top_->count();
+ Blob<Dtype>* const top = new Blob<Dtype>();
+ top->ReshapeLike(*this->blob_top_);
+ caffe_copy(count, this->blob_top_->cpu_data(), top->mutable_cpu_data());
+ this->blob_top_vec_.clear();
+ this->blob_top_vec_.push_back(new Blob<Dtype>());
+ inner_product_param->set_transpose(true);
+ shared_ptr<InnerProductLayer<Dtype> > ip_t(
+ new InnerProductLayer<Dtype>(layer_param));
+ ip_t->SetUp(this->blob_bottom_vec_, this->blob_top_vec_);
+ const int count_w = layer->blobs()[0]->count();
+ EXPECT_EQ(count_w, ip_t->blobs()[0]->count());
+ // manually copy and transpose the weights from 1st IP layer into 2nd
+ const Dtype* w = layer->blobs()[0]->cpu_data();
+ Dtype* w_t = ip_t->blobs()[0]->mutable_cpu_data();
+ const int WIDTH = layer->blobs()[0]->shape(1);
@jeffdonahue

jeffdonahue Feb 3, 2016

Contributor

variable names WIDTH and WIDTH_T should be lowercased

@kashefy

kashefy Feb 3, 2016

Contributor

I capitalized them because their constants. Overkill?

@jeffdonahue

jeffdonahue Feb 3, 2016

Contributor

In general we follow the Google style guide, but I think our convention is to use regular lowercase with underscore variable names (my_variable_name) for consts initialized from other variables (as in this case), but camel case with k prefix (kMyVariableName) for consts directly initialized with their values known at compile time per the Google style guide. We use all caps names only for macros AFAIK.

@kashefy

kashefy Feb 3, 2016

Contributor

Got it, will fix. Thanks for clarifying.
On Feb 3, 2016 23:27, "Jeff Donahue" notifications@github.com wrote:

In src/caffe/test/test_inner_product_layer.cpp
#3612 (comment):

  • const int count = this->blob_top_->count();
  • Blob* const top = new Blob();
  • top->ReshapeLike(*this->blob_top_);
  • caffe_copy(count, this->blob_top_->cpu_data(), top->mutable_cpu_data());
  • this->blob_top_vec_.clear();
  • this->blob_top_vec_.push_back(new Blob());
  • inner_product_param->set_transpose(true);
  • shared_ptr<InnerProductLayer > ip_t(
  •    new InnerProductLayer<Dtype>(layer_param));
    
  • ip_t->SetUp(this->blob_bottom_vec_, this->blob_top_vec_);
  • const int count_w = layer->blobs()[0]->count();
  • EXPECT_EQ(count_w, ip_t->blobs()[0]->count());
  • // manually copy and transpose the weights from 1st IP layer into 2nd
  • const Dtype* w = layer->blobs()[0]->cpu_data();
  • Dtype* w_t = ip_t->blobs()[0]->mutable_cpu_data();
  • const int WIDTH = layer->blobs()[0]->shape(1);

In general we follow the Google style guide, but I think our convention is
to use regular lowercase with underscore variable names (my_variable_name)
for consts initialized from other variables (as in this case), but camel
case with k prefix (kMyVariableName) for consts directly initialized with
their values known at compile time per the Google style guide
https://google.github.io/styleguide/cppguide.html#Constant_Names. We
use all caps names only for macros AFAIK.


Reply to this email directly or view it on GitHub
https://github.com/BVLC/caffe/pull/3612/files#r51797892.

Contributor

jeffdonahue commented Feb 3, 2016

@kashefy the tests and code look good but see comments/nitpicks above. Once you've addressed these, please squash your history to a single commit (or, if you prefer, two commits -- one for the style fixes of existing code, and another for your added feature and tests), and I can merge this. Thanks!

Contributor

kashefy commented Feb 3, 2016

@jeffdonahue thanks for the feedback. Will go over the redundant test. The PR as it is now only adds a transpose feature to the ip layer. Tiying weights in an auto encoder doesn't work yet. If you think the transpose feature is useful on its own, I can the tying part in another PR.

Contributor

jeffdonahue commented Feb 3, 2016

I'm not sure I understand why shared weights between an "encoder" and "decoder" layer wouldn't work in the current form. Both the shape and memory layout of the weight matrix would be the same between a normal IP layer (transpose = false) that takes D-dimensional input and produces N-dimensional output, and a transposed IP layer (transpose = true) that takes N-dimensional input and produces D-dimensional output. Given that, I would think the only other thing that should need to be done in the transpose=true case (and which you have done here) is to change the BLAS transpose settings in forward/backward when reading from/writing to the weights.

I could certainly be missing something though.

Contributor

kashefy commented Feb 3, 2016

Transposing works for tied weights in an autoencoder as well. All good to go.

Contributor

kashefy commented Feb 5, 2016

Failures are due to import errors when running python nose tests. Possible solution in #3638

Contributor

kashefy commented Feb 8, 2016

Travis job passing now, can't really explain why. But glad it the import errors are gone now. All good to go.

Contributor

kashefy commented Feb 17, 2016

Hello @jeffdonahue, I think this is ready. The transpose worked for shared weights after all as is.

Contributor

jeffdonahue commented Feb 18, 2016

@kashefy thanks, looks like this is almost there! But could you add a simple TestGradientTranspose test? It should be exactly the same as the existing TestGradient but of course have one extra line that does set_transpose(true). And with that test added I'm inclined to say TestBackwardTranspose should be removed, unless you think there is something additionally tested in that which isn't covered by the gradient checker.

@kashefy kashefy tranpose parameter added to IP layer to support tied weights in an au…
…toencoder. Arguments to matrix multiplication function are conditioned on this parameter, no actual transposing takes place.

test ip gradient computation with transpose on
8f847fa
Contributor

kashefy commented Feb 20, 2016

Hello @jeffdonahue, I've added TestGradientTranspose (bascially TestGradient + set transpose to true, as you suggested). You're right, TestBackwardTranspose is somewhat redundant. It's not covering anything that the gradient checker isn't already covering. However, I find it to be helpful in narrowing down where a failure could come from. The test was actually very helpful in setting up the backward computation, so I'm a bit reluctant in throwing it out. I tend to use tests as a debug aid, so I usually end up writing more to better understand where a failure is coming from at the expense of increasing redundancy and length of code.

Contributor

kashefy commented Feb 25, 2016

Hello @jeffdonahue, do you think the current tests are sufficient? Anything else you think should go into this PR? Thanks.

Contributor

jeffdonahue commented Feb 25, 2016

@kashefy thanks for adding the gradient check; I suppose it can't hurt much to have the backward test as it's presumably very quick (relative to the full gradient check). LGTM -- thanks for this work.

@jeffdonahue jeffdonahue added a commit that referenced this pull request Feb 25, 2016

@jeffdonahue jeffdonahue Merge pull request #3612 from kashefy/tied_weights_ip_transpose
Tied weights with transpose flag for InnerProduct layer
fe0f441

@jeffdonahue jeffdonahue merged commit fe0f441 into BVLC:master Feb 25, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@fxbit fxbit added a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

@jeffdonahue @fxbit jeffdonahue + fxbit Merge pull request #3612 from kashefy/tied_weights_ip_transpose
Tied weights with transpose flag for InnerProduct layer
b47c249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment