Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Tied weights with transpose flag for InnerProduct layer #3612
Conversation
jeffdonahue
commented on an outdated diff
Jan 29, 2016
| @@ -19,8 +19,9 @@ void InnerProductLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom, | ||
| caffe_gpu_axpy<Dtype>(N_, bias_multiplier_.cpu_data()[0], | ||
| this->blobs_[1]->gpu_data(), top_data); | ||
| } else { | ||
| - caffe_gpu_gemm<Dtype>(CblasNoTrans, CblasTrans, M_, N_, K_, (Dtype)1., | ||
| - bottom_data, weight, (Dtype)0., top_data); | ||
| + caffe_gpu_gemm<Dtype>(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans, | ||
| + M_, N_, K_, (Dtype)1., | ||
| + bottom_data, weight, (Dtype)0., top_data); |
|
|
jeffdonahue
commented on an outdated diff
Jan 29, 2016
|
Thanks @kashefy! This looks pretty good to me.
This shouldn't be needed -- instead the weight param should be set to the correct shape by swapping Besides that, please see the style nitpicks and squash your history to a single commit. Re testing: it would be good to have a few unit tests:
|
|
@jeffdonahue, thanks for the feedback. Will fix styling (travis build failed because of it) and add the unit tests. |
|
Great, thanks. Also just noticed you didn't change backward -- pretty sure that will need a different |
|
quick update: |
jeffdonahue
commented on an outdated diff
Feb 3, 2016
| @@ -148,4 +265,127 @@ TYPED_TEST(InnerProductLayerTest, TestGradient) { | ||
| } | ||
| } | ||
| +TYPED_TEST(InnerProductLayerTest, TestGradientTransposeFalse) { |
jeffdonahue
Contributor
|
jeffdonahue
and 1 other
commented on an outdated diff
Feb 3, 2016
| + const int count = this->blob_top_->count(); | ||
| + Blob<Dtype>* const top = new Blob<Dtype>(); | ||
| + top->ReshapeLike(*this->blob_top_); | ||
| + caffe_copy(count, this->blob_top_->cpu_data(), top->mutable_cpu_data()); | ||
| + this->blob_top_vec_.clear(); | ||
| + this->blob_top_vec_.push_back(new Blob<Dtype>()); | ||
| + inner_product_param->set_transpose(true); | ||
| + shared_ptr<InnerProductLayer<Dtype> > ip_t( | ||
| + new InnerProductLayer<Dtype>(layer_param)); | ||
| + ip_t->SetUp(this->blob_bottom_vec_, this->blob_top_vec_); | ||
| + const int count_w = layer->blobs()[0]->count(); | ||
| + EXPECT_EQ(count_w, ip_t->blobs()[0]->count()); | ||
| + // manually copy and transpose the weights from 1st IP layer into 2nd | ||
| + const Dtype* w = layer->blobs()[0]->cpu_data(); | ||
| + Dtype* w_t = ip_t->blobs()[0]->mutable_cpu_data(); | ||
| + const int WIDTH = layer->blobs()[0]->shape(1); |
jeffdonahue
Contributor
|
|
@kashefy the tests and code look good but see comments/nitpicks above. Once you've addressed these, please squash your history to a single commit (or, if you prefer, two commits -- one for the style fixes of existing code, and another for your added feature and tests), and I can merge this. Thanks! |
|
@jeffdonahue thanks for the feedback. Will go over the redundant test. The PR as it is now only adds a transpose feature to the ip layer. Tiying weights in an auto encoder doesn't work yet. If you think the transpose feature is useful on its own, I can the tying part in another PR. |
|
I'm not sure I understand why shared weights between an "encoder" and "decoder" layer wouldn't work in the current form. Both the shape and memory layout of the weight matrix would be the same between a normal IP layer ( I could certainly be missing something though. |
|
Transposing works for tied weights in an autoencoder as well. All good to go. |
|
Failures are due to import errors when running python nose tests. Possible solution in #3638 |
|
Travis job passing now, can't really explain why. But glad it the import errors are gone now. All good to go. |
|
Hello @jeffdonahue, I think this is ready. The transpose worked for shared weights after all as is. |
|
@kashefy thanks, looks like this is almost there! But could you add a simple |
|
Hello @jeffdonahue, I've added |
|
Hello @jeffdonahue, do you think the current tests are sufficient? Anything else you think should go into this PR? Thanks. |
|
@kashefy thanks for adding the gradient check; I suppose it can't hurt much to have the backward test as it's presumably very quick (relative to the full gradient check). LGTM -- thanks for this work. |
jeffdonahue
added a commit
that referenced
this pull request
Feb 25, 2016
|
|
jeffdonahue |
fe0f441
|
jeffdonahue
merged commit fe0f441
into
BVLC:master
Feb 25, 2016
1 check passed
fxbit
added a commit
to Yodigram/caffe
that referenced
this pull request
Sep 1, 2016
|
|
jeffdonahue + fxbit |
b47c249
|
kashefy commentedJan 29, 2016
I wanted to train an autoencoder where the deocder uses the tranpose of the encoder's weight matrix. This is first discussed in #670 and followed up in #1211 (comment). But it seemed this wasn't resolved. I found @jeffdonahue 's suggestion in this comment to just add a transpose flag to the InnerProduct layer quite reasonable.
This PR adds a transpose flag to the InnerProduct layer as well as its params protobuf message.
When set to true for the deocder, in the forward pass, the call to the matrix multiplication routine is instructed to NOT transpose the weight matrix. Which is what you want in the usual case and for the encoder.
Tying the weights between encoder and decoder requires:
A sample trainval.prototxt to demonstrate usage.
I haven't written unit tests around this yet. Open to suggestions to what makes sense to test for here.
Thanks for reviewing and looking forward to the feedback.