Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
RNN + LSTM Layers #3948
Conversation
jeffdonahue
added the
ready for review
label
Apr 5, 2016
weiliu89
added a commit
to weiliu89/caffe
that referenced
this pull request
Apr 7, 2016
|
|
weiliu89 |
b1678f3
|
longjon
commented on an outdated diff
Apr 8, 2016
| +#include <utility> | ||
| +#include <vector> | ||
| + | ||
| +#include "caffe/blob.hpp" | ||
| +#include "caffe/common.hpp" | ||
| +#include "caffe/layer.hpp" | ||
| +#include "caffe/net.hpp" | ||
| +#include "caffe/proto/caffe.pb.h" | ||
| + | ||
| +namespace caffe { | ||
| + | ||
| +template <typename Dtype> class RecurrentLayer; | ||
| + | ||
| +/** | ||
| + * @brief An abstract class for implementing recurrent behavior inside of an | ||
| + * unrolled network. This Layer type cannot be instantiated -- instaed, |
|
|
shelhamer
added the
focus
label
Apr 8, 2016
weiliu89
added a commit
to weiliu89/caffe
that referenced
this pull request
Apr 9, 2016
|
|
weiliu89 |
8afb9c5
|
weiliu89
commented
Apr 10, 2016
|
It doesn't work with current net_spec.py. In specific, 1) it will fail when using L.LSTM() or L.RNN() since only RecurrentParameter is defined in the caffe.proto. 2) it will fail when using L.Recurrent() since RecurrentLayer is not registered (an abstract class). I did a simple hack by adding the following in the param_name_dict() function in net_spec.py
|
|
@weiliu89 the recurrent parameter for these layers, like the convolution parameter for
Whether or not to map these shared parameter types as you suggest here or as suggested or |
shelhamer
commented on an outdated diff
Apr 16, 2016
| + * @param top output Blob vector (length 1) | ||
| + * -# @f$ (T \times N \times D) @f$ | ||
| + * the time-varying output @f$ y @f$, where @f$ D @f$ is | ||
| + * <code>recurrent_param.num_output()</code>. | ||
| + * Refer to documentation for particular RecurrentLayer implementations | ||
| + * (such as RNNLayer and LSTMLayer) for the definition of @f$ y @f$. | ||
| + */ | ||
| + virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom, | ||
| + const vector<Blob<Dtype>*>& top); | ||
| + virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom, | ||
| + const vector<Blob<Dtype>*>& top); | ||
| + virtual void Backward_cpu(const vector<Blob<Dtype>*>& top, | ||
| + const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom); | ||
| + | ||
| + /// @brief A helper function, useful for stringifying timestep indices. | ||
| + virtual string int_to_str(const int t) const; |
shelhamer
Owner
|
shelhamer
commented on an outdated diff
Apr 16, 2016
| + } | ||
| +} | ||
| + | ||
| +template <typename Dtype> | ||
| +void RecurrentLayer<Dtype>::Reset() { | ||
| + // "Reset" the hidden state of the net by zeroing out all recurrent outputs. | ||
| + for (int i = 0; i < recur_output_blobs_.size(); ++i) { | ||
| + caffe_set(recur_output_blobs_[i]->count(), Dtype(0), | ||
| + recur_output_blobs_[i]->mutable_cpu_data()); | ||
| + } | ||
| +} | ||
| + | ||
| +template <typename Dtype> | ||
| +void RecurrentLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom, | ||
| + const vector<Blob<Dtype>*>& top) { | ||
| + // Hacky fix for test time... reshare all the shared blobs. |
shelhamer
Owner
|
shelhamer
commented on an outdated diff
Apr 16, 2016
| + | ||
| +#include "caffe/blob.hpp" | ||
| +#include "caffe/common.hpp" | ||
| +#include "caffe/layer.hpp" | ||
| +#include "caffe/layers/recurrent_layer.hpp" | ||
| +#include "caffe/net.hpp" | ||
| +#include "caffe/proto/caffe.pb.h" | ||
| + | ||
| +namespace caffe { | ||
| + | ||
| +template <typename Dtype> class RecurrentLayer; | ||
| + | ||
| +/** | ||
| + * @brief Processes sequential inputs using a "Long Short-Term Memory" (LSTM) | ||
| + * [1] style recurrent neural network (RNN). Implemented as a network | ||
| + * unrolled the LSTM computation in time. |
shelhamer
Owner
|
shelhamer
commented on an outdated diff
Apr 16, 2016
| +#include "caffe/filler.hpp" | ||
| +#include "caffe/layers/lstm_layer.hpp" | ||
| + | ||
| +#include "caffe/test/test_caffe_main.hpp" | ||
| +#include "caffe/test/test_gradient_check_util.hpp" | ||
| + | ||
| +namespace caffe { | ||
| + | ||
| +template <typename TypeParam> | ||
| +class LSTMLayerTest : public MultiDeviceTest<TypeParam> { | ||
| + typedef typename TypeParam::Dtype Dtype; | ||
| + | ||
| + protected: | ||
| + LSTMLayerTest() : num_output_(7) { | ||
| + blob_bottom_vec_.push_back(&blob_bottom_); | ||
| + blob_bottom_vec_.push_back(&blob_bottom_flush_); |
shelhamer
Owner
|
|
LGTM overall—my only comments were about comments and naming (and that one int -> string function). @longjon are you done with your review? |
|
Looks great. Thanks for this @jeffdonahue. We've been using a variant of this for a while and it has performed great. One thing we can additionally PR/gist (if it's useful) is a wrapper around the LSTM layer that allows for arbitrary length (batched) forward propagation - which came in handy when doing inference on arbitrary length sequences (relaxing the constraint around T_ while preserving memory efficiency for the forward pass by reusing activations across timesteps). |
|
@shelhamer @longjon thanks for the review! Fixed as suggested. @ajtulloch glad to hear it's been working for you guys, thanks for looking it over! I'm not sure I understand the idea of the wrapper though. I think this implementation should be able to do what you're saying -- memory efficient forward propagation over arbitrarily long sequences -- by feeding in |
|
@jeffdonahue yeah, the only contribution was around allowing variable- |
|
Ah -- batching the input transformation regardless of sequence length indeed makes sense. Thanks in advance for posting the code! |
niketanpansare
added a commit
to niketanpansare/systemml
that referenced
this pull request
May 10, 2016
|
|
niketanpansare |
a416baf
|
MinaRe
commented
May 13, 2016
|
Dear all I have very big matrix(rows are ID and columns are label ) and I was wondering to know How can i do the training on caffe with just fully connected layers? Thanks a lot. |
niketanpansare
added a commit
to niketanpansare/systemml
that referenced
this pull request
May 13, 2016
|
|
niketanpansare |
9579358
|
dangweili
commented
May 20, 2016
|
When will this issue be merged ? |
yshean
commented
May 21, 2016
|
Anyone successfully merged @jeffdonahue caffe:recurrent-layer and BVLC's caffe:master? Why does the assertion of
|
myfavouritekk
added a commit
to myfavouritekk/caffe
that referenced
this pull request
May 24, 2016
|
|
myfavouritekk |
3dcc5f8
|
aralph
added a commit
to aralph/caffe
that referenced
this pull request
Jun 1, 2016
|
|
aralph |
d6031ee
|
jeffdonahue
added some commits
Feb 15, 2015
|
Thanks again for the reviews everyone. Sorry for the delays -- wanted to do some additional testing, but I'm now comfortable enough with this to merge. |
jeffdonahue
merged commit 58b10b4
into
BVLC:master
Jun 2, 2016
1 check passed
|
Very nice work @jeffdonahue. |
|
@jeffdonahue |
jeffdonahue
deleted the
jeffdonahue:recurrent-layer branch
Jun 2, 2016
This was referenced Jun 2, 2016
jakirkham
commented
Jun 3, 2016
|
Any plans for a release? |
|
Can you have a link of a working tutorial/example on using these layers? It would be easier for new learners. I know you have it somewhere. |
yjxiong
pushed a commit
to yjxiong/caffe
that referenced
this pull request
Jun 15, 2016
|
|
jeffdonahue + |
702db71
|
wenwei202
commented
Jun 30, 2016
|
Great work!!! @jeffdonahue I used https://github.com/jeffdonahue/caffe/tree/recurrent-rebase-cleanup/ as the example to do
Any suggestion? |
UsamaShafiq91
commented
Jul 1, 2016
|
@jeffdonahue I am new to caffe. Do you have any example about RNN. How to use RNN layer. |
agethen
commented
Jul 26, 2016
|
@jeffdonahue May I ask your help for a clarification? I can see in |
wenwei202
commented
Jul 29, 2016
|
Hello, what makes it necessary to switch the dimension order of bottom blob from |
shelhamer
referenced
this pull request
Aug 25, 2016
Closed
Why LSTMParameter code isn't merged? #4629
fxbit
added a commit
to Yodigram/caffe
that referenced
this pull request
Sep 1, 2016
|
|
jeffdonahue + fxbit |
fd13748
|
jeffdonahue commentedApr 5, 2016
This PR includes the core functionality (with minor changes) of #2033 -- the
RNNLayerandLSTMLayerimplementations (as well as the parentRecurrentLayerclass) -- without the COCO data downloading/processing tools or the LRCN example.Breaking off this chunk for merge should make users who are already using these layer types on their own happy, without adding a large review/maintenance burden for the examples (which have already broken multiple times due to changes in the COCO data distribution format...). On the other hand, without any example on how to format the input data for these layers, it will be fairly difficult to get started, so I'd still like to follow up with at least a simple sequence example for official inclusion in Caffe (maybe memorizing a random integer sequence -- I think I have some code for that somewhere) soon after the core functionality is merged.
There's still at least one documentation TODO: I added
expose_hiddento allow direct access (via bottoms/tops) to the recurrent model's 0th timestep and Tth timestep hidden states, but didn't add anything to the list of bottoms/tops -- still need to do that. Otherwise, this should be ready for review.