Unrolled recurrent layers (RNN, LSTM) #2033

Closed
wants to merge 11 commits into
from

Conversation

Projects
None yet
@jeffdonahue
Contributor

jeffdonahue commented Mar 4, 2015

(Replaces #1873)

Based on #2032 (adds EmbedLayer -- not needed for, but often used with RNNs in practice, and is needed for my examples), which in turn is based on #1977.

This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

cd data/coco
./get_coco_aux.sh # download train/val/test splits
./download_tools.sh # download official COCO tool
cd tools
python setup.py install # follow instructions to install tools and download COCO data if needed
cd ../../.. # back to caffe root
./examples/coco_caption/coco_to_hdf5_data.py

Then, you can train a language model using ./examples/coco_caption/train_language_model.sh, or train LRCN for captioning using ./examples/coco_caption/train_lrcn.sh (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel).

Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

@cvondrick

This comment has been minimized.

Show comment
Hide comment
@cvondrick

cvondrick Mar 20, 2015

Firstly, thanks for the fantastic code. I had been playing with my own LSTM, and found this PR, and it is above and beyond any of my own attempts. Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the ReshapeLayer will produce all zeros instead of actually copying the data. I've created a minimal test case that shows this failure for this PR:

# Load a random dataset 
layer {
  name: "ToyData_1"
  type: "DummyData"
  top: "ToyData_1"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "gaussian"
      std: 1
    }
  }
}

layer {
  name: "ZeroData"
  type: "DummyData"
  top: "ZeroData"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "constant"
      value: 0
    }
  }
}


# Reshape ToyData_1 to be the same size
layer {
  name: "Reshape"
  type: "Reshape"
  bottom: "ToyData_1"
  top: "ToyData_2"
  reshape_param {
    shape {
      dim: 101
      dim: 7
      dim: 3
    }
  }
}





# Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ToyData_2"
  top: "ToyData_1_vs_2_Difference"
  type: "EuclideanLoss"
}

# We expect this loss to be non-zero, and it is non-zero. 
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ZeroData"
  top: "ToyData_1_vs_Zero_Difference"
  type: "EuclideanLoss"
}

# Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_2"
  bottom: "ZeroData"
  top: "ToyData_2_vs_Zero_Difference"
  type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the exact same size (identity) to create ToyData_2. We would expect that || ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero. Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note that, as expected, the loss between ToyData_1 and all zeros is non-zero.

It seems there is a bug with reshape. I've fixed it here by copying an older version of Reshape into this branch: https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But, hope this bug reports helps.

The same issue occurs in #2088

Carl

Firstly, thanks for the fantastic code. I had been playing with my own LSTM, and found this PR, and it is above and beyond any of my own attempts. Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the ReshapeLayer will produce all zeros instead of actually copying the data. I've created a minimal test case that shows this failure for this PR:

# Load a random dataset 
layer {
  name: "ToyData_1"
  type: "DummyData"
  top: "ToyData_1"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "gaussian"
      std: 1
    }
  }
}

layer {
  name: "ZeroData"
  type: "DummyData"
  top: "ZeroData"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "constant"
      value: 0
    }
  }
}


# Reshape ToyData_1 to be the same size
layer {
  name: "Reshape"
  type: "Reshape"
  bottom: "ToyData_1"
  top: "ToyData_2"
  reshape_param {
    shape {
      dim: 101
      dim: 7
      dim: 3
    }
  }
}





# Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ToyData_2"
  top: "ToyData_1_vs_2_Difference"
  type: "EuclideanLoss"
}

# We expect this loss to be non-zero, and it is non-zero. 
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ZeroData"
  top: "ToyData_1_vs_Zero_Difference"
  type: "EuclideanLoss"
}

# Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_2"
  bottom: "ZeroData"
  top: "ToyData_2_vs_Zero_Difference"
  type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the exact same size (identity) to create ToyData_2. We would expect that || ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero. Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note that, as expected, the loss between ToyData_1 and all zeros is non-zero.

It seems there is a bug with reshape. I've fixed it here by copying an older version of Reshape into this branch: https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But, hope this bug reports helps.

The same issue occurs in #2088

Carl

@cvondrick cvondrick referenced this pull request in jeffdonahue/caffe Mar 20, 2015

Open

Fix bug in recurrent branch #6

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 20, 2015

Contributor

Well that's disturbing... I don't have time to look into it now but thanks
a lot for reporting Carl! Will follow up when I've figured something out.
On Mar 20, 2015 10:41 AM, "Carl Vondrick" notifications@github.com wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl


Reply to this email directly or view it on GitHub
#2033 (comment).

Contributor

jeffdonahue commented Mar 20, 2015

Well that's disturbing... I don't have time to look into it now but thanks
a lot for reporting Carl! Will follow up when I've figured something out.
On Mar 20, 2015 10:41 AM, "Carl Vondrick" notifications@github.com wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl


Reply to this email directly or view it on GitHub
#2033 (comment).

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 20, 2015

Contributor

Oops, failed to read to the end and see that you already had a fix. Thanks
for posting the fix! (I think the current version of my reshapelayer PR
may do what your fix does, in which case I'll just rebase this onto that PR
as I should anyway.)
On Mar 20, 2015 10:41 AM, "Carl Vondrick" notifications@github.com wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl


Reply to this email directly or view it on GitHub
#2033 (comment).

Contributor

jeffdonahue commented Mar 20, 2015

Oops, failed to read to the end and see that you already had a fix. Thanks
for posting the fix! (I think the current version of my reshapelayer PR
may do what your fix does, in which case I'll just rebase this onto that PR
as I should anyway.)
On Mar 20, 2015 10:41 AM, "Carl Vondrick" notifications@github.com wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl


Reply to this email directly or view it on GitHub
#2033 (comment).

@cvondrick

This comment has been minimized.

Show comment
Hide comment
@cvondrick

cvondrick Mar 20, 2015

Thanks Jeff -- yeah, we fixed it by copying a ReshapeLayer from somewhere else. Unfortunately, we have lost track of exactly where that layer came from, but I'm sure somebody here (maybe even you) wrote it at some point.

Thanks Jeff -- yeah, we fixed it by copying a ReshapeLayer from somewhere else. Unfortunately, we have lost track of exactly where that layer came from, but I'm sure somebody here (maybe even you) wrote it at some point.

@hf

This comment has been minimized.

Show comment
Hide comment
@hf

hf Mar 24, 2015

When is this feature going to be ready? Is there something to be done?

hf commented Mar 24, 2015

When is this feature going to be ready? Is there something to be done?

@thuyen

This comment has been minimized.

Show comment
Hide comment
@thuyen

thuyen Mar 24, 2015

For the captioning model, can anyone show me how to generate captions after the training is done? Current LSTM layers process the whole input sequence (20 words in the coco example) across time, but we need to generate one by one at each time step (current time step is the input to the next).

thuyen commented Mar 24, 2015

For the captioning model, can anyone show me how to generate captions after the training is done? Current LSTM layers process the whole input sequence (20 words in the coco example) across time, but we need to generate one by one at each time step (current time step is the input to the next).

@vadimkantorov

This comment has been minimized.

Show comment
Hide comment
@vadimkantorov

vadimkantorov Mar 24, 2015

I've just tried to run train_lcrn.sh (after running coco_to_hdf5_data.py and other scripts) and I get a "dimensions don't match" error:

F0324 16:37:24.435840 20612 eltwise_layer.cpp:51] Check failed: bottom[i]->shape() == bottom[0]->shape()

The stack-trace and log are here: http://pastebin.com/fWUxsSmv

I've uncommented line 471 in net.cpp to find the faulty layer (the only modification). It seems it happens in lstm2 which blends input from the language model and from the image CNN.

train_language_model.sh runs fine without errors.

Ideas?

I've just tried to run train_lcrn.sh (after running coco_to_hdf5_data.py and other scripts) and I get a "dimensions don't match" error:

F0324 16:37:24.435840 20612 eltwise_layer.cpp:51] Check failed: bottom[i]->shape() == bottom[0]->shape()

The stack-trace and log are here: http://pastebin.com/fWUxsSmv

I've uncommented line 471 in net.cpp to find the faulty layer (the only modification). It seems it happens in lstm2 which blends input from the language model and from the image CNN.

train_language_model.sh runs fine without errors.

Ideas?

@ih4cku

This comment has been minimized.

Show comment
Hide comment
@ih4cku

ih4cku Mar 24, 2015

Contributor

By the way, does Caffe's recurrent layer support bi-directional RNN?

Contributor

ih4cku commented Mar 24, 2015

By the way, does Caffe's recurrent layer support bi-directional RNN?

@vadimkantorov

This comment has been minimized.

Show comment
Hide comment
@vadimkantorov

vadimkantorov Mar 24, 2015

Both factored and unfactored setups are concerned. Seems there are some dimensions problems while blending CNN input with embedded text input.

Both factored and unfactored setups are concerned. Seems there are some dimensions problems while blending CNN input with embedded text input.

@ritsu1228

This comment has been minimized.

Show comment
Hide comment
@ritsu1228

ritsu1228 Mar 25, 2015

I have the same question as @thuyen. My understanding is that the current unrolled architecture slices an input sentence and feed the resulting words to each time step at once. So, for both train and test nets, the ground truth sentences are fed to the unrolled net. However, for captioning an image, there is no sentence to give to the net. But I don't think it is correct to give the start symbol to each layer. Did I miss anything?

I have the same question as @thuyen. My understanding is that the current unrolled architecture slices an input sentence and feed the resulting words to each time step at once. So, for both train and test nets, the ground truth sentences are fed to the unrolled net. However, for captioning an image, there is no sentence to give to the net. But I don't think it is correct to give the start symbol to each layer. Did I miss anything?

@ritsu1228

This comment has been minimized.

Show comment
Hide comment
@ritsu1228

ritsu1228 Mar 25, 2015

The dimension check fails for the static input (the image feature) with size 100_4000 vs 1_100*4000. It seems to be caused by Reshape layer; @cvondrick 's fix seems to solve this.

The dimension check fails for the static input (the image feature) with size 100_4000 vs 1_100*4000. It seems to be caused by Reshape layer; @cvondrick 's fix seems to solve this.

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 25, 2015

Contributor

Yes, as noted by @cvondrick, this works with the older version of the ReshapeLayer which puts everything in Reshape (as opposed to the newer one that uses LayerSetUp -- see discussion with @longjon in #2088). I don't yet have any idea why the Reshape version would work but the LayerSetUp version wouldn't, but I've just force pushed a new version of this branch that uses the previous ReshapeLayer version, and confirmed that both example scripts (train_lrcn.sh & train_language_model.sh) run. Sorry for breaking the LRCN one.

Contributor

jeffdonahue commented Mar 25, 2015

Yes, as noted by @cvondrick, this works with the older version of the ReshapeLayer which puts everything in Reshape (as opposed to the newer one that uses LayerSetUp -- see discussion with @longjon in #2088). I don't yet have any idea why the Reshape version would work but the LayerSetUp version wouldn't, but I've just force pushed a new version of this branch that uses the previous ReshapeLayer version, and confirmed that both example scripts (train_lrcn.sh & train_language_model.sh) run. Sorry for breaking the LRCN one.

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 25, 2015

Contributor

By the way, does Caffe's recurrent layer support bi-directional RNN?

You can create a bi-directional RNN using 2 RNN layers and feeding one the input in forward order and the other the input in backward order, and fusing their per-timestep outputs however you like (e.g. eltwise sum or concatenation).

Contributor

jeffdonahue commented Mar 25, 2015

By the way, does Caffe's recurrent layer support bi-directional RNN?

You can create a bi-directional RNN using 2 RNN layers and feeding one the input in forward order and the other the input in backward order, and fusing their per-timestep outputs however you like (e.g. eltwise sum or concatenation).

@vadimkantorov

This comment has been minimized.

Show comment
Hide comment
@vadimkantorov

vadimkantorov Mar 25, 2015

Thanks @jeffdonahue , training lrcn now works! Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

Thanks @jeffdonahue , training lrcn now works! Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

@read-mind

This comment has been minimized.

Show comment
Hide comment
@read-mind

read-mind Mar 27, 2015

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 30, 2015

Contributor

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

True; one would need to implement an additional layer to do the reversal. You'd also need to be careful to ensure that your instances do not cross batch boundaries (as is allowed by my implementation as it works fine for unidirectional architectures) since inference at each timestep is dependent on all other timesteps in a bidirectional RNN.

Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

In the not-too-distant future I'll add code for evaluation, including using the model's own predictions as input in future timesteps as you mention.

Contributor

jeffdonahue commented Mar 30, 2015

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

True; one would need to implement an additional layer to do the reversal. You'd also need to be careful to ensure that your instances do not cross batch boundaries (as is allowed by my implementation as it works fine for unidirectional architectures) since inference at each timestep is dependent on all other timesteps in a bidirectional RNN.

Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

In the not-too-distant future I'll add code for evaluation, including using the model's own predictions as input in future timesteps as you mention.

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 30, 2015

Contributor

I've also gotten a number of questions on the optional third input to RecurrentLayer -- I've added some clarification in the original post:

There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

Contributor

jeffdonahue commented Mar 30, 2015

I've also gotten a number of questions on the optional third input to RecurrentLayer -- I've added some clarification in the original post:

There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

@liqing-ustc

This comment has been minimized.

Show comment
Hide comment
@liqing-ustc

liqing-ustc Mar 31, 2015

Thanks for the fantastic code. But the code of the Reshape function in Recurrent layer makes me confused. when passing data from "output_blobs_" to "top blobs", why it is

    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);

rather than

    top[i]->ShareData(*output_blobs_[i]);
    top[i]->ShareDiff(*output_blobs_[i]);

it seems that the top blobs is just reshaped and empty.

the original code is here:

template <typename Dtype>
void RecurrentLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  CHECK_EQ(top.size(), output_blobs_.size());
  for (int i = 0; i < top.size(); ++i) {
    top[i]->ReshapeLike(*output_blobs_[i]);
    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);
  }
  x_input_blob_->ShareData(*bottom[0]);
  x_input_blob_->ShareDiff(*bottom[0]);
  cont_input_blob_->ShareData(*bottom[1]);
  if (static_input_) {
    x_static_input_blob_->ShareData(*bottom[2]);
    x_static_input_blob_->ShareDiff(*bottom[2]);
  }
}

Thanks for the fantastic code. But the code of the Reshape function in Recurrent layer makes me confused. when passing data from "output_blobs_" to "top blobs", why it is

    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);

rather than

    top[i]->ShareData(*output_blobs_[i]);
    top[i]->ShareDiff(*output_blobs_[i]);

it seems that the top blobs is just reshaped and empty.

the original code is here:

template <typename Dtype>
void RecurrentLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  CHECK_EQ(top.size(), output_blobs_.size());
  for (int i = 0; i < top.size(); ++i) {
    top[i]->ReshapeLike(*output_blobs_[i]);
    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);
  }
  x_input_blob_->ShareData(*bottom[0]);
  x_input_blob_->ShareDiff(*bottom[0]);
  cont_input_blob_->ShareData(*bottom[1]);
  if (static_input_) {
    x_static_input_blob_->ShareData(*bottom[2]);
    x_static_input_blob_->ShareDiff(*bottom[2]);
  }
}
@sunbaigui

This comment has been minimized.

Show comment
Hide comment
@sunbaigui

sunbaigui Apr 11, 2015

@jeffdonahue I've gotten a lrcn model according your example, I'm curious about how to predict the sentence of a given image. It seems there's no such a tool or deploy.prototxt yet, are you working on it?

@jeffdonahue I've gotten a lrcn model according your example, I'm curious about how to predict the sentence of a given image. It seems there's no such a tool or deploy.prototxt yet, are you working on it?

@factom factom referenced this pull request in jeffdonahue/caffe Apr 27, 2015

Closed

Recurrent net fix #8

@dribnet dribnet referenced this pull request in jeffdonahue/caffe Apr 27, 2015

Open

update recurrent branch with latest BVLC/master #9

@niuchuang

This comment has been minimized.

Show comment
Hide comment
@niuchuang

niuchuang May 1, 2015

Could someone give me some guidance about how to construct a RNN with jeffdonahue's PR? I have downloaded the lrcn.prototxt , unfortunately I cannot understand most of its contents , such as include { stage: "freeze-convnet" }, include { stage: "unfactored" } and so on. In fact,I have some time sequence image data , each of which has a label. I have trained reference model in caffe with these data, and now I try to use RNN to classify them. What document I should read so that I can understand lrcn.prototxt and something like this,and then train a RNN model with my data. Much thanks !

Could someone give me some guidance about how to construct a RNN with jeffdonahue's PR? I have downloaded the lrcn.prototxt , unfortunately I cannot understand most of its contents , such as include { stage: "freeze-convnet" }, include { stage: "unfactored" } and so on. In fact,I have some time sequence image data , each of which has a label. I have trained reference model in caffe with these data, and now I try to use RNN to classify them. What document I should read so that I can understand lrcn.prototxt and something like this,and then train a RNN model with my data. Much thanks !

@shls

This comment has been minimized.

Show comment
Hide comment
@shls

shls May 4, 2015

@jeffdonahue Hi. Could you please give me an example about using my own images based on LRCN? Thanks in advance.

shls commented May 4, 2015

@jeffdonahue Hi. Could you please give me an example about using my own images based on LRCN? Thanks in advance.

@Edward12138

This comment has been minimized.

Show comment
Hide comment
@Edward12138

Edward12138 May 27, 2015

Hi, I'm wondering if anyone can explain why all the elewise product operations are implemented as the sum operations..Thanks!

Hi, I'm wondering if anyone can explain why all the elewise product operations are implemented as the sum operations..Thanks!

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jun 9, 2015

@jeffdonahue Hi, thank you for this code. I'm able to train the LRCN on mscoco and my own data set, but I am unable to load the network to view the generated sentences. My understanding is that the layers are unrolled at all times, making sentence generation given an image rather unintuitive. Could you provide some clarification on using the LRCN for generation?
Thanks again!

ghost commented Jun 9, 2015

@jeffdonahue Hi, thank you for this code. I'm able to train the LRCN on mscoco and my own data set, but I am unable to load the network to view the generated sentences. My understanding is that the layers are unrolled at all times, making sentence generation given an image rather unintuitive. Could you provide some clarification on using the LRCN for generation?
Thanks again!

@raingo

This comment has been minimized.

Show comment
Hide comment
@raingo

raingo Jun 10, 2015

@jeffdonahue Hi, I had a look at the code, and have one question.

Between line 141-149 of rnn_layer.cpp, x_cont is combined with previous hidden state. Why they are combined using element sum operation, rather than element production?

Thanks!

raingo commented Jun 10, 2015

@jeffdonahue Hi, I had a look at the code, and have one question.

Between line 141-149 of rnn_layer.cpp, x_cont is combined with previous hidden state. Why they are combined using element sum operation, rather than element production?

Thanks!

@Edward12138

This comment has been minimized.

Show comment
Hide comment
@Edward12138

Edward12138 Jun 10, 2015

@raingo I've got the same question. Not sure if they use some trick in the sum layer.

@raingo I've got the same question. Not sure if they use some trick in the sum layer.

@pakjce

This comment has been minimized.

Show comment
Hide comment
@pakjce

pakjce Jul 1, 2015

In RecurrentLayer class, all recurrent networks are unfolded in the LayerSetUp() method. However, in case of a real-time application such as robotic control (batch size is one), unfolding is not necessary. Moreover, It could cause an out-of-memory problem with a large sequence. Is there any plan to considering this case?

pakjce commented Jul 1, 2015

In RecurrentLayer class, all recurrent networks are unfolded in the LayerSetUp() method. However, in case of a real-time application such as robotic control (batch size is one), unfolding is not necessary. Moreover, It could cause an out-of-memory problem with a large sequence. Is there any plan to considering this case?

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Jul 2, 2015

Contributor

@raingo EltwiseLayer with the addition operation optionally takes scalar coefficients for each of the addends. I modified the layer to accept an additional bottom Blob of coefficients when the coeff_blob option is set. It is a little hacky and should probably be in a separate layer.

@pakjce as far as I can tell there is no reason to have a special case for T==1; just provide 1xN blobs to indicate to the layer you're using data with a single timestep -- this is what I do for inference.

I pushed a very unpolished branch to my public fork containing inference code (including deploy prototxts as many people have requested). I used retrieval_experiment.py for all image captioning results in the paper. (You'll probably have to manually modify the variable declarations at the beginning of the main function as appropriate for your desired experimental config.) The file captioner.py in that same directory contains most of the relevant inference code.

Contributor

jeffdonahue commented Jul 2, 2015

@raingo EltwiseLayer with the addition operation optionally takes scalar coefficients for each of the addends. I modified the layer to accept an additional bottom Blob of coefficients when the coeff_blob option is set. It is a little hacky and should probably be in a separate layer.

@pakjce as far as I can tell there is no reason to have a special case for T==1; just provide 1xN blobs to indicate to the layer you're using data with a single timestep -- this is what I do for inference.

I pushed a very unpolished branch to my public fork containing inference code (including deploy prototxts as many people have requested). I used retrieval_experiment.py for all image captioning results in the paper. (You'll probably have to manually modify the variable declarations at the beginning of the main function as appropriate for your desired experimental config.) The file captioner.py in that same directory contains most of the relevant inference code.

@lostoy

This comment has been minimized.

Show comment
Hide comment
@lostoy

lostoy Jul 6, 2015

Can anyone share some ideas about : with this lstm, how to do the temporal average pooling, for video classification?

lostoy commented Jul 6, 2015

Can anyone share some ideas about : with this lstm, how to do the temporal average pooling, for video classification?

@mgarbade

This comment has been minimized.

Show comment
Hide comment
@mgarbade

mgarbade Jul 15, 2015

In @jeffdonahue's post from Mar 4 (first comment on top of this thread) he explains how to set up his LRCN caffe branch for COCO.
However I cannot make it run.
First of all I had to manually download COCO since "get_coco_aux.sh" only downloaded the filename lists "coco2014_filename.train.txt" etc.
When I try to execute "./examples/coco_caption/coco_to_hdf5_data.py" I get the error:
IOError: [Errno 2] No such file or directory: './data/coco/coco/annotations/captions_train2014.json'

I would adapt the python script "coco_to_hdf5_data.py" myself if I could but it is quite lengthy and I don't know how to repair it. Also I don't have the file "captions_train2014.json" required by the script. It wasn't downloaded.

Any help would be appreciated.

In @jeffdonahue's post from Mar 4 (first comment on top of this thread) he explains how to set up his LRCN caffe branch for COCO.
However I cannot make it run.
First of all I had to manually download COCO since "get_coco_aux.sh" only downloaded the filename lists "coco2014_filename.train.txt" etc.
When I try to execute "./examples/coco_caption/coco_to_hdf5_data.py" I get the error:
IOError: [Errno 2] No such file or directory: './data/coco/coco/annotations/captions_train2014.json'

I would adapt the python script "coco_to_hdf5_data.py" myself if I could but it is quite lengthy and I don't know how to repair it. Also I don't have the file "captions_train2014.json" required by the script. It wasn't downloaded.

Any help would be appreciated.

@seed93

This comment has been minimized.

Show comment
Hide comment
@seed93

seed93 Jul 16, 2015

@mgarbade You need to install pycocotools using download_tools.sh

seed93 commented Jul 16, 2015

@mgarbade You need to install pycocotools using download_tools.sh

@mgarbade

This comment has been minimized.

Show comment
Hide comment
@mgarbade

mgarbade Jul 16, 2015

@seed93 I tried to do this already. It's not downloading the dataset however. It only produces the following error:

data/coco/download_tools.sh 
~/libs/caffe/caffe-recurrent-rebase-cleanup ~/libs/caffe/caffe-recurrent-rebase-cleanup
fatal: destination path 'coco' already exists and is not an empty directory.
~/libs/caffe/caffe-recurrent-rebase-cleanup
Cloned COCO tools to: /home/garbade/libs/caffe/caffe-recurrent-rebase-cleanup/data/coco/coco
To setup COCO tools (and optionally download data), run:
cd /home/garbade/libs/caffe/caffe-recurrent-rebase-cleanup/data/coco/coco
python setup.py install
and follow the prompts.

In the meantime I found the files "captions_train2014.json" which are also part of the coco dataset which I downloaded manually. However the proplem remains that since the dataset is not downloaded automatically the paths to the the manually downloaded dataset are probably wrong...
Did you manage to get the example running on your computer?

@seed93 I tried to do this already. It's not downloading the dataset however. It only produces the following error:

data/coco/download_tools.sh 
~/libs/caffe/caffe-recurrent-rebase-cleanup ~/libs/caffe/caffe-recurrent-rebase-cleanup
fatal: destination path 'coco' already exists and is not an empty directory.
~/libs/caffe/caffe-recurrent-rebase-cleanup
Cloned COCO tools to: /home/garbade/libs/caffe/caffe-recurrent-rebase-cleanup/data/coco/coco
To setup COCO tools (and optionally download data), run:
cd /home/garbade/libs/caffe/caffe-recurrent-rebase-cleanup/data/coco/coco
python setup.py install
and follow the prompts.

In the meantime I found the files "captions_train2014.json" which are also part of the coco dataset which I downloaded manually. However the proplem remains that since the dataset is not downloaded automatically the paths to the the manually downloaded dataset are probably wrong...
Did you manage to get the example running on your computer?

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 13, 2016

Member

I have posted a simple example of LSTM implementation using caffe.NetSpec() on stackoverflow: http://stackoverflow.com/a/35967589/1714410, @chriss2401 I hope you'll find this example useful.

This implementation should work with regular caffe branch (master), and does not need this recurrent PR.

Advantages of using NetSpec():

  1. No need to use "non-master" branches of caffe, no need to merge PR to a private branch...
  2. More flexibility: you can easily replace the TanH activation with other form (e.g., PReLU), you can feed one time step LSTM output as an input to the next time step (AFAIK, this PR does not yet support this feature).

Disadvantages:

  1. LSTM is not encapsulated as a single layer in the prototxt: you end up with a very long prototxt describing the model and it is more difficult to debug architecture issues.
Member

shaibagon commented Mar 13, 2016

I have posted a simple example of LSTM implementation using caffe.NetSpec() on stackoverflow: http://stackoverflow.com/a/35967589/1714410, @chriss2401 I hope you'll find this example useful.

This implementation should work with regular caffe branch (master), and does not need this recurrent PR.

Advantages of using NetSpec():

  1. No need to use "non-master" branches of caffe, no need to merge PR to a private branch...
  2. More flexibility: you can easily replace the TanH activation with other form (e.g., PReLU), you can feed one time step LSTM output as an input to the next time step (AFAIK, this PR does not yet support this feature).

Disadvantages:

  1. LSTM is not encapsulated as a single layer in the prototxt: you end up with a very long prototxt describing the model and it is more difficult to debug architecture issues.
@fl2o

This comment has been minimized.

Show comment
Hide comment
@fl2o

fl2o Mar 14, 2016

@shaibagon, Thank you for your stackoverflow post!

However, I need to process signals of different sizes using LSTM, and I am wondering if this is possible using your code? I also have a label associated to each time step (it's a regression problem), how to take that into account using the euclideanLossLayer? Thanks!

@chriss2401, if you succeed, I would be interested in the changes you made to Jeff's code!

fl2o commented Mar 14, 2016

@shaibagon, Thank you for your stackoverflow post!

However, I need to process signals of different sizes using LSTM, and I am wondering if this is possible using your code? I also have a label associated to each time step (it's a regression problem), how to take that into account using the euclideanLossLayer? Thanks!

@chriss2401, if you succeed, I would be interested in the changes you made to Jeff's code!

@chriss2401

This comment has been minimized.

Show comment
Hide comment
@chriss2401

chriss2401 Mar 14, 2016

@shaibagon thanks allot for your reference, I appreciate it. Ideally I would like to build a many-to-many regression RNN ( one output for each time step) , but it seems a bit not straightforward with this implementation (you have to use two LSTM layers and define which stage you're at I guess).

@shaibagon thanks allot for your reference, I appreciate it. Ideally I would like to build a many-to-many regression RNN ( one output for each time step) , but it seems a bit not straightforward with this implementation (you have to use two LSTM layers and define which stage you're at I guess).

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@chriss2401 @fl2o The implementation I posted at SO is only a sketch: once you understand how to build the proper prototxt (NetSpec) for a single time step LSTM unit using basic caffe layers, the sky is the limit: you can use the h1 output of single_time_step_lstm(...) as the per-time step descriptor on top of which you can have a EuclideanLoss layer for each time step. I suppose you can build a more complex python script that stack together several LSTM units (per time step) etc.
For instance I currently have an architecture with a stack of 3 LSTM units per time step, unrolling it for 10 time steps results with a prototxt file with ~6.5K lines...

BTW, I find draw_net utility very useful for inspecting the resulting model for structural "bugs"

Member

shaibagon commented Mar 15, 2016

@chriss2401 @fl2o The implementation I posted at SO is only a sketch: once you understand how to build the proper prototxt (NetSpec) for a single time step LSTM unit using basic caffe layers, the sky is the limit: you can use the h1 output of single_time_step_lstm(...) as the per-time step descriptor on top of which you can have a EuclideanLoss layer for each time step. I suppose you can build a more complex python script that stack together several LSTM units (per time step) etc.
For instance I currently have an architecture with a stack of 3 LSTM units per time step, unrolling it for 10 time steps results with a prototxt file with ~6.5K lines...

BTW, I find draw_net utility very useful for inspecting the resulting model for structural "bugs"

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 15, 2016

Contributor

I totally agree (and I think I've commented elsewhere) that NetSpec is definitely the way to go for the most part -- and I may never have bothered to write the C++ versions here if I'd been aware of NetSpec at the time. So thanks for posting your example @shaibagon!

The one advantage these C++ implementations have over the NetSpec route is the ability to remember the final timestep hidden state and carry it over as the zeroth timestep hidden state of the next batch for the ability to handle arbitrarily long sequences at training time (with gradients between batches truncated). This is (AFAIK) impossible to do through NetSpec, even if you're allowed to add extra layers; it would require some extra primitives in the Net class itself, I think. And in practice I don't know how much of an advantage this feature is for any application, but there is an example of it being used in train_language_model.sh if anyone is interested...

Contributor

jeffdonahue commented Mar 15, 2016

I totally agree (and I think I've commented elsewhere) that NetSpec is definitely the way to go for the most part -- and I may never have bothered to write the C++ versions here if I'd been aware of NetSpec at the time. So thanks for posting your example @shaibagon!

The one advantage these C++ implementations have over the NetSpec route is the ability to remember the final timestep hidden state and carry it over as the zeroth timestep hidden state of the next batch for the ability to handle arbitrarily long sequences at training time (with gradients between batches truncated). This is (AFAIK) impossible to do through NetSpec, even if you're allowed to add extra layers; it would require some extra primitives in the Net class itself, I think. And in practice I don't know how much of an advantage this feature is for any application, but there is an example of it being used in train_language_model.sh if anyone is interested...

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@jeffdonahue the feature of carrying h and c to the next temporal batch is a very important one. Do you need the entire LSTM layer for this feature, or is it already implemented in the LSTMUnit layer?
The reason I am asking is that I which to have a net that feeds one time step's h as the bottom of the next time step LSTM. AFAIK, this is not possible to do using your LSTM layer in the recurrent branch. However, I managed to do that using NetSpec(). If I can use your LSTMUnits combined with NetSpec() I believe I can enjoy both worlds: having h and c carried across temporal batches AND managing to feed h_{t-1} as a bottom to LSTM_{t}!

Member

shaibagon commented Mar 15, 2016

@jeffdonahue the feature of carrying h and c to the next temporal batch is a very important one. Do you need the entire LSTM layer for this feature, or is it already implemented in the LSTMUnit layer?
The reason I am asking is that I which to have a net that feeds one time step's h as the bottom of the next time step LSTM. AFAIK, this is not possible to do using your LSTM layer in the recurrent branch. However, I managed to do that using NetSpec(). If I can use your LSTMUnits combined with NetSpec() I believe I can enjoy both worlds: having h and c carried across temporal batches AND managing to feed h_{t-1} as a bottom to LSTM_{t}!

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 15, 2016

Contributor

Nope -- LSTMUnit is just a specialized non-linearity that computes c_t and h_t given the gate input values (applying sigmoid/tanh and summing), written with the hope that there's a speed/memory advantage to doing it in one GPU kernel vs. the equivalent composition of sigmoid/tanh/slice/concat/eltwise sum/prod layers. (I'm pretty sure there is a (slight) memory advantage; not so sure of the speed advantage...)

I thought of doing this at some point, but probably won't get to it (so someone else should feel free to try if they like). You'd want something like a "SourceLayer", taking no input, that can provide the 0th timestep hidden state (would be very simple, copying its memory to its top, similar to @longjon's parameter layer which is a PR that I can't find right now...), and a "SinkLayer", taking the final timestep hidden state as input, storing it internally, and producing no output. Then you'd have to add some logic to Net so that it copies the SinkLayer's stored state to the SourceLayer's internal state at the beginning of the next call to Forward. (I think this primitive could allow for some very complex behavior beyond the obvious application to recurrent nets -- you might be able to implement parameter storage and SGD itself all in Forward with these primitives.)

Contributor

jeffdonahue commented Mar 15, 2016

Nope -- LSTMUnit is just a specialized non-linearity that computes c_t and h_t given the gate input values (applying sigmoid/tanh and summing), written with the hope that there's a speed/memory advantage to doing it in one GPU kernel vs. the equivalent composition of sigmoid/tanh/slice/concat/eltwise sum/prod layers. (I'm pretty sure there is a (slight) memory advantage; not so sure of the speed advantage...)

I thought of doing this at some point, but probably won't get to it (so someone else should feel free to try if they like). You'd want something like a "SourceLayer", taking no input, that can provide the 0th timestep hidden state (would be very simple, copying its memory to its top, similar to @longjon's parameter layer which is a PR that I can't find right now...), and a "SinkLayer", taking the final timestep hidden state as input, storing it internally, and producing no output. Then you'd have to add some logic to Net so that it copies the SinkLayer's stored state to the SourceLayer's internal state at the beginning of the next call to Forward. (I think this primitive could allow for some very complex behavior beyond the obvious application to recurrent nets -- you might be able to implement parameter storage and SGD itself all in Forward with these primitives.)

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@jeffdonahue I was thinking of using the "HDF output" data layer as the "sink" layer and a respective "HDFData" layer as the "source" layer. Less efficient than using specialized in-memory layers, but better than hacking SGDSolver... (IMHO).

Member

shaibagon commented Mar 15, 2016

@jeffdonahue I was thinking of using the "HDF output" data layer as the "sink" layer and a respective "HDFData" layer as the "source" layer. Less efficient than using specialized in-memory layers, but better than hacking SGDSolver... (IMHO).

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@jeffdonahue BTW, working with caffe.NetSpec() to construct networks starts to feel like working with Theano... In that case, I suppose Theano can offer more flexibility...

Member

shaibagon commented Mar 15, 2016

@jeffdonahue BTW, working with caffe.NetSpec() to construct networks starts to feel like working with Theano... In that case, I suppose Theano can offer more flexibility...

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 15, 2016

Contributor

I suppose the HDF5 combination could work (but probably requires more hacking of at least the data layer to accept changes to the file list at runtime? Sounds a bit hairy...). And introducing more IO into the training loop is never going to be great for speed. But may be okay for some applications, and I wouldn't doubt it's the quickest way to go.

@jeffdonahue BTW, working with caffe.NetSpec() to construct networks starts to feel like working with Theano... In that case, I suppose Theano can offer more flexibility...

Of course -- see also Caffe2, TensorFlow, Torch, [...]. (BTW, I definitely wasn't actually advocating implementing learning using the primitive I described.) Caffe (still) has its advantages, but it's certainly not the most flexible framework. (Don't want to go into that discussion any further here, though...)

Contributor

jeffdonahue commented Mar 15, 2016

I suppose the HDF5 combination could work (but probably requires more hacking of at least the data layer to accept changes to the file list at runtime? Sounds a bit hairy...). And introducing more IO into the training loop is never going to be great for speed. But may be okay for some applications, and I wouldn't doubt it's the quickest way to go.

@jeffdonahue BTW, working with caffe.NetSpec() to construct networks starts to feel like working with Theano... In that case, I suppose Theano can offer more flexibility...

Of course -- see also Caffe2, TensorFlow, Torch, [...]. (BTW, I definitely wasn't actually advocating implementing learning using the primitive I described.) Caffe (still) has its advantages, but it's certainly not the most flexible framework. (Don't want to go into that discussion any further here, though...)

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@jeffdonahue you override the same hdf file all the time, thus you don't need to change the hdf5 file list at all... Assuming hdf5data layer does not have prefetching :O
Very hacky. I agree.

Member

shaibagon commented Mar 15, 2016

@jeffdonahue you override the same hdf file all the time, thus you don't need to change the hdf5 file list at all... Assuming hdf5data layer does not have prefetching :O
Very hacky. I agree.

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Mar 15, 2016

Contributor

Ah, I didn't know the HDF5OutputLayer worked that way, I see... sounds a little scary, but might work... good luck!

Contributor

jeffdonahue commented Mar 15, 2016

Ah, I didn't know the HDF5OutputLayer worked that way, I see... sounds a little scary, but might work... good luck!

@fl2o

This comment has been minimized.

Show comment
Hide comment
@fl2o

fl2o Mar 15, 2016

@shaibagon Thanks for the hightlight but I struggle to see how to handle signals with different lengths (ie timestep) for the training process using NetSpec? I can't change my unrolled net architecture during training...
Should I use a very long LSTM and stop the forward pass after I have reached the end of the signal being processed then start the backward pass?

fl2o commented Mar 15, 2016

@shaibagon Thanks for the hightlight but I struggle to see how to handle signals with different lengths (ie timestep) for the training process using NetSpec? I can't change my unrolled net architecture during training...
Should I use a very long LSTM and stop the forward pass after I have reached the end of the signal being processed then start the backward pass?

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@fl2o AFAIK, if you want exact backprop for recurrent nets in caffe, there's no way around explicitly unrolling the net across ALL time steps.
However, if you use @jeffdonahue 's recurrent branch, you will be able to achieve exact forward estimation, and backprop exact to the limit of the temporal batch size. This can alleviate the need to explicitly unroll very long temporal nets.

Regarding working with very long sequences:

  1. You may define maxT and explicitly unroll your net to span maxT time steps, padding shorter sequences with some "null" data/label.
  2. Since caffe uses SGD for training, it is better to have more than one sequence participating in a forward-backward pass (i.e., mini-batch of size > 1). Otherwise gradient estimation will be very noisy.

Can you afford all these bolbs in memory at once?

Member

shaibagon commented Mar 15, 2016

@fl2o AFAIK, if you want exact backprop for recurrent nets in caffe, there's no way around explicitly unrolling the net across ALL time steps.
However, if you use @jeffdonahue 's recurrent branch, you will be able to achieve exact forward estimation, and backprop exact to the limit of the temporal batch size. This can alleviate the need to explicitly unroll very long temporal nets.

Regarding working with very long sequences:

  1. You may define maxT and explicitly unroll your net to span maxT time steps, padding shorter sequences with some "null" data/label.
  2. Since caffe uses SGD for training, it is better to have more than one sequence participating in a forward-backward pass (i.e., mini-batch of size > 1). Otherwise gradient estimation will be very noisy.

Can you afford all these bolbs in memory at once?

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@jeffdonahue BTW, is there a reason why this PR is not merged into master?

Member

shaibagon commented Mar 15, 2016

@jeffdonahue BTW, is there a reason why this PR is not merged into master?

@fl2o

This comment has been minimized.

Show comment
Hide comment
@fl2o

fl2o Mar 15, 2016

@shaibagon I am gonna try padding shorter sequences with some "null" data/label (Should I use a special term or just 0 ?) in order to avoid the gradient estimation problem, but I am not sure yet about the memory issue..! (maxT will be around 400! while minT ~50)

fl2o commented Mar 15, 2016

@shaibagon I am gonna try padding shorter sequences with some "null" data/label (Should I use a special term or just 0 ?) in order to avoid the gradient estimation problem, but I am not sure yet about the memory issue..! (maxT will be around 400! while minT ~50)

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@fl2o I'm not certain just using 0 is enough. You want no gradients to be computed from these padded time steps. You might need to have an "ignore_label" and implement your loss layer to support "ignore_label".
Make sure no gradients from the padded time steps are propagated into the "real" time steps

Member

shaibagon commented Mar 15, 2016

@fl2o I'm not certain just using 0 is enough. You want no gradients to be computed from these padded time steps. You might need to have an "ignore_label" and implement your loss layer to support "ignore_label".
Make sure no gradients from the padded time steps are propagated into the "real" time steps

@fl2o

This comment has been minimized.

Show comment
Hide comment
@fl2o

fl2o Mar 15, 2016

That's what I was wondering ....
Wonder if it's not "easier" to use this PR directly ^^
Gonna figure it out! Thank you @shaibagon

fl2o commented Mar 15, 2016

That's what I was wondering ....
Wonder if it's not "easier" to use this PR directly ^^
Gonna figure it out! Thank you @shaibagon

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon Mar 15, 2016

Member

@fl2o in the future, I think it would be best to keep this github issue thread for PR related comments only. For more general inquires and questions about LSTM in Caffe, it might be better to ask a question in stackoverflow.

Member

shaibagon commented Mar 15, 2016

@fl2o in the future, I think it would be best to keep this github issue thread for PR related comments only. For more general inquires and questions about LSTM in Caffe, it might be better to ask a question in stackoverflow.

@chriss2401

This comment has been minimized.

Show comment
Hide comment
@chriss2401

chriss2401 Mar 15, 2016

@shaibagon Cheers for all the helpful comments.

@shaibagon Cheers for all the helpful comments.

@jeffdonahue jeffdonahue referenced this pull request Apr 5, 2016

Merged

RNN + LSTM Layers #3948

@lood339

This comment has been minimized.

Show comment
Hide comment
@lood339

lood339 Apr 11, 2016

Hi, I used the LRCN code to generate captions form an image. I replace the alexNet with google net. The result likes this:
"A brown cat sitting top top top top ...."
The sentence repeats the word "top" a lot. Is there any reasons?
I also tried other modifications. It seams the LSTM is very sensitive to the learning parameters. Is this conclusion right in general?
Thanks.

lood339 commented Apr 11, 2016

Hi, I used the LRCN code to generate captions form an image. I replace the alexNet with google net. The result likes this:
"A brown cat sitting top top top top ...."
The sentence repeats the word "top" a lot. Is there any reasons?
I also tried other modifications. It seams the LSTM is very sensitive to the learning parameters. Is this conclusion right in general?
Thanks.

+ print ('Exhausted all data; cutting off batch at timestep %d ' +
+ 'with %d streams completed') % (t, num_completed_streams)
+ for name in self.substream_names:
+ batch[name] = batch[name][:t, :]

This comment has been minimized.

@yangfly

yangfly Apr 14, 2016

words at timestep t might not be deleted:
batch[name] = batch[name][:(t+1), :]

@yangfly

yangfly Apr 14, 2016

words at timestep t might not be deleted:
batch[name] = batch[name][:(t+1), :]

+ 'with %d streams completed') % (t, num_completed_streams)
+ for name in self.substream_names:
+ batch[name] = batch[name][:t, :]
+ batch_indicators = batch_indicators[:t, :]

This comment has been minimized.

@yangfly

yangfly Apr 14, 2016

batch_indicators = batch_indicators[:(t+1), :]

@yangfly

yangfly Apr 14, 2016

batch_indicators = batch_indicators[:(t+1), :]

@liminchen

This comment has been minimized.

Show comment
Hide comment
@liminchen

liminchen Apr 16, 2016

Could anyone tell me what's the difference between C_diff and C_term_diff in the backward_cpu function? I'm trying to understand the code and write a GRU version. Thanks in advance!
`template
void GRUUnitLayer::Backward_cpu(const vector<Blob>& top,
const vector& propagate_down, const vector<Blob
>& bottom) {
CHECK(!propagate_down[2]) << "Cannot backpropagate to sequence indicators.";
if (!propagate_down[0] && !propagate_down[1]) { return; }

const int num = bottom[0]->shape(1);
const int x_dim = hidden_dim_ * 4;
const Dtype* C_prev = bottom[0]->cpu_data();
const Dtype* X = bottom[1]->cpu_data();
const Dtype* flush = bottom[2]->cpu_data();
const Dtype* C = top[0]->cpu_data();
const Dtype* H = top[1]->cpu_data();
const Dtype* C_diff = top[0]->cpu_diff();
const Dtype* H_diff = top[1]->cpu_diff();
Dtype* C_prev_diff = bottom[0]->mutable_cpu_diff();
Dtype* X_diff = bottom[1]->mutable_cpu_diff();
for (int n = 0; n < num; ++n) {
for (int d = 0; d < hidden_dim_; ++d) {
const Dtype i = sigmoid(X[d]);
const Dtype f = (flush == 0) ? 0 :
(flush * sigmoid(X[1 * hidden_dim + d]));
const Dtype o = sigmoid(X[2 * hidden_dim
+ d]);
const Dtype g = tanh(X[3 * hidden_dim_ + d]);
const Dtype c_prev = C_prev[d];
const Dtype c = C[d];
const Dtype tanh_c = tanh(c);
Dtype* c_prev_diff = C_prev_diff + d;
Dtype* i_diff = X_diff + d;
Dtype* f_diff = X_diff + 1 * hidden_dim_ + d;
Dtype* o_diff = X_diff + 2 * hidden_dim_ + d;
Dtype* g_diff = X_diff + 3 * hidden_dim_ + d;
const Dtype c_term_diff =
C_diff[d] + H_diff[d] * o * (1 - tanh_c * tanh_c);
*c_prev_diff = c_term_diff * f;
*i_diff = c_term_diff * g * i * (1 - i);
*f_diff = c_term_diff * c_prev * f * (1 - f);
*o_diff = H_diff[d] * tanh_c * o * (1 - o);
*g_diff = c_term_diff * i * (1 - g * g);
}
C_prev += hidden_dim_;
X += x_dim;
C += hidden_dim_;
H += hidden_dim_;
C_diff += hidden_dim_;
H_diff += hidden_dim_;
X_diff += x_dim;
C_prev_diff += hidden_dim_;
++flush;
}
}`

Could anyone tell me what's the difference between C_diff and C_term_diff in the backward_cpu function? I'm trying to understand the code and write a GRU version. Thanks in advance!
`template
void GRUUnitLayer::Backward_cpu(const vector<Blob>& top,
const vector& propagate_down, const vector<Blob
>& bottom) {
CHECK(!propagate_down[2]) << "Cannot backpropagate to sequence indicators.";
if (!propagate_down[0] && !propagate_down[1]) { return; }

const int num = bottom[0]->shape(1);
const int x_dim = hidden_dim_ * 4;
const Dtype* C_prev = bottom[0]->cpu_data();
const Dtype* X = bottom[1]->cpu_data();
const Dtype* flush = bottom[2]->cpu_data();
const Dtype* C = top[0]->cpu_data();
const Dtype* H = top[1]->cpu_data();
const Dtype* C_diff = top[0]->cpu_diff();
const Dtype* H_diff = top[1]->cpu_diff();
Dtype* C_prev_diff = bottom[0]->mutable_cpu_diff();
Dtype* X_diff = bottom[1]->mutable_cpu_diff();
for (int n = 0; n < num; ++n) {
for (int d = 0; d < hidden_dim_; ++d) {
const Dtype i = sigmoid(X[d]);
const Dtype f = (flush == 0) ? 0 :
(flush * sigmoid(X[1 * hidden_dim + d]));
const Dtype o = sigmoid(X[2 * hidden_dim
+ d]);
const Dtype g = tanh(X[3 * hidden_dim_ + d]);
const Dtype c_prev = C_prev[d];
const Dtype c = C[d];
const Dtype tanh_c = tanh(c);
Dtype* c_prev_diff = C_prev_diff + d;
Dtype* i_diff = X_diff + d;
Dtype* f_diff = X_diff + 1 * hidden_dim_ + d;
Dtype* o_diff = X_diff + 2 * hidden_dim_ + d;
Dtype* g_diff = X_diff + 3 * hidden_dim_ + d;
const Dtype c_term_diff =
C_diff[d] + H_diff[d] * o * (1 - tanh_c * tanh_c);
*c_prev_diff = c_term_diff * f;
*i_diff = c_term_diff * g * i * (1 - i);
*f_diff = c_term_diff * c_prev * f * (1 - f);
*o_diff = H_diff[d] * tanh_c * o * (1 - o);
*g_diff = c_term_diff * i * (1 - g * g);
}
C_prev += hidden_dim_;
X += x_dim;
C += hidden_dim_;
H += hidden_dim_;
C_diff += hidden_dim_;
H_diff += hidden_dim_;
X_diff += x_dim;
C_prev_diff += hidden_dim_;
++flush;
}
}`

@maydaygmail

This comment has been minimized.

Show comment
Hide comment
@maydaygmail

maydaygmail Apr 29, 2016

@jeffdonahue captioner.py for generating sentence, to generate the current word, captioner.py only use the previous one word not all the previous words?

@jeffdonahue captioner.py for generating sentence, to generate the current word, captioner.py only use the previous one word not all the previous words?

@anguyen8

This comment has been minimized.

Show comment
Hide comment
@anguyen8

anguyen8 May 3, 2016

Does any know if there is a pre-trained image captioning LRCN model out there? I'd greatly appreciate if this is included in the Model Zoo.

@jeffdonahue : would you be able to release the model from your CVPR'15 paper?

anguyen8 commented May 3, 2016

Does any know if there is a pre-trained image captioning LRCN model out there? I'd greatly appreciate if this is included in the Model Zoo.

@jeffdonahue : would you be able to release the model from your CVPR'15 paper?

@anteagle

This comment has been minimized.

Show comment
Hide comment
@anteagle

anteagle May 16, 2016

Has this branch been landed to the master ? The layers are in the master, but it seems the examples are not there. Could anyone point to me to the right way to get this branch ? I did git pull #2033, but just showed Already up-to-date.

Has this branch been landed to the master ? The layers are in the master, but it seems the examples are not there. Could anyone point to me to the right way to get this branch ? I did git pull #2033, but just showed Already up-to-date.

@shaibagon

This comment has been minimized.

Show comment
Hide comment
@shaibagon

shaibagon May 16, 2016

Member

@anteagle it seems like the PR only contained the LSTM RNN layers and not the examples (too much to review). You'll have to go to Jeff Donahue's "recurrent" branch.

Member

shaibagon commented May 16, 2016

@anteagle it seems like the PR only contained the LSTM RNN layers and not the examples (too much to review). You'll have to go to Jeff Donahue's "recurrent" branch.

@anteagle

This comment has been minimized.

Show comment
Hide comment
@anteagle

anteagle May 17, 2016

@shaibagon thanks, I got from Jeff's repo, though it has not been updated for a while.

@shaibagon thanks, I got from Jeff's repo, though it has not been updated for a while.

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue Jun 3, 2016

Contributor

Closing with the merge of #3948 -- though this PR still contains examples that PR lacked, and I should eventually restore and rebase those on the now merged version. In the meantime I'll keep my recurrent branch (and other mentioned branches) open and in their current form for reference.

Contributor

jeffdonahue commented Jun 3, 2016

Closing with the merge of #3948 -- though this PR still contains examples that PR lacked, and I should eventually restore and rebase those on the now merged version. In the meantime I'll keep my recurrent branch (and other mentioned branches) open and in their current form for reference.

@yangzhikai

This comment has been minimized.

Show comment
Hide comment
@yangzhikai

yangzhikai May 10, 2017

hello,I have a question .When I read the file 'lstm_layer.cpp',I find a lot of 'add_top','add_bottom','add_dim',but I can't find the definition of them in caffe folder.Could you tell me where can I them and whats the meaning of the code such as 'add_bottom("c_" + tm1s);'.

hello,I have a question .When I read the file 'lstm_layer.cpp',I find a lot of 'add_top','add_bottom','add_dim',but I can't find the definition of them in caffe folder.Could you tell me where can I them and whats the meaning of the code such as 'add_bottom("c_" + tm1s);'.

@jeffdonahue

This comment has been minimized.

Show comment
Hide comment
@jeffdonahue

jeffdonahue May 10, 2017

Contributor

The methods you refer to are all automatically generated by protobuf. See caffe.proto for the declarations of top, bottom, etc., which result in the protobuf compiler automatically generating the add_top, add_bottom methods. (The resulting C definitions are in the protobuf-generated header file caffe.pb.h.)

Contributor

jeffdonahue commented May 10, 2017

The methods you refer to are all automatically generated by protobuf. See caffe.proto for the declarations of top, bottom, etc., which result in the protobuf compiler automatically generating the add_top, add_bottom methods. (The resulting C definitions are in the protobuf-generated header file caffe.pb.h.)

@yangzhikai

This comment has been minimized.

Show comment
Hide comment
@yangzhikai

yangzhikai May 11, 2017

oh , Thank you very much. I have not find this file(caffe.pb.h) because I haven't complied it before!

oh , Thank you very much. I have not find this file(caffe.pb.h) because I haven't complied it before!

@soulslicer

This comment has been minimized.

Show comment
Hide comment
@soulslicer

soulslicer Mar 7, 2018

Hi, is there any working example of the layer in caffe?

Hi, is there any working example of the layer in caffe?

@cuixing158

This comment has been minimized.

Show comment
Hide comment
@cuixing158

cuixing158 Jun 15, 2018

The same question, is there any working example of the layer in caffe?

The same question, is there any working example of the layer in caffe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment