https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496 #2

skaae · 2017-07-19T16:54:41Z

Is it correct that the code does not implement images2neibs? I think its unfold in pytorch?

This line in the original code: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496

koz4k · 2017-07-19T17:18:48Z

We use convolution instead. It works the same in this case.

skaae · 2017-07-19T17:20:13Z

yes. I figured that out after a little looking around.

greaber · 2017-08-23T10:18:59Z

I am still confused by this. In the paper, they say they use "linear projections," so I thought you could use nn.Linear for this. In my understanding of transposed convolution, it would be different from nn.Linear in the case that you wanted to upsample using information from two or more consecutive frames, but I wonder if I am making some big mistake now. Similarly, for the MLP, why do you use nn.Conv1d instead of nn.Linear?

koz4k · 2017-08-23T17:46:50Z

You are absolutely right, we could have used nn.Linear for these linear projections, but then we would have to use reshapes to make this work. In fact, this is exactly what they do in the original implementation. However, we find this a bit confusing, so we decided to use (transposed) convolutions instead.

The thing is, we need to take the time axis into account. We have different layers of RNNs operating at different time scales. If we have two layers of RNNs, the upper one 4x slower than the lower one, the upsampling layer has to transform a (batch_size x time x dim) tensor to a (batch_size x 4*time x dim) tensor. Transposed convolution with stride=4 and kernel_size=4 does exactly that. The operation does not use information from two consecutive upper frames nor does it provide input for more than one lower frame at the same time. Frames do not overlap because stride is equal to kernel_size.

For the MLP, the reason we use nn.Conv1d is similar. We could use nn.Linear if we wanted to predict only one timestep at a time. However, we want to "slide" the MLP through the time axis and convolution does exactly that. The difference is that now we want the receptive fields to overlap, so we use convolution with stride=1.

greaber · 2017-08-23T20:04:59Z

I see. Thanks a lot. I did a SampleRNN, but it runs more than twice as slow as yours. I found it really confusing to try to figure out why exactly my model was so slow using actual profiling tools, but given what you write, it seems super likely that the difference comes down to two things:

I am using LSTM/GRU cells written in python to be able to apply weight norm and other variations, so no cudnn and no batching along the time dimension (but because of that there is no need to use any fancy reshapes to upsample with nn.Linear, just a split).
I'm predicting one step at a time with the MLP (nor for any reason, just because I didn't think of doing it how you do it).

koz4k · 2017-08-27T18:40:09Z

Yeah, that's probably the reason. Although intuitively, the MLP should have the largest impact on speed, because it gets executed most often (depending on frame_size of the lowest RNN). So it might be possible to achieve similar speed to ours using a hybrid version - custom RNN cells called one at a time and a "sliding" MLP implemented using convolution. It should be pretty straightforward to implement weight normalization for the convolution op.

greaber · 2017-08-30T12:20:31Z

Yes, you are right. With the sliding MLP, but still using custom RNN cells, yours is only about 25% faster than mine (for a config similar to the three tier network in the SampleRNN paper but with no weight norm).

skaae closed this as completed Jul 19, 2017

koz4k reopened this Jul 19, 2017

koz4k closed this as completed Jul 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496 #2

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496 #2

skaae commented Jul 19, 2017 •

edited

koz4k commented Jul 19, 2017

skaae commented Jul 19, 2017

greaber commented Aug 23, 2017 •

edited

koz4k commented Aug 23, 2017 •

edited

greaber commented Aug 23, 2017

koz4k commented Aug 27, 2017 •

edited

greaber commented Aug 30, 2017

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496 #2

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496 #2

Comments

skaae commented Jul 19, 2017 • edited

koz4k commented Jul 19, 2017

skaae commented Jul 19, 2017

greaber commented Aug 23, 2017 • edited

koz4k commented Aug 23, 2017 • edited

greaber commented Aug 23, 2017

koz4k commented Aug 27, 2017 • edited

greaber commented Aug 30, 2017

skaae commented Jul 19, 2017 •

edited

greaber commented Aug 23, 2017 •

edited

koz4k commented Aug 23, 2017 •

edited

koz4k commented Aug 27, 2017 •

edited