Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496 #2

Closed
skaae opened this issue Jul 19, 2017 · 7 comments

Comments

@skaae
Copy link

skaae commented Jul 19, 2017

Is it correct that the code does not implement images2neibs? I think its unfold in pytorch?

This line in the original code: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L496

@skaae skaae closed this as completed Jul 19, 2017
@koz4k
Copy link
Member

koz4k commented Jul 19, 2017

We use convolution instead. It works the same in this case.

@koz4k koz4k reopened this Jul 19, 2017
@koz4k koz4k closed this as completed Jul 19, 2017
@skaae
Copy link
Author

skaae commented Jul 19, 2017

yes. I figured that out after a little looking around.

@greaber
Copy link

greaber commented Aug 23, 2017

I am still confused by this. In the paper, they say they use "linear projections," so I thought you could use nn.Linear for this. In my understanding of transposed convolution, it would be different from nn.Linear in the case that you wanted to upsample using information from two or more consecutive frames, but I wonder if I am making some big mistake now. Similarly, for the MLP, why do you use nn.Conv1d instead of nn.Linear?

@koz4k
Copy link
Member

koz4k commented Aug 23, 2017

You are absolutely right, we could have used nn.Linear for these linear projections, but then we would have to use reshapes to make this work. In fact, this is exactly what they do in the original implementation. However, we find this a bit confusing, so we decided to use (transposed) convolutions instead.

The thing is, we need to take the time axis into account. We have different layers of RNNs operating at different time scales. If we have two layers of RNNs, the upper one 4x slower than the lower one, the upsampling layer has to transform a (batch_size x time x dim) tensor to a (batch_size x 4*time x dim) tensor. Transposed convolution with stride=4 and kernel_size=4 does exactly that. The operation does not use information from two consecutive upper frames nor does it provide input for more than one lower frame at the same time. Frames do not overlap because stride is equal to kernel_size.

For the MLP, the reason we use nn.Conv1d is similar. We could use nn.Linear if we wanted to predict only one timestep at a time. However, we want to "slide" the MLP through the time axis and convolution does exactly that. The difference is that now we want the receptive fields to overlap, so we use convolution with stride=1.

@greaber
Copy link

greaber commented Aug 23, 2017

I see. Thanks a lot. I did a SampleRNN, but it runs more than twice as slow as yours. I found it really confusing to try to figure out why exactly my model was so slow using actual profiling tools, but given what you write, it seems super likely that the difference comes down to two things:

  1. I am using LSTM/GRU cells written in python to be able to apply weight norm and other variations, so no cudnn and no batching along the time dimension (but because of that there is no need to use any fancy reshapes to upsample with nn.Linear, just a split).

  2. I'm predicting one step at a time with the MLP (nor for any reason, just because I didn't think of doing it how you do it).

@koz4k
Copy link
Member

koz4k commented Aug 27, 2017

Yeah, that's probably the reason. Although intuitively, the MLP should have the largest impact on speed, because it gets executed most often (depending on frame_size of the lowest RNN). So it might be possible to achieve similar speed to ours using a hybrid version - custom RNN cells called one at a time and a "sliding" MLP implemented using convolution. It should be pretty straightforward to implement weight normalization for the convolution op.

@greaber
Copy link

greaber commented Aug 30, 2017

Yes, you are right. With the sliding MLP, but still using custom RNN cells, yours is only about 25% faster than mine (for a config similar to the three tier network in the SampleRNN paper but with no weight norm).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants