New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does n_frame_samples represent? #23
Comments
ie: data = [1,2,3,4,5,6,7,8,9,10]
frames = [[1,2], [3,4], [5,6], [7,8], [9, 10]]
# so.. in this case we have 5 frames each frame with 2 samples? therefore n_frame_samples = 2?``` |
That's correct. In our terminology |
Thanks! |
To be 100% clear: ns_frame_samples = list(map(int, np.cumprod(frame_sizes)))
frames = []
for (frame_size, n_frame_samples) in zip(frame_sizes, ns_frame_samples):
print(f'{frame_size} frames each with {n_frame_samples} samples. (ie: (batch_size=128, seq_len={frame_size}, dim={n_frame_samples})')
f = FrameLevelRNN(frame_size, n_frame_samples, n_rnn, dim, learn_h0, weight_norm)
frames.append(f) Do these print statements make sense? 16 frames each with 16 samples. (ie: (batch_size=128, seq_len=16, dim=16) # this is the Tier 3 frame on the paper
4 frames each with 64 samples. (ie: (batch_size=128, seq_len=4, dim=64) # this is the Tier 2 frame on the paper |
Is there a question? |
@koz4k yeah, do the print statements make sense? |
Not really. I assume that by |
ok, that's what I was thinking. So what's the interpretation for the way you guys have it set up? # does 16 here represent the original sequence divided into 16 chunks?
x = np.shape(128, 16, 64)
out = tier_3_frame(x)
# out = (128, 64, 1024)
# does 64 here represent 64 non-overlapping parts of the original sequence? I guess maybe my confusion is what do 16, 4 represent as related to the paper? |
Yeah, that seems right. 16, 4 probably don't mean anything in the paper, there "frame size" relates to te number of samples. |
@koz4k Thanks btw! I think I got it now. Looks like in your code, 64 is your "frame size" ie: for a sequence that is 1024 long and you want 64 samples per frame, you get 16 frames. In their code, they used 8 instead of 64 here. Then every tier after divides by 4... so tier 2 would be 16, tier 1 gets 1? # init example data
time_steps = 1024
bs = 128
seq_len = 1024
frame_size = 64
steps = seq_len // frame_size
X = Variable(torch.FloatTensor(bs, steps, frame_size).random_(0, 256))
# get conditioning context for tier 3
tier_3 = FrameLevel(frame_size=frame_size, upsample_ratio=4)
conditioning_c = tier_3(X, None) Using linear layers would then look like (without the init, weight norm and hidden reset stuff, just high level): class FrameLevel(nn.Module):
"""
Generates conditioning context from raw signal
"""
def __init__(self, frame_size, upsample_ratio):
super(FrameLevel, self).__init__()
self.frame_size = frame_size
self.nb_layers = 2
self.batch_size = 128
self.hidden_size = 1024
self.context_c = None
self.hidden = None
self.upsample_ratio = upsample_ratio
# input expand projection
self.x_proj_fc = TimeDistributedLinear(self.frame_size, self.hidden_size)
# rnn
self.rnn = nn.GRU(
input_size=self.hidden_size,
hidden_size=self.hidden_size,
num_layers=self.nb_layers,
batch_first=True
)
# context projection
# this will be passed to the next module as conditioning
context_out_size = self.upsample_ratio * self.hidden_size
self.c_proj_fc = TimeDistributedLinear(self.hidden_size, context_out_size)
def init_hidden(self):
hidden = Variable(torch.zeros(self.nb_layers, self.batch_size, self.hidden_size))
if self.on_gpu: hidden = hidden.cuda()
return hidden
def forward(self, x_seq, above_context_c):
# project input to rnn hidden. Add context from prior frame if not the first frame
# inp = Wf + c (only if not last layer)
rnn_input = self.x_proj_fc(x_seq)
if above_context_c is not None:
rnn_input += above_context_c
# run through RNN
output, self.hidden = self.rnn(rnn_input, self.hidden)
# project h_t so it can be used at next level
output = self.c_proj_fc(output)
output = output.view(output.size(0), -1, self.hidden_size)
# output will be used as conditioning context to next frame
# hidden is not passed back because it is saved in the frame object
# whenever TBPTT starts a new seq, must reset hidden externally
self.context_c = output
return self.context_c |
Yes, that's right. |
@koz4k sweet... thanks! |
Np :) |
Terminology is a bit confusing. What does n_frame_samples mean? Is this the number of samples per frame? or number of frames?
Is the RNN taking in a sequence of frames (ie: a frame per timestep), and the dimension of each frame is the "n_frame_samples"?
The text was updated successfully, but these errors were encountered: