What does n_frame_samples represent? #23

williamFalcon · 2018-09-21T16:54:18Z

Terminology is a bit confusing. What does n_frame_samples mean? Is this the number of samples per frame? or number of frames?

Is the RNN taking in a sequence of frames (ie: a frame per timestep), and the dimension of each frame is the "n_frame_samples"?

williamFalcon · 2018-09-21T16:59:09Z

ie:

data = [1,2,3,4,5,6,7,8,9,10]
frames = [[1,2], [3,4], [5,6], [7,8], [9, 10]]
# so.. in this case we have 5 frames each frame with 2 samples? therefore n_frame_samples = 2?```

koz4k · 2018-09-21T17:48:52Z

That's correct. In our terminology frame_size is size of the frame in terms of lower tier frames and n_frame_samples is size of the frame in terms of samples. Relationship between the two can be seen at https://github.com/deepsound-project/samplernn-pytorch/blob/master/model.py#L20.

williamFalcon · 2018-09-21T18:04:37Z

Thanks!

williamFalcon · 2018-09-21T18:19:56Z

To be 100% clear:

ns_frame_samples = list(map(int, np.cumprod(frame_sizes)))
frames = []
for (frame_size, n_frame_samples) in zip(frame_sizes, ns_frame_samples):
    print(f'{frame_size} frames each with {n_frame_samples} samples. (ie: (batch_size=128, seq_len={frame_size}, dim={n_frame_samples})')
    f = FrameLevelRNN(frame_size, n_frame_samples, n_rnn, dim, learn_h0, weight_norm)
    frames.append(f)

Do these print statements make sense?

16 frames each with 16 samples. (ie: (batch_size=128, seq_len=16, dim=16)       # this is the Tier 3 frame on the paper
4 frames each with 64 samples. (ie: (batch_size=128, seq_len=4, dim=64)   # this is the Tier 2 frame on the paper

@koz4k

koz4k · 2018-09-21T18:30:58Z

Is there a question?

williamFalcon · 2018-09-21T18:34:45Z

@koz4k yeah, do the print statements make sense?

koz4k · 2018-09-21T18:42:29Z

Not really. I assume that by seq_len you mean the number of frames in the entire sequence and by dim you mean the number of samples in a single frame. If so, then seq_len = n_seq_samples // n_frame_samples where n_seq_samples is the number of samples in the sequence. dim is ok.

williamFalcon · 2018-09-21T19:02:57Z

ok, that's what I was thinking. So what's the interpretation for the way you guys have it set up?
The first layer (tier 3) takes input like:

# does 16 here represent the original sequence divided into 16 chunks?   
x = np.shape(128, 16, 64)   
out = tier_3_frame(x)    
# out = (128, 64, 1024)   
# does 64 here represent 64 non-overlapping parts of the original sequence?

I guess maybe my confusion is what do 16, 4 represent as related to the paper?

koz4k · 2018-09-21T19:13:23Z

Yeah, that seems right. 16, 4 probably don't mean anything in the paper, there "frame size" relates to te number of samples.

williamFalcon · 2018-09-21T19:37:25Z

@koz4k Thanks btw!

I think I got it now. Looks like in your code, 64 is your "frame size" ie: for a sequence that is 1024 long and you want 64 samples per frame, you get 16 frames. In their code, they used 8 instead of 64 here.

Then every tier after divides by 4... so tier 2 would be 16, tier 1 gets 1?

    # init example data
    time_steps = 1024
    bs = 128
    seq_len = 1024
    frame_size = 64
    steps = seq_len // frame_size
    X = Variable(torch.FloatTensor(bs, steps, frame_size).random_(0, 256))
    
    # get conditioning context for tier 3
    tier_3 = FrameLevel(frame_size=frame_size, upsample_ratio=4)
    conditioning_c = tier_3(X, None)

Using linear layers would then look like (without the init, weight norm and hidden reset stuff, just high level):

class FrameLevel(nn.Module):
    """
    Generates conditioning context from raw signal    
    """

    def __init__(self, frame_size, upsample_ratio):
        super(FrameLevel, self).__init__()

        self.frame_size = frame_size
        self.nb_layers = 2
        self.batch_size = 128
        self.hidden_size = 1024
        self.context_c = None
        self.hidden = None
        self.upsample_ratio = upsample_ratio

        # input expand projection
        self.x_proj_fc = TimeDistributedLinear(self.frame_size, self.hidden_size)

        # rnn
        self.rnn = nn.GRU(
            input_size=self.hidden_size,
            hidden_size=self.hidden_size,
            num_layers=self.nb_layers,
            batch_first=True
        )

        # context projection
        # this will be passed to the next module as conditioning
        context_out_size = self.upsample_ratio * self.hidden_size
        self.c_proj_fc = TimeDistributedLinear(self.hidden_size, context_out_size)

    def init_hidden(self):
        hidden = Variable(torch.zeros(self.nb_layers, self.batch_size, self.hidden_size))
        if self.on_gpu: hidden = hidden.cuda()
        return hidden

    def forward(self, x_seq, above_context_c):

        # project input to rnn hidden. Add context from prior frame if not the first frame
        # inp = Wf + c (only if not last layer)
        rnn_input = self.x_proj_fc(x_seq)
        if above_context_c is not None:
            rnn_input += above_context_c

        # run through RNN
        output, self.hidden = self.rnn(rnn_input, self.hidden)

        # project h_t so it can be used at next level
        output = self.c_proj_fc(output)
        output = output.view(output.size(0), -1, self.hidden_size)

        # output will be used as conditioning context to next frame
        # hidden is not passed back because it is saved in the frame object
        # whenever TBPTT starts a new seq, must reset hidden externally
        self.context_c = output
        return self.context_c

koz4k · 2018-09-21T19:42:42Z

Yes, that's right.

williamFalcon · 2018-09-21T19:44:05Z

@koz4k sweet... thanks!

koz4k · 2018-09-21T19:47:20Z

Np :)

williamFalcon closed this as completed Sep 21, 2018

williamFalcon reopened this Sep 21, 2018

williamFalcon closed this as completed Sep 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does n_frame_samples represent? #23

What does n_frame_samples represent? #23

williamFalcon commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 •

edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 •

edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 •

edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 •

edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018

koz4k commented Sep 21, 2018

What does n_frame_samples represent? #23

What does n_frame_samples represent? #23

Comments

williamFalcon commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 • edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 • edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 • edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 • edited

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018

koz4k commented Sep 21, 2018

williamFalcon commented Sep 21, 2018 •

edited

williamFalcon commented Sep 21, 2018 •

edited

williamFalcon commented Sep 21, 2018 •

edited

williamFalcon commented Sep 21, 2018 •

edited