Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

did u make a mistake in 'class ResidualLayer'?seems u make 2 sames layer? #4

Open
drawingsnow opened this issue Apr 20, 2020 · 8 comments

Comments

@drawingsnow
Copy link

    self.conv1d_layer = nn.Sequential(nn.Conv1d(in_channels=in_channels,
                                                out_channels=out_channels,
                                                kernel_size=kernel_size,
                                                stride=1,
                                                padding=padding),
                                      nn.InstanceNorm1d(num_features=out_channels,
                                                        affine=True))

    self.conv_layer_gates = nn.Sequential(nn.Conv1d(in_channels=in_channels,
                                                    out_channels=out_channels,
                                                    kernel_size=kernel_size,
                                                    stride=1,
                                                    padding=padding),
                                          nn.InstanceNorm1d(num_features=out_channels,
                                                            affine=True))
@Georgehappy1
Copy link

This is for gated cnn and of course it needs 2 same cnn .

@drawingsnow
Copy link
Author

drawingsnow commented Apr 24, 2020 via email

@Georgehappy1
Copy link

  1. In this implemented version, it uses interpolate rather than pixelshuffle for upsampling. So during upsampling, firstly the input is fed into conv2d layer whose output channel is 1024. The next step is to interpolate the output which only makes the width and height twice and doesn't change the channel size.(for details you can see model_vc2.py file).
  2. The original input size is 3-dimensional ([batchsize,width,height]). To do 2d convolution, the original input needs to be unsqueezed in the second dimension(now the size is [batchsize,1,width,height]). And the input fed into last conv layer is also 4 dimensional with >1 channels. So the last 1*1 conv is to let channel size be 1. In the end, the output is squeezed in the second dimension to remove the channel dimension and now the output is the same size with original input.

@drawingsnow
Copy link
Author

thanks~
1 . i know you uses interpolate rather than pixelshuffle for upsampling, i means Is there any problem with the structure of the original paper?
2.i get it,thanks~
3.why we should swap the dimensional in the discriminator?what we want is just a feature map,right?
i cannot understand this code:
downSample4 = downSample4.contiguous().permute(0, 2, 3, 1).contiguous()

@Georgehappy1
Copy link

  1. I get what you said and i also think the sructure of original paper seems to have some problems with the output the channel after pixelshuffle.
  2. Yep I think without the swap it will still be ok. And the contiguous() is to make the storage address of the tensor next to each other. If a tensor has went through view() or transpose() and you want to view or transpose the tensor again, you must call .contiguous() function.

@drawingsnow
Copy link
Author

thanks a lot ~

@drawingsnow
Copy link
Author

hi~i get some new puzzle...

here is the code in trainingdataset:

if name == 'main':
trainA = np.random.randn(162, 24, 554)
trainB = np.random.randn(158, 24, 554)
dataset = trainingDataset(trainA, trainB)
trainLoader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=2,
shuffle=True)
for epoch in range(10):
for i, (trainA, trainB) in enumerate(trainLoader):
print(trainA.shape, trainB.shape)

what is the trainA and trainB's first dimensional ? i know it maybe ( ?, feature_numbers ,lenth)

@drawingsnow
Copy link
Author

i think it means :i have 162 voice segments,and i extract features (24,554) from every segments,so the input is (162,24,554),did i get so mistake?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants