Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We can extract both vocals and music background from this repo ? #1

Closed
leminhnguyen opened this issue Sep 14, 2021 · 5 comments
Closed

Comments

@leminhnguyen
Copy link

No description provided.

@qiuqiangkong
Copy link
Collaborator

Yes, both vocals and accompaniment are supported.

@leminhnguyen
Copy link
Author

leminhnguyen commented Sep 16, 2021

the pretrained model doesn't seem to work. I tried to download the checkpoints from your script, after that run the separate_vocals.sh but the size mismatch problem was raised:

size mismatch for stft.conv_real.weight: copying a param with shape torch.Size([1025, 1, 2048]) from checkpoint, the shape in current model is torch.Size([257, 1, 512]).
size mismatch for stft.conv_imag.weight: copying a param with shape torch.Size([1025, 1, 2048]) from checkpoint, the shape in current model is torch.Size([257, 1, 512]).
size mismatch for istft.ola_window: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for istft.conv_real.weight: copying a param with shape torch.Size([2048, 2048, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]).
size mismatch for istft.conv_imag.weight: copying a param with shape torch.Size([2048, 2048, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]).
size mismatch for bn0.weight: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for bn0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for bn0.running_mean: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for bn0.running_var: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for encoder_block1.conv_block1.bn1.weight: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.bn1.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.bn1.running_mean: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.bn1.running_var: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.conv1.weight: copying a param with shape torch.Size([32, 2, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 8, 3, 3]).
size mismatch for encoder_block1.conv_block1.shortcut.weight: copying a param with shape torch.Size([32, 2, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 8, 1, 1]).
size mismatch for after_conv2.weight: copying a param with shape torch.Size([8, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 1, 1]).
size mismatch for after_conv2.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([32]).

@qiuqiangkong
Copy link
Collaborator

qiuqiangkong commented Sep 16, 2021 via email

@leminhnguyen
Copy link
Author

leminhnguyen commented Sep 16, 2021

Hey @qiuqiangkong, In the separate_scripts/download_checkpoints.sh you downloaded the ismir2021 checkpoint, but in the separate_scripts/separate_vocals.sh the default model was resunet_subbandtime. That leads to mismatch error 😄.

I've tried the resunet_ismir2021 model to separate vocals, it reduced about 70% the accompaniment in the audio, that's awesome. I can improve more by finetuning your pretrained model ? Btw you can release resunet_subbandtime model ?

Again, thanks for your amazing work.

@qiuqiangkong
Copy link
Collaborator

qiuqiangkong commented Sep 16, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants