We can extract both vocals and music background from this repo ? #1

leminhnguyen · 2021-09-14T18:22:54Z

No description provided.

qiuqiangkong · 2021-09-15T15:33:44Z

Yes, both vocals and accompaniment are supported.

leminhnguyen · 2021-09-16T05:42:06Z

the pretrained model doesn't seem to work. I tried to download the checkpoints from your script, after that run the separate_vocals.sh but the size mismatch problem was raised:

size mismatch for stft.conv_real.weight: copying a param with shape torch.Size([1025, 1, 2048]) from checkpoint, the shape in current model is torch.Size([257, 1, 512]).
size mismatch for stft.conv_imag.weight: copying a param with shape torch.Size([1025, 1, 2048]) from checkpoint, the shape in current model is torch.Size([257, 1, 512]).
size mismatch for istft.ola_window: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for istft.conv_real.weight: copying a param with shape torch.Size([2048, 2048, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]).
size mismatch for istft.conv_imag.weight: copying a param with shape torch.Size([2048, 2048, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]).
size mismatch for bn0.weight: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for bn0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for bn0.running_mean: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for bn0.running_var: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]).
size mismatch for encoder_block1.conv_block1.bn1.weight: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.bn1.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.bn1.running_mean: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.bn1.running_var: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for encoder_block1.conv_block1.conv1.weight: copying a param with shape torch.Size([32, 2, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 8, 3, 3]).
size mismatch for encoder_block1.conv_block1.shortcut.weight: copying a param with shape torch.Size([32, 2, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 8, 1, 1]).
size mismatch for after_conv2.weight: copying a param with shape torch.Size([8, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 1, 1]).
size mismatch for after_conv2.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([32]).

qiuqiangkong · 2021-09-16T05:52:13Z

Did you change the hyper-parameters such as window_size in the model? They should be 2048 by default.

…

On Thu, 16 Sept 2021 at 13:42, nguyenlm ***@***.***> wrote: the pretrained model doesn't seem to work. I tried to download the checkpoints from your script, after that run the separate_vocals.sh but the size mismatch problem was raised: size mismatch for stft.conv_real.weight: copying a param with shape torch.Size([1025, 1, 2048]) from checkpoint, the shape in current model is torch.Size([257, 1, 512]). size mismatch for stft.conv_imag.weight: copying a param with shape torch.Size([1025, 1, 2048]) from checkpoint, the shape in current model is torch.Size([257, 1, 512]). size mismatch for istft.ola_window: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for istft.conv_real.weight: copying a param with shape torch.Size([2048, 2048, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for istft.conv_imag.weight: copying a param with shape torch.Size([2048, 2048, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for bn0.weight: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]). size mismatch for bn0.bias: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]). size mismatch for bn0.running_mean: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]). size mismatch for bn0.running_var: copying a param with shape torch.Size([1025]) from checkpoint, the shape in current model is torch.Size([257]). size mismatch for encoder_block1.conv_block1.bn1.weight: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for encoder_block1.conv_block1.bn1.bias: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for encoder_block1.conv_block1.bn1.running_mean: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for encoder_block1.conv_block1.bn1.running_var: copying a param with shape torch.Size([2]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for encoder_block1.conv_block1.conv1.weight: copying a param with shape torch.Size([32, 2, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 8, 3, 3]). size mismatch for encoder_block1.conv_block1.shortcut.weight: copying a param with shape torch.Size([32, 2, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 8, 1, 1]). size mismatch for after_conv2.weight: copying a param with shape torch.Size([8, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 1, 1]). size mismatch for after_conv2.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([32]). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADFXTSMSB7R246Z4JQKO7W3UCF7TRANCNFSM5EAXQSTQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

leminhnguyen · 2021-09-16T08:21:30Z

Hey @qiuqiangkong, In the separate_scripts/download_checkpoints.sh you downloaded the ismir2021 checkpoint, but in the separate_scripts/separate_vocals.sh the default model was resunet_subbandtime. That leads to mismatch error 😄.

I've tried the resunet_ismir2021 model to separate vocals, it reduced about 70% the accompaniment in the audio, that's awesome. I can improve more by finetuning your pretrained model ? Btw you can release resunet_subbandtime model ?

Again, thanks for your amazing work.

qiuqiangkong · 2021-09-16T08:57:12Z

Hi nguyenlm, Thanks for the remind! Now you can follow https://github.com/bytedance/music_source_separation/blob/master/separate_scripts/download_checkpoints.sh to download the resunet_subbandtime checkpoints.

…

On Thu, 16 Sept 2021 at 16:21, nguyenlm ***@***.***> wrote: Hey @qiuqiangkong <https://github.com/qiuqiangkong>, In the separate_scripts/download_checkpoints.sh you downloaded the ismir2021 checkpoint, but in the separate_scripts/separate_vocals.sh the default model was resunet_subbandtime. That leads to mismatch error 😄. I've tried the resunet_ismir2021 model to separate vocals, it reduced about 70% the accompaniment in the audio, that's awesome. I can improve more by finetuning your pretrained model ? Btw you can release resunet_subbandtime model. Again, thanks for your amazing work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADFXTSPUZSWG3HIUWFCJKDTUCGSJJANCNFSM5EAXQSTQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

leminhnguyen closed this as completed Sep 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We can extract both vocals and music background from this repo ? #1

We can extract both vocals and music background from this repo ? #1

leminhnguyen commented Sep 14, 2021

qiuqiangkong commented Sep 15, 2021

leminhnguyen commented Sep 16, 2021 •

edited

Loading

qiuqiangkong commented Sep 16, 2021 via email

leminhnguyen commented Sep 16, 2021 •

edited

Loading

qiuqiangkong commented Sep 16, 2021 via email

We can extract both vocals and music background from this repo ? #1

We can extract both vocals and music background from this repo ? #1

Comments

leminhnguyen commented Sep 14, 2021

qiuqiangkong commented Sep 15, 2021

leminhnguyen commented Sep 16, 2021 • edited Loading

qiuqiangkong commented Sep 16, 2021 via email

leminhnguyen commented Sep 16, 2021 • edited Loading

qiuqiangkong commented Sep 16, 2021 via email

leminhnguyen commented Sep 16, 2021 •

edited

Loading

leminhnguyen commented Sep 16, 2021 •

edited

Loading