You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NFO:rvc.configs.config:No supported Nvidia GPU found
INFO:rvc.configs.config:overwrite configs.json
INFO:rvc.configs.config:Use mps instead
INFO:rvc.configs.config:is_half:False, device:mps
UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
DEBUG:rvc.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
INFO:rvc.modules.vc.modules:Select index:
INFO:fairseq.tasks.hubert_pretraining:current directory is /Retrieval-based-Voice-Conversion
INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'conv_pos_batch_norm': False, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
File "/Retrieval-based-Voice-Conversion/rvc/modules/vc/pipeline.py", line 307, in pipeline
index = faiss.read_index(file_index)
File "/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/faiss/swigfaiss_avx2.py", line 9924, in read_index
return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
Possible C/C++ prototypes are:
faiss::read_index(char const *,int)
faiss::read_index(char const *)
faiss::read_index(FILE *,int)
faiss::read_index(FILE *)
faiss::read_index(faiss::IOReader *,int)
faiss::read_index(faiss::IOReader *)
INFO:rvc.modules.vc.pipeline:Loading rmvpe model,assets/rmvpe/rmvpe.pt
/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: The operator 'aten::_fft_r2c' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/Retrieval-based-Voice-Conversion/rvc/lib/infer_pack/attentions.py:334: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
x = F.pad(
{'npy': 6.011045217514038, 'f0': 135.6644949913025, 'infer': 27.060052633285522}
Finish inference. Check output1.mp3
Although I get an output file, the sound has lots of artefacts/noise and is not smooth at all. I see some warnings and errors in the console output, are they the cause? or is it the models I am using?
Also how to get the output combined with the instrumental when using music audio?
Thanks,
The text was updated successfully, but these errors were encountered:
MacBook Pro Intel i9 8-Core / AMD Radeon Pro 5300M / 32GB DDR4 RAM / macOS Sanoma 14.2
Python 3.10.13
Poetry 1.7.1
CLI command:
command output:
Although I get an output file, the sound has lots of artefacts/noise and is not smooth at all. I see some warnings and errors in the console output, are they the cause? or is it the models I am using?
Also how to get the output combined with the instrumental when using music audio?
Thanks,
The text was updated successfully, but these errors were encountered: