Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some errors throw during infer, and output file generated is bad quality #14

Closed
ybwai opened this issue Feb 7, 2024 · 1 comment
Closed
Labels
help wanted Extra attention is needed

Comments

@ybwai
Copy link

ybwai commented Feb 7, 2024

MacBook Pro Intel i9 8-Core / AMD Radeon Pro 5300M / 32GB DDR4 RAM / macOS Sanoma 14.2
Python 3.10.13
Poetry 1.7.1
CLI command:

PYTORCH_ENABLE_MPS_FALLBACK=1  rvc infer -rmr 1 -p 0 -ir 0.75  -m weights/Peter/model.pth -if weights/Peter/index.index -i input.mp3 -o output1.mp3

command output:

NFO:rvc.configs.config:No supported Nvidia GPU found
INFO:rvc.configs.config:overwrite configs.json
INFO:rvc.configs.config:Use mps instead
INFO:rvc.configs.config:is_half:False, device:mps
UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
DEBUG:rvc.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
INFO:rvc.modules.vc.modules:Select index: 
INFO:fairseq.tasks.hubert_pretraining:current directory is /Retrieval-based-Voice-Conversion
INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'conv_pos_batch_norm': False, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
  File "/Retrieval-based-Voice-Conversion/rvc/modules/vc/pipeline.py", line 307, in pipeline
    index = faiss.read_index(file_index)
  File "/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/faiss/swigfaiss_avx2.py", line 9924, in read_index
    return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
  Possible C/C++ prototypes are:
    faiss::read_index(char const *,int)
    faiss::read_index(char const *)
    faiss::read_index(FILE *,int)
    faiss::read_index(FILE *)
    faiss::read_index(faiss::IOReader *,int)
    faiss::read_index(faiss::IOReader *)
    
    
    INFO:rvc.modules.vc.pipeline:Loading rmvpe model,assets/rmvpe/rmvpe.pt
/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: The operator 'aten::_fft_r2c' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Retrieval-based-Voice-Conversion/rvc/lib/infer_pack/attentions.py:334: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
  x = F.pad(
{'npy': 6.011045217514038, 'f0': 135.6644949913025, 'infer': 27.060052633285522}
Finish inference. Check output1.mp3

Although I get an output file, the sound has lots of artefacts/noise and is not smooth at all. I see some warnings and errors in the console output, are they the cause? or is it the models I am using?

Also how to get the output combined with the instrumental when using music audio?

Thanks,

@Tps-F
Copy link
Member

Tps-F commented Feb 20, 2024

The index file does not seem to be loaded
Check the path or try the full path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants