multilingual VITS:speaker_wav #2488

guitarj · 2023-04-07T06:24:28Z

guitarj
Apr 7, 2023

I have trained a multilingual vits_tts model(only using chinese multi-speaker dataset AISHELL3). Now, I am trying to synthesize chinese speech using a new speaker's voice by inputting speaker_wav:

tts --text "wo3 shi4 quan2 shi4 jie4 zui4 mei3 de5 ren2 "
--model_path checkpoint_260000.pth
--config_path config.json
--speaker_wav "SSB03150474.wav"
--language_idx "cn_CN"
--out_path /home/bai/out/vtest.wav

However, I am encountering the following error message:

Traceback (most recent call last):
File "/root/miniconda3/envs/coqui/bin/tts", line 8, in
sys.exit(main())
File "/home/bai/TTS/TTS/bin/synthesize.py", line 357, in main
wav = synthesizer.tts(
File "/home/bai/TTS/TTS/utils/synthesizer.py", line 269, in tts
speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(speaker_wav)
File "/home/bai/TTS/TTS/tts/utils/managers.py", line 359, in compute_embedding_from_clip
embedding = _compute(wf)
File "/home/bai/TTS/TTS/tts/utils/managers.py", line 342, in _compute
waveform = self.encoder_ap.load_wav(wav_file, sr=self.encoder_ap.sample_rate)
AttributeError: 'NoneType' object has no attribute 'load_wav'

What should I do to fix the problem?Thanks!

p.s: I was able to successfully synthesize the speech using the speaker_idx from the training set.

Answered by p0p4k

Apr 11, 2023

go to train_yourtts.py in recipe and read what speaker encoder is.

View full answer

p0p4k · 2023-04-09T15:04:33Z

p0p4k
Apr 9, 2023

Give speaker encoder path and config

4 replies

guitarj Apr 9, 2023
Author

How to do? Thanks!

nanonomad Apr 9, 2023

-encoder_path "/path/to/speaker_encoder.pth" --encoder_config_path "/path/to/config_se.json"
They may be in a weird place on your system if you're using a downloaded and cached model. There's a solo download for the released encoder: https://github.com/coqui-ai/TTS/releases/tag/speaker_encoder_model

guitarj Apr 10, 2023
Author

I have tried the method you suggested, it still doesn't work:

tts --text "wo3 shi4 quan2 shi4 jie4 zui4 mei3 de5 ren2 " --model_path checkpoint_260000.pth --config_path config.json --speaker_wav "home/bai/spkt/SSB03150474.wav" --language_idx "cn_CN" --out_path /home/bai/out/vtest.wav --encoder_path "speakers.pth" --encoder_config_path "config_se.json"

Traceback (most recent call last):
File "/root/miniconda3/envs/coqui/bin/tts", line 8, in
sys.exit(main())
File "/home/bai/TTS/TTS/bin/synthesize.py", line 316, in main
synthesizer = Synthesizer(
File "/home/bai/TTS/TTS/utils/synthesizer.py", line 75, in init
self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
File "/home/bai/TTS/TTS/utils/synthesizer.py", line 122, in _load_tts
self.tts_model.speaker_manager.init_encoder(self.encoder_checkpoint, self.encoder_config, use_cuda)
File "/home/bai/TTS/TTS/tts/utils/managers.py", line 326, in init_encoder
self.encoder_criterion = self.encoder.load_checkpoint(
File "/home/bai/TTS/TTS/encoder/models/base_encoder.py", line 125, in load_checkpoint
raise error
File "/home/bai/TTS/TTS/encoder/models/base_encoder.py", line 120, in load_checkpoint
self.load_state_dict(state["model"])
KeyError: 'model'

I can't find "speaker_encoder.pth",did you mean "speakers.pth"?

p0p4k Apr 11, 2023

go to train_yourtts.py in recipe and read what speaker encoder is.

Answer selected by guitarj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multilingual VITS:speaker_wav #2488

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

multilingual VITS:speaker_wav #2488

guitarj Apr 7, 2023

Replies: 1 comment · 4 replies

p0p4k Apr 9, 2023

guitarj Apr 9, 2023 Author

nanonomad Apr 9, 2023

guitarj Apr 10, 2023 Author

p0p4k Apr 11, 2023

guitarj
Apr 7, 2023

Replies: 1 comment 4 replies

p0p4k
Apr 9, 2023

guitarj Apr 9, 2023
Author

guitarj Apr 10, 2023
Author