Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS微调完后,如何使用style_fs2改变语速,我试着按照style_fs2的文档把fastspeech2模型换成自己的,但是一直报OOM #2504

Closed
kq-cheng opened this issue Oct 8, 2022 · 4 comments
Assignees

Comments

@kq-cheng
Copy link

kq-cheng commented Oct 8, 2022

python3 style_syn.py
--fastspeech2-config
=/opt/ml/server/PaddleSpeechserver/demos/speech_web/speech_server/tmp_dir/finetune/trans/exp/default.yaml
--fastspeech2-checkpoint
=/opt/ml/server/PaddleSpeechserver/demos/speech_web/speech_server/tmp_dir/finetune/trans/exp/checkpoints/snapshot_iter_120292.pdz
--fastspeech2-stat=download/fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy
--fastspeech2-pitch-stat=download/fastspeech2_nosil_baker_ckpt_0.4/pitch_stats.npy
--fastspeech2-energy-stat=download/fastspeech2_nosil_baker_ckpt_0.4/energy_stats.npy
--pwg-config=download/pwg_baker_ckpt_0.4/pwg_default.yaml
--pwg-checkpoint=download/pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz
--pwg-stat=download/pwg_baker_ckpt_0.4/pwg_stats.npy
--text=/opt/ml/server/PaddleSpeechserver/demos/speech_web/speech_server/tmp_dir/finetune/trans/sentences.txt
--output-dir=output --phones-dict=download/fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt

其中fastspeech2-config和fastspeech2-checkpoint是我自己微调的模型,其他的没变。那几个npy文件不知道怎么生成,是npy引起的OOM吗。设置GPU自增长不管用,改用cpu也是内存错误

@yt605155624
Copy link
Collaborator

yt605155624 commented Oct 8, 2022

“那几个npy文件” 以及 phone_id_map.txt 应该使用预训练模型里面的文件,此处应该是 fastspeech2_aishell3_ckpt_1.1.0.zip 里面的文件
微调出来的模型是多说话人的模型,style_fs2 本身是单说话人的模型,说话人的模型多了一个 spk_id 的输入,而且需要输入 speaker_id_map.txt,具体参考 csmsc/tts3 和 aishell3/tts3 里面 synthesize_e2e.sh 的区别,需要相应修改代码
另外想要改变语速可以直接参考这个 #2383

@yt605155624 yt605155624 self-assigned this Oct 8, 2022
@yt605155624 yt605155624 added the T2S label Oct 8, 2022
@kq-cheng
Copy link
Author

kq-cheng commented Oct 9, 2022

好的,我试试,谢谢

@kq-cheng kq-cheng closed this as completed Oct 9, 2022
@Christophy
Copy link

Christophy commented Nov 25, 2022

大概这个样子,给后来人参考,speaker_id_map.txt和spk_id加上就可以了,不然跑出来的声音很奇怪。

with open("/home/test/speaker_id_map.txt", 'rt') as f: spk_id = [line.strip().split() for line in f.readlines()] spk_num = len(spk_id) odim = am_config.n_mels model = FastSpeech2( idim=vocab_size, odim=odim, spk_num= spk_num, **am_config["model"],)

am_inference = StyleFastSpeech2Inference( fastspeech2_normalizer, model, pwd + "/fastspeech2_mix_ckpt_1.2.0/pitch_stats.npy", pwd + "/fastspeech2_mix_ckpt_1.2.0/energy_stats.npy") am_inference.eval()

spk_id = paddle.to_tensor(0) mel = am_inference(part_phone_ids, durations=None, durations_scale=1.2, durations_bias=None, pitch=None, pitch_scale=1.3, pitch_bias=None, energy=None, energy_scale=1.3, energy_bias=None, robot=False, spk_id=spk_id)

@SG-XM
Copy link

SG-XM commented Dec 8, 2022

大概这个样子,给后来人参考,speaker_id_map.txt和spk_id加上就可以了,不然跑出来的声音很奇怪。

with open("/home/test/speaker_id_map.txt", 'rt') as f: spk_id = [line.strip().split() for line in f.readlines()] spk_num = len(spk_id) odim = am_config.n_mels model = FastSpeech2( idim=vocab_size, odim=odim, spk_num= spk_num, **am_config["model"],)

am_inference = StyleFastSpeech2Inference( fastspeech2_normalizer, model, pwd + "/fastspeech2_mix_ckpt_1.2.0/pitch_stats.npy", pwd + "/fastspeech2_mix_ckpt_1.2.0/energy_stats.npy") am_inference.eval()

spk_id = paddle.to_tensor(0) mel = am_inference(part_phone_ids, durations=None, durations_scale=1.2, durations_bias=None, pitch=None, pitch_scale=1.3, pitch_bias=None, energy=None, energy_scale=1.3, energy_bias=None, robot=False, spk_id=spk_id)

您好,方便参考一下您完整的代码吗, #2722 我修改后还是GPU OOM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants