Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS]Voc流式推理拼接时出现了高频噪音 #2413

Closed
SoloPro-Git opened this issue Sep 20, 2022 · 2 comments
Closed

[TTS]Voc流式推理拼接时出现了高频噪音 #2413

SoloPro-Git opened this issue Sep 20, 2022 · 2 comments
Assignees

Comments

@SoloPro-Git
Copy link

SoloPro-Git commented Sep 20, 2022

更新结论:
voc流式合成中,对每一个sub_wav进行change_speed()可能存在问题.
当速度 <1的时候,最终concat出来的声音会有噪音断层,解决方案是把所有生成的新sub_wav最后出席的所有0值删除.
当速度 >1的时候,最终concat出来的声音会有噪音包络,尚未解决


更新1:
audio_process.py文件
change_speed() 这个函数 转换了pcm就会有噪音

我们在做流式推理的时候, 在voc流式合成 block1 block2后 .
对b1 b2进行连接之后播放出来的声音会有一点高频的噪音.
我们觉得是 b1 b2 的交接点不一致造成的 .
观察了下pad和depad的代码好像也没找到问题.
但是通过频谱图的观察有两个观点:
1、频谱图有的拼接处有明显的一条竖线
2、竖线所在频谱图对应的时域图表现为振幅为0
麻烦看看呢~


用的fastspeech2和hifigan
block size是36, padsize是20

image
image

@yt605155624
Copy link
Collaborator

yt605155624 commented Sep 20, 2022

可按照 https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_tts_server README 提高 pad size 到与非流式数值上一致,pad 值计算可参考 https://aistudio.baidu.com/aistudio/projectdetail/4151335 目前的配置是效果和速度上的权衡,不是数学上最佳的配置

@liwei0826
Copy link

确实有问题,不知如何解决

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants