Judging from the lack of reports, it's possible no one else has this problem, but the ORT ScatterElements buffer mismatch fix did not work for me. I still get robotic voice creeping in after some seconds of generation.
What does work for me: in PocketTTS::stream, change
int take = gen_done ? (int)queue.size() : std::min((int)queue.size(), want);
to
int take = std::min((int)queue.size(), want);
That is, even after gen_thread is done, don't take more than max_chunks_frames.
This seems to remove a good optimization, but in my tests (M1 Mac and Android Snapdragon 670), it does not seem to affect performance. RTFx and TTFA unchanged. OTOH, robotic voice is fixed, for both fp32 and int8.
Judging from the lack of reports, it's possible no one else has this problem, but the ORT ScatterElements buffer mismatch fix did not work for me. I still get robotic voice creeping in after some seconds of generation.
What does work for me: in PocketTTS::stream, change
to
That is, even after gen_thread is done, don't take more than max_chunks_frames.
This seems to remove a good optimization, but in my tests (M1 Mac and Android Snapdragon 670), it does not seem to affect performance. RTFx and TTFA unchanged. OTOH, robotic voice is fixed, for both fp32 and int8.