Streaming mode: text stream with gaps #212

matevosashot · 2026-02-12T20:45:30Z

matevosashot
Feb 12, 2026

First of all, thank you for releasing amasing TTS model.

Last few days, I was trying to run the model in the full streaming mode. One can have true streaming in the case of VoiceDesign (CustomVoice model), when the first audio chunk is produced right after processing the first token of the text.

However, the text has to be fed to the model continuously. Usually, text tokens are 10 times less than the corresponding audio chunks. Therefore, then text tokens finish, in order to generate the remaining audio, tts_pad tokens are fed to the model. For example

audio stream:       <language><speaker>pBaaaaaaaaaaaaaaE
 text stream: <role>pppppppppppppppppppBttttttttEppppppp
                                        |
                                        V 
                                predicts next audio
                                code autoregresively

The first part is the initial prompt to the model, where one specifies the language and speaker details. Then comes the text in a stream, and the audio is generated autoregressively. Above, I used the following simple notation:

p = pad token
B = bos token
E = eos token
t = text token
a = audio codec code

The only limitation is that the text should arrive without interruption. For example, I am able to generate audio in the following scenario:

audio stream:       <language><speaker>pBaaaaaaaaaaaaaaaaaaaaaE
 text stream: <role>pppppppppppppppppppBtttgggtttgtttgttEpppppp

where g=gap denotes the missing text in the corresponding frame. I have tried to use various tokens in place of g.

Do you have any ideas on how to generate the audio from the non-continuous stream of text?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming mode: text stream with gaps #212

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Streaming mode: text stream with gaps #212

Uh oh!

matevosashot Feb 12, 2026

Replies: 0 comments

matevosashot
Feb 12, 2026