Skip to content

Typical TTS Scenario

szhaomsft edited this page Mar 19, 2020 · 3 revisions

Voice Assistant

Many enterprise wants to build their own voice assistant. Typically the TTS in such scenario speaks short utterances in the human-bot interactions. A few key design consideration in this scenario:

  • minimize dialog latency. it is good to use direct line speech which can hook up the bot service and TTS/SR service in closest region
  • use an off-the-shelf voice. The neural voice are recommended for its high quality and personality fit.
  • use a branded voice. Custom neural voice is recommended to create a branded voice with a few hundreds recordings.

Immersive Reading

Application can use TTS to read long content like an email, news article or one chapter of a book. In such scenario, the content is usually long. So it is desired to have a way to play the response while TTS is still rendering. There are two options:

  • Speech SDK supports streaming output of the responses. The application developers can develop the streaming rendering logic on the client for the audio stream.
  • Use immersive reading SDK which uses Azure TTS underline.

Audio Content Generation

TTS can be used to turn a long article or even book into audio files. To do it, one can try below method.

Connected Car

In car scenario, normally TTS still needs to work when disconnected. So it is desired to have a hybrid design.

There could be different skills in car scenario

  • If it is an online skill like weather, recommend to call TTS, compress the audio with silk/ogg/mp3 and send to the car head unit to render
  • If it is an on device skills like opening the window, use below policy
  1. use online TTS when there is connection.
  2. fallback to a device voice in the same voice talent when disconnected.

We offer hybrid solution for connected car scenario

Accessibility

Windows 10 OS provides more than 40+ locales TTS for accessibility scenario. It can be invoked using WinRT Speech API