Skip to content

How to do integration with Azure TTS

szhaomsft edited this page Mar 23, 2021 · 3 revisions

The overall architecture is important to ensure good end to end experience like latency, reliability, scale etc.

What is the caller of TTS Service?

It could be from client or a service. For service, it could be an integrated service like direct line speech or a 3rd party application service.

It is usually recommended to call TTS service from an application service, which could be the middle layer to the customer's client application.

Where is the caller calling from?

Is it from Azure DC or from 3rd party cloud?

In general, it is better to keep the region close to minimize latency.

Across DC call would usually take more time. so the close the DCs are, the smaller the latency would be.

We recommend to choose Azure DC to host the application caller service.

What is the audio encoding format?

To minimize latency or bandwidth, a streaming and compressed format is recommended.

how-to-choose-different-audio-output-format

How to debug issues

We recommend to add trace ID for each request. so it is good to debug an issue later.

How to get request ID for debugging purpose

How to scale the service to meet the SLA.

In general, if you have higher demand for request per second than default, engage the Azure support or your Azure contacts, we will respond quickly.

How to increase TTS request limits?

How to reduce latency of SDK

For service to service call, it is better to use synthesis pool which can save connection time for each request.

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechSynthesisScenarioSamples.java