Skip to content

How to call multiple voice in SSML

zhifzhan edited this page Mar 13, 2020 · 13 revisions

Customer may want to use multiple voices in one SSML to deliver some interesting experiences like role play story telling. Azure TTS support combing multiple voices with SSML.

Multiple Standard Voices

To use multiple standard voices, one should have SSML composed to refer to the voices to be used.

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        This is the text that is spoken.
    </voice>
    <voice name="en-US-GuyNeural">
        This is the text that is spoken.
    </voice>
</speak> 

then everything is the same like SSML with single voice.

Multiple Custom Voices

For custom voice, currently the custom endpoint needs to have the custom voice deployment id. Refer to:

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/regions#custom-voices

To access multiple custom voices in the SSML like above, each voice need to be deployed into their own endpoint. Then use multiple deploymentId parameter in endpoint URL to specify the voices needed.

For example:

If there are 3 voices deployed in custom voice into 3 endpoints

Voice Name Endpoint URL
VoiceA https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=44aa21a9-56cb-4959-b4c8-91a14a68b0b2
VoiceB https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=da90e300-3f79-462e-85cc-dac44b44ad33
VoiceC https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=3655bafe-073a-4291-aae7-7d2e7160b0f6

Combine their deploymentId into one URL to access the 3 voices in one endpoint: https://eastasia.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId=44aa21a9-56cb-4959-b4c8-91a14a68b0b2&deploymentId=da90e300-3f79-462e-85cc-dac44b44ad33&deploymentId=3655bafe-073a-4291-aae7-7d2e7160b0f6

With the following SSML:

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="VoiceA">
        This is VoiceA.
    </voice>
    <voice name="VoiceB">
        Then VoiceB.
    </voice>
    <voice name="VoiceC">
        And this is VoiceC.
    </voice>
</speak> 

All endpoints need to be in the same subscription, voices can be in different languages.

All deploymentId must be valid. Any invalid ID will fail the request.

If there are too many voices (more than 10) to put into the URL, it is recommended to have some code to construct the URL dynamically based on the SSML content.

Other Notes

Currently we don't support to mix custom voice and standard voice in one SSML

Clone this wiki locally