[Neural] How do I control the pace of the generated speech #142

boltomli · 2019-10-09T01:51:17Z

How do I control the pace of the generated speech. I need to slow it down by 10%.
(en-US, JessaNeural)
X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm

Originally posted by @shoutbomb in #128 (comment)

boltomli · 2019-10-09T04:33:09Z

@shoutbomb please try SSML to control the synthesis, for example

<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'><prosody rate="-10%">Hello Jessa!</prosody></voice></speak>

shoutbomb · 2019-10-09T07:44:49Z

This is a snippet of the current function call to generate the currently working SSML. Could you please suggest how I can change it to include the 'prosody' node.

public function azure_tts_build_resource_uri($tts_text, $tts_directory, $tts_lang="en-us",$tts_gender="Female",$tts_voiceid="Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)") {

    $doc = new DOMDocument();
    
    $root = $doc->createElement( "speak" );
    $root->setAttribute( "version" , "1.0" );
    $root->setAttribute( "xml:lang" , "$tts_lang" );
    
    $voice = $doc->createElement( "voice" );
    $voice->setAttribute( "xml:lang" , "$tts_lang" );
    $voice->setAttribute( "xml:gender" , "$tts_gender" );
    $voice->setAttribute( "name" , "$tts_voiceid" );
    
    $text = $doc->createTextNode("$tts_text");
    
    $voice->appendChild( $text );
    $root->appendChild( $voice );
    $doc->appendChild( $root );
    $data = $doc->saveXML(); 

        ... and so on.
   }

Many thanks...

boltomli · 2019-10-09T08:11:42Z

For a simple case, you can construct the node inside <voice>. In fact it's all in the text part: <prosody rate="-10%">Hello Jessa!</prosody>. If you don't need to tune part by part, just add an element to $voice then create text in it. If you want some other text that are not in the prosody node, it should work as well. The generated SSML would be like <voice><prosody rate="-10%">this is slower</prosody>this is regular (default)</voice> (voice attributes are omitted to be clear).

Refer to https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup

boltomli closed this as completed Oct 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Neural] How do I control the pace of the generated speech #142

[Neural] How do I control the pace of the generated speech #142

boltomli commented Oct 9, 2019

boltomli commented Oct 9, 2019

shoutbomb commented Oct 9, 2019

boltomli commented Oct 9, 2019 •

edited

Loading

[Neural] How do I control the pace of the generated speech #142

[Neural] How do I control the pace of the generated speech #142

Comments

boltomli commented Oct 9, 2019

boltomli commented Oct 9, 2019

shoutbomb commented Oct 9, 2019

boltomli commented Oct 9, 2019 • edited Loading

boltomli commented Oct 9, 2019 •

edited

Loading