Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Neural] How do I control the pace of the generated speech #142

Closed
boltomli opened this issue Oct 9, 2019 · 3 comments
Closed

[Neural] How do I control the pace of the generated speech #142

boltomli opened this issue Oct 9, 2019 · 3 comments

Comments

@boltomli
Copy link
Contributor

boltomli commented Oct 9, 2019

How do I control the pace of the generated speech. I need to slow it down by 10%.
(en-US, JessaNeural)
X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm

Originally posted by @shoutbomb in #128 (comment)

@boltomli
Copy link
Contributor Author

boltomli commented Oct 9, 2019

@shoutbomb please try SSML to control the synthesis, for example

<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'><prosody rate="-10%">Hello Jessa!</prosody></voice></speak>

@shoutbomb
Copy link

This is a snippet of the current function call to generate the currently working SSML. Could you please suggest how I can change it to include the 'prosody' node.

public function azure_tts_build_resource_uri($tts_text, $tts_directory, $tts_lang="en-us",$tts_gender="Female",$tts_voiceid="Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)") {

    $doc = new DOMDocument();
    
    $root = $doc->createElement( "speak" );
    $root->setAttribute( "version" , "1.0" );
    $root->setAttribute( "xml:lang" , "$tts_lang" );
    
    $voice = $doc->createElement( "voice" );
    $voice->setAttribute( "xml:lang" , "$tts_lang" );
    $voice->setAttribute( "xml:gender" , "$tts_gender" );
    $voice->setAttribute( "name" , "$tts_voiceid" );
    
    $text = $doc->createTextNode("$tts_text");
    
    $voice->appendChild( $text );
    $root->appendChild( $voice );
    $doc->appendChild( $root );
    $data = $doc->saveXML(); 

        ... and so on.
   }

Many thanks...

@boltomli
Copy link
Contributor Author

boltomli commented Oct 9, 2019

For a simple case, you can construct the node inside <voice>. In fact it's all in the text part: <prosody rate="-10%">Hello Jessa!</prosody>. If you don't need to tune part by part, just add an element to $voice then create text in it. If you want some other text that are not in the prosody node, it should work as well. The generated SSML would be like <voice><prosody rate="-10%">this is slower</prosody>this is regular (default)</voice> (voice attributes are omitted to be clear).

Refer to https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants