You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reminder: This app is primarily designed to help people who can't speak present long streams of communication. Typically this can sound monotone. People control this by doing it sentence by sentence or paragraph - but we'd like to give people more control. E.g. within the text somehow indicate pauses, tone and expression.
Some Voices Support something called SSML. It's a XML markup language that tells the synthesiser to read the text differently. Its neat - but it its not supported by all voice engines. Particularly not the built in iOS engine. So for this we need to first detect what the engine is being used and then provide a textView.inputAccessoryView with options. These options differ with the engine is SSML compatible or not. (https://daddycoding.com/2019/10/30/ios-tutorial-input-accessory-view/)
If User chooses a voice with no SSML we need to show the following options in the InputAccessoryView:
Speech Rate (so we change rate for a portion of the text), Speech volume, Spelling mode (12345 gets read as 1 2 3 4 5 - i.e. it puts spaces into the text), and Silence (n ms). (NB: Wrise does this well - look at the pics https://www.assistiveware.com/products/wrise ). Graphic markers would exist in the text to identify these points elements, and behind the scenes, it would have to create some format that the voice synthesiser reads and uses.
For SSML-compatible voices - provide a similar-looking inputAccessoryView - which does something different - creates SSML compatible XML (but only shows text and some graphic markers to individual). This would be neat - there are no apps that I'm aware of that allow you to mark up and play SSML marked up speech
Reminder: This app is primarily designed to help people who can't speak present long streams of communication. Typically this can sound monotone. People control this by doing it sentence by sentence or paragraph - but we'd like to give people more control. E.g. within the text somehow indicate pauses, tone and expression.
Some Voices Support something called SSML. It's a XML markup language that tells the synthesiser to read the text differently. Its neat - but it its not supported by all voice engines. Particularly not the built in iOS engine. So for this we need to first detect what the engine is being used and then provide a textView.inputAccessoryView with options. These options differ with the engine is SSML compatible or not. (https://daddycoding.com/2019/10/30/ios-tutorial-input-accessory-view/)
If User chooses a voice with no SSML we need to show the following options in the InputAccessoryView:
Speech Rate (so we change rate for a portion of the text), Speech volume, Spelling mode (12345 gets read as 1 2 3 4 5 - i.e. it puts spaces into the text), and Silence (n ms). (NB: Wrise does this well - look at the pics https://www.assistiveware.com/products/wrise ). Graphic markers would exist in the text to identify these points elements, and behind the scenes, it would have to create some format that the voice synthesiser reads and uses.
For SSML-compatible voices - provide a similar-looking inputAccessoryView - which does something different - creates SSML compatible XML (but only shows text and some graphic markers to individual). This would be neat - there are no apps that I'm aware of that allow you to mark up and play SSML marked up speech
See here for ideas https://ssml-editor.azurewebsites.net - or https://www.getwoord.com/ssml-editor or those from Microsoft, Google and Amazon, IBM (See their own product pages)
Note: It may be that we choose NOT to support SSML as the key aspects of timing and rate are good. Which is fine - but going forward there are a lot more elements of SSML that are useful including eg. style. See https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-voice
I'm not sure if this is really any easier - but we could just look to support Speech Markdown - see the JS library which we could use with JavascriptCore
There are a number of steps to get this done. Here's one idea
The text was updated successfully, but these errors were encountered: