Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) #2

willwade · 2022-11-02T10:31:06Z

Reminder: This app is primarily designed to help people who can't speak present long streams of communication. Typically this can sound monotone. People control this by doing it sentence by sentence or paragraph - but we'd like to give people more control. E.g. within the text somehow indicate pauses, tone and expression.

Some Voices Support something called SSML. It's a XML markup language that tells the synthesiser to read the text differently. Its neat - but it its not supported by all voice engines. Particularly not the built in iOS engine. So for this we need to first detect what the engine is being used and then provide a textView.inputAccessoryView with options. These options differ with the engine is SSML compatible or not. (https://daddycoding.com/2019/10/30/ios-tutorial-input-accessory-view/)

If User chooses a voice with no SSML we need to show the following options in the InputAccessoryView:
Speech Rate (so we change rate for a portion of the text), Speech volume, Spelling mode (12345 gets read as 1 2 3 4 5 - i.e. it puts spaces into the text), and Silence (n ms). (NB: Wrise does this well - look at the pics https://www.assistiveware.com/products/wrise ). Graphic markers would exist in the text to identify these points elements, and behind the scenes, it would have to create some format that the voice synthesiser reads and uses.
For SSML-compatible voices - provide a similar-looking inputAccessoryView - which does something different - creates SSML compatible XML (but only shows text and some graphic markers to individual). This would be neat - there are no apps that I'm aware of that allow you to mark up and play SSML marked up speech
See here for ideas https://ssml-editor.azurewebsites.net - or https://www.getwoord.com/ssml-editor or those from Microsoft, Google and Amazon, IBM (See their own product pages)

Note: It may be that we choose NOT to support SSML as the key aspects of timing and rate are good. Which is fine - but going forward there are a lot more elements of SSML that are useful including eg. style. See https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-voice

I'm not sure if this is really any easier - but we could just look to support Speech Markdown - see the JS library which we could use with JavascriptCore

There are a number of steps to get this done. Here's one idea

Be able to read in and play a SSML file
Be able to show markers from ssml in the app in a visual way
Be able to edit ssml

willwade · 2023-03-20T16:33:32Z

I found this interesting. And kind of related. Tune to Voice allows you to record some speech and it then marks up your TTS with the correct prosody.. Im not sure we could make use of it - but its interesting https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-intro&mhsrc=ibmsearch_a&mhq=tune+by+example

willwade changed the title ~~Edit SSML in-app~~ Edit SSML/Non-SSML markers in-app Nov 15, 2022

willwade changed the title ~~Edit SSML/Non-SSML markers in-app~~ Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) Nov 16, 2022

willwade mentioned this issue Feb 27, 2023

Dealing with being offline #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) #2

Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) #2

willwade commented Nov 2, 2022 •

edited

Loading

willwade commented Mar 20, 2023

Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) #2

Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) #2

Comments

willwade commented Nov 2, 2022 • edited Loading

willwade commented Mar 20, 2023

willwade commented Nov 2, 2022 •

edited

Loading