Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) #2

Open
3 tasks
willwade opened this issue Nov 2, 2022 · 1 comment
Open
3 tasks

Comments

@willwade
Copy link
Contributor

willwade commented Nov 2, 2022

Reminder: This app is primarily designed to help people who can't speak present long streams of communication. Typically this can sound monotone. People control this by doing it sentence by sentence or paragraph - but we'd like to give people more control. E.g. within the text somehow indicate pauses, tone and expression.

Some Voices Support something called SSML. It's a XML markup language that tells the synthesiser to read the text differently. Its neat - but it its not supported by all voice engines. Particularly not the built in iOS engine. So for this we need to first detect what the engine is being used and then provide a textView.inputAccessoryView with options. These options differ with the engine is SSML compatible or not. (https://daddycoding.com/2019/10/30/ios-tutorial-input-accessory-view/)

  • If User chooses a voice with no SSML we need to show the following options in the InputAccessoryView:
    Speech Rate (so we change rate for a portion of the text), Speech volume, Spelling mode (12345 gets read as 1 2 3 4 5 - i.e. it puts spaces into the text), and Silence (n ms). (NB: Wrise does this well - look at the pics https://www.assistiveware.com/products/wrise ). Graphic markers would exist in the text to identify these points elements, and behind the scenes, it would have to create some format that the voice synthesiser reads and uses.

  • For SSML-compatible voices - provide a similar-looking inputAccessoryView - which does something different - creates SSML compatible XML (but only shows text and some graphic markers to individual). This would be neat - there are no apps that I'm aware of that allow you to mark up and play SSML marked up speech

  • See here for ideas https://ssml-editor.azurewebsites.net - or https://www.getwoord.com/ssml-editor or those from Microsoft, Google and Amazon, IBM (See their own product pages)

Note: It may be that we choose NOT to support SSML as the key aspects of timing and rate are good. Which is fine - but going forward there are a lot more elements of SSML that are useful including eg. style. See https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-voice

I'm not sure if this is really any easier - but we could just look to support Speech Markdown - see the JS library which we could use with JavascriptCore

There are a number of steps to get this done. Here's one idea

  • Be able to read in and play a SSML file
  • Be able to show markers from ssml in the app in a visual way
  • Be able to edit ssml
@willwade willwade changed the title Edit SSML in-app Edit SSML/Non-SSML markers in-app Nov 15, 2022
@willwade willwade changed the title Edit SSML/Non-SSML markers in-app Edit SSML/Non-SSML markers in-app (inputAccessoryView for SSML and non-SSML) Nov 16, 2022
@willwade
Copy link
Contributor Author

I found this interesting. And kind of related. Tune to Voice allows you to record some speech and it then marks up your TTS with the correct prosody.. Im not sure we could make use of it - but its interesting https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-tbe-intro&mhsrc=ibmsearch_a&mhq=tune+by+example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant