Add support for additional TTS integrations through non-Microsoft focused SpeechService interface #2379

druggedhippo · 2022-08-19T15:32:51Z

EDDI currently uses whatever built in Windows TTS system is installed. Unfortunately, the built in Windows TTS are not particularly good.

This feature request is to ask for a better more modular SpeechService class that allows other speech engines to "plugin" that do not rely on the Windows TTS interfaces and provide the same WAV stream as the existing class uses.

Examples of other engines could include (but are not limited to):

Amazon Polly - https://ai-service-demos.go-aws.com/polly
Google - https://cloud.google.com/text-to-speech
Microsoft Azure - https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/
Different versions of the SAPI interface

As a proof of concept, here is an Amazon polly implementation I created.

https://gist.github.com/druggedhippo/0a887973ee019dea1fc9e522f513b0f5

Example audio of Amazon Polly processing a EDDI TTS prompt in real-time:

https://imgur.com/zyoWmQg

Tkael · 2022-10-03T18:55:52Z

Thank you for this. 😀

As you have effectively demonstrated, it is indeed possible to add additional speech synthesizers to EDDI, including for voices sourced from various cloud development environments (Azure, AWS, etc.).

These cloud voices typically require the user to provide specific credentials and are limited in some way (either as timed trials or offering to render a limited number of words for free each month).

We're happy to support additional voices in EDDI but it is also important to note that voices from different sources do not always behave alike (in terms of SSML support, lexicons, etc).

We would need to do some additional work to document the new capability and help users enter their credentials for accessing the voice. Some UI changes to allow capturing credentials in EDDI would probably also be very welcome.

Tkael · 2022-11-18T02:22:31Z

Related: https://github.com/jamescl604/MSCognitiveSpeechForVoiceAttack

Tkael · 2023-05-15T00:33:33Z

https://cloud.google.com/text-to-speech/docs/libraries

Tkael added the 9. enhancement The behaviour is as specified, but we would like to modify or extend the spec. label Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for additional TTS integrations through non-Microsoft focused SpeechService interface #2379

Add support for additional TTS integrations through non-Microsoft focused SpeechService interface #2379

druggedhippo commented Aug 19, 2022

Tkael commented Oct 3, 2022 •

edited

Loading

Tkael commented Nov 18, 2022

Tkael commented May 15, 2023

Add support for additional TTS integrations through non-Microsoft focused SpeechService interface #2379

Add support for additional TTS integrations through non-Microsoft focused SpeechService interface #2379

Comments

druggedhippo commented Aug 19, 2022

Tkael commented Oct 3, 2022 • edited Loading

Tkael commented Nov 18, 2022

Tkael commented May 15, 2023

Tkael commented Oct 3, 2022 •

edited

Loading