(Feature): Natural voice playback using tts for edge #25

Iheuzio · 2023-09-05T16:59:52Z

Hi,

I've recently saw the project here and it is using the services directly in windows. However the natural voices do sound much better, but are locked from the normal api in windows. They could be accessed with the edge/chrome api through chrome.ttsEngine or chrome.tts.

The added benefit is you do not need windows to run this and you can relay the calls to windows, or possibly whatever text reader is configured for the browser. There is the github project here, the work is already done there and you can simply pass the command from unity into the edge-tts command from the program there.

If this sounds reasonable, I could try working on a possible integration.

The text was updated successfully, but these errors were encountered:

Osmodium · 2023-09-07T07:51:16Z

I'm just realizing; are you talking about these voices? https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/
Because then the mod already supports the voices, you "just" have to enable them through the guide (also described in the only article on the mod page)

Iheuzio · 2023-09-07T11:07:35Z

It should read whichever tts voice is configured in chrome/chromium. This is separate from windows' voices and would require to pass the text through the browser API in order to be read.

Osmodium · 2023-09-07T11:48:02Z

I think you might mean this? https://learn.microsoft.com/en-us/archive/msdn-magazine/2019/june/speech-text-to-speech-synthesis-in-net
It was not possible to use SpeechSynthesis when the project was created since the version of .Net was incompatible with the version used in Unity, so the sideloading did not work. It might work now, however I have not tested this yet.

Couple of questions if not:

Would this require the user to have chrome installed?
Would it only be able to use the voice configured in chrome and not be able to take an agument for the voice?
Does it require an internet connection?

Iheuzio · 2023-11-23T19:57:49Z

Hi, sorry for the delay in response. Was busy working on other stuff.

You could check the forked repository I made for testing the changes, it works well.
here

it requires ffmpeg, mpv and edge-tts to be installed. Then it would work like this:
View Video demo

As for your questions:

Would this require the user to have chrome installed?

Yes you must have a chromium browser (edge and chrome are supported), only windows would work with this feature.

Would it only be able to use the voice configured in chrome and not be able to take an agument for the voice?

You can pass any one of the models after running edge-tts --list-voices, you can configure rate, pitch and all that as well: see documentation

Does it require an internet connection?

No internet connection is needed

Iheuzio · 2023-11-23T21:16:52Z

If you want to communicate on discord I'm in the server @iheuzio.

I could integrate a change where there is an option in the menu that allows this option to be toggled on or off. As well as a script to setup the mpv, ffmpeg and all that on windows to make it easier for installation. Let me know if that sounds good.

Osmodium · 2023-11-24T16:37:19Z

Thanks for providing a PoC of it, and no worries about taking time :)
I haven't had time to look at the fork yet but, it looks a sounds good.
I have some requests/concers:
It sounds like it takes a while for the audio to play after clicking, can this be reduced?
I imagine needing to have some checks as to if the computer has the applications installed and the correct supported versions of them, have you included this?

Iheuzio · 2023-11-24T17:22:11Z

It sounds like it takes a while for the audio to play after clicking, can this be reduced?

Yes, the longer the text, the longer it takes to save a temporary mp3 file, we could either split up the mp3 files into many smaller ones and play them separately as the others finish up sequentially.

I imagine needing to have some checks as to if the computer has the applications installed and the correct supported versions of them, have you included this?

I did not test that, this was done just to show that using edge-tts was possible. I can work on an actual integration later, however my code was written in 30 minutes so it's pretty bad.

adamstradomski · 2023-12-29T22:30:14Z

Is this idea stopped? I would love to see it work. Especially with Rouge Trader :)

But I believe the edge-tts does require internet connection. It's a python wrapper over Bing API?

Iheuzio · 2023-12-30T00:11:14Z

Yeah, it requires internet to be connected for natural voices. My bad, however I'll be able to work on a simple proper integration with the existing tts possibly after january 12th.

Osmodium · 2024-01-19T15:21:50Z

@Iheuzio Saw your video on the owl-cat forum. Looks promising with a bit of delay still 👍
Couldn't contact you on Discord since we aren't friends there.
Also: It seems like the service is no always up, which might create confusion if the mod switches after a delay to the "standard" voices. It seems a bit unstable to me still.
Let me know if you want me to take a look/help with it.
Tried this link: https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/readaloud/voices/list?trustedclienttoken=6A5AA1D4EAFF4E9FB37E23D68491D6F4
However it would be cool to have this as a toggle, so if people wanted to use it, they could with the caveats it might have :)

Osmodium · 2024-01-19T22:05:13Z

I just dabbled a bit with this in LinqPad, and I got it to work without having the python program installed, which would probably be preferred?

Wazard · 2024-02-03T14:23:13Z

hey, are you still working on that idea? I do have experience in coding wiht C# and unity but none in tts so i don't know if I would be of any help

Osmodium · 2024-02-04T09:48:12Z

@Wazard Yes, and I've gotten it to work, but discovered that the service that is being used to generate the audio does not support multiple voices in one request. So I'm currently working on parallelizing calls for each section of the dialog.

Wazard · 2024-02-04T13:17:32Z

@Osmodium sorry for the probably dumb question, but: I saw that from the narrator you can add and download the natural voices. Couldn't be possible to use em within windows itself in this way?

Iheuzio · 2024-02-04T18:29:46Z

Hi, currently I won't have time to focus on this mod. I'm working on other stuff at the moment, I may be able to try helping out later however not for the meantime.

bubval · 2024-03-03T19:16:15Z

Hey @Osmodium it's awesome ot hear that you have it working. Would it be possible to have a release not including multiple voices? How's the progress so far?

Osmodium · 2024-04-01T19:10:50Z

Hi! I have just uploaded the experimental version of the mod (0.9.4-EXP) here which includes Natural Voices through the Bing service. It is the version from over a month ago, but progress has been slow.
It is still WIP so there might be bugs.

Christian-Arning · 2024-04-07T18:49:57Z

@Osmodium does this mod also work for pathfinder wotr, if not how can i make it work there?

BelegCufea · 2024-05-23T07:32:59Z

OK. I have found NaturlaVoiceSAPIAdapter repo on GitHub that enables us to use Natural voices (including those for Edge) with TTS.

But it needs slight adjustment in SpeechMod to function. Just five new lines in GetAvailableVoices() method in WindowsVoiceUnity.cs.

Put these lines:

if (voices[i].Contains("(Natural)"))
{
    voices[i] = voices[i].Replace("(Natural)", "");
    voices[i] = voices[i].Replace("(", "Natural (");
}

just under

if (!voices[i].Contains('-'))
    voices[i] = $"{voices[i]}#Unknown";
else
    voices[i] = voices[i].Replace(" - ", "#");

Whole method should look like this:

public static string[] GetAvailableVoices()
{
    string voicesDelim = getVoicesAvailable();
    if (string.IsNullOrWhiteSpace(voicesDelim))
        return Array.Empty<string>();
    string[] voices = voicesDelim.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
    for (int i = 0; i < voices.Length; ++i)
    {
        if (!voices[i].Contains('-'))
            voices[i] = $"{voices[i]}#Unknown";
        else
            voices[i] = voices[i].Replace(" - ", "#");
        if (voices[i].Contains("(Natural)"))
        {
            voices[i] = voices[i].Replace("(Natural)", "");
            voices[i] = voices[i].Replace("(", "Natural (");
        }
    }
    return voices;
}

You will get something like thise:

Only tried if for few dialogs and books, but it seems to work just fine.

Osmodium · 2024-05-23T08:21:48Z

@BelegCufea That looks awesome and interesting! It looks like it only works for Windows 11, or I might be missing something.

BelegCufea · 2024-05-23T10:52:05Z

@Osmodium I have no idea, but on the Repo page there is a mention about Win 10 in System Requirements section:

I'm using Windows 10. Can I use the Narrator natural voices on Windows 11?

Yes, as long as your Windows 10 build number is 17763 or above (version 1809). You can choose and install Windows 11 Narrator voices here.

Windows 10's Narrator doesn't support natural voices directly, but it does support SAPI 5 voices. So you can make Windows 11 Narrator voices work on Windows 10 via this engine.

Osmodium · 2024-05-23T11:49:09Z

@BelegCufea I will test this out, since I'm on a Windows 10. Thanks!

Osmodium · 2024-05-23T12:51:55Z

It seems to be working pretty well, apart from it crashing when issued to stop. I have to look into this, but otherwise this is a pretty elegant solution for those who want those voices, and possibly others too.

BelegCufea · 2024-05-23T13:57:01Z

@Osmodium Nice. Works fine on Win 11. No crashing at all. Even when interrupting dialogs.

Sometimes there is a slight delay when using Edge voices, but not always. And even when there is one, it is acceptable (me thinks :-) )

And it seems it has some problem with <silence/> tag. Had to change phonetics like so:

  "—": " . . ",
  "...": " . . . ",

BelegCufea · 2024-05-26T19:35:24Z

@Osmodium Update after a few more hours of play.

Occasionally (about 1% of the time), the game becomes unresponsive when starting a new "page" of dialogue.
I always wait for the dialogue to be fully read before proceeding, so there is no interrupting.
This only happens when using Edge voices.
I can't determine where the freeze is occurring: in SAPIAdapter, in the WindowsVoice DLL, or in the C# wrapper.
As far as I can tell, this has never happened to me with offline Natural voices.

LeapSoftware · 2024-06-01T23:55:35Z

Hey All, just thought id add my own experience to the above mentioned ^.

I cloned the repo and made the changes you mentioned above (plus using NaturlaVoiceSAPIAdapter). I was able to get it working and detecting all online and locally downloaded natural voices. After using for a bit i can confirm that using online voices seems to every so often (far more often than 1%) hang the game indefinitely. I have not delved into where exactly it is throwing (i cant see any errors) but there is definitely an issue there.

If I find any other issues ill post here :)

BelegCufea · 2024-06-02T06:41:00Z

@LeapSoftware Thanks for info.

Unfortunately, that is true. It is unstable when using online voices. Nevertheless, I have had no problems so far using offline natural voices.

For anyone interested, I compiled the changes at my fork. It is highly experimental though! Use at your own risk :-)

Wazard · 2024-06-03T20:27:18Z

@LeapSoftware Thanks for info.

Unfortunately, that is true. It is unstable when using online voices. Nevertheless, I have had no problems so far using offline natural voices.

For anyone interested, I compiled the changes at my fork. It is highly experimental though! Use at your own risk :-)

how do i make it to work? The only 2 available are Zira and David. I'm on win11 with natural voices installed

Osmodium · 2024-06-03T22:05:15Z

I'll add the code to the project and I guess I can do a small writeup about having to use NaturalVoiceSAPIAdapter to make it work.

This comment was marked as outdated.

Sign in to view

Osmodium self-assigned this Sep 7, 2023

Osmodium added the enhancement New feature or request label Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Feature): Natural voice playback using tts for edge #25

(Feature): Natural voice playback using tts for edge #25

Iheuzio commented Sep 5, 2023 •

edited

This comment was marked as outdated.

Osmodium commented Sep 7, 2023 •

edited

Iheuzio commented Sep 7, 2023

Osmodium commented Sep 7, 2023 •

edited

Iheuzio commented Nov 23, 2023 •

edited

Iheuzio commented Nov 23, 2023

Osmodium commented Nov 24, 2023

Iheuzio commented Nov 24, 2023

adamstradomski commented Dec 29, 2023

Iheuzio commented Dec 30, 2023

Osmodium commented Jan 19, 2024 •

edited

Osmodium commented Jan 19, 2024

Wazard commented Feb 3, 2024

Osmodium commented Feb 4, 2024

Wazard commented Feb 4, 2024

Iheuzio commented Feb 4, 2024

bubval commented Mar 3, 2024

Osmodium commented Apr 1, 2024

Christian-Arning commented Apr 7, 2024

BelegCufea commented May 23, 2024

Osmodium commented May 23, 2024

BelegCufea commented May 23, 2024

Osmodium commented May 23, 2024

Osmodium commented May 23, 2024

BelegCufea commented May 23, 2024

BelegCufea commented May 26, 2024

LeapSoftware commented Jun 1, 2024 •

edited

BelegCufea commented Jun 2, 2024

Wazard commented Jun 3, 2024

Osmodium commented Jun 3, 2024

(Feature): Natural voice playback using tts for edge #25

(Feature): Natural voice playback using tts for edge #25

Comments

Iheuzio commented Sep 5, 2023 • edited

This comment was marked as outdated.

Osmodium commented Sep 7, 2023 • edited

Iheuzio commented Sep 7, 2023

Osmodium commented Sep 7, 2023 • edited

Iheuzio commented Nov 23, 2023 • edited

Iheuzio commented Nov 23, 2023

Osmodium commented Nov 24, 2023

Iheuzio commented Nov 24, 2023

adamstradomski commented Dec 29, 2023

Iheuzio commented Dec 30, 2023

Osmodium commented Jan 19, 2024 • edited

Osmodium commented Jan 19, 2024

Wazard commented Feb 3, 2024

Osmodium commented Feb 4, 2024

Wazard commented Feb 4, 2024

Iheuzio commented Feb 4, 2024

bubval commented Mar 3, 2024

Osmodium commented Apr 1, 2024

Christian-Arning commented Apr 7, 2024

BelegCufea commented May 23, 2024

Osmodium commented May 23, 2024

BelegCufea commented May 23, 2024

Osmodium commented May 23, 2024

Osmodium commented May 23, 2024

BelegCufea commented May 23, 2024

BelegCufea commented May 26, 2024

LeapSoftware commented Jun 1, 2024 • edited

BelegCufea commented Jun 2, 2024

Wazard commented Jun 3, 2024

Osmodium commented Jun 3, 2024

Iheuzio commented Sep 5, 2023 •

edited

Osmodium commented Sep 7, 2023 •

edited

Osmodium commented Sep 7, 2023 •

edited

Iheuzio commented Nov 23, 2023 •

edited

Osmodium commented Jan 19, 2024 •

edited

LeapSoftware commented Jun 1, 2024 •

edited