Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Feature): Natural voice playback using tts for edge #25

Open
Iheuzio opened this issue Sep 5, 2023 · 30 comments
Open

(Feature): Natural voice playback using tts for edge #25

Iheuzio opened this issue Sep 5, 2023 · 30 comments
Assignees
Labels
enhancement New feature or request

Comments

@Iheuzio
Copy link

Iheuzio commented Sep 5, 2023

Hi,

I've recently saw the project here and it is using the services directly in windows. However the natural voices do sound much better, but are locked from the normal api in windows. They could be accessed with the edge/chrome api through chrome.ttsEngine or chrome.tts.

The added benefit is you do not need windows to run this and you can relay the calls to windows, or possibly whatever text reader is configured for the browser. There is the github project here, the work is already done there and you can simply pass the command from unity into the edge-tts command from the program there.

If this sounds reasonable, I could try working on a possible integration.

@Osmodium

This comment was marked as outdated.

@Osmodium Osmodium self-assigned this Sep 7, 2023
@Osmodium Osmodium added the enhancement New feature or request label Sep 7, 2023
@Osmodium
Copy link
Owner

Osmodium commented Sep 7, 2023

I'm just realizing; are you talking about these voices? https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/
Because then the mod already supports the voices, you "just" have to enable them through the guide (also described in the only article on the mod page)

@Iheuzio
Copy link
Author

Iheuzio commented Sep 7, 2023

It should read whichever tts voice is configured in chrome/chromium. This is separate from windows' voices and would require to pass the text through the browser API in order to be read.

@Osmodium
Copy link
Owner

Osmodium commented Sep 7, 2023

I think you might mean this? https://learn.microsoft.com/en-us/archive/msdn-magazine/2019/june/speech-text-to-speech-synthesis-in-net
It was not possible to use SpeechSynthesis when the project was created since the version of .Net was incompatible with the version used in Unity, so the sideloading did not work. It might work now, however I have not tested this yet.

Couple of questions if not:

  • Would this require the user to have chrome installed?
  • Would it only be able to use the voice configured in chrome and not be able to take an agument for the voice?
  • Does it require an internet connection?

@Iheuzio
Copy link
Author

Iheuzio commented Nov 23, 2023

Hi, sorry for the delay in response. Was busy working on other stuff.

You could check the forked repository I made for testing the changes, it works well.
here

it requires ffmpeg, mpv and edge-tts to be installed. Then it would work like this:
View Video demo

As for your questions:

  • Would this require the user to have chrome installed?

Yes you must have a chromium browser (edge and chrome are supported), only windows would work with this feature.

  • Would it only be able to use the voice configured in chrome and not be able to take an agument for the voice?

You can pass any one of the models after running edge-tts --list-voices, you can configure rate, pitch and all that as well: see documentation

  • Does it require an internet connection?

No internet connection is needed

@Iheuzio
Copy link
Author

Iheuzio commented Nov 23, 2023

If you want to communicate on discord I'm in the server @iheuzio.

I could integrate a change where there is an option in the menu that allows this option to be toggled on or off. As well as a script to setup the mpv, ffmpeg and all that on windows to make it easier for installation. Let me know if that sounds good.

@Osmodium
Copy link
Owner

Thanks for providing a PoC of it, and no worries about taking time :)
I haven't had time to look at the fork yet but, it looks a sounds good.
I have some requests/concers:
It sounds like it takes a while for the audio to play after clicking, can this be reduced?
I imagine needing to have some checks as to if the computer has the applications installed and the correct supported versions of them, have you included this?

@Iheuzio
Copy link
Author

Iheuzio commented Nov 24, 2023

  • It sounds like it takes a while for the audio to play after clicking, can this be reduced?

Yes, the longer the text, the longer it takes to save a temporary mp3 file, we could either split up the mp3 files into many smaller ones and play them separately as the others finish up sequentially.

  • I imagine needing to have some checks as to if the computer has the applications installed and the correct supported versions of them, have you included this?

I did not test that, this was done just to show that using edge-tts was possible. I can work on an actual integration later, however my code was written in 30 minutes so it's pretty bad.

@adamstradomski
Copy link

Is this idea stopped? I would love to see it work. Especially with Rouge Trader :)

But I believe the edge-tts does require internet connection. It's a python wrapper over Bing API?
image

@Iheuzio
Copy link
Author

Iheuzio commented Dec 30, 2023

Yeah, it requires internet to be connected for natural voices. My bad, however I'll be able to work on a simple proper integration with the existing tts possibly after january 12th.

@Osmodium
Copy link
Owner

Osmodium commented Jan 19, 2024

@Iheuzio Saw your video on the owl-cat forum. Looks promising with a bit of delay still 👍
Couldn't contact you on Discord since we aren't friends there.
Also: It seems like the service is no always up, which might create confusion if the mod switches after a delay to the "standard" voices. It seems a bit unstable to me still.
Let me know if you want me to take a look/help with it.
Tried this link: https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/readaloud/voices/list?trustedclienttoken=6A5AA1D4EAFF4E9FB37E23D68491D6F4
However it would be cool to have this as a toggle, so if people wanted to use it, they could with the caveats it might have :)

@Osmodium
Copy link
Owner

I just dabbled a bit with this in LinqPad, and I got it to work without having the python program installed, which would probably be preferred?

@Wazard
Copy link

Wazard commented Feb 3, 2024

hey, are you still working on that idea? I do have experience in coding wiht C# and unity but none in tts so i don't know if I would be of any help

@Osmodium
Copy link
Owner

Osmodium commented Feb 4, 2024

@Wazard Yes, and I've gotten it to work, but discovered that the service that is being used to generate the audio does not support multiple voices in one request. So I'm currently working on parallelizing calls for each section of the dialog.

@Wazard
Copy link

Wazard commented Feb 4, 2024

@Osmodium sorry for the probably dumb question, but: I saw that from the narrator you can add and download the natural voices. Couldn't be possible to use em within windows itself in this way?

@Iheuzio
Copy link
Author

Iheuzio commented Feb 4, 2024

Hi, currently I won't have time to focus on this mod. I'm working on other stuff at the moment, I may be able to try helping out later however not for the meantime.

@bubval
Copy link

bubval commented Mar 3, 2024

Hey @Osmodium it's awesome ot hear that you have it working. Would it be possible to have a release not including multiple voices? How's the progress so far?

@Osmodium
Copy link
Owner

Osmodium commented Apr 1, 2024

Hi! I have just uploaded the experimental version of the mod (0.9.4-EXP) here which includes Natural Voices through the Bing service. It is the version from over a month ago, but progress has been slow.
It is still WIP so there might be bugs.

@Christian-Arning
Copy link

@Osmodium does this mod also work for pathfinder wotr, if not how can i make it work there?

@BelegCufea
Copy link

OK. I have found NaturlaVoiceSAPIAdapter repo on GitHub that enables us to use Natural voices (including those for Edge) with TTS.

But it needs slight adjustment in SpeechMod to function. Just five new lines in GetAvailableVoices() method in WindowsVoiceUnity.cs.

Put these lines:

if (voices[i].Contains("(Natural)"))
{
    voices[i] = voices[i].Replace("(Natural)", "");
    voices[i] = voices[i].Replace("(", "Natural (");
}

just under

if (!voices[i].Contains('-'))
    voices[i] = $"{voices[i]}#Unknown";
else
    voices[i] = voices[i].Replace(" - ", "#");

Whole method should look like this:

public static string[] GetAvailableVoices()
{
    string voicesDelim = getVoicesAvailable();
    if (string.IsNullOrWhiteSpace(voicesDelim))
        return Array.Empty<string>();
    string[] voices = voicesDelim.Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
    for (int i = 0; i < voices.Length; ++i)
    {
        if (!voices[i].Contains('-'))
            voices[i] = $"{voices[i]}#Unknown";
        else
            voices[i] = voices[i].Replace(" - ", "#");
        if (voices[i].Contains("(Natural)"))
        {
            voices[i] = voices[i].Replace("(Natural)", "");
            voices[i] = voices[i].Replace("(", "Natural (");
        }
    }
    return voices;
}

You will get something like thise:
image

Only tried if for few dialogs and books, but it seems to work just fine.

@Osmodium
Copy link
Owner

@BelegCufea That looks awesome and interesting! It looks like it only works for Windows 11, or I might be missing something.

@BelegCufea
Copy link

@Osmodium I have no idea, but on the Repo page there is a mention about Win 10 in System Requirements section:

I'm using Windows 10. Can I use the Narrator natural voices on Windows 11?

Yes, as long as your Windows 10 build number is 17763 or above (version 1809). You can choose and install Windows 11 Narrator voices here.

Windows 10's Narrator doesn't support natural voices directly, but it does support SAPI 5 voices. So you can make Windows 11 Narrator voices work on Windows 10 via this engine.

@Osmodium
Copy link
Owner

@BelegCufea I will test this out, since I'm on a Windows 10. Thanks!

@Osmodium
Copy link
Owner

It seems to be working pretty well, apart from it crashing when issued to stop. I have to look into this, but otherwise this is a pretty elegant solution for those who want those voices, and possibly others too.

@BelegCufea
Copy link

@Osmodium Nice. Works fine on Win 11. No crashing at all. Even when interrupting dialogs.

Sometimes there is a slight delay when using Edge voices, but not always. And even when there is one, it is acceptable (me thinks :-) )

And it seems it has some problem with <silence/> tag. Had to change phonetics like so:

  "—": " . . ",
  "...": " . . . ", 

@BelegCufea
Copy link

@Osmodium Update after a few more hours of play.

  • Occasionally (about 1% of the time), the game becomes unresponsive when starting a new "page" of dialogue.
  • I always wait for the dialogue to be fully read before proceeding, so there is no interrupting.
  • This only happens when using Edge voices.
  • I can't determine where the freeze is occurring: in SAPIAdapter, in the WindowsVoice DLL, or in the C# wrapper.
  • As far as I can tell, this has never happened to me with offline Natural voices.

@LeapSoftware
Copy link

LeapSoftware commented Jun 1, 2024

Hey All, just thought id add my own experience to the above mentioned ^.

I cloned the repo and made the changes you mentioned above (plus using NaturlaVoiceSAPIAdapter). I was able to get it working and detecting all online and locally downloaded natural voices. After using for a bit i can confirm that using online voices seems to every so often (far more often than 1%) hang the game indefinitely. I have not delved into where exactly it is throwing (i cant see any errors) but there is definitely an issue there.

If I find any other issues ill post here :)

@BelegCufea
Copy link

@LeapSoftware Thanks for info.

Unfortunately, that is true. It is unstable when using online voices. Nevertheless, I have had no problems so far using offline natural voices.

For anyone interested, I compiled the changes at my fork. It is highly experimental though! Use at your own risk :-)

@Wazard
Copy link

Wazard commented Jun 3, 2024

@LeapSoftware Thanks for info.

Unfortunately, that is true. It is unstable when using online voices. Nevertheless, I have had no problems so far using offline natural voices.

For anyone interested, I compiled the changes at my fork. It is highly experimental though! Use at your own risk :-)

how do i make it to work? The only 2 available are Zira and David. I'm on win11 with natural voices installed

@Osmodium
Copy link
Owner

Osmodium commented Jun 3, 2024

I'll add the code to the project and I guess I can do a small writeup about having to use NaturalVoiceSAPIAdapter to make it work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants