Merge of coqui voices broke stuff #126

aedocw · 2023-12-23T22:01:05Z

Due to lack of complete testing, the merge that made studio voices work also broke a bunch of other stuff. This merge fixes that, and also includes a test script that I will use in the future to validate a few common use cases. It would be nice to add some real tests into CI, but the test runners do not have GPU, so would not be that useful.

danielw97 · 2023-12-23T22:36:52Z

Hi again,
If you'd rather have a separate issue for this let me know, although in my testing just now after pulling your most recent commit I'm getting the following error, as xtts is getting sent a bigger text chunk than it can handle I believe.
This is using one of the coqui studio voices, btw.

Error: ❗ XTTS can only generate text with a maximum of 400 tokens. ... Retrying (0 retries left)
This is a longer paragraph, although using a finetuned model last week with the same book didn't have this problem.
Thanks for all of your work.

aedocw · 2023-12-23T22:40:35Z

Hmm, I might need to fully revert this then. I have not tested with really long text, so have not run into exceeding the tokens. I think it's fair to keep it under this issue as the merge of coqui voices sure did break stuff! For what it's worth though, it is working for me with current chunk size with epubs, maybe I need to test with text longer than what I have sent to it so far.

…

On Sat, Dec 23, 2023 at 2:37 PM danielw97 ***@***.***> wrote: Hi again, If you'd rather have a separate issue for this let me know, although in my testing just now after pulling your most recent commit I'm getting the following error, as xtts is getting sent a bigger text chunk than it can handle I believe. This is using one of the coqui studio voices, btw. Error: ❗ XTTS can only generate text with a maximum of 400 tokens. ... Retrying (0 retries left) This is a longer paragraph, although using a finetuned model last week with the same book didn't have this problem. Thanks for all of your work. — Reply to this email directly, view it on GitHub <#126 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFBJGOAY54TN3RDBNQK6R3YK5MI5AVCNFSM6AAAAABBBBXDVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGM4DEMJTGE> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

danielw97 · 2023-12-23T22:44:23Z

Other than that everything seems to be working, I wonder is it possible to encorperate the same segmenting code that is used with xtts as I assume the limits are the same (if it isn't already)?
I think this is an outlier with a longer paragraph that xtts handled fine although I can see how it may cause issues.

danielw97 · 2023-12-24T01:04:44Z

Also, the same text processing at least in my mind should be used, as I believe this is basically xtts under the hood unless I am incorrect.

aedocw · 2023-12-24T01:05:03Z

I'm going to reopen this one until things are sorted out.

The difference has to do with how I'm calling XTTS between these two ways. The way I use the xtts cloning model is with their inference streaming approach but the docs don't indicate how you can use that with their voices.

danielw97 · 2023-12-24T01:32:00Z

Okay, thanks.
Not a rush especially this time of year with the holidays of course, although wanted to let you know the errors I was seeing.
Maybe they've not implemented the streaming approach with their studio voices, although might be worth asking on the Discord as it might just not be documented.

aedocw · 2023-12-24T01:35:58Z

Haha yes I have asked on discord, no answer yet though (and someone else just asked the same question today).

I put a potential fix in the branch "fixes" if you want to try it out when you get a chance.

As far as the holidays, it's OK, this is relaxing and I always sleep better after fixing some bugs :)

Thanks, and happy holidays to you too!

danielw97 · 2023-12-24T01:42:26Z

Thanks, I've got some time this evening and will test this now.
Edit: I've run the troublesome paragraph and that seems to have fixed it, appreciate your quick work on this.

aedocw · 2023-12-24T01:53:08Z

Thanks, I appreciate your testing and all your feedback!

You should see this, indicating the right xtts version:

 > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.                                                                       
 > Using model: xtts ```

danielw97 · 2023-12-24T01:54:57Z

Yes, that's what I got in the end.
I made a silly mistake the first time specifying --model instead of --engine although that was fixed fairly quickly, all good now though.

aedocw · 2023-12-24T02:00:34Z

Excellent! I'll figure out what's going on with other languages hopefully tonight and merge this branch.

aedocw · 2023-12-24T04:13:02Z

Found the problem with Coqui voices, reading plain text (rather than epub), and specifying a language other than english. On line 164 I replace all periods with commas if language != en, and that seems to break something along the way (maybe it confuses the segmenter that breaks everything up into individual sentences).

Replacing periods with commas did seem to help for non-english languages where it would seem to always pronounce the period at the end of sentences as "dot" or some variation of that. I changed that to happen now just before the sentence is sent to TTS, hopefully it is still effective for other languages.

aedocw · 2023-12-24T04:27:47Z

I believe things are all fixed now, please log bugs as always :)

aedocw self-assigned this Dec 23, 2023

aedocw added the bug Something isn't working label Dec 23, 2023

aedocw linked a pull request Dec 23, 2023 that will close this issue

Fixes for things I just broke #127

Merged

aedocw closed this as completed in #127 Dec 23, 2023

aedocw reopened this Dec 24, 2023

aedocw closed this as completed Dec 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge of coqui voices broke stuff #126

Merge of coqui voices broke stuff #126

aedocw commented Dec 23, 2023

danielw97 commented Dec 23, 2023

aedocw commented Dec 23, 2023 via email

danielw97 commented Dec 23, 2023

danielw97 commented Dec 24, 2023

aedocw commented Dec 24, 2023

danielw97 commented Dec 24, 2023

aedocw commented Dec 24, 2023

danielw97 commented Dec 24, 2023 •

edited

Loading

aedocw commented Dec 24, 2023

danielw97 commented Dec 24, 2023

aedocw commented Dec 24, 2023

aedocw commented Dec 24, 2023

aedocw commented Dec 24, 2023

Merge of coqui voices broke stuff #126

Merge of coqui voices broke stuff #126

Comments

aedocw commented Dec 23, 2023

danielw97 commented Dec 23, 2023

aedocw commented Dec 23, 2023 via email

danielw97 commented Dec 23, 2023

danielw97 commented Dec 24, 2023

aedocw commented Dec 24, 2023

danielw97 commented Dec 24, 2023

aedocw commented Dec 24, 2023

danielw97 commented Dec 24, 2023 • edited Loading

aedocw commented Dec 24, 2023

danielw97 commented Dec 24, 2023

aedocw commented Dec 24, 2023

aedocw commented Dec 24, 2023

aedocw commented Dec 24, 2023

danielw97 commented Dec 24, 2023 •

edited

Loading