[Feature request] pronounciation, cadence and nuances in XTTS v2... #3764

0wwafa · 2024-05-29T18:55:07Z

Hello!
I have used xTTS v2 for a while and made great voices.
I sih to know one thing:
every voice made, when it "speaks" has the same cadence and pronounciation (clearly from a trained model).
How could I get from the speaker also that?
I mean, to really clone a voice, you don't need only the frequencies but also their nuances.
Can you please post an example or even better, add the feture directly in xTTSv2?
So that one can decide if getting a standard voice, a "speaker" voice, or a speaker voice and "nuance".
That would be great!
Thanks.

0wwafa · 2024-06-08T14:38:39Z

how can I do this manually? can anybody help?

Aphexus · 2024-06-18T15:36:44Z

I don't think this is entirely true. I put in some text and I had something like "in the butt, yeah in the butt!" and it spoke the last part where it raised the pitch of yeah and said it more excited and made it feel like an exclamation(as if it took into account !).

So there are some nuances. Maybe there should be some way to modify the speech a bit with "special tokens" that can raise or lower the pitch or increase the speed or whatever. I think this would require, for it to work, someone to categorize a training set that way else it likely won't feel natural.

0wwafa · 2024-06-18T17:17:15Z

@Aphexus lol. yes.. there are.. but they are not the same of the speakers..

like:

from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cpu")
t=open('text.txt', 'r').read().replace('\n','')
tts.tts_to_file(text=t, speaker_wav=["./speaker1.wav","./speaker2.wav"], language="en", file_path="test.wav")

no matter how long are the samples or how many, the foning intonation is not as the original even if the voice is similar.

hjj-lmx · 2024-07-13T08:59:11Z

Have you found a solution to the inconsistency between the generated sound and the original uploaded sound？

0wwafa · 2024-07-13T13:35:48Z

Have you found a solution to the inconsistency between the generated sound and the original uploaded sound？

no.

0wwafa · 2024-09-01T07:44:14Z

no :(

…

On Sat, Jul 13, 2024 at 11:59 AM hjj-lmx ***@***.***> wrote: Have you found a solution to the inconsistency between the generated sound and the original uploaded sound？ — Reply to this email directly, view it on GitHub <#3764 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BITFXHQAC5BCUQGDH6ERIUDZMDT7LAVCNFSM6AAAAABIPQLF4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWHAZDQNBSGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0wwafa added the feature request feature requests for making TTS better. label May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] pronounciation, cadence and nuances in XTTS v2... #3764

[Feature request] pronounciation, cadence and nuances in XTTS v2... #3764

0wwafa commented May 29, 2024

0wwafa commented Jun 8, 2024

Aphexus commented Jun 18, 2024

0wwafa commented Jun 18, 2024

hjj-lmx commented Jul 13, 2024

0wwafa commented Jul 13, 2024

0wwafa commented Sep 1, 2024 via email

[Feature request] pronounciation, cadence and nuances in XTTS v2... #3764

[Feature request] pronounciation, cadence and nuances in XTTS v2... #3764

Comments

0wwafa commented May 29, 2024

0wwafa commented Jun 8, 2024

Aphexus commented Jun 18, 2024

0wwafa commented Jun 18, 2024

hjj-lmx commented Jul 13, 2024

0wwafa commented Jul 13, 2024

0wwafa commented Sep 1, 2024 via email