-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] pronounciation, cadence and nuances in XTTS v2... #3764
Comments
how can I do this manually? can anybody help? |
I don't think this is entirely true. I put in some text and I had something like "in the butt, yeah in the butt!" and it spoke the last part where it raised the pitch of yeah and said it more excited and made it feel like an exclamation(as if it took into account !). So there are some nuances. Maybe there should be some way to modify the speech a bit with "special tokens" that can raise or lower the pitch or increase the speed or whatever. I think this would require, for it to work, someone to categorize a training set that way else it likely won't feel natural. |
@Aphexus lol. yes.. there are.. but they are not the same of the speakers.. like:
no matter how long are the samples or how many, the foning intonation is not as the original even if the voice is similar. |
Have you found a solution to the inconsistency between the generated sound and the original uploaded sound? |
no. |
no :(
…On Sat, Jul 13, 2024 at 11:59 AM hjj-lmx ***@***.***> wrote:
Have you found a solution to the inconsistency between the generated sound
and the original uploaded sound?
—
Reply to this email directly, view it on GitHub
<#3764 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/BITFXHQAC5BCUQGDH6ERIUDZMDT7LAVCNFSM6AAAAABIPQLF4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWHAZDQNBSGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hello!
I have used xTTS v2 for a while and made great voices.
I sih to know one thing:
every voice made, when it "speaks" has the same cadence and pronounciation (clearly from a trained model).
How could I get from the speaker also that?
I mean, to really clone a voice, you don't need only the frequencies but also their nuances.
Can you please post an example or even better, add the feture directly in xTTSv2?
So that one can decide if getting a standard voice, a "speaker" voice, or a speaker voice and "nuance".
That would be great!
Thanks.
The text was updated successfully, but these errors were encountered: