Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voices not stretching when OpenAI models are used. #90

Open
Hiuzuki opened this issue Mar 22, 2024 · 5 comments
Open

Voices not stretching when OpenAI models are used. #90

Hiuzuki opened this issue Mar 22, 2024 · 5 comments

Comments

@Hiuzuki
Copy link

Hiuzuki commented Mar 22, 2024

I'm using Azure TTS, trying to dub some specific tutorials from English to my language, PT-BR, and I'm having a specific problem, when everything is finished, the audio is overlapped in the transitions as there is no stretching or retracting process done, but this only happens when I use OpenAI voices (I want to use them for an obvious reason, they are much better), the conventional Azure voices work perfectly without any overlap, but they sound strange.
I've tried a bunch of settings in the .ini files, but they don't seem to affect these OpenAI voice models.
Any suggestion?

@ThioJoe
Copy link
Owner

ThioJoe commented Mar 25, 2024

Hm I wonder if maybe those voices don't support the mstts:audioduration SSML tag. In the mean time, try going into config.ini and setting the option force_stretch_with_twopass = True, I believe that should work. If it doesn't, try changing the two_pass_voice_synth option to the opposite of whatever it is now then try it again. Probably better to try this with a small file so you don't pay for a whole video's worth of API processing.

Also you're using ffmpeg for the stretching right? I added that in the past couple months and it's better than rubberband I've found.

Edit: After looking, it might not actually work even with the force stretch option set to true because of some checks in audio_builder.py, so I might have to change that.

@Hiuzuki
Copy link
Author

Hiuzuki commented Mar 25, 2024

Ok, i'm going to waiting anxiously.

@ThioJoe
Copy link
Owner

ThioJoe commented Mar 26, 2024

Ok try replacing your audio_builder.py file with the latest one: https://github.com/ThioJoe/Auto-Synced-Translated-Dubs/blob/main/Scripts/audio_builder.py

And also add this new option anywhere in your config.ini file and make sure it's set to true:

	# This will make it so the audio clips get stretched locally even if the TTS service allows specifying exact duration
	# This could be used when a TTS service like Azure is creating clips of incorrect length, or if certain voices don't support exact length
	# Possible Values: True  |  False (Default)
force_always_stretch = True

You shouldn't have to have force_stretch_with_twopass enabled, but if the above doesn't work maybe enable it and try again.

@Hiuzuki
Copy link
Author

Hiuzuki commented Mar 26, 2024

Hello, now it's working well, there are still some small overlaps, practically unnoticeable, absurdly better than before.
Thank you very much.

Captura de tela 2024-03-26 205528

@ThioJoe
Copy link
Owner

ThioJoe commented Mar 27, 2024

Ok great. For the remaining overlaps, you could try messing around with the add_line_buffer_milliseconds setting in config.ini which will add a bit of extra space between the clips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants