Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Encodec decoding with Vocos #34

Merged
merged 1 commit into from
Aug 28, 2023
Merged

Conversation

v0xie
Copy link
Contributor

@v0xie v0xie commented Aug 27, 2023

This PR replaces the Encodec decoding with Vocos (https://github.com/charactr-platform/vocos).

In "utils/generation.py" file, generate_audio and generate_audio_from_long_text now both use Vocos decoding.

The dependency on Encodec isn't able to be fully replaced because Vocos doesn't have a way to encode.

My limited testing with a 4090 indicates that Vocos decoding is just marginally slower than Encodec decoding, but the increase in quality more than makes up for it I think.

I am attaching some generated audio samples so you can hear the difference the new decoder makes. This PR does not implement upsampling from 24k to 44.1k as recommended by the Vocos examples, but I have included a few generations that were upsampled to 44.1k in the samples zip.

samples.zip

@Plachtaa Plachtaa merged commit 6db2af0 into Plachtaa:master Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants