We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider giving a go to Silero TTS models. These are published under an open license assuming non-commercial / personal usage. Please see our TTS models here - https://github.com/snakers4/silero-models#text-to-speech (corresponding article https://habr.com/ru/post/549482/).
What is most important our TTS models can run on one CPU thread / core decently and depend mostly only on PyTorch.
Just let me repost some of the benchmarks here:
RTF (Real Time Factor) - time the synthesis takes divided by audio duration;
RTS = 1 / RTF (Real Time Speed) - how much the synthesis is "faster" than realtime;
We benchmarked the models on two devices using Pytorch 1.8 utils:
CPU - Intel i7-6800K CPU @ 3.40GHz;
GPU - 1080 Ti;
When measuring CPU performance, we also limited the number of threads used;
For the 16KHz models we got the following metrics:
| BatchSize | Device | RTF | RTS | | --------- | ------------- | ----- | ----- | | 1 | CPU 1 thread | 0.7 | 1.4 | | 1 | CPU 2 threads | 0.4 | 2.3 | | 1 | CPU 4 threads | 0.3 | 3.1 | | 4 | CPU 1 thread | 0.5 | 2.0 | | 4 | CPU 2 threads | 0.3 | 3.2 | | 4 | CPU 4 threads | 0.2 | 4.9 | | --- | ----------- | --- | --- | | 1 | GPU | 0.06 | 16.9 | | 4 | GPU | 0.02 | 51.7 | | 8 | GPU | 0.01 | 79.4 | | 16 | GPU | 0.008 | 122.9 | | 32 | GPU | 0.006 | 161.2 | | --- | ----------- | --- | --- |
For the 8KHz models we got the following metrics:
| BatchSize | Device | RTF | RTS | | --------- | ------------- | ----- | ----- | | 1 | CPU 1 thread | 0.5 | 1.9 | | 1 | CPU 2 threads | 0.3 | 3.0 | | 1 | CPU 4 threads | 0.2 | 4.2 | | 4 | CPU 1 thread | 0.4 | 2.8 | | 4 | CPU 1 threads | 0.2 | 4.4 | | 4 | CPU 4 threads | 0.1 | 6.6 | | --- | ----------- | --- | --- | | 1 | GPU | 0.06 | 17.5 | | 4 | GPU | 0.02 | 55.0 | | 8 | GPU | 0.01 | 92.1 | | 16 | GPU | 0.007 | 147.7 | | 32 | GPU | 0.004 | 227.5 | | --- | ----------- | --- | --- |
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Proposal
Consider giving a go to Silero TTS models. These are published under an open license assuming non-commercial / personal usage. Please see our TTS models here - https://github.com/snakers4/silero-models#text-to-speech (corresponding article https://habr.com/ru/post/549482/).
What is most important our TTS models can run on one CPU thread / core decently and depend mostly only on PyTorch.
Just let me repost some of the benchmarks here:
RTF (Real Time Factor) - time the synthesis takes divided by audio duration;
RTS = 1 / RTF (Real Time Speed) - how much the synthesis is "faster" than realtime;
We benchmarked the models on two devices using Pytorch 1.8 utils:
CPU - Intel i7-6800K CPU @ 3.40GHz;
GPU - 1080 Ti;
When measuring CPU performance, we also limited the number of threads used;
For the 16KHz models we got the following metrics:
For the 8KHz models we got the following metrics:
The text was updated successfully, but these errors were encountered: