With more training data that covers more word instances, you have higher possibility to reduce the DSAT (dis-satisfied part of the speech, for example, the glitches) ratio for the voice. Here are some DSAT examples.
Audio | Script | DSAT description |
---|---|---|
https://nerualttswaves.blob.core.windows.net/dsat-samples/SarahFSL24K(skyman)_638077517539111683.wav | He leapt again, and the club caught him once more. | The audio of the bolded portion is distorted and this sentence should be in a descending tone. |
https://nerualttswaves.blob.core.windows.net/dsat-samples/SarahFSL24K(skyman)_638077518138794736.wav | Curly rushed her antagonist, who struck again and leaped aside. | The audio of the bolded portion is distorted. |
https://nerualttswaves.blob.core.windows.net/dsat-samples/SarahFSL24K(skyman)_638077519100474702.wav | You're joking me, sir, the other managed to articulate. | The audio of the bolded portion is distorted. |
https://nerualttswaves.blob.core.windows.net/dsat-samples/SarahFSL24K(skyman)_638077518658058847.wav | You've got a farmin' partner here at r.a.b. farmers coop. | The audio of the bolded portion is distorted. |