Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(suggestion) Ability to preface Gaussian Diffusion with a user-selectable acoustic model #198

Open
SouperDuper opened this issue Dec 10, 2023 · 0 comments

Comments

@SouperDuper
Copy link

SouperDuper commented Dec 10, 2023

Vaguely inspired by DiffSinger's Shallow Diffusion mechanism.

Gaussian Diffusion has a tendency to become "lost" at times, producing undesirable results.

The ability to (separately?) train and use a lower quality acoustic model before diffusion could potentially provide a scaffolding to improve model stability, with the trade-off of more processing overhead.

In addition to this, the suggestion is to allow users to select this acoustic model so they may weigh their pros and cons, as they may influence the final result.
(Such as FFConvLSTM → GaussianDiffusion compared to BiLSTMResF0NonAttentiveDecoder → GaussianDiffusion.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant