-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question:Voice activated AGC #191
Comments
I assume you mean automatic gain control? I don't have a good solution for this. We have a decent SNR estimate though, so you could use this as a base and start from there. |
Yeah Automatic Gain Control but yeah I guess (avg SNR * sample magnitude) could be used as that is what is lacking in fact that is probably a more accurate VAD than many VAD algs. If I remember rightly you have an attack time that drops a signal pretty quickly if too loud but does it in steps on the samples of the attack envelope to stop that 'clipping/distortion' effect just cutting the amplitude would create. Where it all goes wrong from Speex AGC which does give some details but never found it great as it doesn't co-ordinate purely to voice and with (avg SNR * sample magnitude) really fixes the main problem? The AGC is very simple the problem has always been a VAD metric and its load which for most parts is already done, but maybe add a dynamic hold thats ends >200ms (intonation pause setting) of no voice detected. |
I currently don't have time to integrate AGC here, but I would welcome any patches that utilize the SNR estimate. |
Unfortnately a Rust noob maybe another time or someone else will as much of the process load is already done and would be great to use it. |
@Rikorose Hendrik if you do get any time or someone else does it would also be great if there was a param so that the binary would take audio from stdin and output on stdout so you can also create simple pipes. |
There is an interesting ultra low load dagc with Rust code. Have no idea what its like over previous AGC the claims seem really good. |
@Rikorose I was wondering as I can never find a good AGC that works for speech especially with noise and the attack & delay never works right.
I am assuming that as well as sample magnitude the model knows what remains should be voice?
At that point is it possible to embed a more voice tailored AGC?
I guess maybe just have a AGC after but again they often seem to have poor parameters and may work better knowing history and a certain amount of look ahead?
I have never really found a good AGC solution for a voice stream and just wondered what you think?
The text was updated successfully, but these errors were encountered: