Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question:Voice activated AGC #191

Closed
StuartIanNaylor opened this issue Nov 24, 2022 · 6 comments · May be fixed by #213
Closed

Question:Voice activated AGC #191

StuartIanNaylor opened this issue Nov 24, 2022 · 6 comments · May be fixed by #213
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@StuartIanNaylor
Copy link

@Rikorose I was wondering as I can never find a good AGC that works for speech especially with noise and the attack & delay never works right.
I am assuming that as well as sample magnitude the model knows what remains should be voice?
At that point is it possible to embed a more voice tailored AGC?
I guess maybe just have a AGC after but again they often seem to have poor parameters and may work better knowing history and a certain amount of look ahead?

I have never really found a good AGC solution for a voice stream and just wondered what you think?

@Rikorose
Copy link
Owner

I assume you mean automatic gain control? I don't have a good solution for this. We have a decent SNR estimate though, so you could use this as a base and start from there.

@StuartIanNaylor
Copy link
Author

StuartIanNaylor commented Nov 24, 2022

Yeah Automatic Gain Control but yeah I guess (avg SNR * sample magnitude) could be used as that is what is lacking in fact that is probably a more accurate VAD than many VAD algs.
Maybe something as simple as https://www.allaboutcircuits.com/technical-articles/adaptive-gain-control-with-the-least-mean-squares-algorithm/
Eeven more complex are still relatively simple once you have that esitmated VAD level.
https://www.ti.com/lit/wp/spraal1/spraal1.pdf

If I remember rightly you have an attack time that drops a signal pretty quickly if too loud but does it in steps on the samples of the attack envelope to stop that 'clipping/distortion' effect just cutting the amplitude would create.
Delay is just the inverse but generally a longer and slower increase in amplitude if too low.
Sometimes there is a hold which just holds the current AGC level for a set length after attack to halt the delay.

Where it all goes wrong from Speex AGC which does give some details but never found it great as it doesn't co-ordinate purely to voice and with (avg SNR * sample magnitude) really fixes the main problem?

The AGC is very simple the problem has always been a VAD metric and its load which for most parts is already done, but maybe add a dynamic hold thats ends >200ms (intonation pause setting) of no voice detected.
Having it co-ordinated with voice means AGC is fixed and maybe it could be expanded upon using longer term historical data but a simple AGC based on voice would be a huge improvement with the load of implementing as secondary ML based AGC.
PS a fire once on every dynamic hold thats ends >200ms (intonation pause setting) of no voice detected would also be super if say that info was on dbus with the LADSPA plugin as if the load is already being done it would be great to be able to share the info?

@Rikorose
Copy link
Owner

I currently don't have time to integrate AGC here, but I would welcome any patches that utilize the SNR estimate.

@StuartIanNaylor
Copy link
Author

Unfortnately a Rust noob maybe another time or someone else will as much of the process load is already done and would be great to use it.

@Rikorose Rikorose added enhancement New feature or request help wanted Extra attention is needed labels Dec 1, 2022
@StuartIanNaylor
Copy link
Author

@Rikorose Hendrik if you do get any time or someone else does it would also be great if there was a param so that the binary would take audio from stdin and output on stdout so you can also create simple pipes.

@StuartIanNaylor
Copy link
Author

There is an interesting ultra low load dagc with Rust code.
https://github.com/sile/dagc
With paper https://hal.univ-lorraine.fr/hal-01397371/document

Have no idea what its like over previous AGC the claims seem really good.
It has the 'Freeze gain' implementation which would likely make a perfect pairing for a post voice AGC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants