You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not all that sure about silero-vad as the Number Detector and Language Classifier sort of make it a bit 'fat' for just VAD.
Maybe there are simpler and easier ways to chunk spoken audio to fit beam search lengths of incoming realtime audio?
Z-yq haven't looked much but likely a simpler lower parameter model than silero could be used.
Also I think farfield and BSS/Beamforming are likely wireless distributed arrays and ASR central due to the possible diversification of use zonal systems could use.
https://github.com/breizhn/DTLN is a pretty good filter but the dataset needs to be mixed with noise and processed by DTLN or any filter so artefacts are trained in. https://github.com/Rikorose/DeepFilterNet is truly outstanding but more load and a shame the Ladspa plugin uses Tract as a ML framework as its single thread only.
No description provided.
The text was updated successfully, but these errors were encountered: