feat: jump based on "Voice Activity Detection"/"Speech Recognition" #164

nvxos · 2023-07-20T07:44:31Z

I know this is no small feat, but a feature that would be insanely good in my opinion is for the extension to be able to jump/cut based on VAD (Voice Activity Detection) or Speech Recognition.

I tried doing some research on the matter, mainly to find an editor that would be able to cut parts of a video that doesn't contain speech, and for example there's the new paid jumpcutter (gui, is in beta and has a trial) from carykh (jumpcutter.com) that now can jump/cut using VAD, but it's a bit slow and lacking and you can't use it in CLI, which is what I'm mainly looking for. There's also cloud-based services like wisecut.video but it's not suitable for my use case being priced/limited in video time/size/etc.

And it's while doing this research that I found this extension, that I found actually pretty useful for different use cases than what I was looking for (I have a lot of media files which I would like to trim the non-speech parts, but I also consume quite a bit of content online and I'm glad to have found this extension for this)

So having not found anything that could do what I wanted I'm now looking into maybe coding myself a script to do it.
And so I thought that maybe I could share the resources I've found to this point to help implement this in this extension if this ever gets implemented, which I think would be such a huge and useful feature.
Sadly everything I've found is mainly in python so not sure how well it could apply to this project.

But here's what I've got so far:

https://archive.is/20220527092223/https://towardsdatascience.com/automatic-video-editing-using-python-324e5efd7eba
https://archive.is/S6a4V
https://wandb.ai/yvrjsharma/posts/reports/Video-Editing-Using-Automatic-Speech-Recognition---VmlldzoyMTY4OTQy
https://realpython.com/python-speech-recognition/
https://thegradient.pub/one-voice-detector-to-rule-them-all/

https://github.com/openai/whisper
https://github.com/snakers4/silero-vad
https://github.com/wiseman/py-webrtcvad
https://github.com/Picovoice/cobra
https://github.com/alibaba-damo-academy/FunASR

Edit:
Adding some links which seem more suitable for this extension:
https://github.com/ccoreilly/vosk-browser
https://github.com/wiseman/py-webrtcvad

(Silero VAD seems to be the best model to use)

WofWca · 2023-07-20T13:30:06Z

I didn't read through yet, but take a look at #46 for now.

nvxos · 2023-07-20T16:57:36Z

Oh yeah sorry, I didn't even think about searching this in the existing issues because of how rare it seems to be for "loudness-based" softwares to have this kind of feature. That's cool.
Reading through the issue you linked, I guess my post doesn't bring much to the table, feel free to close it if you think it's just adding a duplicate to this subject.

WofWca · 2023-07-21T16:03:30Z

Thanks a lot for the links! I guess I'll close this as a duplicate, and let's continue the discussion there.

FYI the extension is modular enough in this regard, so if you have an algorithm, it shouldn't be hard to integrate it into the extension. Here's the responsible part.

Duplicate of #46.

WofWca closed this as not planned Won't fix, can't repro, duplicate, stale Jul 21, 2023

WofWca mentioned this issue May 6, 2024

Better Voice Activity Detection (VAD) (volume threshold) algorithm? #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: jump based on "Voice Activity Detection"/"Speech Recognition" #164

feat: jump based on "Voice Activity Detection"/"Speech Recognition" #164

nvxos commented Jul 20, 2023 •

edited

WofWca commented Jul 20, 2023

nvxos commented Jul 20, 2023 •

edited

WofWca commented Jul 21, 2023

feat: jump based on "Voice Activity Detection"/"Speech Recognition" #164

feat: jump based on "Voice Activity Detection"/"Speech Recognition" #164

Comments

nvxos commented Jul 20, 2023 • edited

WofWca commented Jul 20, 2023

nvxos commented Jul 20, 2023 • edited

WofWca commented Jul 21, 2023

nvxos commented Jul 20, 2023 •

edited

nvxos commented Jul 20, 2023 •

edited