Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: jump based on "Voice Activity Detection"/"Speech Recognition" #164

Closed
nvxos opened this issue Jul 20, 2023 · 3 comments
Closed

feat: jump based on "Voice Activity Detection"/"Speech Recognition" #164

nvxos opened this issue Jul 20, 2023 · 3 comments

Comments

@nvxos
Copy link

nvxos commented Jul 20, 2023

I know this is no small feat, but a feature that would be insanely good in my opinion is for the extension to be able to jump/cut based on VAD (Voice Activity Detection) or Speech Recognition.

I tried doing some research on the matter, mainly to find an editor that would be able to cut parts of a video that doesn't contain speech, and for example there's the new paid jumpcutter (gui, is in beta and has a trial) from carykh (jumpcutter.com) that now can jump/cut using VAD, but it's a bit slow and lacking and you can't use it in CLI, which is what I'm mainly looking for. There's also cloud-based services like wisecut.video but it's not suitable for my use case being priced/limited in video time/size/etc.

And it's while doing this research that I found this extension, that I found actually pretty useful for different use cases than what I was looking for (I have a lot of media files which I would like to trim the non-speech parts, but I also consume quite a bit of content online and I'm glad to have found this extension for this)

So having not found anything that could do what I wanted I'm now looking into maybe coding myself a script to do it.
And so I thought that maybe I could share the resources I've found to this point to help implement this in this extension if this ever gets implemented, which I think would be such a huge and useful feature.
Sadly everything I've found is mainly in python so not sure how well it could apply to this project.

But here's what I've got so far:

https://archive.is/20220527092223/https://towardsdatascience.com/automatic-video-editing-using-python-324e5efd7eba
https://archive.is/S6a4V
https://wandb.ai/yvrjsharma/posts/reports/Video-Editing-Using-Automatic-Speech-Recognition---VmlldzoyMTY4OTQy
https://realpython.com/python-speech-recognition/
https://thegradient.pub/one-voice-detector-to-rule-them-all/

https://github.com/openai/whisper
https://github.com/snakers4/silero-vad
https://github.com/wiseman/py-webrtcvad
https://github.com/Picovoice/cobra
https://github.com/alibaba-damo-academy/FunASR

Edit:
Adding some links which seem more suitable for this extension:
https://github.com/ccoreilly/vosk-browser
https://github.com/wiseman/py-webrtcvad

(Silero VAD seems to be the best model to use)

@WofWca
Copy link
Owner

WofWca commented Jul 20, 2023

I didn't read through yet, but take a look at #46 for now.

@nvxos
Copy link
Author

nvxos commented Jul 20, 2023

Oh yeah sorry, I didn't even think about searching this in the existing issues because of how rare it seems to be for "loudness-based" softwares to have this kind of feature. That's cool.
Reading through the issue you linked, I guess my post doesn't bring much to the table, feel free to close it if you think it's just adding a duplicate to this subject.

@WofWca
Copy link
Owner

WofWca commented Jul 21, 2023

Thanks a lot for the links! I guess I'll close this as a duplicate, and let's continue the discussion there.

FYI the extension is modular enough in this regard, so if you have an algorithm, it shouldn't be hard to integrate it into the extension. Here's the responsible part.

Duplicate of #46.

@WofWca WofWca closed this as not planned Won't fix, can't repro, duplicate, stale Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants