Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Wyoming API #4

Closed
ser opened this issue Feb 15, 2024 · 6 comments
Closed

Feature request: Wyoming API #4

ser opened this issue Feb 15, 2024 · 6 comments

Comments

@ser
Copy link

ser commented Feb 15, 2024

It would be cool if extension could communicate with local Faster Whisper via Wyoming protocol API:

https://github.com/rhasspy/wyoming-faster-whisper

The advantage is that voice recognition could work on cheap gnome clients with one more capable machine in the local network.

@QuantiusBenignus
Copy link
Owner

Hi @ser, thanks for the suggestion.
The idea is nice but it seems that this is somewhat of an edge user case.
Hardware like Raspberry PI and such do have GNOME but are often run headless, with no mouse, keyboard etc.
And while I have a mic hat on mine, they typically do not have a built-in microphone.
I looked at the protocol and while I understand why it is being used, I think there are other, leaner options for blasting audio data over a LAN to a (server or not) instance of [faster]whisper[.cpp] and then getting the text result back. Maybe RTP or other low-latency, lightweight approach.

Since the idea behind this very simple extension is to remain such, I would rather not add features that IMHO, will see limited use.
Still, I will keep this issue open for some time and put some thought into a possible lightweight solution.

Actually, for this use case, I would recommend starting from something like cliblurt which uses minimal resources (GUI is optional) and is not GNOME only (should work under XFCE4 for example.)

@ser
Copy link
Author

ser commented Feb 17, 2024

These are not only Pis, 90% of my computers, older PCs or laptops are unable to handle speech recognition in sensible time. So in other words, do you plan to add any API or you are decided to keep everything local?

BTW this local stack is very complex to be honest, making use of server-client architecture would simplify things a lot, even on the same machine.

@QuantiusBenignus
Copy link
Owner

Valid points. It may not be such an edge case after all.
I am going to create an option to choose between a local whisper.cpp and sending the audio data to a server for transcription.
This will be a call to a whisper.cpp server simply because the data-transfer format is simpler.

If you would like, you can then use that as a base to craft an appropriate "multipart/form-data" curl request to conform to the Wyoming protocol and call the referenced faster-whisper server.

Setting up this little hack will likely remain complex since it is not a monolithic app, but rather uses the built-in tools and flexibility of the Linux system. An installation script will help automate things a bit, will see.

@QuantiusBenignus
Copy link
Owner

Hi @ser, the extension can now be set up to transcribe over the network using a whisper.cpp server.
Please, see here for details.

Talking to a faster-whisper server should be possible to implement in a similar fashion.
With a lot more work, this can of course be all written in GJS to work from GNOME shell, but it will waste a lot more CPU cycles and memory. The command line shell remains unbeatable for speed and flexibility.

@ser
Copy link
Author

ser commented Feb 18, 2024

fantastic!!!!! i am investigating now how much resources would take whisper.cpp server additionally to current fast whisper.

@ser ser closed this as completed Feb 18, 2024
@ser
Copy link
Author

ser commented Feb 24, 2024

So finally I decided to write Wyoming server also using Whisper API to avoid necessity of having two STT services, https://github.com/ser/wyoming-whisper-api-client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants