-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System wide STT service #161
base: master
Are you sure you want to change the base?
Conversation
…so that dicio / vosk is registered in system as speech recognition service which can be queried by other apps without any dicio UI. - splitted VoskInputDevice.java in 3 parts: The dicio recognition service SttService.java using vosk, the SpeechRecogServiceInputDevice.java as a more generalized Input for Dicio and the VoskInputDevice.java which handles downloading of vosk models - added preference option to use system provided stt service for dicio instead of vosk
- Bugfix: Load new model when language changed - Bugfix: Breakdown when no model is downloaded - Implemented error message notifications for analyzing errors when in background - Audio Permission requirement in manifest declaration of the STT service removed, since it may cause breakdowns in calling app instead of reporting ERROR_INSUFFICIENT_PERMISSION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code structure mostly looks food to me. Thank you again for looking into this!
Before merging this we will have to make sure users understand the difference between these things:
- STT service available silently to other apps and possibly also usable by Dicio to provide instant availability (implemented in this PR)
- built-in Vosk STT run in the Dicio app process
- STT service accessible to users from the drawer to e.g. take dictation, and also usable by other apps in a non-silent way
Continuing on the discussion from: #151 (comment) . I agree with you that at the moment it is best to keep Vosk integrated directly in Dicio. However, I think it would be a good idea to create a separate gradle module (named e.g. vosk-stt-service
) so that it is basically developed as a standalone project (this is also what e.g. Sapphire does). The app
project will then depend directly on vosk-stt-service
, and so will still be provided without the need to install two APKs. What do you think about this? Would it be too complicated to do at this point?
app/src/main/java/org/stypox/dicio/input/SpeechRecogServiceInputDevice.java
Outdated
Show resolved
Hide resolved
…urity in order to notify user that speech input is started from background).
Yes, would be the readme file a good place for it? Additionally a button in the dicio settings as a shortcut to android settings menu for stt would be useful. But I am not sure whether it is in all android version the same place. In Android 10 it seems to be within the assistent settings, but on Android 13 I couldn't find the settings menu at all (or didn't showed up still an open TODO - see below). The Intent (= drawer = non-silent) way of requesting speech input should be intuitive as it shows up when dicio is first time installed like other standard apps.
I didn't looked into Sapphire yet, and therefore I am not sure whether it is the same what I guess it is. If it is something like a plugin so that the apps are related with each other this sounds generally like a very good approach. However, I just noticed when I was looking for some documentation about vosk in order to enable more RecognizerIntent extras that there is acutally a stand-alone stt-service project by the vosk-developers. In the first moment I doubted whether it is actually useful at all to spend more time than necessary in this branch. However, since the other project is not easily available (neither in f-droid nor play store and the latest apk release fails installing) and it seems that it will at least take its time until it will be, I think at least for the moment its definitly useful to let dicio export its vosk implementation. Especially with having the initialization speed up for dicio in mind. But spending too much effort in "reinventing the wheel" and make a completey standalone app doesn't make sense any more in my eyes. At least as long as the further development of the vosk-stand-alone-app is not given up (or other features missing, I didn't tried it yet). How do you think? |
Current state of this PR:
Extended dicio features
Limitations
|
@nebkrid could you provide an apk ? I would like to test your PR as a user. |
@lman0 I just compiled it to test it, here is an apk: app-debug.zip
It persists on regular app updates, as long as you don't change the names of the pertaining classes. |
Here is my use result : It don't show as voice ime for other keyboard (tested with openboard , florisboard , each found in fdroid and with aosp keyboard as well) is that normal? The microphone remain activated all the time . |
@cvzi thank young for compiling and your hints and answers! @lman0 thank you for testing!
What this PR implements is not explicitly an IME but registering as an speech recognition provider which than can be used by all other apps (not just edit text fields) and also e.g. from IMEs. At least florisboard (linkToCode) searchs explicitly for an IME which supports voice (and I guess the others are doing it the same way). Can be discussed to support explicitly IME requests, too, but I think it would be better (and easier) for a keyboard app to request the system STT service (which then can be set to vosk/dicio) than a speech assistant app implementing all the code and requirements to serve as a keyboard.
Yes, you are right. The microphone should not be occupied all the time. Though, it does not happen on my device, I have an idea what it might be. May you can check how this apk behaves? |
I think that ,since even the aosp keyboard search for an voice ime, all keyboard will share the same trends and seek an ime voice recognition. If you know some (except dicio) , i would be inteested to know some app name. About tbe new app you have given : First , I maybe said a word too strong, aka 'takeover' . It 's The same , the micro notification remain shown all the time, until dicio is removed and the device rebooted. Is it the same with google? |
There is kõnele app ,found on fdroid, that do ime backed by kõnele-service that call an online SST. |
@nebkrid if kõnele app is installed , and dicio is the only one selected as stt service inside kõnele setting Interestingly the microphone notifications stop showing when kõnele app stop the call of the stt service. (But the recognition still work , if a speech to text is done again) Otherwise, if I try ,inside kõnele setting, to call dicio setting it say that dicio block the intent. The combo kõnele app with dicio stt, allow to have microphone button usable with other ime. |
PS , if you search ,inside fdroid app, the kõnele app you must type kõnele , with the õ otherwise you will not found it |
In case this PR will become a standalone app as dicussed above I do agree that this would be better for compatibility. However, for the moment especially with konele as a working IME with STT (thank you for pointing to this app!) I would prefer to keep concentrated onto the STT. Additionally I still do believe it would be an even more userfriendly option for other keyboard apps to directly query the STT service, as this does not require to change the keyboard UI. That this is not done yet is probably caused that there are actually no other speech recognition services easily available than the google one, so that the STT service part is not well known or at least no benefit on first sight. But it might be an idea to start such a feature request issue in these apps.
Most apps use the intent approach which is already implemented in dicio, as this does not need microphone permission. But when I searched once in the play store for STT service apps I saw some dictation apps which at the end used the google background STT service. And since this function is generally available in android, I think more apps will come with time. Personally, I am interested in STT for automation (like in #154) .
You still could try this app-debug.zip which shows two types of Toast messages, when the microphone should be released to make sure that the methods are called on your device. But if the toasts are showing and since it seems only your device has this error (How is it acutally with the konele app as STT with its only recognition service on your device?), I don't know what could be the reason. |
Kõnele stt-service don't have this notifications always on. Even if I kill dicio the notification remain. But if use kõnele app/ime to call dicio stt service then the notification disappear once I do speech recognition and stop the kõnele listening. |
@nebkrid there is a bug with dicio, if I disable the WiFi/4g (aka offline), dicio can't evaluate the stt content that come from the sst (instead of the internal vosk)and show an 'network error'. |
Please doublecheck which STT service is set as default one in your system settings. Killing the dicio process causes setting this back to a different one. The network error means that a different STT than the dicio one is used, since the dicio STT never returns network error. (I guess in your setup konele STT via network is requested)
The both toasts "stop recognizer" and "shutdown" are showing up?
|
You are right with the bug: When I desinstalled both dicio and kõnele rebooted Then when I selected the SST source inside dicio as android stt (and closed to make sure it use android stt). It was tricky. @nebkrid it seem that the reason kõnele split in stt and ime was to not be impacted if the app was stopped. By the way, if I use internal vosk instead of external stt, the toast still show that the speeche recognizer was stopped then shutdown. |
@nebkrid Both toast show up when kõnele use dicio stt and stop listening. I think it may be linked at the fact that dicio start automatically to listen (use microphone) when started |
The Regnizer stop yes, the shutdown was something I only made for your issue with the microphone keeps showing and is not necessary for me to disable the system microphone symbol. (This must be gray after step 2 above. Otherwise the dicio STT process is still running in background without UI and reused as soon as UI loaded again.) |
When i use the audio recorder, microphone stop showing once i stop the recording. https://stackoverflow.com/questions/14252400/how-to-stop-recording-in-android By the way, when dicio is closed within app info , with vosk internal set before As expected, closing dicio , deselect dicio setting from stt selection , but that is already know |
I understood your issue and agree that this is annoying for you that it still seems to listen. However, if the test with the 4 steps described above does not help, I have no idea left what causes the microphone symbol keeps showing. Technically with this test there should be no difference left compared when konele finishs the dicio STT service usage compared when dicio uses it. And the toasts confirms that the methods for releasing and stopping the microphone are called. |
@nebkrid I understand ,it'ok, since it seem iam the only one that have this situation , and it ' s somewhat cosmetic for some . It's better to have other problem resolved first. And the more problematic , is the reset of the selection of dicio. For instance, I found that konele itself, like dicio, have also an internal/offline stt. |
But is this actually a real problem in daily use? To me it seems that this is only a problem while developing (because of the reinstalls).
Yes, this is because you started the background service when using konele with dicio. It seems that you are technically interested in this whole system and also tried a lot of comparing. Because you once wrote "test as a user", I don't know whether you have programming skills, but even without any I think it would be interesting for you, if you read some stuff about the android system and how apps interact. Just search for things like "android developer xxx" and you are pointed to the well explaining android developing resources. (e.g. this one, just don't be confused from the referred classes. Keep on reading, follow the linked classes and read their introductions and with time it will make sense ;) ). |
On daily use , in my case, I try to swipe up recent app that I don't use, and since the green icon is always up I try to close dicio whenever i can. I said 'test as user ', because people that usually respond to such pull are Dev, so I wanted to express that I'm not. And I compared with konele because it the app I use everyday and since I find your pull interesting. In tried to express what I found and what I searched in their issue . |
What's the hold up on getting this merged? |
@hexedsilicon sorry for my late reply. Haven't been on github for a while. |
Sorry for taking so long to actually take a look at this PR. In the meantime I did a complete refactor, so this PR is not applicable to the current code anymore. However, now the SttInputDevice is much simpler to interact with, so recreating this PR was simple: #227. Thank you everyone for your efforts! I will keep this PR open as a reminder to implement the other things this PR provides, namely the possibility to use the system STT service in Dicio, and the sound played when starting to listen. @nebkrid could you review #227? |
Great news that you now want to include this feature. I was actually using two apps in parallel within the last year - one from this PR as a stt service provider, and as a second one your current main branch with updated and new features. |
This is still in development, but the pull request is already opened for testing and reviewing purpose.