Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System wide STT service #161

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

System wide STT service #161

wants to merge 6 commits into from

Conversation

nebkrid
Copy link
Contributor

@nebkrid nebkrid commented Feb 1, 2023

  • Making the speech-to-text service available for system, so that other apps can request it without a dicio-popup via SpeechRecognizer if user selects dicio as standard STT service in system settings.
  • adding the possibility for dicio to use other STT service than vosk (according to system settings)

This is still in development, but the pull request is already opened for testing and reviewing purpose.

…so that dicio / vosk is registered in system as speech recognition service which can be queried by other apps without any dicio UI.

- splitted VoskInputDevice.java in 3 parts: The dicio recognition service SttService.java using vosk, the SpeechRecogServiceInputDevice.java as a more generalized Input for Dicio and the VoskInputDevice.java which handles downloading of vosk models
- added preference option to use system provided stt service for dicio instead of vosk
- Bugfix: Load new model when language changed
- Bugfix: Breakdown when no model is downloaded
- Implemented error message notifications for analyzing errors when in background
- Audio Permission requirement in manifest declaration of the STT service removed, since it may cause breakdowns in calling app instead of reporting ERROR_INSUFFICIENT_PERMISSION
Copy link
Owner

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code structure mostly looks food to me. Thank you again for looking into this!

Before merging this we will have to make sure users understand the difference between these things:

  • STT service available silently to other apps and possibly also usable by Dicio to provide instant availability (implemented in this PR)
  • built-in Vosk STT run in the Dicio app process
  • STT service accessible to users from the drawer to e.g. take dictation, and also usable by other apps in a non-silent way

Continuing on the discussion from: #151 (comment) . I agree with you that at the moment it is best to keep Vosk integrated directly in Dicio. However, I think it would be a good idea to create a separate gradle module (named e.g. vosk-stt-service) so that it is basically developed as a standalone project (this is also what e.g. Sapphire does). The app project will then depend directly on vosk-stt-service, and so will still be provided without the need to install two APKs. What do you think about this? Would it be too complicated to do at this point?

…urity in order to notify user that speech input is started from background).
@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 7, 2023

Before merging this we will have to make sure users understand the difference between these things

Yes, would be the readme file a good place for it? Additionally a button in the dicio settings as a shortcut to android settings menu for stt would be useful. But I am not sure whether it is in all android version the same place. In Android 10 it seems to be within the assistent settings, but on Android 13 I couldn't find the settings menu at all (or didn't showed up still an open TODO - see below). The Intent (= drawer = non-silent) way of requesting speech input should be intuitive as it shows up when dicio is first time installed like other standard apps.

... separate gradle module... The app project will then depend directly on vosk-stt-service, and so will still be provided without the need to install two APKs. What do you think about this? Would it be too complicated to do at this point?

I didn't looked into Sapphire yet, and therefore I am not sure whether it is the same what I guess it is. If it is something like a plugin so that the apps are related with each other this sounds generally like a very good approach. However, I just noticed when I was looking for some documentation about vosk in order to enable more RecognizerIntent extras that there is acutally a stand-alone stt-service project by the vosk-developers. In the first moment I doubted whether it is actually useful at all to spend more time than necessary in this branch. However, since the other project is not easily available (neither in f-droid nor play store and the latest apk release fails installing) and it seems that it will at least take its time until it will be, I think at least for the moment its definitly useful to let dicio export its vosk implementation. Especially with having the initialization speed up for dicio in mind. But spending too much effort in "reinventing the wheel" and make a completey standalone app doesn't make sense any more in my eyes. At least as long as the further development of the vosk-stand-alone-app is not given up (or other features missing, I didn't tried it yet). How do you think?

@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 7, 2023

Current state of this PR:
Implemented Features in the STT service:

  • exports vosk speech recognition as an STT service to the system (need to be tested on other devices, whether available for all android versions / devices)
  • Handles changing language in dicio settings
  • Sound / Toast when STT starts (Individually settable for each app in preferences)
  • Handles RecognizerIntent.EXTRANMAXNRESULT

Extended dicio features

  • Speed up voice recognition initialization since model is kept loaded in background even if dicio is closed
  • sound notification preference for dicio
  • dicio preference for using the system STT service instead vosk

Limitations

  • only one language supported (as chosen in dicio settings)
  • Even if a requesting app explicitly requests a different language, keeps listening on installed language (but may be interpreted as a feature instead a bug ;) )
  • No advanced Recognizer.Intent extras
  • only main microphone is used, even if Bluetooth earpods with microphones are connected (device specific? no change to dicio behaviour as it was without service)
  • reinstallation via android studio resets system setting to default STT service (don't know whether this will happen in case of normal updates, too. May also be device/android version specific. Seems to happen when the dicio process is killed by user / an update.) link
  • To keep the initialization speed-up for 3rd party app STT requests, it may be necessary to open dicio UI at least once after each device reboot (in order to initiate startService-call to avoid service destroying when system speech recognizer unbounds)

@lman0
Copy link

lman0 commented Feb 7, 2023

@nebkrid could you provide an apk ? I would like to test your PR as a user.
Thanks

@cvzi
Copy link

cvzi commented Feb 7, 2023

@lman0 I just compiled it to test it, here is an apk: app-debug.zip


  • reinstallation via android studio resets system setting to default STT service (don't know whether this will happen in case of normal updates, too. May also be device/android version specific. Seems to happen when the dicio process is killed by user / an update.)

It persists on regular app updates, as long as you don't change the names of the pertaining classes.

@lman0
Copy link

lman0 commented Feb 7, 2023

Here is my use result :
@nebkrid

It don't show as voice ime for other keyboard (tested with openboard , florisboard , each found in fdroid and with aosp keyboard as well) is that normal?
I need to tell , I use French language.
And since I use an aosp android 12 , I don't have google play voice recognition.
Only dicio show as voice recognition, in the setting but I don't know how to use it.

The microphone remain activated all the time .
Unless i'm wrong stt should not takeover the microphone all the time but only when it called by apps (like dicio, keyboard , ...),right?

@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 9, 2023

@cvzi thank young for compiling and your hints and answers!

@lman0 thank you for testing!

don't show as voice ime

What this PR implements is not explicitly an IME but registering as an speech recognition provider which than can be used by all other apps (not just edit text fields) and also e.g. from IMEs. At least florisboard (linkToCode) searchs explicitly for an IME which supports voice (and I guess the others are doing it the same way). Can be discussed to support explicitly IME requests, too, but I think it would be better (and easier) for a keyboard app to request the system STT service (which then can be set to vosk/dicio) than a speech assistant app implementing all the code and requirements to serve as a keyboard.

Unless i'm wrong stt should not takeover the microphone all the time but only when it called by apps (like dicio, keyboard , ...),right?

Yes, you are right. The microphone should not be occupied all the time. Though, it does not happen on my device, I have an idea what it might be. May you can check how this apk behaves?

@lman0
Copy link

lman0 commented Feb 9, 2023

I think that ,since even the aosp keyboard search for an voice ime, all keyboard will share the same trends and seek an ime voice recognition.
So it's better to create a be able to respond to such requeest since it basicaly the 'norm' way.
Otherwise iy will make the stt service , mostly unusable by keyboard.
And beside keyboard, there is to my knowledge , very few apps that use directly the sst service .

If you know some (except dicio) , i would be inteested to know some app name.

About tbe new app you have given :

First , I maybe said a word too strong, aka 'takeover' .
to be precise the microphone notifications is always shown
Since dico have been put as stt service but it still possible to use audio recorder (the audio recorder take the microphone just fine, an record properly audio)

It 's The same , the micro notification remain shown all the time, until dicio is removed and the device rebooted.

Is it the same with google?

@lman0
Copy link

lman0 commented Feb 10, 2023

There is kõnele app ,found on fdroid, that do ime backed by kõnele-service that call an online SST.
Maybe you could see how it work

@lman0
Copy link

lman0 commented Feb 10, 2023

@nebkrid if kõnele app is installed , and dicio is the only one selected as stt service inside kõnele setting
, then it seem to work
when calling the microphone button inside other ime (kõnele IME must be activated in android setting) .

Interestingly the microphone notifications stop showing when kõnele app stop the call of the stt service. (But the recognition still work , if a speech to text is done again)

Otherwise, if I try ,inside kõnele setting, to call dicio setting it say that dicio block the intent.
(It 's about a permission denial , when clicking on dicio inside recognition service setting of the kõnele ime app).
Maybe something to improve.

The combo kõnele app with dicio stt, allow to have microphone button usable with other ime.

@lman0
Copy link

lman0 commented Feb 10, 2023

PS , if you search ,inside fdroid app, the kõnele app you must type kõnele , with the õ otherwise you will not found it

@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 10, 2023

... So it's better to create a be able to respond to such request since it basicaly the 'norm' way. ...
...The combo kõnele app with dicio stt, allow to have microphone button usable with other ime....

In case this PR will become a standalone app as dicussed above I do agree that this would be better for compatibility. However, for the moment especially with konele as a working IME with STT (thank you for pointing to this app!) I would prefer to keep concentrated onto the STT. Additionally I still do believe it would be an even more userfriendly option for other keyboard apps to directly query the STT service, as this does not require to change the keyboard UI. That this is not done yet is probably caused that there are actually no other speech recognition services easily available than the google one, so that the STT service part is not well known or at least no benefit on first sight. But it might be an idea to start such a feature request issue in these apps.

And beside keyboard, there is to my knowledge , very few apps that use directly the sst service .

Most apps use the intent approach which is already implemented in dicio, as this does not need microphone permission. But when I searched once in the play store for STT service apps I saw some dictation apps which at the end used the google background STT service. And since this function is generally available in android, I think more apps will come with time. Personally, I am interested in STT for automation (like in #154) .

First , I maybe said a word too strong, aka 'takeover' .
to be precise the microphone notifications is always shown
Since dico have been put as stt service but it still possible to use audio recorder (the audio recorder take the microphone just fine, an record properly audio)

You still could try this app-debug.zip which shows two types of Toast messages, when the microphone should be released to make sure that the methods are called on your device. But if the toasts are showing and since it seems only your device has this error (How is it acutally with the konele app as STT with its only recognition service on your device?), I don't know what could be the reason.

@lman0
Copy link

lman0 commented Feb 11, 2023

Kõnele stt-service don't have this notifications always on.

Even if I kill dicio the notification remain.

But if use kõnele app/ime to call dicio stt service then the notification disappear once I do speech recognition and stop the kõnele listening.

@lman0
Copy link

lman0 commented Feb 11, 2023

@nebkrid there is a bug with dicio, if I disable the WiFi/4g (aka offline), dicio can't evaluate the stt content that come from the sst (instead of the internal vosk)and show an 'network error'.
But kõnele with dicio stt work.
Dicio with internal vosk work with no error.
Even with all skill related to network disabled

@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 11, 2023

if I disable the WiFi/4g (aka offline), dicio can't evaluate the stt content that come from the sst (instead of the internal vosk)and show an 'network error'

Please doublecheck which STT service is set as default one in your system settings. Killing the dicio process causes setting this back to a different one. The network error means that a different STT than the dicio one is used, since the dicio STT never returns network error. (I guess in your setup konele STT via network is requested)

But if use kõnele app/ime to call dicio stt service then the notification disappear once I do speech recognition and stop the kõnele listening.

The both toasts "stop recognizer" and "shutdown" are showing up?
Please try additionally the following:

  1. in dicio setting -> input and output -> input method -> choose Sytem provided STT service
  2. force killing the dicio process in the device settings screen for dicio (the one where you also could deinstall the app) so that microphone notification is gone.
  3. check that dicio is set as default STT service in system settings
  4. open dicio app and look whether microphone notification error is gone

@lman0
Copy link

lman0 commented Feb 11, 2023

You are right with the bug:
Indeed since I had installed kõnele before dicio, when the dicio speech was stopped , it return to default to kõnele for stt.

When I desinstalled both dicio and kõnele rebooted
Then installed dicio
Rebooted
Then installed kõnele .
And made sure it's dicio on both (stt and ime)

Then when I selected the SST source inside dicio as android stt (and closed to make sure it use android stt).
When offline dicio worked correctly.

It was tricky.

@nebkrid it seem that the reason kõnele split in stt and ime was to not be impacted if the app was stopped.

By the way, if I use internal vosk instead of external stt, the toast still show that the speeche recognizer was stopped then shutdown.
Is that normal?

@lman0
Copy link

lman0 commented Feb 11, 2023

@nebkrid Both toast show up when kõnele use dicio stt and stop listening.
And it's not a notifications 'error' , it show near battery icon , a microphone icon that show an app use the microphone .
It disturbing because it feel like dicio listen all the time even though the stt should not.
The icon stop showing when kõnele do the call and dicio was stopped.
The icon still show if dicio is started then stopped.

I think it may be linked at the fact that dicio start automatically to listen (use microphone) when started
But it may not say to the system that dicio no longer use the microphone when stopping the speech recognition either internal or not .
I wonder if this would still be the same if dicio stt were to be splited in another package (with another package name)

@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 11, 2023

By the way, if I use internal vosk instead of external stt, the toast still show that the speeche recognizer was stopped then shutdown.
Is that normal?

The Regnizer stop yes, the shutdown was something I only made for your issue with the microphone keeps showing and is not necessary for me to disable the system microphone symbol.
Since they show up, everything regarding the microphone is released. When set to internal vosk, the process is kept running (with released microphone but loaded speech model) in order to improve loading speed, whereas when dicio set to Android STT it is not specified whether it will keep running and at least on my device it is stopped very soon after dicio is closed. For your tests it is important when changing from internal vosk to Android STT to make sure that dicio process is killed in the settings (not just the UI "swiped away" because otherwise technically there is no difference to internal vosk, if it was started once with it (which always happens when it is freshly installed).
grafik

(This must be gray after step 2 above. Otherwise the dicio STT process is still running in background without UI and reused as soon as UI loaded again.)

@lman0
Copy link

lman0 commented Feb 11, 2023

@Nekrid

When i use the audio recorder, microphone stop showing once i stop the recording.
But in dicio case , it's not stopping showing.
I think that the microphone is not released or stoped once dicio app have finished the record of audio necessary for speech recognition.

https://stackoverflow.com/questions/14252400/how-to-stop-recording-in-android

By the way, when dicio is closed within app info , with vosk internal set before
And I restart dicio
The toasts remain when using vosk internal.

As expected, closing dicio , deselect dicio setting from stt selection , but that is already know

@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 11, 2023

When i use the audio recorder, microphone stop showing once i stop the recording.
But in dicio case , it's not stopping showing.

I understood your issue and agree that this is annoying for you that it still seems to listen. However, if the test with the 4 steps described above does not help, I have no idea left what causes the microphone symbol keeps showing. Technically with this test there should be no difference left compared when konele finishs the dicio STT service usage compared when dicio uses it. And the toasts confirms that the methods for releasing and stopping the microphone are called.
Therefore, I am very sorry, but I have really no idea left what is different on your phone. (I tested it on three devices, and neither cvzi nor stypox mentioned this yet). Maybe a log output would help if it shows any specific errors, but I don't know whether this is possible from a compiled apk (@Stypox is this possible?)

@lman0
Copy link

lman0 commented Feb 12, 2023

@nebkrid I understand ,it'ok, since it seem iam the only one that have this situation , and it ' s somewhat cosmetic for some .

It's better to have other problem resolved first.

And the more problematic , is the reset of the selection of dicio.
This problem don't occurs if dicio is not stopped.

For instance,
When I start my phone, without starting dicio
And I use kõnele to use dicio , it work without any problem.
And if check app info of dicio , I discovered that dicio have been started silently
(there is no GUI in the list of app).
More over , in my case, the microphone icon is not here even though dicio have been started silently.

I found that konele itself, like dicio, have also an internal/offline stt.
Maybe checking within konele source code would help since it seem not have dicio problems of reset.

@nebkrid
Copy link
Contributor Author

nebkrid commented Feb 18, 2023

And the more problematic , is the reset of the selection of dicio.
This problem don't occurs if dicio is not stopped.

But is this actually a real problem in daily use? To me it seems that this is only a problem while developing (because of the reinstalls).

And if check app info of dicio , I discovered that dicio have been started silently
(there is no GUI in the list of app).

Yes, this is because you started the background service when using konele with dicio. It seems that you are technically interested in this whole system and also tried a lot of comparing. Because you once wrote "test as a user", I don't know whether you have programming skills, but even without any I think it would be interesting for you, if you read some stuff about the android system and how apps interact. Just search for things like "android developer xxx" and you are pointed to the well explaining android developing resources. (e.g. this one, just don't be confused from the referred classes. Keep on reading, follow the linked classes and read their introductions and with time it will make sense ;) ).

@lman0
Copy link

lman0 commented Feb 18, 2023

But is this actually a real problem in daily use? To me it seems that this is only a problem while developing (because of the reinstalls).

On daily use , in my case, I try to swipe up recent app that I don't use, and since the green icon is always up I try to close dicio whenever i can.
And when I update dicio, I don't want to have to check to make dicio the default again...

I said 'test as user ', because people that usually respond to such pull are Dev, so I wanted to express that I'm not.

And I compared with konele because it the app I use everyday and since I find your pull interesting.

In tried to express what I found and what I searched in their issue .
So you could check with them or in their source code, how they manage to do it well when it seem here it's still problematic.

@hexedsilicon
Copy link

What's the hold up on getting this merged?

@nebkrid
Copy link
Contributor Author

nebkrid commented Aug 19, 2023

@hexedsilicon sorry for my late reply. Haven't been on github for a while.
I am not quite sure. I stopped working on this when it was working from my perspective and the possibilites (devices) I have to test.
I guess (but don't know it) @Stypox want to have seperate modules for it, as he suggested in one comment above. I long wanted look into this but didn't had time / motivation yet due to the device dependent issues discussed above. Did you tried it out and can confirm that it is working on other devices than my one, too?
From my perspective it is working and I am using it regularly. Therefore it would be indeed great to have it implemented in the main branch, since even if this feature is not perfectly working on some devices, it does not change the behaviour of the existing app, if it is not activated. (At least it didn't based on the branch from which I created the PR.)

@Stypox
Copy link
Owner

Stypox commented Jul 26, 2024

Sorry for taking so long to actually take a look at this PR. In the meantime I did a complete refactor, so this PR is not applicable to the current code anymore. However, now the SttInputDevice is much simpler to interact with, so recreating this PR was simple: #227. Thank you everyone for your efforts! I will keep this PR open as a reminder to implement the other things this PR provides, namely the possibility to use the system STT service in Dicio, and the sound played when starting to listen. @nebkrid could you review #227?

@nebkrid
Copy link
Contributor Author

nebkrid commented Jul 27, 2024

Great news that you now want to include this feature. I was actually using two apps in parallel within the last year - one from this PR as a stt service provider, and as a second one your current main branch with updated and new features.
Since I am not used to Kotlin (and it is quite a while ago, so I don't know the detailed implentation requirements any more) I didn't reviewed your code in detail, but I tested your sample app from #227 with the app in which I regularly use the stt feature (on Android 13 device), and it is working! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants