Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Speech-To-Text Service for other Apps #100

Closed
wants to merge 3 commits into from

Conversation

nebkrid
Copy link
Contributor

@nebkrid nebkrid commented Nov 20, 2022

Implemented export of Speech-To-Text functionality for other Apps, which can call this by "startActivityForResult" with an "Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)"

Extra "RecognizerIntent.EXTRA_PROMPT" is implemented

@Stypox
Copy link
Owner

Stypox commented Nov 21, 2022

Thank you!

  • There are some checkstyle issues, Android Studio should have reported the errors to you when building the app in theory.
  • I don't think you needed to create a skill, since skills interpret user sentences and react by providing some interesting output. What you are doing here is instead just recognizing speech. But I get it that it was simpler to connect with the already-implemented Vosk STT this way. For now this is ok, but once everything works well this part should be moved to a separate activity.
  • Were you able to test whether these changes work? Can you suggest a specific app or keyboard that would allow the RecognizerIntent.ACTION_RECOGNIZE_SPEECH intent to be tested?

@nebkrid
Copy link
Contributor Author

nebkrid commented Nov 23, 2022

Thanks for your feedback.

  • checkstyle: indeed I turned it off, because I had the same issue with the unchanged code, as described here When I checked the config the SuppressionSingleFilter was already added (of course, in this moment I realised who asked the question ;) ) Well seems, that the solution is not generally working. Do you may have news which are not posted there? Manually on and off turning would be really annoying...
  • Yes I agree that this should be moved to a seperate activity in future. Then it may be even possible to avoid that the voice model has to be loaded every time anew.
  • Test apps: Yes and no (at first): I first tested it with my own app, which was the trigger why I was looking for this feature. It already worked with the google speech recognition, and since it was working with the changes I thought it will work universally. However, when you asked for an example I tried with some more apps - and the result was mixed. I now understand that there are two result mechanisms which need to be served and implemented the second one. With the changes from today it is also working with other apps, I tested with openhab (but I don't know whether you can test it without an openhab server running) and with automate (using a "App decision?" Block with "android.speech.action.RECOGNIZE_SPEECH" action - something like this should be also a solution for tasker without explicitly having a tasker plugin available, I guess).
    Generally there are more extras which can be served to the intent. I think I will implement the easy ones (where I can expect that they will work as described) within the next days, too. At least, currently there shouldn't be a braking one any more

@nebkrid
Copy link
Contributor Author

nebkrid commented Dec 8, 2022

Hi,
I manually reactived checkstyle and made the changes (there were still 3 left which I couldn't solve, don't know whether they even show up in your configuration).
Additionally, as working app examples: Google Maps and Ebay are using the android.speech.action.RECOGNIZE_SPEECH in their search field. Dicio is working with both, too. (When first requested, the android systems shows a popup to choose between Google speech input and dicio, as for other standard apps.)

PS: I left two "TODO" annotations. They are not really relevant, as all the apps I tested are not using these extra parameters (and even if they would be used, they are only an additional help but not required to use from the speech recognition). However, they are designed by android to be available, so if there is a possibility to pass this extras to the vosk speech engine, it may be helpful to add it. Therefore, I left them as a reminder so that this does not have to researched again. If this disturbs the code, I can remove them.

@Stypox
Copy link
Owner

Stypox commented Dec 13, 2022

@nebkrid thank you for the research! I really appreciated it. I opened #109 based on your implementation, but instead of creating a skill like you did, I created a separate stt activity that can popup on top of apps. I would like you to take a look at #109 and tell me whether it's fine, if you have some time. Thanks :-)

@nebkrid
Copy link
Contributor Author

nebkrid commented Dec 13, 2022

Thanks for your feedback, indeed the separate activity is much more beautiful. Luckily I already had time today to look on it. I added two small things - and if I correctly figured out how to work together with github, these should now pop up in your #109 (otherwise please let me know how to best merge this :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants