Skip to content
This repository has been archived by the owner on Aug 30, 2020. It is now read-only.

Clarification on intent handling / Remote Server needed #217

Open
patrickjane opened this issue May 1, 2020 · 23 comments
Open

Clarification on intent handling / Remote Server needed #217

patrickjane opened this issue May 1, 2020 · 23 comments

Comments

@patrickjane
Copy link

I want to use rhasspy to build a voice assistant which (for now) contains the following functionality:

  • control lights via home assistant
  • control heater via home assistant
  • get train departure information (from the internet)
  • get weather info (from the internet)

As you can see, not all of those tasks are handled by home assistant. In fact, I have existing implementations (in python) for Snips.ai for all those tasks.
To reuse them, I am developing some kind of plugin-based skill server, where I can hook on my existing code from snips (with, of course, slight modifications).

Now, regarding rhasspy, I understand that it can have several endpoints for intent handling:

  • home assistant (not viable for me as I need to handle non-HA-related intents)
  • node-red (not compatible with my existing skill implementations)
  • remote server <<< this is what I am trying to use

From the documentation I understand that rhasspy will HTTP POST any intent which was detected to my server. This works. I can see the intent JSON coming in at my skill server. However, it is unclear to me what kind of HTTP response is expected. From the documentation I can see it must be JSON, however I fail to find a detailed description of how this JSON should look like.

If I return an empty JSON, rhasspy complains: TypeError: e.data.intent is undefined (I'll get this error as popup in the rhasspy browser, not in the rhasspy log files)
The log looks like:

May  1 11:32:56 calypso rhasspy[547]: DEBUG:InboxActor: -> stopped
May  1 11:32:56 calypso rhasspy[547]: DEBUG:__main__:{"intent": {"name": "GetTemperature", "confidence": 1.0}, "entities": [], "text": "wie warm ist es gerade", "raw_text": "wie warm ist es gerade", "recognize_seconds": 0.0030146099998091813, "tokens": ["wie", "warm", "ist", "es", "gerade"], "raw_tokens": ["wie", "warm", "ist", "es", "gerade"], "wav_seconds": 0.0, "transcribe_seconds": 0.0, "speech_confidence": 1, "slots": {}, "wakeId": "", "siteId": "default", "time_sec": 0.014754056930541992}
May  1 11:32:56 calypso rhasspy[547]: DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.60.40:8888
May  1 11:32:56 calypso rhasspy[547]: DEBUG:urllib3.connectionpool:http://10.0.60.40:8888 "POST /intents HTTP/1.1" 200 2
May  1 11:32:56 calypso rhasspy[547]: DEBUG:RemoteIntentHandler:{}
May  1 11:32:56 calypso rhasspy[547]: DEBUG:InboxActor: -> stopped
May  1 11:32:56 calypso rhasspy[547]: [2020-05-01 11:32:56,159] 10.0.60.40:61297 POST /api/text-to-intent 1.1 200 2 110961
May  1 11:32:56 calypso rhasspy[547]: INFO:quart.serving:10.0.60.40:61297 POST /api/text-to-intent 1.1 200 2 110961

If I return { "intent": {}}, rhasspy complains: TypeError: e.data.time_sec is undefined (I'll get this error as popup in the rhasspy browser, not in the rhasspy log files)
The log looks like:

May  1 11:35:38 calypso rhasspy[547]: DEBUG:InboxActor: -> stopped
May  1 11:35:38 calypso rhasspy[547]: DEBUG:__main__:{"intent": {"name": "GetTemperature", "confidence": 1.0}, "entities": [], "text": "wie warm ist es gerade", "raw_text": "wie warm ist es gerade", "recognize_seconds": 0.0030003929987287847, "tokens": ["wie", "warm", "ist", "es", "gerade"], "raw_tokens": ["wie", "warm", "ist", "es", "gerade"], "wav_seconds": 0.0, "transcribe_seconds": 0.0, "speech_confidence": 1, "slots": {}, "wakeId": "", "siteId": "default", "time_sec": 0.01453399658203125}
May  1 11:35:38 calypso rhasspy[547]: DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.0.60.40:8888
May  1 11:35:38 calypso rhasspy[547]: DEBUG:urllib3.connectionpool:http://10.0.60.40:8888 "POST /intents HTTP/1.1" 200 14
May  1 11:35:38 calypso rhasspy[547]: DEBUG:RemoteIntentHandler:{'intent': {}}
May  1 11:35:38 calypso rhasspy[547]: DEBUG:InboxActor: -> stopped
May  1 11:35:38 calypso rhasspy[547]: [2020-05-01 11:35:38,979] 10.0.60.40:61320 POST /api/text-to-intent 1.1 200 13 120977
May  1 11:35:38 calypso rhasspy[547]: INFO:quart.serving:10.0.60.40:61320 POST /api/text-to-intent 1.1 200 13 120977

From the documentation I can see that in case of outputting speech, this should be given:

{
  ...
  "speech": {
    "text": "Some text to speak."
  }
}

and in case of forwarding something (what?) to home assistant, this should be given:

{
  // rest of input JSON
  // ...
  "hass_event": {
    "event_type": "...",
    "event_data": {
      "key": "value",
      // ...
    }
  }
}

(and in this case: what is 'rest of input JSON'?)

So, long story short, what do I need to send back to rhasspy after my remote server has successfully handled some intent which was detected by rhasspy and send to my remote server?

And what is the idea of the "forward to home assistant" feature? I mean if my remote server shall handle the intent, why forward anything else to home assistant? Is this meant to be some kind of light-wrapper for the HA-API in order to enable the remote server to easily generate HA events in addition to its very own intent handling?

@mathquis
Copy link
Contributor

mathquis commented May 1, 2020

If your intents are already handled via Snips MQTT Hermes protocol then you won’t have to do much using the next version of Rhasspy (2.5) as it is completely compatible with the Hermes protocol already. It is even going to propose Snips NLU.

For the last few months, Rhasspy has gone through an intense restructuring of its services for improved modularity based on the MQTT Hermes protocol.

I think the official release of the 2.5 version is approaching rapidly.

For more info, maybe this can help:
https://rhasspy.github.io/rhasspy-voltron/tutorials.html

@patrickjane
Copy link
Author

Okay so youre saying that the "remote server" / HTTP based variant is going to be deprecated soon?
When reading the docs, I rather had the impression that the snips/hermes interface is merely "something rhasspy also does, but preferred way is home assistant/node red".

@mathquis
Copy link
Contributor

mathquis commented May 1, 2020

I think the remote HTTP handler will go on via a separated Rhasspy service.

As the Hermes MQTT protocol will be used as the underlying glue between all services, it might be simpler to interface directly with it instead of relying on an additional service just to forward intents and dialogue handling messages.

If your intents are already handling Snips topics then they should be completely compatible with Rhasspy next version (2.5).

How did you handle your skills with Snips? We’re you using snips-skill-server?

@patrickjane
Copy link
Author

patrickjane commented May 1, 2020

Okay I see. I wanted to implement both connectors anyway (HTTP + hermes), since its not so much of an effort.

Yes I was using the snips-skill-server previously, so basically I am trying to make a replacement for it since snips is dead.

In the current version (2.4.x) I can see there is options for MQTT/snips/hermes already. Is there going to be a bigger change regarding this interface in the upcoming 2.5 release?

@synesthesiam
Copy link
Owner

Hi @patrickjane, the short answer is that the same intent JSON should be returned (in 2.4). My original idea was that an intent handler could alter the intent before it got passed to Home Assistant (maybe add some extra information).

Going forward in 2.5, remote HTTP handling is fully supported. Like any other Hermes-compatible service, rhasspy-remote-http-hermes listens for intents via MQTT and POSTs them to some HTTP endpoint. It only expects a JSON object back with an optional "speech" property (with a "text" sub-property).

If you're using NodeRED, you have many choices in 2.5 to handle intents: directly via MQTT (Hermes protocol), via WebSocket (/api/intent or through the /api/mqtt bridge), or with the "remote" HTTP intent handling system. These can all be used simultaneously as well :)

@patrickjane
Copy link
Author

Okay I see. Meanwhile I've switched to MQTT, and I am receiving the intents from rhasspy via hermes/intent/#. So I've dropped the HTTP approach.

I have used node-red before with home assistant to do automations, however at some point I have dropped node-red and decided to just do everything in home assistant.
I am a programmer for quite a long time, and I would probably always favor coding something fancy instead of clicking things together in node-red. I would probably never try to implement the dark-sky API within node-red. But that might be just me.

Also I liked the way snips did it, in that you could pull existing skills from their store and just plug them into your system without much effort. This is why I started working on a skill-server replacement.

@patrickjane
Copy link
Author

So in 2.4 rhasspy does not yet listen on hermes/tts/say, right? So it currently only pushes intents to hermes/intent/# but no further MQTT interaction I guess.
So for now I have my existing skills working with said skill server, which might look like this:

INFO:hss: Hermes Skill Server v1.0.0
INFO:hss: Copyright (c) 2020-2020 Patrick Fial
INFO:skillserver: Loading skills ...
INFO:collection: Initializing skills ...
DEBUG:collection: Registering intent 's710:getForecast' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:getTemperature' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:hasRain' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:hasSun' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:hasSnow' for skill 's710-weather'
INFO:collection: Loaded 1 skill
INFO:skillserver: Connecting to MQTT server ...
INFO:mqtt: Connected to 10.0.50.5:1883
INFO:mqtt: Publishing TTS to topic 'hermes/tts/say'
INFO:mqtt: Subscribing to topic 'hermes/intent/#' ...

DEBUG:mqtt: Received message on topic 'hermes/intent/s710:getTemperature'
INFO:controller: Handling request with skill 's710-weather/s710:getTemperature'
ERROR:controller: Skill 's710-weather' raised exception while handling intent 's710:getTemperature' ('Skill' object has no attribute 'known_intents')
ERROR:controller: Respawning skill
DEBUG:collection: Registering intent 's710:getForecast' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:getTemperature' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:hasRain' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:hasSun' for skill 's710-weather'
DEBUG:collection: Registering intent 's710:hasSnow' for skill 's710-weather'
INFO:collection: Skill 's710-weather' respawned

DEBUG:mqtt: Received message on topic 'hermes/intent/s710:getTemperature'
INFO:controller: Handling request with skill 's710-weather/s710:getTemperature'
INFO:controller: Skill 's710-weather/s710:getTemperature' response: 'Morgen abend wird es zwischen 12 und 15 Grad warm.'
INFO:mqtt: Publishing response to 'hermes/tts/say'
DEBUG:mqtt: Response '{"sessionId": "", "siteId": "default", "text": "Morgen abend wird es zwischen 12 und 15 Grad warm.", "lang": "de_DE"}'

If you think this project might be useful for rhasspy I'd be happy to put it on github, so far I guess its satisfying my own personal needs for the voice assistant.

Idea of the skill server would be:

  • serve as platform which hosts the skills
  • each skill having its own git repo, python venv and runtime
  • each skill optionally having its own configuration
  • abstract as much as possible away from the actuall skill implementation
  • which boils down to your skill implementation overwriting as little as 2 methods (namely get_intentlist and handle)

I might add a little cli tool for this to handle skill installation & setup (same as we had with snips).

@patrickjane
Copy link
Author

@synesthesiam
Okay so meanwhile I actually came up with this:

https://github.com/patrickjane/hss-server
https://github.com/patrickjane/hss-skill

I'd be happy to work on some kind of skill-platform/marketplace thingy, if you guys be up for it.

@mathquis
Copy link
Contributor

mathquis commented May 7, 2020

This is pretty neat 😊👍

I’d be even better if languages outside of python could be used for skills. Like executing a command line and using stdin/stdout to communicate over a simple JSON protocol. A simple JSON/YAML at skill root level with skill properties (name, description, author, intents to handle, command to execute, etc) maybe ?

Just thinking 🤔🤗

@patrickjane
Copy link
Author

patrickjane commented May 7, 2020

I think we have that simple JSON protocol already, which is hermes over MQTT. Introducing a similar transport-/language-agnostic protocol on top of it might not be that useful, since you could achieve that with something like node-red already I guess.

I think it boils down a little on how the overall workflow of "skills" shall be, and if it is meant to be more for like developers and hackers, or more for the average user.
What I mean is, that in Snips/Alexa/.../ one would usually not install the assistant and then start programming. The average user would merely just install the assistant, and pick up existing skills from some kind of skill marketplace. Still, developers/hackers are the ones who can/will provide/develop a plethora of skills, and from my point of view, it would be okay (for a skill-server, that is), to keep it in one technology stack.

[edit] By the way, when it comes to sharing skills and installing skills from other developers, one half of it right now cannot be easily shared, which is the sentences & slots. Is there any idea/concept for this to enable easy sharing in the future of rhasspy?

@mathquis
Copy link
Contributor

mathquis commented May 7, 2020

Fair enough ;)

It boils down to how the sentences/slots are registered and forwarded to the ASR and NLU services. Maybe @synesthesiam could provide more insights on how Rhasspy 2.5 will handle the dataset.

@philtweir
Copy link

philtweir commented May 29, 2020

Just to feed back, @patrickjane (and others), I've been experimenting with your https://github.com/patrickjane/hss-server and rhasspy 2.5, and have managed to proof-of-concept (very roughly) a dialogue-based countdown timer skill, and a skill for adding reactions to RocketChat's most recent Gnome notification (happy to share once tidied a little) - for keeping those modular, I find your hss-skill pattern works quite well, and can see a couple of other itches I'll likely to use it to scratch. However, keen to know if you or others have had any more thoughts on trajectory.

@patrickjane
Copy link
Author

patrickjane commented May 30, 2020

I am using the hss-server now for a while with rhasspy 2.4, and indeed it does work, at least as much as to replace my existing snips-based skills.
I have a couple of skills on my account, which could serve as example implementations (here, here, here and here).

Yet, hss-server is just a first draft, and I would be happy to augment and improve it further. Two things I would like to implement, but will need help from rhasspy developers:

  1. Support real dialogs, e.g. skills should be able to ask questions before answering/processing the intent. This is supported by the hermes protocol, and I would like to implement the skill-server part, however rhasspy would need to support this as well

  2. Provide the sentences with the skill itself, so that potential users (non-developers) don't need to create their own sentences which mus then match the skill implementation. Maybe some kind of API in rhasspy for this would be great.

  3. Provide some kind of platform which provides skills made by others. Currently, hss-cli installs by cloning a GIT repo via URL. This could be improved by providing a small website, which lists existing skills along with their name + git URL, which would then be used by the hss-cli instead. This way one could simply do hss-cli -i my-skill, then hss-cli would just look up the git url on said website and install it.
    Its a minor improvement, but yet any other major voice assistant is having some kind of skills marketplace / repository, and I think its a great way of making the assistant accessible to non-developers.

Some thoughts on 2):
Currently, this would be possible, since rhasspy's HTTP API contains endpoints for a) updating sentences and b) an endpoint to trigger training. However, I don't feel exactly good about having the hss-server push sentencens into rhasspy like this. It would be any idea, however I'd like to hear rhasspy developer's opinion first on this, maybe there's a smarter way.

@koenvervloesem
Copy link
Contributor

Hi @patrickjane, impressive work! I have been thinking about the same functionality and implementing part of it too.

Some remarks:

  1. What specific support do you need in the skill server? Isn't this just the apps talking the Hermes protocol? I'm not sure the Hermes implementation has been tested yet with complex dialogues, but this task should be handled by rhasspy-dialogue-hermes.
  2. I proposed the same functionality in Add a way to upload a sentences.ini and retrain Rhasspy. rhasspy/rhasspy-hermes#12. Maybe you can add your remarks, requirements, ideas for the API there?
  3. This is also something I have been thinking about, but I don't think it makes sense yet to implement this, as there's no real app ecosystem yet. I know this is a bit of a chicken-and-egg problem, but I think it's better to come up first with one or more good app helper libraries and a way to securely install Rhasspy apps.

Are you active on the forum? I have discussed about these and other topics here:

The result of my thoughts in the first forum post is a helper library for Rhasspy apps, rhasspy-hermes-app. This is just a wrapper library around rhasspy-hermes to make it easy as possible to create Rhasspy apps. It's still a proof of concept, but already quite usable.

It seems to me that your hss-server and hss-skill are tightly intercoupled. I'm not sure that's the best way to go forward. Ultimately a skill server should be able to install skills developed in various languages, as @mathquis already remarked. So that's why I'm not too fond of the idea to couple a skill server to an app library.

But even with Python alone it would be better to make the architecture more flexible. There are a couple of initiatives to create libraries to develop Rhasspy apps (@daniele-athome is also working on a proof of concept for Rhasspy Hermes apps in AppDaemon) and it would probably better if we could share some parts of the API. Because one of Rhasspy's strong points is its flexibility in which services you can use with it, I think we should try to keep our options open for the creation and distribution of Rhasspy apps, so it's nice that there are various app platform implementations. But it would be good if we could share some resources.

Another idea I have created a proof of concept for is running each Rhasspy app in a Docker container. You can see my thoughts about it in the third forum link mentioned above. Coupled with Mosquitto's access control list and a username and password for each app, we can precisely limit what an app is able to do. My goal is to work on this idea further, because I don't like the idea of apps being able to do what they want on my machine or my network. So an alternative "skill server" could just install Docker containers this way to add Rhasspy apps.

These are just some ideas :-) Don't let this dissuade you from working on hss-server and hss-skill, I think there's still a lot to explore in this domain and having multiple implementations for Rhasspy is good.

@patrickjane
Copy link
Author

patrickjane commented May 30, 2020

Hi @patrickjane, impressive work! I have been thinking about the same functionality and implementing part of it too.

Some remarks:

1. What specific support do you need in the skill server? Isn't this just the apps talking the Hermes protocol? I'm not sure the Hermes implementation has been tested yet with complex dialogues, but this task should be handled by [rhasspy-dialogue-hermes](https://github.com/rhasspy/rhasspy-dialogue-hermes/).

Well mainly rhasspy would need to support the following workflow:

  1. reconize intent
  2. publish intent over MQTT
  3. skill-server/skill/app receives intent request
  4. skill-server/skill/app requests additional info by asking, this is sent via MQTT / hermes dialog
  5. rhasspy should speak the question, wait for/record user response (do NLU & stuff) and send back the response via MQTT, reusing the session-id
  6. skill-server/skill/app must match the MQTT message to the same session so that intent processing can continue

As I am still on rhasspy 2.4, I have no clue whether or not this already works.

2. I proposed the same functionality in [rhasspy/rhasspy-hermes#12](https://github.com/rhasspy/rhasspy-hermes/issues/12). Maybe you can add your remarks, requirements, ideas for the API there?

Will do.

3. This is also something I have been thinking about, but I don't think it makes sense yet to implement this, as there's no real app ecosystem yet. I know this is a bit of a chicken-and-egg problem, but I think it's better to come up first with one or more good app helper libraries and a way to securely install Rhasspy apps.

Agreed. All which I have written on this thread should be considered as support for rhasspy, and while I've already implemented a working skillserver, it should be merely a first draft, and subject to change. I would be happy to contribute, and agree to share resources.

When I first started using rhasspy, I found that there is no real intent handling in place, other than publishing intents via MQTT, HTTP or to home assistant, all of which were not suitable for me.

Are you active on the forum? I have discussed about these and other topics here:

* [Helper library to develop Rhasspy apps in Python](https://community.rhasspy.org/t/helper-library-to-develop-rhasspy-apps-in-python/969)

* [Rhasspy apps with Hermes MQTT in Python](https://community.rhasspy.org/t/rhasspy-apps-with-hermes-mqtt-in-appdaemon/937)

* [Secure architecture for Rhasspy apps](https://community.rhasspy.org/t/secure-architecture-for-rhasspy-apps/939)

Nope, didn't even know a forum exists :D

The result of my thoughts in the first forum post is a helper library for Rhasspy apps, rhasspy-hermes-app. This is just a wrapper library around rhasspy-hermes to make it easy as possible to create Rhasspy apps. It's still a proof of concept, but already quite usable.

It seems to me that your hss-server and hss-skill are tightly intercoupled. I'm not sure that's the best way to go forward. Ultimately a skill server should be able to install skills developed in various languages, as @mathquis already remarked. So that's why I'm not too fond of the idea to couple a skill server to an app library.

But even with Python alone it would be better to make the architecture more flexible. There are a couple of initiatives to create libraries to develop Rhasspy apps (@daniele-athome is also working on a proof of concept for Rhasspy Hermes apps in AppDaemon) and it would probably better if we could share some parts of the API. Because one of Rhasspy's strong points is its flexibility in which services you can use with it, I think we should try to keep our options open for the creation and distribution of Rhasspy apps, so it's nice that there are various app platform implementations. But it would be good if we could share some resources.

Well, I think both approaches have their pros and cons, and as I have stated earlier, it pretty much boils down to how rhasspy is meant to do intent handling. There is the option already to publish intents over MQTT, HTTP and send to home assistant. Lets ignore the HTTP stuff, then MQTT alone is already bringing the language agnostic decoupling, since it would allow anyone to just hook on the MQTT message using their favourite language. It would still work with a running hss-server, since the server would just drop unknown intents.

When I've started working on hss-server (and the efforts made so far are really trivial, so starting from scratch again is not an issue for me) I had the idea of resembling the snips-skill-server, which pretty much acted in the very same way.

Some of the reasons for using a server-approach over standalone app runtimes were:

  • avoid intent collision, which can hardly be achieved when independent processes are in place which don't know about each other. although snips had a skill-server, I always found it painful when intents were delivered to multiple skills, and skills were responsible to drop unknown intents
  • abstract hermes protocol away from skill implementation (*). in snips, one had to open a dedicated MQTT connection per skill, which implicates several things: a) each skill must know where the MQTT server is located b) each skill must implement hermes message parsing (which - of course - can be abstracted away by using dedicated hermes protocol libraries). I never got the idea behind this in snips, as it was their skill system anyway, and most likely you would never run skills with different MQTT servers within one skill server (you could always just run multiple skill servers). so with the skill-server approach, you can implement the MQTT connection just once, and then forward the inten to the appropriate skill
  • reduce complexity of developing skills. to implement a skill using hss-skill you basically just need to implement 2 abstract methods, thats it. piece of cake. voice assistant & config parsing will come for free without additional code.
  • reduce complexity of installing and starting skills, as this will be handled by the server itself, and not other runtime managers like systemd etc. involved. that means, once hss-server is installed (e.g. using systemd), no more steps necessary to have a skill up and running.
  • support the skill marketplace/ecosystem
  • prompt the user for configuration parameters upon installation/updating of skills (same way snips-skill-server did)

So basically what I want to say is, if rhasspy decides to offer intent handling, but at the same time make it possible to use any given programming language, then it would get a bit hard to bring all this together. As I said, you've got the decoupling via MQTT already, I see no real benefit of further decoupling within the skill-server, only to enable developers to implement skills in other programming languages.

Especially when we're talking about installing skills from other developers, skill marketplace/ecosystem, I think all this might get really complicated, when it shall be possible to install skills written in arbitrary programming languages. Just think about all the stuff that might need to be set up, like dependencies, tools, libraries. Right now, hss-cli is going to create python venvs for each skill and then install the skill's dependencies upon installation. Similar tasks would be necessary for other programming languages as well. So unless you want to decouple completely - that is, provide no rhasspy tool/script to install skills, and just have the user/developer install their skills manually - this might get overcomplicated.
I think other voice assistants also don't support arbitrary programming languages for their skill development.

Another idea I have created a proof of concept for is running each Rhasspy app in a Docker container. You can see my thoughts about it in the third forum link mentioned above. Coupled with Mosquitto's access control list and a username and password for each app, we can precisely limit what an app is able to do. My goal is to work on this idea further, because I don't like the idea of apps being able to do what they want on my machine or my network. So an alternative "skill server" could just install Docker containers this way to add Rhasspy apps.

Then, you would need some sort of protocol between the skill server and the docker container. And this would essentially mean you're back to zero, as hermes is that protocol already. So either your docker containers which contain the skills just need to implement the hermes protocol again, or we're talking about a second, non-standard protocol.

However maybe the skill server could support two kinds of skills; docker based and python based, and docker based skills would have no interaction with the skill server at all.

But even then, the user would need to configure MQTT connection parameters for every docker container, which is pretty much what I wanted to avoid in the first place.

These are just some ideas :-) Don't let this dissuade you from working on hss-server and hss-skill, I think there's still a lot to explore in this domain and having multiple implementations for Rhasspy is good.

No worries, all I want is to contribute to rhasspy's functionality. Although it is named hss-server, for me its more like a rhasspy-skill-server.
I am a professional c++/nodejs/python developer, if you want me to contribute and just tell me what to implement, I'll go for it ;)

BTW: you're using the term "app" for what I consider "skill", so in the above just read "skill" as "app" :)

[edit] So maybe as some kind of rough requirements list for intent handling:

  • interface between rhasspy and the skill shall use the hermes protocol (independent from whether a skill server is used or not)
  • skills shall provide sentences
  • upon installation, sentences shall be supplied to rhasspy, training shall be triggered
  • there should be a way to easily install a skill (by script, cli, ...)
  • when installing a skill, its configuration parameters should be prompted, so that the user does not have to manually edit config files after installation (optional)
  • rhasspy should be aware of available skills / intents (optional)
  • when installing a skill which provides the same intents as another already installed skill, the user should receive an error message
  • skills should only receive intent messages for intents they are handling (optional)
  • interaction with a skill marketplace shall allow easy skill lookup and installation (optional)

(to be continued)

@koenvervloesem
Copy link
Contributor

Well mainly rhasspy would need to support the following workflow:

This should already work on the Rhasspy 2.5 pre-release :-) Have a look at Rhasspy Voltron. That's why I was a bit puzzled why you would need a skill server for this.

@philtweir
Copy link

I can confirm that this flow seems to work for me with hss-server and Rhasspy Voltron - my understanding (from up the issue) was that that was where you were targetting @patrickjane ?

Support real dialogs, e.g. skills should be able to ask questions before answering/processing the intent. This is supported by the hermes protocol, and I would like to implement the skill-server part, however rhasspy would need to support this as well

I'm not sure if this is quite what you mean, but I have added a couple of small tweaks to my local version of hss-server and hss-skill to add a BaseSkill.next(...) with same args as BaseSkill.done(...) but sends a hermes/dialogueManager/continueSession instead of the ttsTopic, with an intent-filter of only the current intent. That then comes back into the Skill's handle method, and an if-else switch on the text gives separate flows for original and follow-up commands.

Working example:

Me: "jarvis set a timer"
Skill: "How long for" [implemented as return self.next(..."How long for") in hss-skill]
Me: "one minute"
Skill: "Timer set for 60 seconds" [self.done]
...
Skill: "Your timer is up" [separate thread in skill, repeats every 5s]
Me: "jarvis cancel timer"
Skill: "Timer cancelled"

Rhasspy seems to implement that fine, and afaiu the dialogue state handling parameters from the Hermes protocol are implemented (but haven't tried) - I have tested the "no-matching-intent" dialogue event too, and that can be picked up, for conversational-response misses. The main downside is that, as it still has to match the intent (even if filtered), the possible responses must be sentences for that intent in Rhasspy, just as the original command is.

I do recall seeing suggestion in the forums of modally switching the STT for follow-up, which would be nice, but at least if there was some way of making an intent, or certain sentences that trigger it, only matchable on follow-up dialogue (so words like "no" and "yes" wouldn't technically be valid opening commands), the next step, of switching speech-to-text from e.g. PocketSphinx to DeepSpeech in follow-up to give greater freedom, would be a bonus.

@philtweir
Copy link

philtweir commented May 30, 2020

On the language-independence as @koenvervloesem , I was thinking that too - I can see your reservations @patrickjane but if RPyC is negotiable, there's one or two RPC options.

IMO a simple option from a skill-maker's perspective (which should be an almost drop-in replacement from skill-maker-flow perspective) would be WAMP with Autobahn - I have used this on a number of projects for near-transparent RPC between languages in a Python-native-feeling way (it also has the bonus of supporting event subscription ootb). Happy to PoC that, if it would be a potential option. That said, having MQTT already there, there's maybe an argument for RPC over MQTT, but it those options don't seem nearly as mature as either RPyC or Autobahn.

A second benefit of this is that it'll work fine with venv or dockerized processes (Python or otherwise), and not increase the code a skill-creator would write.

To @koenvervloesem 's question about where a skill-server would fit - I think @patrickjane 's point about abstracting MQTT protocol interaction away is important. I probably wouldn't have bothered getting started with those if it wasn't just a case of "fill in this def handle" and away you go, only a small papercut, but hss-server (or seemingly Rhasspy Hermes App) does address it.

And of course the bullets @patrickjane mentioned sound like things that, given the modular nature of Rhasspy, it would want to defer to a handler such as hss-server or Rhasspy Hermes App (which I hadn't seen and haven't yet looked at properly!)

@philtweir
Copy link

(and a language-independent RPC framework would avoid every language having to have a Hermes implementation as a library for skill-makers)

@patrickjane
Copy link
Author

I can confirm that this flow seems to work for me with hss-server and Rhasspy Voltron - my understanding (from up the issue) was that that was where you were targetting @patrickjane ?

Support real dialogs, e.g. skills should be able to ask questions before answering/processing the intent. This is supported by the hermes protocol, and I would like to implement the skill-server part, however rhasspy would need to support this as well

I'm not sure if this is quite what you mean, but I have added a couple of small tweaks to my local version of hss-server and hss-skill to add a BaseSkill.next(...) with same args as BaseSkill.done(...) but sends a hermes/dialogueManager/continueSession instead of the ttsTopic, with an intent-filter of only the current intent. That then comes back into the Skill's handle method, and an if-else switch on the text gives separate flows for original and follow-up commands.

Working example:

Me: "jarvis set a timer"
Skill: "How long for" [implemented as return self.next(..."How long for") in hss-skill]
Me: "one minute"
Skill: "Timer set for 60 seconds" [self.done]
...
Skill: "Your timer is up" [separate thread in skill, repeats every 5s]
Me: "jarvis cancel timer"
Skill: "Timer cancelled"

Yeah, thats exactly the idea. Although I would have named it BaseSkill.ask() :)

I havent worked on this since 2.5 is not yet released.

Rhasspy seems to implement that fine, and afaiu the dialogue state handling parameters from the Hermes protocol are implemented (but haven't tried) - I have tested the "no-matching-intent" dialogue event too, and that can be picked up, for conversational-response misses. The main downside is that, as it still has to match the intent (even if filtered), the possible responses must be sentences for that intent in Rhasspy, just as the original command is.

I do recall seeing suggestion in the forums of modally switching the STT for follow-up, which would be nice, but at least if there was some way of making an intent, or certain sentences that trigger it, only matchable on follow-up dialogue (so words like "no" and "yes" wouldn't technically be valid opening commands), the next step, of switching speech-to-text from e.g. PocketSphinx to DeepSpeech in follow-up to give greater freedom, would be a bonus.

Thats what I mean with "rhasspy needs to support this". Doing full/plain new intent recognition for a follow up question is probably not more than a workaround I guess.
Most likely, the follow up responses should also go into sentences.ini, maybe even a separate NLU model could be used for those purposes.

Well mainly rhasspy would need to support the following workflow:

This should already work on the Rhasspy 2.5 pre-release :-) Have a look at Rhasspy Voltron. That's why I was a bit puzzled why you would need a skill server for this.

See the above. I'm gonna check it out when 2.5 is released.

(and a language-independent RPC framework would avoid every language having to have a Hermes implementation as a library for skill-makers)

I think my point was not so much about the protocol between skill-server and skill (which can with no issues be language agnostic, e.g. HTTP/JSON), but more about the dependencies and different handling for different languages. For example, node.js based skills would require npm install, which in turn needs to find npm on your system. C++ based skills might depend on some c libraries, would you fire off some apt install libxxx upon skill installation?
Unless you call some generic setup.sh upon installation and pretty much leave all those issues to the skill-developers, I can't think of a viable solution that involves arbitrary technologies.

So while its perfectly possible and fine to me to use a non-python RPC protocol, you would still have issues when installing the skill via hss-cli and later running the skill process.

@patrickjane
Copy link
Author

Maybe we should close this issue, and move the discussion to the forums? I think we have some really good ideas, and we should continue to discuss?

@philtweir
Copy link

philtweir commented May 30, 2020

Yeah, thats exactly the idea. Although I would have named it BaseSkill.ask() :)

Fair! Like I say, tidying required :)

Thats what I mean with "rhasspy needs to support this". Doing full/plain new intent recognition for a follow up question is probably not more than a workaround I guess.

Indeed - given an intent filter is part of the conversation response API (which Rhasspy implements, afaict, and is a start), it does seems like that approach is not inconsistent with Hermes, at least. However, it would make sense for Rhasspy to do some minimal implementation here - even just to allow marking sentences as ineligible for initial intent matching. Conversely, a potential use-case for full intent recognition (by specifying more than one, or no, intents in the filter) would be to ask a question that could switch path to a different skill.

So while its perfectly possible and fine to me to use a non-python RPC protocol, you would still have issues when installing the skill via hss-cli and later running the skill process.

True, but perhaps its a question of level-of-abstraction - if the decision is not made at the protocol level, but potential skill-family helper classes could be made, then language-specific-functionality is not quite so baked in and encapsulated to installation/provisioning functionality (a simple Python install-class for JS might use nodeenv, for instance).

Most likely, the follow up responses should also go into sentences.ini, maybe even a separate NLU model could be used for those purposes.

Yes, I think this touches on some broader questions that would be great to get input from the Rhasspy architects on (as you'd suggested).

Maybe we should close this issue, and move the discussion to the forums? I think we have some really good ideas, and we should continue to discuss?

Agreed - I think it's safe to say this has turned into solution development rather than issue resolution! If you want to post a link, we can jump over - would be keen to @koenvervloesem in that loop - have to say, from my brief look, I like the decorator skill syntax of Rhasspy Hermes App - wondering how hard it might be to use both hss-server and Rhasspy Hermes App together 🤔

@patrickjane
Copy link
Author

Okay so I have posted here: https://community.rhasspy.org/t/hermes-skill-server-for-intent-handling/1054/3

Currently I am working on a proper hermes dialog implementation, and I also had some idea for a low-cost marketplace-thingy, I'll see if I can get this up and running until tomorrow.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants