Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naomi TTI needs intent vectors #357

Open
aaronchantrill opened this issue Nov 26, 2021 · 1 comment
Open

Naomi TTI needs intent vectors #357

aaronchantrill opened this issue Nov 26, 2021 · 1 comment

Comments

@aaronchantrill
Copy link
Contributor

Detailed Description

I ran into some issues with the "Naomi TTI" plugin yesterday. Since Naomi started out as basically a keyword recognizer, I wanted to honor that tradition in the Naomi TTI plugin. So the idea was to let Naomi decide how important individual words were to different intents, instead of having plugin authors "own" particular words. If a word appeared in 100% of the templates for a particular intent, then it would be considered extremely important to that intent. If it only appeared in 50% of the templates for the intent, then it would be considered less important to that intent. On top of that, if the word appears in every intent, then its importance is reduced, and if it only appears in one intent, then the presence of that word almost guarantees that the word belongs to the intent.

This is causing a problem between the MPDControl intent and the Frotz Intent, which both heavily use the word PLAY. MPDControl has 5 templates that use the word "PLAY", but also five additional ones like "PAUSE", "RESUME", and "INCREASE VOLUME" that do not include the word play.

This means that the word "PLAY" is heavily skewed towards Frotz which uses "PLAY" in all three templates. In fact, so heavily skewed that if you ask for an exact match for "PLAY MUSIC" it still matches the Frotz intent.

We have also been having trouble with things like distributing access to multiple joke intents "JokeIntent", "DadJokeIntent" and "ChuckIntent". If the user simply asks Naomi "tell me a joke" it would make sense to choose one of them randomly.

Context

The default TTI engine for Naomi is meant to prevent plugin authors from "gaming the system" to get all requests routed through their own plugin.

Possible Implementation

I think what we need to do is start using vectors instead of simply trying to calculate how important specific words are to specific intents.

@aaronchantrill
Copy link
Contributor Author

I'm searching for pre-trained vector models. It doesn't seem like they should need to be domain specific, so it seems like finding pre-trained ones would be a big time saver.

In the meantime, I did finally find a source lists of word frequencies: https://www.lexipedia.org/
I have no idea what their license is, but since I would be having the software download a list directly from them, I think it should be okay to use. If there is an issue, I should be able to scrape and analyze wikipedia myself and distribute my own. I think this can be used to weight words in such a way that connector words like in, the, of, etc. can be reduced in weight to the point where they no longer meaningfully impact the intent matching algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant