Naomi TTI needs intent vectors #357

aaronchantrill · 2021-11-26T02:21:54Z

Detailed Description

I ran into some issues with the "Naomi TTI" plugin yesterday. Since Naomi started out as basically a keyword recognizer, I wanted to honor that tradition in the Naomi TTI plugin. So the idea was to let Naomi decide how important individual words were to different intents, instead of having plugin authors "own" particular words. If a word appeared in 100% of the templates for a particular intent, then it would be considered extremely important to that intent. If it only appeared in 50% of the templates for the intent, then it would be considered less important to that intent. On top of that, if the word appears in every intent, then its importance is reduced, and if it only appears in one intent, then the presence of that word almost guarantees that the word belongs to the intent.

This is causing a problem between the MPDControl intent and the Frotz Intent, which both heavily use the word PLAY. MPDControl has 5 templates that use the word "PLAY", but also five additional ones like "PAUSE", "RESUME", and "INCREASE VOLUME" that do not include the word play.

This means that the word "PLAY" is heavily skewed towards Frotz which uses "PLAY" in all three templates. In fact, so heavily skewed that if you ask for an exact match for "PLAY MUSIC" it still matches the Frotz intent.

We have also been having trouble with things like distributing access to multiple joke intents "JokeIntent", "DadJokeIntent" and "ChuckIntent". If the user simply asks Naomi "tell me a joke" it would make sense to choose one of them randomly.

Context

The default TTI engine for Naomi is meant to prevent plugin authors from "gaming the system" to get all requests routed through their own plugin.

Possible Implementation

I think what we need to do is start using vectors instead of simply trying to calculate how important specific words are to specific intents.

aaronchantrill · 2021-12-13T02:01:20Z

I'm searching for pre-trained vector models. It doesn't seem like they should need to be domain specific, so it seems like finding pre-trained ones would be a big time saver.

In the meantime, I did finally find a source lists of word frequencies: https://www.lexipedia.org/
I have no idea what their license is, but since I would be having the software download a list directly from them, I think it should be okay to use. If there is an issue, I should be able to scrape and analyze wikipedia myself and distribute my own. I think this can be used to weight words in such a way that connector words like in, the, of, etc. can be reduced in weight to the point where they no longer meaningfully impact the intent matching algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naomi TTI needs intent vectors #357

Naomi TTI needs intent vectors #357

aaronchantrill commented Nov 26, 2021

aaronchantrill commented Dec 13, 2021

Naomi TTI needs intent vectors #357

Naomi TTI needs intent vectors #357

Comments

aaronchantrill commented Nov 26, 2021

Detailed Description

Context

Possible Implementation

aaronchantrill commented Dec 13, 2021