Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving integration of open domain utterances #4266

Closed
dakshvar22 opened this issue Aug 16, 2019 · 4 comments
Closed

Improving integration of open domain utterances #4266

dakshvar22 opened this issue Aug 16, 2019 · 4 comments
Assignees
Labels
type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@dakshvar22
Copy link
Contributor

Description of Problem:
Currently open domain utterances like chitchat can be integrated but in a slightly cumbersome manner. Each such open domain utterance needs to be classified under a separate intent and one action needs to be predicted based on a complex OR construct.

For example -

NLU

intent: ask_where_from
- Where are you from?
- Where do you live?

Story

* ask_weather OR ask_builder OR ask_howdoing OR ask_whoisit OR ask_whatisrasa OR ask_isbot OR ask_howold OR ask_languagesbot OR ask_restaurant OR ask_time OR ask_wherefrom OR ask_whoami OR handleinsult OR nicetomeeyou OR telljoke OR ask_whatismyname OR ask_howbuilt OR ask_whatspossible
 - action_chitchat

Template

utter_ask_where_from
- I was born in the city of berlin

Can we do away with the complex OR constructs and micro intents for each utterance type?

Overview of the Solution:
We can treat this as a more end-to-end dialogue system where given the user utterance, if it's of type open_domain(e.g. chitchat) then directly pick the most appropriate bot response using a ML model.
We call this a ResponseSelector Component which sits inside NLU pipeline, works exactly as EmbeddingIntentClassifier and embeds both candidate response and user utterance in the same embedding space and computes a similarity. Based on similarity ranking the model picks the most appropriate response.

We can have multiple types of open domain utterances like - chitchat, faq, etc. and build a separate response selector model or a unified model, depending on configuration.

We always use a mapping policy for the open_domain intent to map it to a utter_x action which queries the appropriate response selector model for the appropriate bot utterance. Also, these utter_x actions can be included in training stories in case there are follow-up actions that should be triggered and this behaviour can be learnt by other core policies.

A simple PoC of this is described here

Examples (if relevant):
The new way of integrating open domain utterances could look like -

NLU

intent: chitchat, response: I was born in the city of Berlin
- Where are you from?
- Where do you live?

Domain

intents:
- intent1
- intent2
- chitchat:
     triggers: utter_chitchat
     response: True
- faq:
     triggers: utter_faq
     response: True

Definition of Done:

@dakshvar22 dakshvar22 added the type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR label Aug 16, 2019
@dakshvar22 dakshvar22 self-assigned this Aug 16, 2019
@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Aug 16, 2019

Some open design questions -

  1. Should the name of action corresponding to an open domain utterance type be self-generated based on a template, since it is always of type utter_ and is tightly coupled to its own response selector? The template could be utter_<intent>, for e.g. - utter_chitchat,utter_faq. We could also do away with the need of developer specifying a mapping between the intent and action since that would be done by default because of the nature of the feature.

The domain file would then look like -

intents:
- intent1
- intent2
- chitchat:
     response: True
- faq:
     response: True

The only place developer would need to specify the action(if needed) would be in training story and he could use the same name based on the template.

This also helps in the case where you may want to build only one response selector model for all utterance types and all utterances get classified under a default intent chitchat and action utter_chitchat.
This can then be driven by a configuration variable as opposed to data driven where you change the training data to test out the variation.

The config for response selector in that case would look like -

- name: ResponseSelector
   param_key:param_val
   merge_all: True

Specifying this in the domain.yml isn't feasible since domain is not available currently inside NLU pipeline.

Edit - Discussed this with @amn41 and we decided to follow a templated name.

  1. Should there be flexibility to specify different hyperparams for each model corresponding to each utterance type(in case the developer is building multiple models for multiple utterance types)?
    If yes, the NLU config could look like -
- name: ResponseSelector
   param_key: param_val
   response_type: chitchat
- name: ResponseSelector
   param_key: param_val
   response_type: faq

How do we define whether to merge the models into one model for utterance types in the above case?

In the other case, we would do away with response_type key in the config and have a universal config for response selector like -

- name: ResponseSelector
   param_key: param_val
   merge_all: True/False

Based on merge_all one/multiple response selectors are instantiated but all with same hyperparams.

Edit - Decided to go the most transparent way which is to have the developer write config for multiple utterance types if needed and not worry about making it simpler. Instead of key merge_all we define a key response_type which allows the component to filter training examples by the corresponding utterance type. If response_type is not mentioned all utterance types are collapse under one default type.

@dakshvar22
Copy link
Contributor Author

Based on user tests and feedback received, allowing multiple responses for similar utterances is a good to have capability and we should try to include it. Although the current training data format is not the most flexible format for it. For e.g. -

intent: chitchat, response: I was born in the city of Berlin
- Where are you from?
- Where do you live?

The above format includes a single response cleanly. To allow multiple responses we would need to add a delimiter in between all candidate responses in the first line of the data point which wouldn't render very cleanly in the markdown format.

One most intuitive option is source other variants of the same response from a different file based on a key lookup. The key can be the primary response itself and the variants can be written inside domain.yml. All variants of the response can be part of the template section in domain file and can be randomly picked at inference time. Although to accomplish that in the cleanest of implementation we should have access to domain inside the NLU component which is not possible right now.

What is the best option?

@dakshvar22
Copy link
Contributor Author

Document regarding nuances of different training formats -
https://docs.google.com/document/d/1eblzWgtxWKy5-ClBL3elrs-zch2ai_R8lPd0RgbBctM/edit?usp=sharing
Discussion regarding where bot utterances for open domain intents should go -
#4233 (comment)

@dakshvar22
Copy link
Contributor Author

Fixed in #4233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

1 participant