Tim Kettenacker, August 2020

# Ontology-aided Chatbot

## Topic Introduction and Use Case

With the first wave of AI applications arriving in the corporate world, automating domain-specific tasks is accelerating. AI applications are becoming more and more proficient in solving and automating one particular thing in one particular domain. Same is true for chatbots: A gym chatbot can be good at informing me about the opening hours and generating a workout plan, but it probably struggles at delivering accurate weather forecasts. If I manage to use the same vocabulary a chatbot is able to understand (that is, training data aligns with the practicality of the language and the domain), a chatbot is able to understand my intent and potentially, it could steer a conversation. For that, it needs to predict the intent behind a conversation. As a matter of fact, capturing and keeping the intent in a conversation is crucial to the acceptance and success of a chatbot. In this document, I illustrate both how NLP can be used to understand human language based on domain-specific training data, and how a mapping to an ontology of that domain can be used to navigate the conversation. 

To support that idea, my use case is a website chatbot offering various iPhone models for sale (in German). The user may ask for iPhones in general or specific models as well as their product features and pricing.


## Architecture and Process Overview

The chatbot browser interface is a very simplistic one. It is a box where the user types in any question or comment. The input is processed in the backend, a response is generated and displayed in the frontend. The user responds to that again and so on. The web service providing the user interface is spun up by Python library Flask. The computation logic is done in Python, with the different components split up into classes. An instance of the  **chatbot** class is handling the user input and creates instances of the other classes to orchestrate the logic.

The **trainer** class is used to load a pre-trained model on the training data. Facebooks "fasttext" library is utilized for that component. Note that training does not follow the usual training-test routine because of the limited amount of usable data. It is trained to predict the label for a user input, i.e. whether an input is about a product variant. 

The result set of this prediction outcome is then passed on to an instance of the **Natural Language Processor**. I utilize the German libraries of "spacy" for that. Due to the nature of the expected input - very small statements with only a couple of words - the result of the trainer instance needs to be verified with regards to the accuracy of the prediction. By applying shallow and dependency parsing, the overall structure is examined and it can be detected and cross-checked if for example a sentence is positioned as an open or closed question, in turn verifying or invalidating the prediction outcome of the trainer. The structure of a sentence may even hint at the presence of a product variant.

A pillar in the architecture is certainly the ontology that serves as a schema model describing the domain in place. I chose to go for the GoodRelations ontology (http://goodrelations.makolab.com/description) due to its popular use in ecommerce-webpages and its broad coverage of the domain. I deleted some parts in my local representation of the ontology to declutter it and reducing the memory consumption, because the ontology is loaded to memory by the **Ontology Lookup** component. The Ontology Lookup maps the output generated by natural language processing to the entities in the ontology. When it finds a match, it traverses the graph, looking for adjacent hierarchical information. Depending on the entity and its outgoing relationships, it derives questions that can be used by the **Conversation Context** class instance to further narrow down the intent. The conversation context also stores and updates the current class and instance of the ontology as a reference point to steer the conversation.

A logging module is in place. Its purpose is to capture mainly the questions of the user that could not be answered. This way the log files display an information need that is not yet modelled but should be. 

![Chatbot Architecture](chatbot-architecture.jpg)

## Typical intents reflected in the training data

Natural language understanding depends on the quality and quantity of the data that was used to train the model. From an engineering perspective, finding and selecting annotated data that suits your use case can be a hassle. For english text data, usually the Brown corpus is used. However, the sources for German text data are comparatively scarce and hard to find. Due to the lack of sources in that very domain of my use case, I choose to annotate my own data (which was probably the most cumbersome part of the project). 

The annotated data has to reflect the real-world conversation and information exchange.

1. Usually, an inquiring user would start off the conversation asking

 * a general, open question about the availability of a product or brand, i.e. *'Welche iPhones habt ihr?'* (*'Which iPhones do you have?'*) or

 * a closed question about the availability, i.e. *'Habt ihr iPhone 11?'* (*'Do you have iPhone 11?'*)
 

2. The Chatbot retrieves the item(s) found in the 'database' (the ontology) and 

 * in the case of an open question, displays the products available and asks for confirmation
 
 * in the case of a closed question, displays only 1 item and asks for confirmation
 

3. The user confirms or denies the selection 

 * when it is now narrowed down to one specific product variant, the chatbots asks wheter the user is interested in learning more about the features or the pricing 

 * when the user negates the selection, the loop is started over again

And so on, until the purchase is done or the user leaves the conversation.

The chatbot must be able to react to these occasions, so it is important that labelled training data is available to recognize patterns in the user input. Data to be used for training needs to have a special format. For the fasttext engine, it looks like this:

\__label__product_availability_closed_question habt ihr produkte

\__label__product_availability_closed_question führst du produkte

\__label__product_availability_open_question was für produkte gibt es

\__label__product_availability_open_question welches produkt habt ihr vorrätig

\__label__product_variant iphone 11 gelb 64 GB

\__label__product_variant iphone 11 weiß 64 GB

\__label__confirmation genau

\__label__rejection nein

As you can see, it splits up into the label associated with a text. It is a bit fiddly to find the right amount of labels and text variants, especially if the sentences are very short. The many ways of expressing questions in the German language make it difficult for a machine to distinguish between the label to be predicted. For open and closed questions, it is merely a change of a few words!


## Predicting the intent

Still, the trained model is able to detect and distinguish open and closed questions present in the user input with a degree of certainty, even if the exact same sentence has not been in the training set:

In [19]:
import os
os.chdir('/Users/timkettenacker/dsproj_repos/python/bluefinch')
import fasttext
model = fasttext.load_model("ml_model/model_intent_detection.bin") # model was pre-trained by me

input_open = 'welche iphones habt ihr?'
print("Open question from the user: " + input_open)
print(model.predict(input_open.lower()))

Open question from the user: welche iphones habt ihr?
(('__label__product_availability_open_question',), array([0.95645875]))





In [20]:
input_closed = 'habt ihr iphone 11?'
print("Closed question from the user: " + input_closed)
print(model.predict(input_closed.lower()))

Closed question from the user: habt ihr iphone 11?
(('__label__product_availability_closed_question',), array([0.74043649]))


It is also able to detect mentions of a product variant in the user statement, as can be seen below. Especially for such cases, there is an inbuilt verification of the prediction introduced in the Natural Language Processor.

In [24]:
input_variant = 'iphone 11 64 gb in weiß'
print("Statement about product variant: " + input_variant)
print(model.predict(input_variant.lower()))

Statement about product variant: iphone 11 64 gb in weiß
(('__label__product_variant',), array([0.9256106]))


## Verifying the prediction with NLP

Just like humans, AI is bound to make mistakes. To validate the outcome of the prediction, it is possible to examine the pattern of the input structure and see if it fits to the prediction. Spacy's NLP pipeline ships with some handy parsing options to inspect the linguistic annotations (see https://spacy.io/usage/spacy-101#annotations-pos-deps). For example, one can check if the first word used is one of the German w-questions, pointing to an open question type:

In [36]:
import natural_language_processor # custom-build class
import de_core_news_sm
from collections import defaultdict
import spacy

def generate_nlp_output(input):
    nlp = de_core_news_sm.load()
    doc = nlp(input.title())
    nlp_output = defaultdict(list)
    for token in doc:
        nlp_output[token.i] = [token.text, token.pos_, token.tag_, token.dep_, token.head.text]
    return nlp_output

nlp_output = generate_nlp_output(input_open)
    
open_question_pointers = ['PWAT', 'PWAV', 'PWS']  # which, when, what, ...
closed_question_pointers = ['VMFIN', 'VVFIN', 'VVIMP', 'VAFIN']  # könnt, habt, ...
if nlp_output[0][2] in open_question_pointers:
    sentence_type = 'open_question'
elif nlp_output[0][2] in closed_question_pointers and nlp_output[len(nlp_output.keys()) - 1][0] == '?':
    sentence_type = 'closed_question'
else:
    sentence_type = "undefined"
    
print("The input '" + input_open + "' is of type " + sentence_type)

The input 'welche iphones habt ihr?' is of type open_question


## Extracting product entities

Another very important task that the NLP pipeline does is extracting entities from a user's message in order to grasp the products and product features a user mentions. It is the first step in sharpening the context of conversation and keeping it alive throughout the chat. The entity extraction confines to the typical setup of the domain data and thus is very specialized. I.e., it tries to pay attention to the naming conventions of Apple products where the brand name is followed by a version number or roman letter, i.e. iPhone 11 or iPhone X. Mind that is has a pretty hard fallback if multiple nouns are present (how would it distinguish between them anyway), so it only works well for very short inputs that usually result from a chatbot asking "Are you interested in iPhone 11 or iPhone 11 Pro?"

In [48]:
def noun_extraction(nlp_output):
    found_nouns = []
    root_noun = ''

    for key, value in nlp_output.items():
        if value[1] in ['NOUN', 'PROPN'] and value[3] in ['ROOT', 'pnc']:
            root_noun = value[0]
            found_nouns.append(value[0])
        if root_noun in value[4] and value[1] in ['PROPN', 'NOUN', 'X', 'NUM'] and value[3] in ['nk', 'pnc']:
            found_nouns.append(value[0])

    nouns = ' '.join(found_nouns)
    return nouns

print("Input " + 'iphone 11' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('iphone 11')))
print("Input " + 'iphone 11 pro' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('iphone 11 pro')))
print("Input " + 'bietet ihr das iphone 11 pro an?' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('bietet ihr das iphone 11 pro an?')))
print("Input " + 'ich möchte gerne über das iphone 11 pro genaueres wissen' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('ich möchte gerne über das iphone 11 pro genaueres wissen')))

Input iphone 11 extracts: 
Iphone 11
Input iphone 11 pro extracts: 
Iphone Pro
Input bietet ihr das iphone 11 pro an? extracts: 
Iphone Pro
Input ich möchte gerne über das iphone 11 pro genaueres wissen extracts: 
Iphone Wissen


Extracted entities are then passed on to the Ontology using the Ontology Lookup.

In [None]:
## Ontology Lookup

