Tim Kettenacker, August 2020

# Ontology-aided Chatbot

## Topic Introduction and Use Case

With the first wave of AI applications arriving in the corporate world, automating domain-specific tasks is accelerating. AI applications are becoming more and more proficient in solving and automating one particular thing in one particular domain. Same is true for chatbots: A gym chatbot can be good at informing me about the opening hours and generating a workout plan, but it probably struggles at delivering accurate weather forecasts. If I manage to use the same vocabulary a chatbot is able to understand (that is, training data aligns with the practicality of the language and the domain), a chatbot is able to understand my intent and potentially, it could steer a conversation. For that, it needs to predict the intent behind a conversation. As a matter of fact, capturing and keeping the intent in a conversation is crucial to the acceptance and success of a chatbot. In this document, I illustrate both how NLP can be used to understand human language based on domain-specific training data, and how a mapping to an ontology of that domain can be used to navigate the conversation. 

To support that idea, my use case is a website chatbot offering various iPhone models for sale (in German). The user may ask for iPhones in general or specific models as well as their product features and pricing.


## Architecture and Process Overview

The chatbot browser interface is a very simplistic one. It is a box where the user types in any question or comment. The input is processed in the backend, a response is generated and displayed in the frontend. The user responds to that again and so on. The web service providing the user interface is spun up by Python library Flask. The computation logic is done in Python, with the different components split up into classes. An instance of the  **chatbot** class is handling the user input and creates instances of the other classes to orchestrate the logic.

The **trainer** class is used to load a pre-trained model on the training data. Facebooks "fasttext" library is utilized for that component. Note that training does not follow the usual training-test routine because of the limited amount of usable data. It is trained to predict the label for a user input, i.e. whether an input is about a product variant. 

The result set of this prediction outcome is then passed on to an instance of the **Natural Language Processor**. I utilize the German libraries of "spacy" for that. Due to the nature of the expected input - very small statements with only a couple of words - the result of the trainer instance needs to be verified with regards to the accuracy of the prediction. By applying shallow and dependency parsing, the overall structure is examined and it can be detected and cross-checked if for example a sentence is positioned as an open or closed question, in turn verifying or invalidating the prediction outcome of the trainer. The structure of a sentence may even hint at the presence of a product variant.

A pillar in the architecture is certainly the ontology that serves as a schema model describing the domain in place. I chose to go for the GoodRelations ontology (http://goodrelations.makolab.com/description) due to its popular use in ecommerce-webpages and its broad coverage of the domain. I deleted some parts in my local representation of the ontology to declutter it and reducing the memory consumption, because the ontology is loaded to memory by the **Ontology Lookup** component. The Ontology Lookup maps the output generated by natural language processing to the entities in the ontology. When it finds a match, it traverses the graph, looking for adjacent hierarchical information. Depending on the entity and its outgoing relationships, it derives questions that can be used by the **Conversation Context** class instance to further narrow down the intent. The conversation context also stores and updates the current class and instance of the ontology as a reference point to steer the conversation.

A logging module is in place. Its purpose is to capture mainly the questions of the user that could not be answered. This way the log files display an information need that is not yet modelled but should be. 

![Chatbot Architecture](chatbot-architecture.jpg)

## Typical intents reflected in the training data

Natural language understanding depends on the quality and quantity of the data that was used to train the model. From an engineering perspective, finding and selecting annotated data that suits your use case can be a hassle. For english text data, usually the Brown corpus is used. However, the sources for German text data are comparatively scarce and hard to find. Due to the lack of sources in that very domain of my use case, I choose to annotate my own data (which was probably the most cumbersome part of the project). 

The annotated data has to reflect the real-world conversation and information exchange.

1. Usually, an inquiring user would start off the conversation asking

 * a general, open question about the availability of a product or brand, i.e. *'Welche iPhones habt ihr?'* (*'Which iPhones do you have?'*) or

 * a closed question about the availability, i.e. *'Habt ihr iPhone 11?'* (*'Do you have iPhone 11?'*)
 

2. The Chatbot retrieves the item(s) found in the 'database' (the ontology) and 

 * in the case of an open question, displays the products available and asks for confirmation
 
 * in the case of a closed question, displays only 1 item and asks for confirmation
 

3. The user confirms or denies the selection 

 * when it is now narrowed down to one specific product variant, the chatbots asks wheter the user is interested in learning more about the features or the pricing 

 * when the user negates the selection, the loop is started over again

And so on, until the purchase is done or the user leaves the conversation.

The chatbot must be able to react to these occasions, so it is important that labelled training data is available to recognize patterns in the user input. Data to be used for training needs to have a special format. For the fasttext engine, it looks like this:

\__label__product_availability_closed_question habt ihr produkte

\__label__product_availability_closed_question führst du produkte

\__label__product_availability_open_question was für produkte gibt es

\__label__product_availability_open_question welches produkt habt ihr vorrätig

\__label__product_variant iphone 11 gelb 64 GB

\__label__product_variant iphone 11 weiß 64 GB

\__label__confirmation genau

\__label__rejection nein

As you can see, it splits up into the label associated with a text. It is a bit fiddly to find the right amount of labels and text variants, especially if the sentences are very short. The many ways of expressing questions in the German language make it difficult for a machine to distinguish between the label to be predicted. For open and closed questions, it is merely a change of a few words!


## Predicting the intent

Still, the trained model is able to detect and distinguish open and closed questions present in the user input with a degree of certainty, even if the exact same sentence has not been in the training set:

In [5]:
import os
os.chdir('/Users/timkettenacker/dsproj_repos/python/bluefinch')
import fasttext
model = fasttext.load_model("ml_model/model_intent_detection.bin") # model was pre-trained by me

input_open = 'welche iphones habt ihr?'
print("Open question from the user: " + input_open)
print(model.predict(input_open.lower()))

Open question from the user: welche iphones habt ihr?
(('__label__product_availability_open_question',), array([0.95645875]))





In [3]:
input_closed = 'habt ihr iphone 11?'
print("Closed question from the user: " + input_closed)
print(model.predict(input_closed.lower()))

Closed question from the user: habt ihr iphone 11?
(('__label__product_availability_closed_question',), array([0.74043649]))


It is also able to detect mentions of a product variant in the user statement, as can be seen below. Especially for such cases, there is an inbuilt verification of the prediction introduced in the Natural Language Processor.

In [4]:
input_variant = 'iphone 11 64 gb in weiß'
print("Statement about product variant: " + input_variant)
print(model.predict(input_variant.lower()))

Statement about product variant: iphone 11 64 gb in weiß
(('__label__product_variant',), array([0.9256106]))


## Verifying the prediction with NLP

Just like humans, AI is bound to make mistakes. To validate the outcome of the prediction, it is possible to examine the pattern of the input structure and see if it fits to the prediction. Spacy's NLP pipeline ships with some handy parsing options to inspect the linguistic annotations (see https://spacy.io/usage/spacy-101#annotations-pos-deps). For example, one can check if the first word used is one of the German w-questions, pointing to an open question type:

In [6]:
import natural_language_processor # custom-build class
import de_core_news_sm
from collections import defaultdict
import spacy

def generate_nlp_output(input):
    nlp = de_core_news_sm.load()
    doc = nlp(input.title())
    nlp_output = defaultdict(list)
    for token in doc:
        nlp_output[token.i] = [token.text, token.pos_, token.tag_, token.dep_, token.head.text]
    return nlp_output

nlp_output = generate_nlp_output(input_open)
    
open_question_pointers = ['PWAT', 'PWAV', 'PWS']  # which, when, what, ...
closed_question_pointers = ['VMFIN', 'VVFIN', 'VVIMP', 'VAFIN']  # könnt, habt, ...
if nlp_output[0][2] in open_question_pointers:
    sentence_type = 'open_question'
elif nlp_output[0][2] in closed_question_pointers and nlp_output[len(nlp_output.keys()) - 1][0] == '?':
    sentence_type = 'closed_question'
else:
    sentence_type = "undefined"
    
print("The input '" + input_open + "' is of type " + sentence_type)

The input 'welche iphones habt ihr?' is of type open_question


## Extracting product entities

Another very important task that the NLP pipeline does is extracting entities from a user's message in order to grasp the products and product features a user mentions. It is the first step in sharpening the context of conversation and keeping it alive throughout the chat. The entity extraction confines to the typical setup of the domain data and thus is very specialized. I.e., it tries to pay attention to the naming conventions of Apple products where the brand name is followed by a version number or roman letter, i.e. iPhone 11 or iPhone X. Mind that is has a pretty hard fallback if multiple nouns are present (how would it distinguish between them anyway), so it only works well for very short inputs that usually result from a chatbot asking "Are you interested in iPhone 11 or iPhone 11 Pro?"

In [7]:
def noun_extraction(nlp_output):
    found_nouns = []
    root_noun = ''

    for key, value in nlp_output.items():
        if value[1] in ['NOUN', 'PROPN'] and value[3] in ['ROOT', 'pnc']:
            root_noun = value[0]
            found_nouns.append(value[0])
        if root_noun in value[4] and value[1] in ['PROPN', 'NOUN', 'X', 'NUM'] and value[3] in ['nk', 'pnc']:
            found_nouns.append(value[0])

    nouns = ' '.join(found_nouns)
    return nouns

print("Input " + 'iphone 11' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('iphone 11')))
print("Input " + 'iphone 11 pro' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('iphone 11 pro')))
print("Input " + 'bietet ihr das iphone 11 pro an?' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('bietet ihr das iphone 11 pro an?')))
print("Input " + 'ich möchte gerne über das iphone 11 pro genaueres wissen' + ' extracts: ' )
print(noun_extraction(generate_nlp_output('ich möchte gerne über das iphone 11 pro genaueres wissen')))

Input iphone 11 extracts: 
Iphone 11
Input iphone 11 pro extracts: 
Iphone Pro
Input bietet ihr das iphone 11 pro an? extracts: 
Iphone Pro
Input ich möchte gerne über das iphone 11 pro genaueres wissen extracts: 
Iphone Wissen


Extracted entities are then passed on to the Ontology using the Ontology Lookup.

## Mapping recognized entities to the ontology

The notoriously difficult task in chatbots lies in keeping the state of the conversation and the anaphora resolution (the problem of resolving what a pronoun, or a noun phrase refers to). Both can be achieved with the help of ontologies. Ontologies contain a sketch of the entities and the relationship between those entities in a domain. If the entity the user is talking about is known, the relationship to other entities, and therefore, branches to the possible information needs are also known. Knowing the structure can therefore help steering the conversation. I.e. if the current entity a user is referring to can be identified as "iPhone 11", a lookup into the ontology yields that "iPhone 11" is of type "Product". One can see that a "product" branches out into "individuals", "service items" and "some items". Those are the *children* of the product entity. Either the properties stored on the product or on any of the children may lead to a 'real' intent behind the user's questions, so it is worth navigating them in the application to bring more context into the conversation. 

![Ontology_model](ontology_product_variant.png)

Since "iPhone 11" has multiple product variants, it is important to ask the user which one he is actually referring to or which one sparks his interest. If there would not be any trace of this product and product variant constellation in the backend, it would be difficult to connect the dots and figure out the real intent. By utilizing the information from the ontology, a more informed decision can be made by both the user *and* the application.

![Ontology_relationships](ontology_individuals_relationships.png)

First step is to map the entity extracted by NLP to its representation in the ontology (if applicable). For performance reasons, the whole ontology is stored in memory. The classes and individuals of the ontology are stored in python lists to make the lookup from the extracted nouns to the ontology values faster.

In [9]:
import ontology_lookup # custom-build class
from owlready2 import *
from fuzzywuzzy import process, fuzz
from collections import defaultdict

onto_path.append("ontology_material")
ontology = get_ontology("GoodRelationsBluefinch_v2.owl")
ontology.load()

def display_classes_and_individuals(ontology):
    classes = defaultdict(list)
    for cl in ontology.classes():
        classes[cl.name] = cl

    individuals = defaultdict(list)
    for individual in ontology.individuals():
        individuals[individual.label.first()] = [individual.iri, individual.is_a.first().name]

    return classes, individuals
    
classes = display_classes_and_individuals(ontology)[0]
individuals = display_classes_and_individuals(ontology)[1]
print('This is how the "iPhone 11" product instance is being stored:')
print(individuals['Apple iPhone 11'])

This is how the "iPhone 11" product instance is being stored:
['http://webprotege.stanford.edu/AppleIPhone11', 'Product']


There are two search methods applied to speed up matching entities discovered in a user's message to items in an ontology. The first one uses a fuzzy string comparison algorithm to detect overlaps between the nouns mentioned in the user input and the string representation of the individuals in an ontology. So basically, it just checks whether the entity that was detected through the means of NLP is actually in the ontology. This requires a certain degree of fuzziness to compare the values. 

Depending on the type of question, higher or lower matching thresholds are applied on the data.  As long as a question is not an open question, an honest attempt can be made by comparing the whole user message against all individuals.

In [10]:
def individual_lookup(nouns, sentence_type, input, individuals):
        if (len(input) < 50) and sentence_type != "open_question":
            match_list = process.extractBests(input, individuals.keys(), scorer=fuzz.UWRatio,
                                              score_cutoff=70, limit=len(individuals))
        else:
            match_list = process.extractBests(nouns, individuals.keys(), scorer=fuzz.UWRatio,
                                              score_cutoff=50, limit=len(individuals))
        match_list_cleaned = [[*x] for x in zip(*match_list)]
        return match_list_cleaned

input = 'habt ihr iphone 11?'
individual_lookup(noun_extraction(generate_nlp_output(input)), 'closed_question', input, individuals)

[['Apple iPhone 11 128 GB Weiß MWM22ZD/A',
  'Apple iPhone 11 256 GB Gelb MWMA2ZD/A',
  'Apple iPhone 11 256 GB Weiß MWM82ZD/A',
  'Apple iPhone 11 64 GB Gelb MWLW2ZD/A',
  'Apple iPhone 11 64 GB Weiß MWLU2ZD/A',
  'Apple iPhone 11 Hauptkameraspezifikation',
  'Apple iPhone 11 Pricing für 64 GB',
  'Apple iPhone 11 Pro 64 GB Nachtgrün MWC62ZD/A',
  'Apple iPhone 11 Pro Frontkameraspezifikation',
  'Apple iPhone 11 Pro Hauptkameraspezifikation',
  'Apple iPhone 11 Pro Produktspezifikation',
  'Apple iPhone 11 Simspezifikation',
  'Apple iPhone 11 Produktspezifikation',
  'Apple Iphone 11 Frontkameraspezifikation',
  'Apple Iphone',
  'Apple iPhone 11 Pricing für 128 GB',
  'Apple Iphone 11 Pricing für 265 GB',
  'Apple iPhone 11 Pro Pricing für 64 GB',
  'Apple iPhone 11'],
 [86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 71]]

For closed questions a lot of matches with a high scoring match can be established. Also, individuals of type product variant are first on the list output. For open questions, the cutoff threshold must be lower due to the nature of the broader question, so the range of returned results is also broader:

In [11]:
input = 'welche iphones habt ihr?'
individual_lookup(noun_extraction(generate_nlp_output(input)), 'open_question', input, individuals)

[['Apple iPhone 11 Hauptkameraspezifikation',
  'Apple iPhone 11 Garantie',
  'Apple Iphone 11 Frontkameraspezifikation',
  'Apple Iphone',
  'Apple iPhone 11 Kapazitaet',
  'Apple iPhone 11 128 GB Weiß MWM22ZD/A',
  'Apple iPhone 11 256 GB Gelb MWMA2ZD/A',
  'Apple iPhone 11 256 GB Weiß MWM82ZD/A',
  'Apple iPhone 11 64 GB Gelb MWLW2ZD/A',
  'Apple iPhone 11 64 GB Weiß MWLU2ZD/A',
  'Apple iPhone 11 Pricing für 64 GB',
  'Apple iPhone 11 Pro 64 GB Nachtgrün MWC62ZD/A',
  'Apple iPhone 11 Pro Frontkameraspezifikation',
  'Apple iPhone 11 Pro Hauptkameraspezifikation',
  'Apple iPhone 11 Simspezifikation',
  'Apple iPhone 11 Pricing für 128 GB',
  'Apple Iphone 11 Pricing für 265 GB',
  'Apple iPhone 11 Pro Pricing für 64 GB',
  'Apple iPhone 11',
  'Apple iPhone 11 Pro',
  'Apple iPhone 11 Pro Produktspezifikation',
  'Apple iPhone 11 Produktspezifikation'],
 [68,
  64,
  64,
  64,
  60,
  57,
  57,
  57,
  57,
  57,
  57,
  57,
  57,
  57,
  57,
  57,
  57,
  57,
  56,
  52,
  52,
  5

When an entity match is perceived to be successful by automatical means, i.e. the entity/entities found are above a certain cutoff threshold, the result set of individuals is passed on to the second search method.

## Establishing conversation context

To drive the conversation, a conversation object is generated holding the context of an ongoing conversation between a chatbot and a user, and stores it throughout the session. The context object is updated if an interaction between user and chatbot demands it. This way, the application can then fetch responses that actually fit to a user's inquiry and it can tick-off nodes and relationships that have been part of the conversation already and don't need to be re-visited.

To do this, a second search method is applied on top of the result from the first search method (this one checks if any entity extracted by the NLP pipeline shows up in the ontology). It establishes a link to the ontology by comparing both the prediction and the name label of the found individual(s) to a set of pre-defined terms. For each label in the prediction outcome a different logical pattern is applicable. For example, if the AI predicts that a user inquires about the availability of a product, it checks whether all the found individuals are really of type 'product' and updates the conversation context accordingly. This serves as a verification of the approach.

In [12]:
def ontology_search_and_reason(recognized_individuals, prediction, ontology, classes,
        individuals, current_context_class=None, current_context_individuals=None, input=None):
        context_class = []
        context_individuals = []

        if 'product_availability' in str(prediction[0]):
            for ind in recognized_individuals:
                if individuals[ind][1] == 'Product':
                    context_individuals.append(ontology.search(iri=individuals[ind][0]).first())
            context_class = classes['Product']

        # and much more ...
    
        return context_class, context_individuals
    
input = 'habt ihr iphone 11?'
recognized_individuals = individual_lookup(noun_extraction(generate_nlp_output(input)), 
                                           'closed_question', input, individuals)
context_class, context_individuals = ontology_search_and_reason(recognized_individuals[0], 
                            model.predict(input.lower()), ontology, classes, individuals)
context_class, context_individuals

(schema.org.Product, [webprotege.stanford.edu.AppleIPhone11])

In the example above, the context class object points to the 'product' node in the ontology, while the actual instance, 'Apple iPhone 11' as a manifestation of this product, is stored alongside in the context individual. The charm of this approach is that it now lets the application traverse the (outgoing) relationships of this individual. For example, we can see that 'Apple iPhone 11' has a relationship to 4 adjacent nodes through the property 'hat_Produktvariante'. 

In [50]:
for i in range(0,4):
    print(context_individuals[0].hat_Produktvariante[i].label)

['Apple iPhone 11 128 GB Weiß MWM22ZD/A']
['Apple iPhone 11 256 GB Gelb MWMA2ZD/A']
['Apple iPhone 11 256 GB Weiß MWM82ZD/A']
['Apple iPhone 11 64 GB Gelb MWLW2ZD/A']


## Using the conversation context ro respond the user

The information advantage gained is played out in the response to the user by utilizing the current context nodes in the ontology as a reference point. Again, the predicted intent and the referenced nodes in the ontology aid the selection of the response. Mind the different, more fine-leveled responses that are introduced through the combination of NLP prediction and ontology mapping:

In [25]:
import csv
import random

def traverse_graph(context_individuals):
    attached = defaultdict(list)
    if context_individuals.is_a[0].name == 'Product':
        attached['Variants'] = context_individuals.hat_Produktvariante
    return attached

def choose_response(context_class, context_individuals, prediction):
        possible_responses = defaultdict(list)
        with open("ml_model/responses.csv", 'r', newline='') as f:
            reader = csv.reader(f, delimiter=';')
            next(reader)  # toss headers
            for label, reply in reader:
                possible_responses[label].append(reply)

        if (context_class.name == 'Product') and ("product_availability" in str(prediction[0])):
            if len(context_individuals) == 1:
                response = random.choice(possible_responses['_product_availability_one']) % dict(
                    first=context_individuals[0].label.first())
                
        if context_class.name in ['Product', 'Individual'] and ("confirmation" in str(prediction[0])) \
                or ("product_variant" in str(prediction[0])):
            if context_individuals[0].is_instance_of.first().name == 'Product':
                attached = []
                for individual in context_individuals:
                    attached.append(traverse_graph(individual))
                many = ""
                for e in range(0, len(attached)):
                    for variant in attached[e]['Variants']:
                        many += str(variant.label.first() + ", ")
                    response = random.choice(possible_responses['_product_variants']) % dict(many=many)
                
        return response

print("For a closed question like '" + input + "', the response prompts a verification from the user:")
choose_response(context_class, context_individuals, model.predict(input.lower()))

For a closed question like 'habt ihr iphone 11?', the response prompts a verification from the user:


'Ich habe Apple iPhone 11 gefunden. Möchtest du mehr darüber erfahren?'

Let us assume the user confirms she/he wants to know more about 'iPhone 11', as suggested by the chatbot and confirms that in the input line. The current context still points to 'iPhone 11', thereby 'memorizing' the last inquiry from the user. Now, the NLP engine is triggered again, predicting the user input "ja" as a sign of confirmation to its proposed 'iPhone 11' node. 

The application is able to use 'iPhone 11' as a starting point to traverse the ontology loaded into memory. It finds all the product variants that are attached by the property 'hat_Produktvariante' to 'iPhone 11'. The chatbot is able to steer the dialogue by asking whether the user wants to know about any of the offered variants, narrowing down the intent of the user and managing the conversation.

In [21]:
choose_response(context_class, context_individuals, model.predict('ja'.lower()))

'Ich habe mehrere Produktvarianten gefunden. Welches interessiert dich näher? Apple iPhone 11 128 GB Weiß MWM22ZD/A, Apple iPhone 11 256 GB Gelb MWMA2ZD/A, Apple iPhone 11 256 GB Weiß MWM82ZD/A, Apple iPhone 11 64 GB Gelb MWLW2ZD/A, Apple iPhone 11 64 GB Weiß MWLU2ZD/A, '

If the user would choose one of the variants in his response, the conversation context would now move away from the current class 'product' to the level of the 'individual'. 'Individual' holds a different set of modelled properties that can be used to steer the conversation, i.e. asking for product features or pricing. 

## Conclusion and further research

Managing the dialogue state in a chatbot session with the help of an ontology is a great step forward in creating transparency of AI-driven applications. Modelling the domain with an ontology deepens the understanding of the complexity of the domain. A formal, machine-readable representation of the domain aids traversing the graph to create a link to the current context and helps anticipating 'what comes next' in a conversation. A logger module could give insight into when a user does not find an answer to his question or is not satisfied with the response, in turn serving as a kind of feedback loop to the ontologist (is the model complete?). At some instances, it may be appropriate to break the pure line-based UI of the chatbot; in the example above, multiple instances are reported back to the user so she/he can choose from it. It would be more convenient for the user *and* the backend to present the options in a interactive, selectable fashion, instead of applying nitty-gritty techniques in the NLP pipeline to read out the exact one, especially if they all resemble each other very closely.