Tim Kettenacker, August 2020

# Ontology-aided Chatbot

## Topic Introduction and Use Case

With the first wave of AI applications arriving in the corporate world, automating domain-specific tasks is accelerating. AI applications are becoming more and more proficient in solving and automating one particular thing in one particular domain. Same is true for chatbots: A gym chatbot can be good at informing me about the opening hours and generating a workout plan, but it probably struggles at delivering accurate weather forecasts. If I manage to use the same vocabulary a chatbot is able to understand (that is, training data aligns with the practicality of the language and the domain), a chatbot is able to understand my intent and potentially, it could steer a conversation. For that, it needs to predict the intent behind a conversation. As a matter of fact, capturing and keeping the intent in a conversation is crucial to the acceptance and success of a chatbot. In this document, I illustrate both how NLP can be used to understand human language based on domain-specific training data, and how a mapping to an ontology of that domain can be used to navigate the conversation. 

To support that idea, my use case is a website chatbot offering various iPhone models for sale (in German). The user may ask for iPhones in general or specific models as well as their product features and pricing.


## Architecture and Process Overview

The chatbot browser interface is a very simplistic one. It is a box where the user types in any question or comment. The input is processed in the backend, a response is generated and displayed in the frontend. The user responds to that again and so on. The web service providing the user interface is spun up by Python library Flask. The computation logic is done in Python, with the different components split up into classes. An instance of the  **chatbot** class is handling the user input and creates instances of the other classes to orchestrate the logic.

The **trainer** class is used to load a pre-trained model on the training data. Facebooks "fasttext" library is utilized for that component. Note that training does not follow the usual training-test routine because of the limited amount of usable data. It is trained to predict the label for a user input, i.e. whether an input is about a product variant. 

The result set of this prediction outcome is then passed on to an instance of the **Natural Language Processor**. I utilize the German libraries of "spacy" for that. Due to the nature of the expected input - very small statements with only a couple of words - the result of the trainer instance needs to be verified with regards to the accuracy of the prediction. By applying shallow and dependency parsing, the overall structure is examined and it can be detected and cross-checked if for example a sentence is positioned as an open or closed question, in turn verifying or invalidating the prediction outcome of the trainer. The structure of a sentence may even hint at the presence of a product variant.

A pillar in the architecture is certainly the ontology that serves as a schema model describing the domain in place. I chose to go for the http://goodrelations.makolab.com/description due to its popular use in ecommerce-webpages and its broad coverage of the domain. I deleted some parts in my local representation of the ontology to declutter it and reducing the memory consumption, because the ontology is loaded to memory by the **Ontology Lookup** component. The Ontology Lookup maps the output generated by natural language processing to the entities in the ontology. When it finds a match, it traverses the graph, looking for adjacent hierarchical information. Depending on the entity and its outgoing relationships, it derives questions that can be used by the **Conversation Context** class instance to further narrow down the intent. The conversation context also stores and updates the current class and instance of the ontology as a reference point to steer the conversation.

A logging module is in place. Its purpose is to capture mainly the questions of the user that could not be answered. This way the log files display an information need that is not yet modelled but should be. 

![Chatbot Architecture](chatbot-architecture.jpg)

## Training Data

Natural language understanding depends on the quality and quantity of the data that was used to train the model. From an engineering perspective, finding and selecting annotated data that suits your use case can be a hassle. For english text data, usually the Brown corpus is used. However, the sources for German text data are comparatively scarce and hard to find. Due to the lack of sources in that very domain of my use case, I choose to annotate my own data (which was probably the most cumbersome part of the project). 

Data to be used for training needs to have a special format. For the fasttext engine, it looks like this:

\__label__product_availability_closed_question habt ihr produkte

\__label__product_availability_closed_question führst du produkte

\__label__product_availability_open_question was für produkte gibt es

\__label__product_availability_open_question welches produkt habt ihr vorrätig

\__label__product_variant iphone 11 gelb 64 GB

\__label__product_variant iphone 11 weiß 64 GB

\__label__confirmation genau

\__label__rejection nein

As you can see, it splits up into the label associated with a text. It is a bit fiddly to find the right amount of labels and text variants, especially if the sentences are very short. The many ways of expressing questions in the German language make it difficult for a machine to distinguish between the label to be predicted. For open and closed questions, it is merely a change of a few words!