# Deep learning Q&A Artificial Intelligence Chat Bot; A Transformer-Based Approach

#### Alexander Swanepoel, _University of Warwick, Computer Science MSc_

## Dataset

## Introduction

We seek to explore the Deep learning approach to artificial intelligence chat bot construction. The application of a transformer based model has had great success with respect to natural language processing relative to state-of-the-art machine learning advances such as the OpenAI-GPT. The Model in particular we are interested in is a refined, robust version of the novel BERT ( Bidirectional Encoder Representations from Transformers) model, created and constructed by Lee at al. (https://arxiv.org/abs/1810.04805), known as RoBERTa. The model itself is keystoned on a pretraining ensemble, that allows for dynamic mask changing, and more fine-tuned hyperparameterisation. The results of the canonical predecessor can be seen tabulated below, however note that RoBERTa is noted in the white-paper to outperform BERT on every metric thereafter.

System                  | MultiNLI | Question NLI | SWAG
----------------------- | :------: | :----------: | :------:
BERT                    | **86.7** | **91.1**     | **86.3**
OpenAI GPT (Prev. SOTA) | 82.2     | 88.1         | 75.0

The following notebook undergoes a demonstratory journey, to the application and exhibition of the results of the RoBERTa algorithm, such that one can experience and test within their own satisfactions. Thus, for completion the will explore the mathematical frameworks of the transformer deep learning model for completeness, and give a brief insight as to the operation at hand. Then, we will build an application such that a rudimentary version of the model can be tested and sandboxed. 

## Architecture

We seek to give brief insight into the workings of a transformer model initially developed by Vaswani et. al in the paper 'Attention is All You Need' (reference: https://arxiv.org/abs/1706.03762). The context of the model for succinctness will be in a natural language processing setting. Given a black-box setting, the given input would undergo some transformation, and yield a particular output, with some stochastic accounting. For example, if the given input into the transformer black box would be the string 'Je suis etudiant', an ouput would read 'I am a student', where the translation itself would be a set of instructions such that the encoding of the initial input could be decoded via a subset of connections, into a tangible string. With respect to the initial paper, on a high level viewpoint, they presume to stack 6 encoders ontop of another, with connections between the encoders and an identical number of decoders.

The encoders themselves are comprised of a feed-forward neural network, essentially the canonical multi-layer perceptron(https://en.wikipedia.org/wiki/Feedforward_neural_network), and a self attention layer. The area of technical innovation would be the self attention layer. The self-attention layer can be locally described as a association mechanism for the algorithm. For example, if you were to pose the question, 'The chicken did not cross the road, as it was exhausted.' from an algorithmic perspective you would have to ponder the word 'it'. What is it in reference to? Is it the road, or the chicken? Its a triviality to humans, however to the machine learning model it is quite a complex process. In essence, the self-attention layer from ones understanding, evaluates the weightings and positions of the individual words in the input sequence, i.e. as the words are processed (with respect to each position in the input sequence), the layer searches for information gathered to optimise encoding for a particular word. 

This complicated mechanism is undergone in a similar fashion to recurrent neural network (RNN) machinery. When we consider how in RNN, the network itself maintains a hidden intermediate state that allows the respresentation of previous vectors(words), it has processed with respect to the current vector, the self attention layer behaves in a similar way. The self attention layer itself is where the transformer forms an 'understanding' of the relevancy of vectors with respect to the current vector being processed. For further understanding, please see https://en.wikipedia.org/wiki/Transformer_(machine_learning_model).

### Mathematical Framework 

For brevity, we will briefly discuss the computation of the self-attention mathematics for a deeper understanding. 

As discussed previously, self-attention layers can be represented as a scaled dot-product attention units. Passing a string into the transformer, will cause attention weightings to be computed simulatenously. The unit itself fabricates embeddings for every tokenised identifier i.e. word, referenced as token in context, that can contain information about the word itself, concurrently with a weighted combination of other related words, each with their respective weightings assigned by the attention weight.

Then, for each unit, the transformer requires three learning matrices; the query weightings, $W_Q$, the key weightings, $W_K$, and the value weightings, $W_V$. For a given token, i, the word embedding $x_i$, is factored by all the matrices, such that the output is a given query vector, $q_i = x_i W_Q$, key vector, $k_i = x_i W_V$, and a value vector, $v_i = x_i W_V$. Individual attention weights, $a_{ij}$ are computed via the dot product of the i'th and the j'th component of $q_i \bullet k_j$. Then, one can take the root with respec to the dimension of the key vector, to account for gradient stabilisation during training, and normalise the calculations via the softmax function. Retrospectively, $W_Q$ and $W_K$ are non-symmetric matrices, i.e. there is no transitive relationship between the ij'th component and the ji'th component. 

Then, the attention, can be defined as the following;
\begin{equation*}
Attention(Q,K,V) = Softmax \left((\frac{QK^T}{\sqrt{d_k}}\right)V
\end{equation*}

## Construction

#### Prerequisite Installations

Below, we will install the roberta-base for QA model (see https://huggingface.co/deepset/roberta-base-squad2 for more information), and load the required dependencies and model required.

In [1]:
#Transformer installation see https://huggingface.co/deepset/roberta-base-squad2
!pip install transformers



In [2]:
#Dependency Import with respect to RoBERTa Model
from transformers import AutoModelForQuestionAnswering
from transformers import AutoTokenizer
from transformers import pipeline

In [3]:
#Instantiating the Pipeline for Questions and Answers, and Model Load
model_name = "deepset/roberta-base-squad2"

nlp = pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="question-answering")

In [4]:
#Corpus, body of text (reference: https://en.wikipedia.org/wiki/Spider-Man)
wikipedia_text = """
Spider-Man is a superhero appearing in American comic books published by Marvel Comics. Created by writer-editor Stan Lee and artist Steve Ditko, he first appeared in the anthology comic book Amazing Fantasy #15 (August 1962) in the Silver Age of Comic Books. He has since been featured in movies, television shows, video games, and plays. Spider-Man is the alias of Peter Parker, an orphan raised by his Aunt May and Uncle Ben in New York City after his parents Richard and Mary Parker died in a plane crash. Lee and Ditko had the character deal with the struggles of adolescence and financial issues and gave him many supporting characters, such as Flash Thompson, J. Jonah Jameson and Harry Osborn, romantic interests Gwen Stacy, Mary Jane Watson and the Black Cat, and foes such as Doctor Octopus, the Green Goblin and Venom. In his origin story, he gets spider-related abilities from a bite from a radioactive spider; these include clinging to surfaces, superhuman strength and agility, and detecting danger with his "spider-sense." He also builds wrist-mounted "web-shooter" devices that shoot artificial spider webs of his own design.

When Spider-Man first appeared in the early 1960s, teenagers in superhero comic books were usually relegated to the role of sidekick to the protagonist. The Spider-Man series broke ground by featuring Peter Parker, a high school student from Queens behind Spider-Man's secret identity and with whose "self-obsessions with rejection, inadequacy, and loneliness" young readers could relate.[9] While Spider-Man had all the makings of a sidekick, unlike previous teen heroes such as Bucky and Robin, Spider-Man had no superhero mentor like Captain America and Batman; he thus had to learn for himself that "with great power there must also come great responsibility"—a line included in a text box in the final panel of the first Spider-Man story but later retroactively attributed to his guardian, his late Uncle Ben Parker.
"""

In [5]:
#Dictionary Pass Through; requires a Question and Context
# Define question set
question_set = {
                'question':'Who created Spiderman?', 
                'context':wikipedia_text
               }

In [6]:
results = nlp(question_set, dtype=object)

  return array(a, dtype, copy=False, order=order)


In [7]:
results['answer']

'Stan Lee and artist Steve Ditko'

#### Anvil Install & Dependencies

In [8]:
!pip install anvil-uplink

Collecting argparse
  Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Installing collected packages: argparse
Successfully installed argparse-1.4.0


In [9]:
import anvil.server

In [10]:
anvil.server.connect('HMKZ6CRFQU23PH7ZEMHND45W-GYOCEGB4RZII3GIA')

Connecting to wss://anvil.works/uplink
Anvil websocket open
Connected to "Default environment" as SERVER


##### Callable Function Setup

In [11]:
# Tells the jupyter server that this is a an Anvil callable function
@anvil.server.callable
# Define the function that is going to do our NLP
def answer_questions(question_text, context_text): 
    # Convert this to a dictionary
    question_set = {
                'question':question_text, 
                'context':context_text
               }
    # Run it through the NLP pipeline
    results = nlp(question_set)
    
    return results['answer']

In [12]:
anvil_result = answer_questions('Who is Spiderman\'s enemy?', wikipedia_text)

In [13]:
anvil_result

'Doctor Octopus'

## Conclusion 

The results themselves are remarkable, the notebook itself has provided a succinct introductory version of transformer deep learning, with library of canonical references to state-of-the-art machinery. The hope is we may find use for the pedagogical approach provided at a future date, to expand on the research provided and build more complex systems. 

With respect to the app constructed, we provide a private link to the app sandbox where one can explore the use of the app at their pleasure. The application itself is constructed as a parsing tool, designed so that one may pass in a body of text, from any source, in which one may pose a question with reference to the original text, and the AI will respond with a deep-learning constructed answer. The link for the application is as follows: https://gyocegb4rzii3gia.anvil.app/TJLRBEUDIC2J6NUXQHARB6FI (Please share with discretion as server space and bandwidth is limited). 