# Contextual Chatbots with Tensorflow

In conversations, context is king! We’ll build a chatbot framework using Tensorflow and add some context handling to show how this can be approached.

Ever wonder why most chatbots lack conversational [context](https://chatbotsmagazine.com/maintaining-context-in-chatbots-2016b6a5b7c6)?

How is this possible given the importance of context in nearly all conversations?

We’re going to create a [chatbot framework](https://chatbotsmagazine.com/design-framework-for-chatbots-aa27060c4ea3) and build a conversational model for an **island moped rental shop**. The chatbot for this small business needs to handle simple questions about hours of operation, reservation options and so on. We also want it to handle contextual responses such as inquiries about same-day rentals. Getting this right [could save a vacation](https://medium.com/p/how-a-messaging-app-saved-my-vacation-192b031a96f5)!

We’ll be working through 3 steps:

* We’ll transform conversational intent definitions to a Tensorflow model
* Next, we will build a chatbot framework to process responses
* Lastly, we’ll show how basic context can be incorporated into our response processor

We’ll be using **[tflearn](http://tflearn.org/)**, a layer above **[tensorflow](https://www.tensorflow.org/)**, and of course **[Python](https://www.python.org/)**. As always we’ll use **[Python notebook](https://ipython.org/notebook.html)** as a tool to facilitate our work

# Transform Conversational Intent Definitions to a Tensorflow Model
The complete notebook for our first step is [here](https://github.com/ugik/notebooks/blob/master/Tensorflow%20chat-bot%20model.ipynb).

A chatbot framework needs a structure in which conversational intents are defined. One clean way to do this is with a JSON file, like [this](https://github.com/ugik/notebooks/blob/master/intents.json).

![chatbot intents](figs/1_pcbw_Y4acT750-lL98iw2Q.png)

Each conversational intent contains:

* a **tag** (a unique name)
* **patterns** (sentence patterns for our neural network text classifier)
* **responses** (one will be used as a response)

And later on we’ll add some basic contextual elements.

First we take care of our imports:

In [3]:
# things we need for NLP
import nltk
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()

# things we need for Tensorflow
import numpy as np
import tflearn
import tensorflow as tf
import random

Have a look at "[Deep Learning in 7 lines of code](https://chatbotslife.com/deep-learning-in-7-lines-of-code-7879a8ef8cfb)" for a primer or [here](https://chatbotslife.com/tensorflow-demystified-80987184faf7) if you need to demystify Tensorflow.

In [4]:
# import our chat-bot intents file
import json
with open('intents.json') as json_data:
    intents = json.load(json_data)

With our intents JSON [file](https://github.com/ugik/notebooks/blob/master/intents.json) loaded, we can now begin to organize our documents, words and classification classes.

In [6]:
words = []
classes = []
documents = []
ignore_words = ['?']
# loop through each sentence in our intents patterns
for intent in intents['intents']:
    for pattern in intent['patterns']:
        # tokenize each word in the sentence
        w = nltk.word_tokenize(pattern)
        # add to our words list
        words.extend(w)
        # add to documents in our corpus
        documents.append((w, intent['tag']))
        # add to our classes list
        if intent['tag'] not in classes:
            classes.append(intent['tag'])

# stem and lower each word and remove duplicates
words = [stemmer.stem(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))

# remove duplicates
classes = sorted(list(set(classes)))

print (len(documents), "documents")
print (len(classes), "classes", classes)
print (len(words), "unique stemmed words", words)

27 documents
9 classes ['goodbye', 'greeting', 'hours', 'mopeds', 'opentoday', 'payments', 'rental', 'thanks', 'today']
48 unique stemmed words ["'d", "'s", 'a', 'acceiv', 'anyon', 'ar', 'bye', 'can', 'card', 'cash', 'credit', 'day', 'do', 'doe', 'good', 'goodby', 'hav', 'hello', 'help', 'hi', 'hour', 'how', 'i', 'is', 'kind', 'lat', 'lik', 'mastercard', 'mop', 'of', 'on', 'op', 'rent', 'see', 'tak', 'thank', 'that', 'ther', 'thi', 'to', 'today', 'we', 'what', 'when', 'which', 'work', 'yo', 'you']


We create a list of documents (sentences), each sentence is a list of stemmed words and each document is associated with an intent (a class).

The stem ‘tak’ will match ‘take’, ‘taking’, ‘takers’, etc. We could clean the words list and remove useless entries but this will suffice for now.

Unfortunately this data structure won’t work with Tensorflow, we need to transform it further: from documents of words into tensors of numbers.

In [7]:
# create our training data
training = []
output = []
# create an empty array for our output
output_empty = [0] * len(classes)

# training set, bag of words for each sentence
for doc in documents:
    # initialize our bag of words
    bag = []
    # list of tokenized words for the pattern
    pattern_words = doc[0]
    # stem each word
    pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]
    # create our bag of words array
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)

    # output is a '0' for each tag and '1' for current tag
    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1

    training.append([bag, output_row])

# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)

# create train and test lists
train_x = list(training[:,0])
train_y = list(training[:,1])

Notice that our data is shuffled. Tensorflow will take some of this and use it as test data to gauge accuracy for a newly fitted model.

If we look at a single x and y list element, we see ‘[bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model)’ arrays, one for the intent pattern, the other for the intent class.


**train_x example**: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]

**train_y example**: [0, 0, 1, 0, 0, 0, 0, 0, 0] 
`

We’re ready to build our model.

In [8]:
# reset underlying graph data
tf.reset_default_graph()
# Build neural network
net = tflearn.input_data(shape=[None, len(train_x[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(train_y[0]), activation='softmax')
net = tflearn.regression(net)

# Define model and setup tensorboard
model = tflearn.DNN(net, tensorboard_dir='tflearn_logs')
# Start training (apply gradient descent algorithm)
model.fit(train_x, train_y, n_epoch=1000, batch_size=8, show_metric=True)
model.save('model.tflearn')

Training Step: 3999  | total loss: [1m[32m0.20905[0m[0m | time: 0.018s
| Adam | epoch: 1000 | loss: 0.20905 - acc: 0.9838 -- iter: 24/27
Training Step: 4000  | total loss: [1m[32m0.19499[0m[0m | time: 0.025s
| Adam | epoch: 1000 | loss: 0.19499 - acc: 0.9854 -- iter: 27/27
--
INFO:tensorflow:C:\Users\phs\textmining\python\text-mining-camp\note\arkwith\Contextual_Chatbots\model.tflearn is not in all_model_checkpoint_paths. Manually adding it.


This is the same tensor structure as we used in our 2-layer neural network in [our ‘toy’ example](https://chatbotslife.com/deep-learning-in-7-lines-of-code-7879a8ef8cfb). Watching the model fit our training data never gets old…

![interactive build of a model in tflearn](figs/1_5UIqnedBzsYTXJ81wEU-vg.GIF)

To complete this section of work, we’ll save (‘pickle’) our model and documents so the next notebook can use them.

In [9]:
# save all of our data structures
import pickle
pickle.dump( {'words':words, 'classes':classes, 'train_x':train_x, 'train_y':train_y}, open( "training_data", "wb" ) )


![...](figs/1_f9Sq7I_pauPQ9u4PbtPt4w.JPEG)

# Building Our Chatbot Framework

The complete notebook for our second step is [here](https://github.com/ugik/notebooks/blob/master/Tensorflow%20chat-bot%20response.ipynb).

We’ll build a simple state-machine to handle responses, using our intents model (from the previous step) as our classifier. That’s [how chatbots work](https://medium.freecodecamp.com/how-chat-bots-work-dfff656a35e2).

## <center>A contextual chatbot framework is a classifier within a state-machine.</center>

After loading the same imports, we’ll un-pickle our model and documents as well as reload our intents file. Remember our chatbot framework is separate from our model build — you don’t need to rebuild your model unless the intent patterns change. With several hundred intents and thousands of patterns the model could take several minutes to build.



In [10]:
# restore all of our data structures
import pickle
data = pickle.load( open( "training_data", "rb" ) )
words = data['words']
classes = data['classes']
train_x = data['train_x']
train_y = data['train_y']

# import our chat-bot intents file
import json
with open('intents.json') as json_data:
    intents = json.load(json_data)

Next we will load our saved Tensorflow (tflearn framework) model. Notice you first need to define the Tensorflow model structure just as we did in the previous section.

In [11]:
# load our saved model
model.load('./model.tflearn')

INFO:tensorflow:Restoring parameters from C:\Users\phs\textmining\python\text-mining-camp\note\arkwith\Contextual_Chatbots\model.tflearn


Before we can begin processing intents, we need a way to produce a bag-of-words from user input. This is the same technique as we used earlier to create our training documents.

In [12]:
def clean_up_sentence(sentence):
    # tokenize the pattern
    sentence_words = nltk.word_tokenize(sentence)
    # stem each word
    sentence_words = [stemmer.stem(word.lower()) for word in sentence_words]
    return sentence_words

# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(sentence, words, show_details=False):
    # tokenize the pattern
    sentence_words = clean_up_sentence(sentence)
    # bag of words
    bag = [0]*len(words)  
    for s in sentence_words:
        for i,w in enumerate(words):
            if w == s: 
                bag[i] = 1
                if show_details:
                    print ("found in bag: %s" % w)

    return(np.array(bag))

In [13]:
p = bow("is your shop open today?", words)
print (p)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
 0 0 0 1 0 0 0 0 0 1 0]


We are now ready to build our response processor.

In [14]:
ERROR_THRESHOLD = 0.25
def classify(sentence):
    # generate probabilities from the model
    results = model.predict([bow(sentence, words)])[0]
    # filter out predictions below a threshold
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], r[1]))
    # return tuple of intent and probability
    return return_list

def response(sentence, userID='123', show_details=False):
    results = classify(sentence)
    # if we have a classification then find the matching intent tag
    if results:
        # loop as long as there are matches to process
        while results:
            for i in intents['intents']:
                # find a tag matching the first result
                if i['tag'] == results[0][0]:
                    # a random response from the intent
                    return print(random.choice(i['responses']))

            results.pop(0)

Each sentence passed to response() is classified. Our classifier uses model.predict() and is lighting fast. The probabilities returned by the model are lined-up with our intents definitions to produce a list of potential responses.

If one or more classifications are above a threshold, we see if a tag matches an intent and then process that. We’ll treat our classification list as a stack and pop off the stack looking for a suitable match until we find one, or it’s empty.

Let’s look at a classification example, the most likely tag and its probability are returned.

In [15]:
classify('is your shop open today?')

[('opentoday', 0.6864895), ('today', 0.31298262)]

Notice that ‘is your shop open today?’ is not one of the patterns for this intent: “patterns”: [“Are you open today?”, “When do you open today?”, “What are your hours today?”] however the terms ‘open’ and ‘today’ proved irresistible to our model (they are prominent in the chosen intent).

We can now generate a chatbot response from user-input:

In [16]:
response('is your shop open today?')

Our hours are 9am-9pm every day


And other context-free responses…

In [17]:
response('do you take cash?')

We accept most major credit cards


In [18]:
response('what kind of mopeds do you rent?')

We rent Yamaha, Piaggio and Vespa mopeds


In [19]:
response('Goodbye, see you later')

Bye! Come back again soon.


![...](figs/1_RrQH1Mt6R73nq6lO6vTZ2w.JPEG)

Let’s work in some basic context into our moped rental chatbot conversation.

# Contextualization

We want to handle a question about renting a moped and ask if the rental is for today. That clarification question is a simple contextual response. If the user responds ‘today’ and the context is the rental timeframe then it’s best they call the rental company’s 1–800 #. No time to waste.

To achieve this we will add the notion of ‘state’ to our framework. This is comprised of a data-structure to maintain state and specific code to manipulate it while processing intents.

Because the state of our state-machine needs to be easily persisted, restored, copied, etc. it’s important to keep it all in a data structure such as a dictionary.

Here’s our response process with basic contextualization: