# Building your own Chatbot

Chatbots, or generally referred to conversation software are amazing tools for a lot of businesses. They help businesses server their clients server 24X7 without increasing effort, with consistent quality and the inbuilt option to defer to a human when the bots are not enough. 

They are great example where technology or AI has come to improve the impact human effort. 

They range from voice based solutions like Alexa to text based Intercom chat boxes to menu based navigation in Uber

So far, we have seen one off application of every NLP topic that we have seen: 
- text cleaning using grammar and vocabulary insights 
- linguistics (and statistical parsers) to mine questions from text
- entity recognition for information extraction
- text similarity using text based vectors like GloVe/word2vec

We now combine all of them into a lot more complicated setup and write our own chatbot, from scratch. But before you build anything from scratch, you should ask why.

## Why should I build the service again? 


##### Related: Why can't I use FB/MSFT/some other cloud service?

- **Privacy and Competition**: As a business, is it a good idea to share information of your users with a Facebook, or Microsoft? Or even a smaller company? 
- **Cost and Constraints**: Your funky cloud limits your design choices made by a particular intelligence provider to those made like Google or Facebook. Additionally, you are now paying for each http call you make, which is slower than locally running code too
- **Freedom to Customize and Extend**: You can develop a solution which performs better for you! You don't have to cure world hunger, just keep shipping an ever increasing more business value via quality software. If you are a bigco, you have all the more reason to invest in extendible software

For the sake of simplicity, we will assume that our bot does not need to remember the context of any question. So it sees one input, and responds to it and is done. No links established with previous input

## Word Vectors + Heuristic - Fancy Stuff = Quick Working Code

Let's start by simply loading the word vectors using gensim as we have seen earlier

In [1]:
import numpy as np
import gensim
print(f"Gensim version: {gensim.__version__}")

Gensim version: 3.4.0


In [2]:
from tqdm import tqdm
class TqdmUpTo(tqdm):
    def update_to(self, b=1, bsize=1, tsize=None):
        if tsize is not None: self.total = tsize
        self.update(b * bsize - self.n)

def get_data(url, filename):
    """
    Download data if the filename does not exist already
    Uses Tqdm to show download progress
    """
    import os
    from urllib.request import urlretrieve
    
    if not os.path.exists(filename):

        dirname = os.path.dirname(filename)
        if not os.path.exists(dirname):
            os.makedirs(dirname)

        with TqdmUpTo(unit='B', unit_scale=True, miniters=1, desc=url.split('/')[-1]) as t:
            urlretrieve(url, filename, reporthook=t.update_to)
    else:
        print("File already exists, please remove if you wish to download again")

embedding_url = 'http://nlp.stanford.edu/data/glove.6B.zip'
get_data(embedding_url, 'data/glove.6B.zip')

File already exists, please remove if you wish to download again


In [3]:
# !unzip data/glove.6B.zip 
# !mv -v glove.6B.300d.txt data/glove.6B.300d.txt 
# !mv -v glove.6B.200d.txt data/glove.6B.200d.txt 
# !mv -v glove.6B.100d.txt data/glove.6B.100d.txt 
# !mv -v glove.6B.50d.txt data/glove.6B.50d.txt 

from gensim.scripts.glove2word2vec import glove2word2vec
glove_input_file = 'data/glove.6B.300d.txt'
word2vec_output_file = 'data/glove.6B.300d.txt.word2vec'
import os
if not os.path.exists(word2vec_output_file):
    glove2word2vec(glove_input_file, word2vec_output_file)

In [4]:
%%time
from gensim.models import KeyedVectors
filename = word2vec_output_file
embed = KeyedVectors.load_word2vec_format(word2vec_output_file, binary=False)

CPU times: user 1min 49s, sys: 2.11 s, total: 1min 51s
Wall time: 1min 47s


Let's quickly check if we can vectorize any word by checking for word embedding for any word e.g. 'awesome' 

In [5]:
assert embed['awesome'] is not None

'awesome', this works!

Here is the first challenge for us: **Figuring out the right user intent**.

This is commonly referred to as the problem of intent categorization. 

As a toy example, we will try to build an order bot which someone like DoorDash/Swiggy/Zomato might use. 

## Use Case: Food Order Bot

Consider the following sample sentence:  “I’m looking for a cheap Chinese place in Indiranagar”

We want to pick out Chinese as a cuisine type in the sentence. We can obviously do simple approaches, like exact substring match (search "Chinese") or TF-IDF based matches. 

Instead, we will generalize the model to discover cuisine types which we might not have identified yet, but we can learn about them via the GloVe embedding:

We’ll keep it as simple as possible: 
- provide some example cuisine types to tell the model that we need cuisines, and 
- look for the most similar words in the sentence

We’ll loop through the words in the sentence, and pick out the ones whose average cosine similarity to the reference words is above some threshold.

### Do word vectors even work for this? 

In [6]:
cuisine_refs = ["mexican", "thai", "british", "american", "italian"]
sample_sentence = "I’m looking for a cheap Indian or Chinese place in Indiranagar"

For simplicity's sake, the code below is written as for loops, but can be vectorized for speed. 

We iterate over each word in the input sentence and find the similarity score with respect to known cuisine words. 

The higher the value, the more likely is the word to be something related to our cuisine references or `cuisine_refs`

In [7]:
tokens = sample_sentence.split()
tokens = [x.lower().strip() for x in tokens] 
threshold = 18.3
found = []
for term in tokens:
    if term in embed.vocab:
        scores = []
        for C in cuisine_refs:
            scores.append(np.dot(embed[C], embed[term].T))
            # hint replace above above np.dot with: 
            # scores.append(embed.cosine_similarities(<vector1>, <vector_all_others>))
        mean_score = np.mean(scores)
        print(f"{term}: {mean_score}")
        if mean_score > threshold:
            found.append(term)
print(found)

looking: 7.448504447937012
for: 10.627421379089355
a: 11.809560775756836
cheap: 7.09670877456665
indian: 18.64516258239746
or: 9.692893981933594
chinese: 19.09498405456543
place: 7.651237487792969
in: 10.085711479187012
['indian', 'chinese']


The threshold is determined empirically. Notice that we are able to infer 'Indian' and 'Chinese' as cuisines, even if they are not part of the original set. 

Ofcourse, exact matches will have a much higher score.

This is a good example where a better problem formulation, in terms of "generic" cuisine type which can be learned is more helpful than a dictionary based cuisine type. This also proves that we can rely on word vector based approaches. 

Can we extend this for user intent classification? Let's try this:

### Next Stop: Classifying user intent

We want to be able to put sentences in categories by user "intents". Intents are a generic mechanism which combine multiple individual example into one semantic umbrella e.g. "hi", "hey", "good morning", "wassup!" are all example of the intent: _greeting_

Using 'greeting' as an input, the back end logic can then determin how to respond to user. 

There are many ways we could combine word vectors to represent a sentence, but again we’re going to do the simplest thing possible: add them up. 

This is definitely a less than ideal solution, but works in practice because of the simple unsupervised approach we use with this. 

In [8]:
def sum_vecs(embed,text):

    tokens = text.split(' ')
    vec = np.zeros(embed.vector_size)

    for idx, term in enumerate(tokens):
        if term in embed.vocab:
            vec = vec + embed[term]
    return vec

sentence_vector = sum_vecs(embed, sample_sentence)
print(sentence_vector.shape)

(300,)


Let's define a data dictionary with some examples for each intent. This dictionary can be updated as we have more user inputs:

In [9]:
data={
  "greet": {
    "examples" : ["hello","hey there","howdy","hello","hi","hey","hey ho"],
    "centroid" : None
  },
  "inform": {
    "examples" : [
        "i'd like something asian",
        "maybe korean",
        "what mexican options do i have",
        "what italian options do i have",
        "i want korean food",
        "i want german food",
        "i want vegetarian food",
        "i would like chinese food",
        "i would like indian food",
        "what japanese options do i have",
        "korean please",
        "what about indian",
        "i want some chicken",
        "maybe thai",
        "i'd like something vegetarian",
        "show me french restaurants",
        "show me a cool malaysian spot",
        "where can I get some spicy food"
    ],
    "centroid" : None
  },
  "deny": {
    "examples" : [
      "nah",
      "any other places ?",
      "anything else",
      "no thanks"
      "not that one",
      "i do not like that place",
      "something else please",
      "no please show other options"
    ],
    "centroid" : None
  },
    "affirm":{
        "examples":[
            "yeah",
            "that works",
            "good, thanks",
            "this works",
            "sounds good",
            "thanks, this is perfect",
            "just what I wanted"
        ],
        "centroid": None
    }

}

The approach we have is simple, we find the centroid of each "user intent". A centroid is just a central point to denote each intent. Then, the incoming text is assigned to the user intent nearest to the corresponding cluster. 

In [10]:
def get_centroid(embed, examples):
    C = np.zeros((len(examples),embed.vector_size))
    for idx, text in enumerate(examples):
        C[idx,:] = sum_vecs(embed,text)

    centroid = np.mean(C,axis=0)
    assert centroid.shape[0] == embed.vector_size
    return centroid

In [11]:
# Adding Centroid to data dictionary
for label in data.keys():
    data[label]["centroid"] = get_centroid(embed,data[label]["examples"])

In [12]:
for label in data.keys():
    print(f"{label}: {data[label]['examples']}")

greet: ['hello', 'hey there', 'howdy', 'hello', 'hi', 'hey', 'hey ho']
inform: ["i'd like something asian", 'maybe korean', 'what mexican options do i have', 'what italian options do i have', 'i want korean food', 'i want german food', 'i want vegetarian food', 'i would like chinese food', 'i would like indian food', 'what japanese options do i have', 'korean please', 'what about indian', 'i want some chicken', 'maybe thai', "i'd like something vegetarian", 'show me french restaurants', 'show me a cool malaysian spot', 'where can I get some spicy food']
deny: ['nah', 'any other places ?', 'anything else', 'no thanksnot that one', 'i do not like that place', 'something else please', 'no please show other options']
affirm: ['yeah', 'that works', 'good, thanks', 'this works', 'sounds good', 'thanks, this is perfect', 'just what I wanted']


In [13]:
def get_intent(embed,data, text):
    intents = list(data.keys())
    vec = sum_vecs(embed,text)
    scores = np.array([ np.linalg.norm(vec-data[label]["centroid"]) for label in intents])
    return intents[np.argmin(scores)]

In [14]:
for text in ["hey ","i am looking for chinese food","not for me", "ok, this is good"]:
    print(f"text : '{text}', predicted_label : '{get_intent(embed, data, text)}'")

text : 'hey ', predicted_label : 'greet'
text : 'i am looking for chinese food', predicted_label : 'inform'
text : 'not for me', predicted_label : 'deny'
text : 'ok, this is good', predicted_label : 'affirm'


## Bot Responses

We now know how to understand and categorize user intent. We now need to simply respond to each user intent with some corresponding responses. Let's these 'template' bot responses in one place:

In [15]:
templates = {
        "utter_greet": ["hey there!", "Hey! How you doin'? "],
        "utter_options": ["ok, let me check some more"],
        "utter_goodbye": ["Great, I'll go now. Bye bye", "bye bye", "Goodbye!"],
        "utter_default": ["Sorry, I didn't quite follow"],
        "utter_confirm": ["Got it", "Gotcha", "Your order is confirmed now"]
    }

Storing the Response map in a separate entity is helpful. This means you can generate responses at a separate service from your intent understanding module and then glue them together. 

In [16]:
response_map = {
    "greet": "utter_greet",
    "affirm": "utter_goodbye",
    "deny": "utter_options",
    "inform": "utter_confirm",
    "default": "utter_default",
}

If you'll notice carefully, there is no need for response map to be depend only on the intent categorized. You can convert this response map to a separate function which generates the map using related context and then picks a bot template. 

But here, for simplicity, let's keep it as a human interpretable dictionary/JSON style structure. 

Let's write a simple get_bot_response function which takes in the response mapping, templates, and the intent as inputs and return the actual bot response:

In [17]:
import random
def get_bot_response(bot_response_map, bot_templates, intent):
    if intent not in list(response_map):
        intent = "default"
    select_template = bot_response_map[intent]
    templates = bot_templates[select_template]
    return random.choice(templates)

Let's quickly tests this with one sentence: 

In [18]:
user_intent = get_intent(embed, data, "i want indian food")
get_bot_response(response_map, templates, user_intent)

'Got it'

**Better Response Personalisation?**:

You'll notice that the function picks one template at random for any particular "bot intent" so to say. While this is for simplicity here, in practice - you can train a ML model to pick a response personalized to user. 

A simple personalization to make is to be more coherent with the talking/typing style of the user itself. E.g. some user might be formal with "Hello, how are you today?", while other might be more informal with "yo". 

So "Hello" gets "Goodbye!" in response while "yo!" gets "bye bye" or even "ttyl" in the same conversation. 

---

For now, let's go ahead and check the bot response for the sentences which we have already seen. 

In [19]:
for text in ["hey","i am looking for italian food","not for me", "ok, this is good"]:
    user_intent = get_intent(embed, data, text)
    bot_reply = get_bot_response(response_map, templates, user_intent)
    print(f"text : '{text}', intent: {user_intent}, bot: {bot_reply}")

text : 'hey', intent: greet, bot: Hey! How you doin'? 
text : 'i am looking for italian food', intent: inform, bot: Gotcha
text : 'not for me', intent: deny, bot: ok, let me check some more
text : 'ok, this is good', intent: affirm, bot: Goodbye!
