# **Chatbot**

This notebook intend to perform the creation of a simple chatbot using tensorflow api. 

This chatbot does not learn throughtout the conversation, and simply pick the best answer possible inside a json file.

In [2]:
from google.colab import drive
from google.colab import files
import io
import os
drive.mount('/content/drive')
#%cd /content/drive/My Drive/Colab Notebooks
os.chdir("/content/drive/My Drive/Colab Notebooks/Chatbot")
#os.listdir()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [15]:
import nltk
nltk.download('punkt')
from nltk.stem.lancaster import LancasterStemmer

import numpy as np
import tflearn
import tensorflow as tf
import random
import json
import pickle

stemmer = LancasterStemmer()

with open('intents.json', 'r') as file:
    data = json.load(file)


#print(data['intents'])

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [0]:
try:
    with open('data.pickle', 'rb') as f:
        words, labels, training, output = pickle.load(f)
except:
    words  = [] # List of all the words in the vocabulary
    labels = [] # List of all intents
    docs_x = [] # List of list of words, according to the intents
    docs_y = [] # List of intents of each element in docs_x

    for intent in data['intents']:
        for pattern in intent['patterns']:
            wrds = nltk.word_tokenize(pattern)
            words.extend(wrds)
            docs_x.append(wrds)
            docs_y.append(intent['tag'])

            if intent['tag'] not in labels:
                labels.append(intent['tag'])

    words = [w.lower() for w in words]
    #words = [stemmer.stem(w) for w in words if w not in '?']
    words = sorted(list(set(words)))

    labels = sorted(labels)

    training = []  # List of binary arrays, one array for each sentence.
    output   = []  # List of binary arrays, one array for the intent of each sentence.

    out_empty = [0 for _ in range(len(labels))]

    for x, doc in enumerate(docs_x):
        bag = []

        wrds = [stemmer.stem(w) for w in doc]

        for w in words:
            if w in wrds:
                bag.append(1)
            else:
                bag.append(0)
        
        output_row = out_empty[:]
        output_row[labels.index(docs_y[x])] = 1

        training.append(bag)
        output.append(output_row)
    
    # Converting list to numpy arrays
    training = np.array(training)
    output = np.array(output)

    with open('data.pickle', 'wb') as f:
        pickle.dump((words, labels, training, output), f)

# Defining the DNN (deep neural network), aka MLP (Multi Layer Perceptron)

We use *tflearn* to build our model. This model has a input layer with some neurons, two hidden layers with 8 neurons each, and an output layer with 6 neurons (1 for each intent). In the output layer, it is used the softmax activation function.

What we are doing is using the DNN to classify the sentence the user types in one of the intents on the json file, and then choose a random answer from that intent. If the output does not show a very high confidence, the answer is "*I don't understand it, sorry :(*"

In [17]:
tf.reset_default_graph()
net = tflearn.input_data(shape=[None, len(training[0])]) # input layer
net = tflearn.fully_connected(net, 8)                    # hidden layer
net = tflearn.fully_connected(net, 8)                    # hidden layer
net = tflearn.fully_connected(net, len(output[0]), activation='softmax') # output layer
net = tflearn.regression(net)

model = tflearn.DNN(net)


model.fit(training, output, n_epoch=1000, batch_size=8, show_metric=True)
model.save('Model.tflearn')
print('Model was saved successfully!')

Training Step: 3999  | total loss: [1m[32m0.07798[0m[0m | time: 0.011s
| Adam | epoch: 1000 | loss: 0.07798 - acc: 0.9542 -- iter: 24/26
Training Step: 4000  | total loss: [1m[32m0.07065[0m[0m | time: 0.014s
| Adam | epoch: 1000 | loss: 0.07065 - acc: 0.9588 -- iter: 26/26
--
INFO:tensorflow:/content/drive/My Drive/Colab Notebooks/Chatbot/Model.tflearn is not in all_model_checkpoint_paths. Manually adding it.
Model was saved successfully!


In [0]:
def bag_of_words(s, words):
    """ Transforms a sentence in a binary array of corresponding to which words 
    on the vocabulary of the DNN is presented in the sentence given by the user. 
    This array is used as input of the DNN. """

    bag = [0 for _ in range(len(words))]

    s_words = nltk.word_tokenize(s)
    s_words = [stemmer.stem(word.lower()) for word in s_words]

    for se in s_words:
        for i, w in enumerate(words):
            if w == se:
                bag[i] = 1
    
    return np.array(bag)

def chat():
    """Reads the input given by the user, process it and calculate the best answer with the DNN"""

    print('Start talking to the chatbot!')
    while(True):
        inp = str(input('You: ')).lower()
        if inp in ['quit', 'bye']:
            print("Bot: See you later.")
            break
        
        result = model.predict([bag_of_words(inp, words)])[0]
        index = np.argmax(result)
        tag = labels[index]

        for tg in data['intents']:
            if tg['tag'] == tag:
                responses = tg['responses']
        #print(result)
        threshold = result[index]
        if threshold > 0.8:
            print('Bot: {}'.format(random.choice(responses)))
        else:
            print("Bot: I don't understand it, sorry :(")



In [0]:
chat()

Start talking to the chatbot!
You: Hi
Bot: Hi there, how can I help?
You: Whats your name?
Bot: I'm Will!
You: How old are you?
Bot: I am 26 years old!
You: I am leaving.
Bot: Sad to see you go :(
You: Bye.
Bot: Goodbye!
