<a href="https://colab.research.google.com/github/codexer-25aditi/Contact/blob/main/AI_ChatBot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Required external python libraries


---



1. Natural Language ToolKit: easy text processing for human language data (tokenization, stemming etc.)

2. TensorFlow: Machine Learning models

3. TFLearn: Deep Learning models





In [None]:
pip install nltk

In [None]:
pip install tensorflow

In [None]:
pip install tflearn

Collecting tflearn
  Downloading tflearn-0.5.0.tar.gz (107 kB)
[?25l[K     |███                             | 10 kB 15.3 MB/s eta 0:00:01[K     |██████                          | 20 kB 18.8 MB/s eta 0:00:01[K     |█████████▏                      | 30 kB 14.3 MB/s eta 0:00:01[K     |████████████▏                   | 40 kB 11.0 MB/s eta 0:00:01[K     |███████████████▎                | 51 kB 7.4 MB/s eta 0:00:01[K     |██████████████████▎             | 61 kB 8.5 MB/s eta 0:00:01[K     |█████████████████████▍          | 71 kB 7.9 MB/s eta 0:00:01[K     |████████████████████████▍       | 81 kB 7.6 MB/s eta 0:00:01[K     |███████████████████████████▌    | 92 kB 8.4 MB/s eta 0:00:01[K     |██████████████████████████████▌ | 102 kB 8.7 MB/s eta 0:00:01[K     |████████████████████████████████| 107 kB 8.7 MB/s 
Building wheels for collected packages: tflearn
  Building wheel for tflearn (setup.py) ... [?25l[?25hdone
  Created wheel for tflearn: filename=tflearn-0.5.0-py3-

#Import statements

In [None]:
import nltk
nltk.download('punkt')
import numpy as np
import tensorflow as tf
import tflearn
import random
import json

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Instructions for updating:
non-resource variables are not supported in the long term


#The intents.json file

---

Uploading and viewing the intents.json file



In [None]:
from google.colab import files
uploaded = files.upload()

with open(r"intents.json") as intents_file:
  intents = json.load(intents_file)
  for i in intents['intents']:
    print("tag:", i['tag'])
    print("patterns:", i['patterns'])
    print("responses:", i['responses'])
    print()

Saving intents.json to intents.json
tag: greeting
patterns: ['Greetings', 'Hi', 'How are you', 'Is anyone there?', 'Hello', 'Good day']
responses: ['Hello, thanks for visiting', 'Good to see you again', 'Hi there, how can I help?']

tag: goodbye
patterns: ['Bye', 'See you later', 'Goodbye']
responses: ['See you later, thanks for visiting', 'Have a nice day', 'Bye! Come back again soon.']

tag: thanks
patterns: ['Thanks', 'Thank you', "That's helpful"]
responses: ['Happy to help!', 'Any time!', 'My pleasure']

tag: hours
patterns: ['What hours are you open?', 'What are your hours?', 'When are you open?']
responses: ["We're open every day 9am-9pm", 'Our hours are 9am-9pm every day']

tag: timings
patterns: ['At what time are you open?', 'At what times do you remain open?', 'Timings?', 'Till what time does the shop remains open']
responses: ["We're open every day 9am-9pm", 'Our hours are 9am-9pm every day']

tag: payments
patterns: ['Do you take credit cards?', 'Do you accept Mastercard?'

#Stemming

---


Language processing technique to reduce words to thier root form.

We will implement Lancaster Stemming technique as it is straightforward (but also very agressive), ideal for a chatbot model , stemming words to reduce the vocabulary of our model and attempt to find the more general meaning behind sentences.

In [None]:
stemmer = nltk.stem.lancaster.LancasterStemmer()

#list containing all different stemmed words, the vocabulary for the model
words = []
#list of all differnt tags in our intents data
classes = [] 
#tuples containing all the different words along with the tag they belong to
documents = [] 
ignore_words = ['?']

for intent in intents['intents']:
    for pattern in intent['patterns']:
        w = nltk.word_tokenize(pattern)
        words.extend(w)
        documents.append((w, intent['tag']))

        if intent['tag'] not in classes:
            classes.append(intent['tag'])

# stem and lower each word and remove duplicates
words = [stemmer.stem(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))

# remove duplicates
classes = sorted(list(set(classes)))

print (len(documents), "documents")
print (len(classes), "unique classes")
print (len(words), "unique words")

91 documents
23 unique classes
129 unique words


#Bag of Words
------------------
As we know neural networks and machine learning algorithms require numerical input. So our list of strings wont cut it. We need some way to represent our sentences with numbers and this is where a bag of words comes in. 
 
What we are going to do is represent each sentence with a list the length of the amount of words in our models vocabulary. Each position in the list will represent a word from our vocabulary. If the position in the list is a 1 then that will mean that the word exists in our sentence, if it is a 0 then the word is nor present. 

We call this a bag of words because the order in which the words appear in the sentence is lost, we only know the presence of words in our models vocabulary.


In [None]:
training = []
output = []

output_empty = [0]*len(classes)

#bag of words for each sentence
for doc in documents:
    bag = []
    pattern_words = doc[0]
    pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]

    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)

    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1

    training.append([bag, output_row])

# shuffle tha data into np.array
random.shuffle(training)
training = np.array(training)

train_x = list(training[:,0]) #input set for our neural network
train_y = list(training[:,1]) #output set for our neral network

print(train_x)
print(train_y)

[[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0



# Training

----------------------------------------------------------
 Now using the created bag of words, we will start training our chatbot model.

  For our purposes we will use a fairly standard feed-forward neural network with two hidden layers. The goal of our network will be to look at a bag of words and give a class that they belong too (one of our tags from the JSON file).

We will start by defining the architecture of our model


In [None]:
#Build neural network
net = tflearn.input_data(shape=[None, len(train_x[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(train_y[0]), activation='softmax')
net = tflearn.regression(net)

#Define model and setup tensorboard
model = tflearn.DNN(net, tensorboard_dir='tflearn_logs')

#Start training (apply gradient descent algorithm)
model.fit(train_x, train_y, n_epoch=1000, batch_size=8, show_metric=True)

#Saveing the trained model to the file model.tflearn for use in other scripts.
model.save('model.tflearn')

Training Step: 11999  | total loss: [1m[32m0.00067[0m[0m | time: 0.052s
| Adam | epoch: 1000 | loss: 0.00067 - acc: 1.0000 -- iter: 88/91
Training Step: 12000  | total loss: [1m[32m0.00066[0m[0m | time: 0.058s
| Adam | epoch: 1000 | loss: 0.00066 - acc: 1.0000 -- iter: 91/91
--
INFO:tensorflow:/content/model.tflearn is not in all_model_checkpoint_paths. Manually adding it.


## Loading the model
-----------------------------------------------------------------
 Changing some aspects of our code to load our model and data if it has already been created.With these tweaks we will only retrain the model and recreate our data if we haven’t done so already.

In [None]:
try:
    model.load("model.tflearn")
except:
    model.fit(training, output, n_epoch=1000, batch_size=8, show_metric=True)
    model.save("model.tflearn")

INFO:tensorflow:Restoring parameters from /content/model.tflearn


## Prediction genration
-------------------------------------------------------
 Since our model does not take string input, it takes a bag of words and does not spit out sentences, it generates a list of probabilities for all of our classes. This makes the process to generate a response look like the following:
* Get some input from the user
* Convert it to a bag of words
* Get a prediction from the model
* Find the most probable class
* Pick a response from that class

The bag of words function will transform our string input to a bag of words using our created words list. The chat function will handle getting a prediction from the model and grabbing an appropriate response from our JSON file of responses

In [None]:
ERROR_THRESHOLD = 0.7

def clean_up_sentence(sentence):
    sentence_words = nltk.word_tokenize(sentence)
    sentence_words = [stemmer.stem(word.lower()) for word in sentence_words]
    return sentence_words

# return bag of words array
def bow(sentence, words, show_details=False):
    sentence_words = clean_up_sentence(sentence)
    bag = [0]*len(words)  
    for s in sentence_words:
        for i,w in enumerate(words):
            if w == s: 
                bag[i] = 1
                if show_details:
                    print ("found in bag: %s" % w)
    return(np.array(bag))

def classify(sentence):
    # generate probabilities from the model
    results = model.predict([bow(sentence, words)])[0]
    # filter out predictions below a threshold
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], r[1]))
    print("Matched tag(s):", return_list)
    return return_list

def response(sentence, userID='123', show_details=False):
    results = classify(sentence)
    if results:
        while results:
            for i in intents['intents']:
                if i['tag'] == results[0][0]:
                    return print("\nReply:", random.choice(i['responses']))
            results.pop(0)
    else:
      return print("Sorry, I did not get that.")

Testing response genreator

In [None]:
response("What flavours are the cookies available in")

Matched tag(s): [('flavours', 0.9994267)]

Reply: chocolate, orange, vanilla, pineapple, strawberry


In [None]:
!zip -r /content/ai_chatbot_content.zip /content

  adding: content/ (stored 0%)
  adding: content/.config/ (stored 0%)
  adding: content/.config/.last_survey_prompt.yaml (stored 0%)
  adding: content/.config/.last_opt_in_prompt.yaml (stored 0%)
  adding: content/.config/gce (stored 0%)
  adding: content/.config/.last_update_check.json (deflated 23%)
  adding: content/.config/active_config (stored 0%)
  adding: content/.config/.feature_flags_config.yaml (deflated 23%)
  adding: content/.config/config_sentinel (stored 0%)
  adding: content/.config/logs/ (stored 0%)
  adding: content/.config/logs/2022.04.08/ (stored 0%)
  adding: content/.config/logs/2022.04.08/13.32.13.036412.log (deflated 53%)
  adding: content/.config/logs/2022.04.08/13.31.53.465513.log (deflated 53%)
  adding: content/.config/logs/2022.04.08/13.32.12.365197.log (deflated 55%)
  adding: content/.config/logs/2022.04.08/13.31.04.724542.log (deflated 90%)
  adding: content/.config/logs/2022.04.08/13.31.45.686476.log (deflated 86%)
  adding: content/.config/logs/2022.04.