# NLP Lesson Map: NLP Process

<font color=red>Red </font> highlighted is the topic covered for this lesson

1.	Obtain Text
> *	From NTLK
> *	From website
> *	From CSV
2.	<font color=red>Tokens</font>
> *	Sentence Segmentation
> *	Tokenization
> *	<font color=red>Remove stopwords, special characters, numbers
> *	Converting text to a common case</font>
> *	Stemming & Lemmatization
3.	Numbers
> *	Create a Dictionary
> *	<font color=red>Create Document Vectors = Bag of Words = Term Frequency (tf)  = number of times that each word occurs per document</font>
> *	idf
> *	tf-idf
4.	<font color=red>AI Models</font>
> * Logistic Regression
> *	Cosine Similarity
> *	<font color=red>Neural Network</font>
5.	<font color=red>NLP Applications</font>
> * Classification
> *	Sentiment Analysis
> *	<font color=red>Chatbot</font>

# Build a Chatbot with Neural Network 

We've discovered how to build a chatbot with cosine similarity. Now, let's explore how we might build one with neural network!

We will create our training data, train a neural network with them, then use the trained model to make our chatbot. 

First, we will install required libraries. Uncomment the few blocks below only if you do not have the libraries installed. 

<font color=red>Note:</font> 
* Colab has all these libraries installed.  Hence, no need to uncomment.  This is for your reference if you are running these code on your PC.

* This lab is adapted from the below link. Please to refer to this link for further info or **better understanding**.
 
>> https://blog.eduonix.com/internet-of-things/simple-nlp-based-chatbot-python/



In [14]:
#!pip install numpy scipy
#!pip install scikit-learn
#!pip install pillow
#!pip install h5py

In [15]:
#!pip install tensorflow

In [16]:
#!pip install tensorflow-gpu

In [17]:
#!pip install keras

# 1. Install Libraries

Firstly, we will install libraries needed for this neural network powered chatbot. 
Keras is a machine learning library which utilizes tensorflow (another lower level machine learning library) at the backend. This makes it easier for us to deploy deep neural network for this purpose. 

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Dense
 
from numpy import argmax
import numpy as np
import re

# 2. Input training data

We will first include the following training data for our chatbot:
1. `X` represent the different possible inputs that users might enter
2. `Y` represent the intent of the inputs

In [19]:
X = ['Hi',
     'Hello',
     'How are you?',
     'I am making',
     'making',
     'working',
     'studying',
     'see you later',
     'bye',
     'goodbye']

In [20]:
print(len(X))

10


In [21]:
Y = ['greeting',
     'greeting',
     'greeting',
     'busy',
     'busy',
     'busy',
     'busy',
     'bye',
     'bye',
     'bye']

In [22]:
print(len(Y))

10


Notice that there are several different sentences that have similar intent. Here, we are only having 3 intents, but you can add as many as you want for your project!

This is the way our chatbot will work:
1. From the input sentence, we will identify the intent using our trained AI model.
2. For each intent, we have a prepared response. 

For example, if we identify that the intent of the input is for a greeting, we might ask the chatbot to reply with a greeting as well, something like 'hi' or 'how are you doing?'

We will use machine learning to create a model that can classify input sentence into different intents. 
We make it as follows:

1. We create a training data (`X` and `Y` above) which contains a list of sentences and their intents.
2. Use the training data to train a classifier. 
3. Vectorize input sentences and use classifier to determine intent. 

# 3. Text processing

As usual, we will start with text processing. Do you remember the process?

## 3.1 Remove non alphanumeric characters

In [23]:
def remove_non_alpha_numeric_characters(sentence):
    new_sentence = ''
    for alphabet in sentence:
        if alphabet.isalpha() or alphabet == ' ':
            new_sentence += alphabet
    return new_sentence

In [24]:
def preprocess_data(X):
    X = [data_point.lower() for data_point in X]
    X = [remove_non_alpha_numeric_characters(
        sentence) for sentence in X]
    X = [data_point.strip() for data_point in X]
    X = [re.sub(' +', ' ',
                data_point) for data_point in X]
    return X

In [25]:
X = preprocess_data(X)

vocabulary = set()
for data_point in X:
    for word in data_point.split(' '):
        vocabulary.add(word)

vocabulary = list(vocabulary)

## Create document vectors

In [26]:
X_encoded = []

def encode_sentence(sentence):
    sentence = preprocess_data([sentence])[0]
    sentence_encoded = [0] * len(vocabulary)
    for i in range(len(vocabulary)):
        if vocabulary[i] in sentence.split(' '):
            sentence_encoded[i] = 1
    return sentence_encoded

X_encoded = [encode_sentence(sentence) for sentence in X]

In [27]:
classes = list(set(Y))

Y_encoded = []
for data_point in Y:
    data_point_encoded = [0] * len(classes)
    for i in range(len(classes)):
        if classes[i] == data_point:
            data_point_encoded[i] = 1
    Y_encoded.append(data_point_encoded)

# 4. Create training data and test data

In [28]:
X_train = X_encoded
y_train = Y_encoded
X_test = X_encoded
y_test = Y_encoded

Print and check the data you are using for training and test data

In [29]:
print (y_test)

[[0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [1, 0, 0], [1, 0, 0], [1, 0, 0]]


In [30]:
print(len(X_train))
print(len(y_train))
print(len(X_test))
print(len(y_test))

10
10
10
10


In [31]:
y_train

[[0, 1, 0],
 [0, 1, 0],
 [0, 1, 0],
 [0, 0, 1],
 [0, 0, 1],
 [0, 0, 1],
 [0, 0, 1],
 [1, 0, 0],
 [1, 0, 0],
 [1, 0, 0]]

What does `y_train` represent? Do you understand the array shown above?

# 5. Model training

Now we will use the training data to train our neural network.

In [32]:
model = Sequential()

model.add(Dense(units=64, activation='sigmoid',input_dim=len(X_train[0])))

model.add(Dense(units=len(y_train[0]), activation='softmax'))

model.compile(loss=categorical_crossentropy,optimizer=SGD(lr=0.01,momentum=0.9, nesterov=True))

model.fit(np.array(X_train), np.array(y_train), epochs=100, batch_size=16)

  super(SGD, self).__init__(name, **kwargs)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7fbe333a9e50>

## List down predictions

In [45]:
predictions = [argmax(pred) for pred in model.predict(np.array(X_test))]

# Model Evaluation

Let's evaluate our model now. We will compare the prediction made by the model and our test data:

In [46]:
correct = 0
for i in range(len(predictions)):
    if predictions[i] == argmax(y_test[i]):
        correct += 1

print ("Correct:", correct)
print ("Total:", len(predictions))

Correct: 8
Total: 10


# Testing the chatbot

Let's test the chatbot now! We will input a sentence, and then see what class is predicted by the neural network:

In [23]:
while True:
    print ("Enter a sentence")
    sentence = input()
    prediction= model.predict(np.array([encode_sentence(sentence)]))
    print (classes[argmax(prediction)])

Enter a sentence
goodbye
busy
Enter a sentence
hi
busy
Enter a sentence
busy
busy
Enter a sentence
bye
busy
Enter a sentence
hello
greeting
Enter a sentence
how are you
greeting
Enter a sentence
bye
busy
Enter a sentence
bye
busy
Enter a sentence


KeyboardInterrupt: ignored

Realize that you can't stop the chatbot? You'll have to add the exit command later (see the previous notebook to find out how to do it. 

For now, simply press the stop button (interrupt button) above to stop the chatbot. 

Try it! press the stop button, and try typing something onto the box. 

# <font color=red>Challenge</font> 

We have successfully use neural network to map our input to conversation intent. 
Your challenge is to link the conversation intent to a particular response that the chatbot will say. 
For example, if the conversation intent is 'greeting', get your chatbot to say a greeting as well!

In [59]:
import random
greeting_responses = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me", "Hi there sir!"]
busy_responses = ['Sorry to disturb you!', 'Ok I will talk to you later then', 'ok sorry', 'ok', 'K']
bye_responses = ['Had a great time chatting with you!', 'Bye bye, have a great time']

# Let's chat for 4 lines
for line in range(4):
  print('Enter a sentence')
  sentence = input()
  prediction = model.predict(np.array([encode_sentence(sentence)]))
  if classes[argmax(prediction)] == 'greeting':
    print("Bot: {}".format(random.choice(greeting_responses)))
  if classes[argmax(prediction)] == 'busy':
     print("Bot: {}".format(random.choice(busy_responses)))
  if classes[argmax(prediction)] == 'bye':
     print("Bot: {}".format(random.choice(bye_responses)))

Enter a sentence
hi
Bot: hello
Enter a sentence
studying
Bot: Sorry to disturb you!
Enter a sentence
see you later
Bot: Bye bye, have a great time
Enter a sentence
working
Bot: ok


### Great job! You've successfully created a simple chatbot with neural network! How might you improve the chatbot?
You can improve the chatbot by:
- Adding more training data
- Adding more intent
- Focusing on a particular topic and train the chatbot with many training data in that topic

### Resource:
https://blog.eduonix.com/internet-of-things/simple-nlp-based-chatbot-python/