# NLTK Chatbot Training

### Scope of this chatbot
We are going to build a chatbot using deep learning techniques using a **retrieval-based** approach. The chatbot will be trained on the dataset 
which contains conversation categories (intents), patterns, and responses. The model uses a Deep Neural Network with a single hidden layer to 
classify which category the input message belongs to and then the chatbot will select a random response from the list of responses, which have 
similar meaning.

Topics the chatbot will be helpful with helping students finding answers to questions in the following topics:
- Before calculus
- Limits and continuity
- Derivatives
- Integrals

Furthermore, this is just a prototype whose functionality can be greatly expanded in topics (other than math) it can reply to, depth of conversation, answer a
plethra of questions and so on.

In [None]:
import random
from keras.optimizers import SGD
from keras.layers import Dense, Activation, Dropout
from keras.models import Sequential
import numpy as np
import pickle
import json
import nltk
from nltk.stem import WordNetLemmatizer
import pandas as pd

## Input Data

In [None]:
train = pd.read_json('intents.json')
train.head(3)
train.info()


For each intent there is information on:
- tag: Topic of conversation
- patterns: The user input
- responses: The chatbot's reply
- context: A field that correlates to the tag field

## Load json file

In [None]:
words = []
classes = []
documents = []
ignore_words = ['?', '!']
data_file = open('./chatbot/data/intents.json').read()
intents = json.loads(data_file)

 
for intent in intents['intents']:
    for pattern in intent['patterns']:

        tockenized_word_patterns = nltk.word_tokenize(pattern)
        words.extend(tockenized_word_patterns)
        documents.append((tockenized_word_patterns, intent['tag']))

        if intent['tag'] not in classes:
            classes.append(intent['tag'])

## Process words and classes

In [None]:
lemmatizer = WordNetLemmatizer()

words = [lemmatizer.lemmatize(word.lower()) for word in words if word not in ignore_words]
words = sorted(list(set(words)))
classes = sorted(list(set(classes)))

In [None]:
pickle.dump(words, open('./chatbot/words.pkl', 'wb'))
pickle.dump(classes, open('./chatbot/classes.pkl', 'wb'))

## Preprocessing

In [None]:
training = []

output_empty = [0] * len(classes)

for document in documents:
    bag = []

    tokenized_words = document[0]
    tokenized_words = [lemmatizer.lemmatize(word.lower()) for word in tokenized_words]

    for word in words:
        bag.append(1) if word in tokenized_words else bag.append(0)

    output_row = list(output_empty)
    output_row[classes.index(document[1])] = 1

    training.append([bag, output_row])

random.shuffle(training)
training = np.array(training)

train_x_bags = list(training[:, 0])
train_y_output_rows = list(training[:, 1])
print("Training data created")

## Create a Deep Neural Network

In [None]:
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x_bags[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y_output_rows[0]), activation='softmax'))

In [None]:
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd, metrics=['accuracy'])

## Fit and save model

In [None]:
hist = model.fit(np.array(train_x_bags), np.array(train_y_output_rows), epochs=300, batch_size=10, verbose=1)
model.save('./chatbot/chatbot_model.h5', hist)

print("model created")