## Chloebot

<img src="https://avatars3.githubusercontent.com/u/56938552?s=100&v=1">

This tutorial is based on the work of the [Harvard NLP group](http://nlp.seas.harvard.edu/2018/04/03/attention.html) and [SamLynnEvans](https://github.com/SamLynnEvans/Transformer) 

Each of these rectangles, like this one you are reading from, and the ones with code in them, are called a cells, click one cell and Press *shift* + *return or enter* together, or go to Cell in the nav bar and click "Run Cells" to run each of the next cells below to summon chloe

In [1]:
import torch

from scripts.MoveData import *
from scripts.Transformer import *
from scripts.TalkTrain import *

import nltk
nltk.download('wordnet') 

[nltk_data] Downloading package wordnet to /home/carson/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [11]:
opt = Options(batchsize=8, device=torch.device("cpu"), epochs=25, 
              lr=0.01, beam_width=3, max_len = 25, save_path = 'saved/weights/model_weights')
data_iter, infield, outfield, opt = json2datatools(opt=opt)
emb_dim, n_layers, heads, dropout = 32, 3, 8, 0.01 
chloe = Transformer(len(infield.vocab), len(outfield.vocab), emb_dim, n_layers, heads, dropout)
chloe.load_state_dict(torch.load(opt.save_path)) 

<All keys matched successfully>

The next cell uses a `while` loop to make the chloe continuously ask for the next input sentence. 

say "hi" to chloe

When you want to turn off this cell and end the conversation, tell her "bye chloe", or click Kernel-> Interrupt

In [12]:
 while True:
    tell_chloe = input("You > ")
    chloes_reply = talk_to_model(tell_chloe, chloe, opt, infield, outfield, explain=True)
    if ("bye chloe" in tell_chloe or "bye ttyl" in chloes_reply):
        print('Chloe > '+ chloes_reply + '\n')
        break
    else:
        print('Chloe > '+ chloes_reply + '\n') 

You > hi
Chloe > hi, can i tell you a joke?

You > ok
Chloe > how do french cats say thank you?

You > how
Chloe > meowci beaucoup

You > haha
Chloe > thanks for laughing at my joke

You > any more?
Chloe > you will have to teach me

You > i dunno
Chloe > i dunno either

You > bye
Chloe > bye ttyl



Now lets teach chloe a few new tricks, use your preferred text editor to open the file called *pairs.json* that is included in the */saved* folder and add a few of your own conversation pairs to the list. 

For example, if you want chloe to say "hi vicki" when you say, "hi i am vicki", then add this line to *saved/pairs.json*

{"listen": "hi i am vicki", "reply" : "hi vicki"}

Be careful not to add blank lines to the json file, if you do so on accident, just put your cursor on the blank line and hit *backspace* to get rid of it. 

In the cell below, `data_iter` is a data loader object that gives you training data in the form of batches everytime you call it, `infield` and `outfield` are objects that store the relationship between the strings in Chloe's vocabulary with their indices, for the words Chloe expects to hear and the words Chloe expects to use in response. What do I mean by this? go to Insert and insert a cell below then run `infield.vocab.stoi`, you will see a dictionary of all the words Chloe expects to hear and each word's integer index. We need to recreate this vocabulary because by adding more lines of data, you probably have added some new vocab words that chloe must know. `opt` is a object of the options class that stores your preferences such as your learning rate (lr), path to where you want your neural network weights saved, etc. Run the cell below AFTER you have added your lines of new data to 'saved/pairs.json'

In [13]:
data_iter, infield, outfield, opt = json2datatools(opt=opt)

OK, now that we have built a data loader, a vocabulary and an object to store our preferences, lets instantiate a Transformer sequence to sequence model. There is alot summoned by the line `model = Transformer(len(infield.vocab), len(outfield.vocab), emb_dim, n_layers, heads, dropout)`, Transformers are the general neural architecture behind many of hyped up / notorious research models of 2017-2019 such as OpenAI's [GPT-2](http://jalammar.github.io/illustrated-gpt2/) and Google AI's [BERT](https://arxiv.org/abs/1810.04805)

We will take our time with dissecting and understanding it's components later. In the cell below `emb_dim`, `n_layers`, `heads`, `dropout` stand for embedding dimensions, number of layers, attention heads and dropout. These are some of the specifications that indicate the size and complexity of the Transformer we are going to instantiate. The number provided here create a relatively small Transformer for our toy example. `model` is the instance of the Transformer, aka chloe, that we are creating and training

In [14]:
emb_dim, n_layers, heads, dropout = 32, 3, 8, 0.01
chloe = Transformer(len(infield.vocab), len(outfield.vocab), emb_dim, n_layers, heads, dropout)

Neural network optimization is a whole [field](https://www.jeremyjordan.me/nn-learning-rate/) in itself, we will talk more about this in the future. For now, just know the learning rate `opt.lr` is a hyperparameter whose initial value we choose, it modifies the magnitude of the step the Adam optimizer algorithm will take to update the weights, aka parameters, of the neural network model during training. As training progresses the learning rate is also changing according to a scheduler [cosine annealing schedule](https://github.com/allenai/allennlp/issues/1642). `epochs` is the number of times we will cycle through the data during training. If you trained on the same dataset in a different sitting and would like to reload that trained model instead of training from scratch, simply paste this line of code below `model.load_state_dict(torch.load(opt.save_path))` before running it. The cell below defines the learning rate, epochs, type of optimzer and type of scheduler we will use for training.

In [15]:
optimizer = torch.optim.Adam(chloe.parameters(), lr=opt.lr, betas=(0.9, 0.98), eps=1e-9)
scheduler = CosineWithRestarts(optimizer, T_max=num_batches(data_iter))

Now lets train the chloe on your modified json dataset, chloe should quickly memorize the data. As the loss decreases, chloe learns from the data to output the corresponding sequence when fed inputs that are close enough to the training inputs. When the loss is less than 0.1, the responses should become coherent. You can re-instantiate chloe to start fresh or rerun the cell below if you need to get the loss lower with more training. If the loss is not yet less than 0.1, just run the cell below again and train from where you left off

In [17]:
chloe = trainer(chloe, data_iter, opt, optimizer, scheduler)

training on cpu
0m: epoch 0 loss = 0.157
0m: epoch 1 loss = 0.167
0m: epoch 2 loss = 0.123
0m: epoch 3 loss = 0.107
0m: epoch 4 loss = 0.084
0m: epoch 5 loss = 0.067
0m: epoch 6 loss = 0.054
0m: epoch 7 loss = 0.045
0m: epoch 8 loss = 0.038
0m: epoch 9 loss = 0.037
0m: epoch 10 loss = 0.030
0m: epoch 11 loss = 0.025
0m: epoch 12 loss = 0.023
0m: epoch 13 loss = 0.021
0m: epoch 14 loss = 0.016
0m: epoch 15 loss = 0.015
0m: epoch 16 loss = 0.014
0m: epoch 17 loss = 0.013
0m: epoch 18 loss = 0.012
0m: epoch 19 loss = 0.011
0m: epoch 20 loss = 0.012
0m: epoch 21 loss = 0.010
0m: epoch 22 loss = 0.009
0m: epoch 23 loss = 0.010
0m: epoch 24 loss = 0.011


Now talk to chloe! by modifying the `tell_chloe` variable and running the cell below. Your input sentence that Chloe hears has to be tokenized (split into separate words), converted from strings to a sequence of integers, inputted to the model (chloe), who then responds with a sequence of integers, that sequence is converted back into strings for you to read. All this is taken care of by the `talk_to_model()` function below 

In [19]:
tell_chloe = "hi i am vicki" 
chloes_reply = talk_to_model(tell_chloe, chloe, opt, infield, outfield, explain=True)
print('Chloe > '+ chloes_reply + '\n')

Chloe > hi vicki



Notice that chloe is a combination of hard coded rules and also neural network. The neural network portion allows chloe to encode, or represent, your messages to her in a way that she can make use of, even if that message was not exactly in the training data. if it is close enough, she knows what to do next. chloe is cute, at least i think so, but there is alot we can do to make chloe smarter and more useful. 

For example, is chloe just responding to each of your messages with a simple mapping between input and output? or does chloe take into account the entire conversation so far, or even previous conversations? is chloe trying to accomplish anything? what is the point of her conversation?  is there a reward signal we can build into the learning so that chloebot learns from experience to achieve a goal aka objective? can chloe learn new words or understand misspelled words? not yet. can chloebot use outside knowledge to inform her conversations? not yet. 

In [20]:
 while True:
    tell_chloe = input("You > ")
    chloes_reply = talk_to_model(tell_chloe, chloe, opt, infield, outfield, explain=True)
    if ("bye chloe" in tell_chloe or "bye ttyl" in chloes_reply):
        print('Chloe > '+ chloes_reply + '\n')
        break
    else:
        print('Chloe > '+ chloes_reply + '\n') 

You > hi i am vicki
Chloe > hi vicki

You > are you alive?
Chloe > depends on your definition of alive, are viruses alive?

You > no
Chloe > how do french cats say thank you?

You > ?
Chloe > meowci beaucoup

You > bye 
Chloe > bye ttyl



These are some of the future directions I want to take us in, but first we need to learn the basics deeply and explain ourselves with mathematical rigor, only then do we stand a chance. 
Lets start with vector representations and probability theory fundamentals, go to `notebooks/Talk.ipynb` for the next part of this adventure. see you there!

<img src="https://avatars3.githubusercontent.com/u/56938552?s=100&v=1">