## Chloebot

<img src="https://avatars3.githubusercontent.com/u/56938552?s=100&v=1">

This tutorial is based on the wonderful work of the
[Harvard NLP group](http://nlp.seas.harvard.edu/2018/04/03/attention.html) and [SamLynnEvans](https://github.com/SamLynnEvans/Transformer) 

Each of these rectangles is a cell, click one of them and Press 
*shift* + *enter/return*, or go to Cell in the nav bar and click "Run Cells" to run each cell, the one below imports `torch` so you can use PyTorch, it also imports some python code that I wrote in the folder *scripts* that I will explain to you after I show you a toy example of how the whole code works together, using a chatbot that says cute/flirty/snide/anything you want/etc language. nltk is the [Natural Language Toolkit](https://www.nltk.org/) that we will be using for things such as synonym matching, that way when you say "adore", Chloe knows it means the same thing as "like", even if "adore" is not in Chloe's vocabulary. To do this nltk will need to download a folder called corpora. Running the next cell will do that for you.

In [2]:
import torch

from scripts.MoveData import *
from scripts.Transformer import *
from scripts.TalkTrain import *

import nltk
nltk.download('wordnet') 
from nltk.corpus import wordnet

[nltk_data] Downloading package wordnet to /home/carson/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


use your preferred text editor to open the csv called `chat_pairs.csv` that is included in the */saved* folder and add a few of your own conversation pairs to the list. You can see that a comma separates the sentence Chloebot expects to hear, with the sentence Chloebot is expected to respond with. only use one comma (',') to separate input and output. ie

*who are you? , i am chloe*. 

In the cell below, `data_iter` is an object that gives you training data in the form of batches everytime you call it, `infield` and `outfield` are objects that store the relationship between the strings in Chloe's vocabulary with their indices, for the words Chloe expects to hear and the words Chloe expects to use in response. What do I mean by this? go to Insert and insert a cell below then run `infield.vocab.stoi`, you will see a dictionary of all the words Chloe expects to hear and each word's integer index. The string "smile" might be indexed as `8` and be represented with a vector of length 512. All three represent the same word or *token*. `opt` is a object of the options class that stores your preferences such as your learning rate (lr), path to where you want your neural network weights saved, etc. Run the cell below AFTER you have added your lines of data to 'saved/chat_pairs.csv'

In [3]:
csv_path = 'saved/chat_pairs.csv'
opt = Options(batchsize = 4)
data_iter, infield, outfield, opt = csv2datatools(csv_path,'en', opt)

OK, now that we have built a data loader, a vocabulary and an object to store our preferences, lets instantiate a Transformer sequence to sequence model. There is alot summoned by this one line, Transformers are the general neural architecture behind many of hyped up / notorious research models of 2017-2019 such as OpenAI's [GPT-2](http://jalammar.github.io/illustrated-gpt2/). 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/OpenAI_Logo.svg/200px-OpenAI_Logo.svg.png"> 

We will take our time with dissecting and understanding it's components later. In the cell below `emb_dim`, `n_layers`, `heads`, `dropout` stand for embedding dimensions, number of layers, attention heads and dropout. These are some of the specifications that indicate the size and complexity of the Transformer we are going to instantiate. The number provided here create a relatively small Transformer for our toy example. `save_path` is where your trained model weights will be saved, `model` is the instance of the Transformer, aka chloe, that we are creating and training, and the code
` model = model.cuda()` is used to load the model onto a GPU if you have one set up, no worries if you do not, this lesson will still work. 

In [4]:
emb_dim, n_layers, heads, dropout = 64, 2, 8, 0.1 
opt.save_path = 'saved/weights/model_weights'

model = Transformer(len(infield.vocab), len(outfield.vocab), emb_dim, n_layers, heads, dropout)

if opt.device != -1:
    model = model.cuda()

Neural network optimization is a whole [field](https://www.jeremyjordan.me/nn-learning-rate/) in itself, we will talk more about this in the future. For now, just know the learning rate `opt.lr` is a hyperparameter whose initial value we choose, it modifies the magnitude of the step the Adam optimizer algorithm will take to update the weights, aka parameters, of the neural network model during training. As training progresses the learning rate is also changing according to a scheduler [cosine annealing schedule](https://github.com/allenai/allennlp/issues/1642). `epochs` is the number of times we will cycle through the data during training. If you trained on the same dataset in a different sitting and would like to reload that trained model instead of training from scratch, simply paste this line of code below `model.load_state_dict(torch.load(opt.save_path))` before running it. The cell below defines the learning rate, epochs, type of optimzer and type of scheduler we will use for training.

In [5]:
opt.lr = 0.01 # usually 0.01 - 0.0001
opt.epochs = 30 
optimizer = torch.optim.Adam(model.parameters(), lr=opt.lr, betas=(0.9, 0.98), eps=1e-9)
scheduler = CosineWithRestarts(optimizer, T_max=num_batches(data_iter))

Now lets train the model on our toy csv dataset, the model should quickly memorize the data. As the loss decreases, the model learns from the data to output the corresponding sequence when fed inputs that are close enough to the training inputs. When the loss is less than 0.1, the responses should start to become coherent. You can re-instantiate the model to start fresh or rerun the cell below if you need to get the loss lower with more training. If after 30 epochs the loss is not yet less than 0.1, just run the cell below again and train from where you left off

In [6]:
model = trainer(model, data_iter, opt, optimizer, scheduler)

0m: epoch 0 loss = 2.910
0m: epoch 1 loss = 1.838
0m: epoch 2 loss = 1.393
0m: epoch 3 loss = 1.136
0m: epoch 4 loss = 0.923
0m: epoch 5 loss = 0.792
0m: epoch 6 loss = 0.676
0m: epoch 7 loss = 0.573
0m: epoch 8 loss = 0.518
0m: epoch 9 loss = 0.471
0m: epoch 10 loss = 0.416
0m: epoch 11 loss = 0.307
0m: epoch 12 loss = 0.301
0m: epoch 13 loss = 0.228
0m: epoch 14 loss = 0.194
0m: epoch 15 loss = 0.164
0m: epoch 16 loss = 0.149
0m: epoch 17 loss = 0.131
0m: epoch 18 loss = 0.100
0m: epoch 19 loss = 0.098
0m: epoch 20 loss = 0.072
0m: epoch 21 loss = 0.087
0m: epoch 22 loss = 0.071
0m: epoch 23 loss = 0.065
0m: epoch 24 loss = 0.078
0m: epoch 25 loss = 0.060
0m: epoch 26 loss = 0.061
0m: epoch 27 loss = 0.086
0m: epoch 28 loss = 0.084
0m: epoch 29 loss = 0.067


Now talk to chloe! by modifying the `sentence` variable and running the cell below. Your input sentence that Chloe hears has to be tokenized (split into separate words), converted from strings to a sequence of integers, inputted to the model (chloe), who then responds with a sequence of integers, that sequence is converted back into strings for you to read. All this is taken care of by the `talk_to_model()` function below, `opt.k` is something called the beam width, it essentially tells chloe how many different replies she should consider before choosing the best one, and `opt.max_len` is the maximum lenght of replies. 

In [13]:
tell_chloe = "make me laugh" 
opt.k = 2
opt.max_len = 20
chloes_reply = talk_to_model(tell_chloe, model, opt, infield, outfield)
print('Chloe > '+ chloes_reply + '\n')

Chloe > how do french cats say thank you? meowci beaucoup



use a `while` loop to make the chatbot continuously ask for the next input sentence. When you want to turn off this cell and end the conversation, tell her "bye chloe", or click Kernel-> Interrupt

In [14]:
 while True:
    tell_chloe = input("You > ")
    chloes_reply = talk_to_model(tell_chloe, model, opt, infield, outfield)
    if ("bye chloe" in tell_chloe or "bye ttyl" in chloes_reply):
        print('Chloe > '+ chloes_reply + '\n')
        break
    else:
        print('Chloe > '+ chloes_reply + '\n') 

You > hi chloe
Chloe > why hello there

You > how are you today
Chloe > umm... i m feeling embarrased

You > why embarrased?
Chloe > i like you are cute

You > thanks chloe
Chloe > depends on your definition of alive are viruses alive?

You > no you're cute
Chloe > depends on your definition of alive are viruses alive?

You > bye little cat
Chloe > yes but you are just a biological robot

You > bye chloe
Chloe > bye ttyl



Notice that chloe is a combination of hard coded rules and also neural network. The neural network portion allows chloe to encode, or represent, your messages to her in a way that she can make use of, even if that message was not exactly in the training data. if it is close enough, she knows what to do next. chloe is cute, at least i think so, but there is alot we can do to make chloe smarter and more useful. 

For example, is chloe just responding to each of your messages with a simple mapping between input and output? or does chloe take into account the entire conversation so far, or even previous conversations? is chloe trying to accomplish anything? what is the point of her conversation?  is there a reward signal we can build into the learning so that chloebot learns from experience to achieve a goal aka objective? can chloe learn new words or understand misspelled words? not yet. can chloebot use outside knowledge to inform her conversations? 


These are some of the future directions I want to take us in, but first we need to learn the basics deeply and explain ourselves with mathematical rigor, only then do we stand a chance. 
Lets start with vector representations and probability theory fundamentals, go to `notebooks/Talk.ipynb` for the next part of this adventure. see you there!

<img src="https://avatars3.githubusercontent.com/u/56938552?s=100&v=1">