<h1><center>Chloebot</center></h1>

<table><tr style="background:transparent;">
<td><img width="200" height="200" src="https://csml.princeton.edu/sites/csml/files/styles/pwds_featured_image/public/events/share.png"></td>
<td><img width="200" height="200" src="https://venturebeat.com/wp-content/uploads/2019/06/pytorch.jpg"></td>
<td><img width="100" height="100" src="https://avatars3.githubusercontent.com/u/56938552?s=100&v=1"></td>  
</tr></table>


This tutorial is based on the research [Attention Is All You Need](https://arxiv.org/abs/1706.03762) from Google AI, and the PyTorch implementation by [Harvard NLP group](http://nlp.seas.harvard.edu/2018/04/03/attention.html) and [SamLynnEvans](https://github.com/SamLynnEvans/Transformer) 

Each of these rectangles, like this one you are reading from, and the ones with code in them, are called a cells, click one cell and Press *shift* + *return or enter* together, or go to Cell in the nav bar and click "Run Cells" to run each of the next cells below to summon chloe

In [23]:
import math, copy, sys

sys.path.append('env/lib/python3.6/site-packages') #this line assumes you are using env

import torch

import nltk
nltk.download('wordnet') 

from scripts.MoveData import *
from scripts.Transformer import *
from scripts.TalkTrain import *

[nltk_data] Downloading package wordnet to /Users/carson/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [12]:
opt = Options(batchsize=8, device=torch.device("cpu"), epochs=25, 
              lr=0.01, max_len = 25, save_path = 'saved/weights/transformer_example_weights')

data_iter, infield, outfield, opt = json2datatools(path = 'saved/examplepairs.json', opt=opt)
emb_dim, n_layers, heads, dropout = 32, 2, 8, 0.1 
chloe = Transformer(len(infield.vocab), len(outfield.vocab), emb_dim, n_layers, heads, dropout)
chloe.load_state_dict(torch.load(opt.save_path)) 

<All keys matched successfully>

## Conversation

The next cell uses a `while` loop to make the chloe continuously ask for the next input sentence. 

say "hi" to chloe

When you want to turn off this cell and end the conversation, tell her "bye chloe", or click Kernel-> Interrupt

Here is an example conversation:

You > hi

Chloe > hi ! , can i tell you a joke ?

You > ok

Chloe > how do french cats say thank you ?

You > how?

Chloe > meowci beaucoup !

You > haha

Chloe > thanks for laughing at my joke

You > any more?

Chloe > you will have to teach me

You > are you alive?

Chloe > depends on your definition of alive , are viruses alive ?

You > i dont know

Chloe > i dont know either

You > ok bye

Chloe > bye ttyl


In [13]:
 while True:
    tell_chloe = input("You > ")
    chloes_reply = talk_to_chloe(tell_chloe, chloe, opt, infield, outfield)
    if ("bye chloe" in tell_chloe or "bye ttyl" in chloes_reply):
        print('Chloe > '+ chloes_reply + '\n')
        break
    else:
        print('Chloe > '+ chloes_reply + '\n') 

You > hi
Chloe > hi ! , can i tell you a joke ?

You > sure
Chloe > how do french cats say thank you ?

You > how
Chloe > meowci beaucoup !

You > haha
Chloe > thanks for laughing at my joke

You > any more?
Chloe > you will have to teach me

You > are you alive?
Chloe > depends on your definition of alive , are viruses alive ?

You > i dunno
Chloe > i dont know either

You > ok bye
Chloe > bye ttyl



## Data

Now lets teach chloe a few new tricks, use your preferred text editor to open the file called *custompairs.json* that is included in the */saved* folder and add a few of your own conversation pairs to the list. 

For example, if you want chloe to say "hi vicki" when you say, "hi i am vicki", then add this line to *saved/pairs.json*

{"listen": "hi i am vicki", "reply" : "hi vicki"}

Be careful not to add blank lines to the json file, if you do so on accident, just put your cursor on the blank line and hit *backspace* to get rid of it. 

In the cell below, `data_iter` is a data loader object that gives you training data in the form of batches everytime you call it, `infield` and `outfield` are objects that store the relationship between the strings in Chloe's vocabulary with their indices, for the words Chloe expects to hear and the words Chloe expects to use in response. What do I mean by this? go to Insert and insert a cell below then run `infield.vocab.stoi`, you will see a dictionary of all the words Chloe expects to hear and each word's integer index. We need to recreate this vocabulary because by adding more lines of data, you probably have added some new vocab words that chloe must know. `opt` is a object of the options class that stores your preferences such as your learning rate (lr), path to where you want your neural network weights saved, etc. Run the cell below AFTER you have added your lines of new data to 'saved/pairs.json'

In [14]:
data_iter, infield, outfield, opt = json2datatools(path = 'saved/custompairs.json', opt=opt)

## Neural Network Model

OK, now that we have built a data loader, a vocabulary and an object to store our preferences, lets instantiate a Transformer sequence to sequence model. There is alot summoned by the line `model = Transformer(len(infield.vocab), len(outfield.vocab), emb_dim, n_layers, heads, dropout)`, Transformers are the general neural architecture behind many of hyped up / notorious research models of 2017-2019 such as OpenAI's [GPT-2](http://jalammar.github.io/illustrated-gpt2/) and Google AI's [BERT](https://arxiv.org/abs/1810.04805)

We will take our time with dissecting and understanding it's components later. In the cell below `emb_dim`, `n_layers`, `heads`, `dropout` stand for embedding dimensions, number of layers, attention heads and dropout. These are some of the specifications that indicate the size and complexity of the Transformer we are going to instantiate. The number provided here create a relatively small Transformer for our toy example. `model` is the instance of the Transformer, aka chloe, that we are creating and training

In [15]:
emb_dim, n_layers, heads, dropout = 32, 2, 8, 0.1 
chloe = Transformer(len(infield.vocab), len(outfield.vocab), 
                    emb_dim, n_layers, heads, dropout)

## Training

Neural network optimization is a whole [field](https://www.jeremyjordan.me/nn-learning-rate/) in itself, we will talk more about this in the future. For now, just know the learning rate `opt.lr` is a hyperparameter whose initial value we choose, it modifies the magnitude of the step the Adam optimizer algorithm will take to update the weights, aka parameters, of the neural network model during training. As training progresses the learning rate is also changing according to a scheduler that monitors the learning progress. `epochs` is the number of times we will cycle through the data during training. If you trained on the same dataset in a different sitting and would like to reload that trained model instead of training from scratch, simply paste this line of code below `model.load_state_dict(torch.load(opt.save_path))` before running it. The cell below defines the learning rate, epochs, type of optimzer and type of scheduler we will use for training.

In [16]:
optimizer = torch.optim.Adam(chloe.parameters(), lr=opt.lr, betas=(0.9, 0.98), eps=1e-9)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.9, patience=3)

Now lets train the chloe on your modified json dataset, chloe should quickly memorize the data. As the loss decreases, chloe learns from the data to output the corresponding sequence when fed inputs that are close enough to the training inputs. When the loss is less than 0.1, the responses should become coherent. You can re-instantiate chloe to start fresh or rerun the cell below if you need to get the loss lower with more training. If the loss is not yet less than 0.1, just run the cell below again and train from where you left off

In [17]:
def trainer(model, data_iterator, options, optimizer, scheduler):

    if torch.cuda.is_available() and options.device == torch.device("cuda:0"):
        print("a GPU was detected, model will be trained on GPU")
        model = model.cuda()
    else:
        print("training on cpu")

    model.train()
    start = time.time()
    best_loss = 100
    for epoch in range(options.epochs):
        total_loss = 0
        for i, batch in enumerate(data_iterator): 
            src = batch.listen.transpose(0,1)
            trg = batch.reply.transpose(0,1)
            #print(trg)
            trg_input = trg[:, :-1]
            src_mask, trg_mask = create_masks(src, trg_input, options)
            preds = model(src, src_mask, trg_input, trg_mask)
            #print(preds.shape, trg.shape)
            ys = trg[:, 1:].contiguous().view(-1)
            optimizer.zero_grad()
            preds = preds.view(-1, preds.size(-1))
            #print(preds.shape, ys.shape)
            batch_loss = F.cross_entropy(preds, ys, 
                                         ignore_index = options.trg_pad)
            batch_loss.backward()
            optimizer.step()
            total_loss += batch_loss.item()

        epoch_loss = total_loss/(num_batches(data_iterator)+1)
        scheduler.step(epoch_loss)

        if epoch_loss < best_loss:
            best_loss = epoch_loss
            print(f'saving model at', options.save_path)
            torch.save(model.state_dict(), options.save_path)
            
        print("%dm: epoch %d loss = %.3f" %((time.time() - start)//60, epoch, epoch_loss))
        total_loss = 0

    return model

In [18]:
chloe = trainer(chloe, data_iter, opt, optimizer, scheduler)

training on cpu
saving model at saved/weights/transformer_example_weights
0m: epoch 0 loss = 2.969
saving model at saved/weights/transformer_example_weights
0m: epoch 1 loss = 2.092
saving model at saved/weights/transformer_example_weights
0m: epoch 2 loss = 1.565
saving model at saved/weights/transformer_example_weights
0m: epoch 3 loss = 1.229
saving model at saved/weights/transformer_example_weights
0m: epoch 4 loss = 0.954
saving model at saved/weights/transformer_example_weights
0m: epoch 5 loss = 0.839
saving model at saved/weights/transformer_example_weights
0m: epoch 6 loss = 0.684
saving model at saved/weights/transformer_example_weights
0m: epoch 7 loss = 0.583
saving model at saved/weights/transformer_example_weights
0m: epoch 8 loss = 0.506
saving model at saved/weights/transformer_example_weights
0m: epoch 9 loss = 0.410
saving model at saved/weights/transformer_example_weights
0m: epoch 10 loss = 0.343
saving model at saved/weights/transformer_example_weights
0m: epoch 11

## Evaluation

Now talk to chloe! by modifying the `tell_chloe` variable and running the cell below. Your input sentence that Chloe hears has to be tokenized (split into separate words), converted from strings to a sequence of integers, inputted to the model (chloe), who then responds with a sequence of integers, that sequence is converted back into strings for you to read. All this is taken care of by the `talk_to_model()` function below 

In [19]:
tell_chloe = "hi i am vicki" 
chloes_reply = talk_to_chloe(tell_chloe, chloe, opt, infield, outfield)
print('Chloe > '+ chloes_reply + '\n')

Chloe > hi vicki



## Discussion

Notice that chloe is a combination of hard coded rules and also neural network. The neural network portion allows chloe to encode, or represent, your messages to her in a way that she can make use of, even if that message was not exactly in the training data. if it is close enough, she knows what to do next. chloe is cute, at least i think so, but there is alot we can do to make chloe smarter and more useful. 

For example, is chloe just responding to each of your messages with a simple mapping between input and output? or does chloe take into account the entire conversation so far, or even previous conversations? is chloe trying to accomplish anything? what is the point of her conversation?  is there a reward signal we can build into the learning so that chloebot learns from experience to achieve a goal aka objective? can chloe learn new words or understand misspelled words? not yet. can chloebot use outside knowledge to inform her conversations? not yet. 

In [21]:
 while True:
    tell_chloe = input("You > ")
    chloes_reply = talk_to_chloe(tell_chloe, chloe, opt, infield, outfield)
    if ("bye chloe" in tell_chloe or "bye ttyl" in chloes_reply):
        print('Chloe > '+ chloes_reply + '\n')
        break
    else:
        print('Chloe > '+ chloes_reply + '\n') 

You > hi
Chloe > hi ! , can i tell you a joke ?

You > bye
Chloe > bye ttyl



## Whats Next

The next lesson is an intuitive explaination of loss functions with a toy coding example that is expanded to pytorch tensors, tokenization and an explaination of the training function `trainer()` used in this introductory lesson. Go to `notebooks/Trainer.ipynb` for the next part of this adventure. see you there!

<img src="https://avatars3.githubusercontent.com/u/56938552?s=100&v=1">