<a href="https://colab.research.google.com/github/ReynaQuita/NLP/blob/main/Chatbot_OpenAIGPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Make your own Rick Sanchez (bot) with Transformers and DialoGPT fine-tuning

by [Rostyslav Neskorozhenyi](https://www.linkedin.com/in/slanj)

I am a big fan of 'Rick and Morty' and NLP (Natural Language Processing). After watching the last episode of a mad scientist's and his grandson's adventures I realized that it's not a coincidence and I have to apply my knowledge in NLP and Machine Learning to create my own Rick (in a form of chat-bot, at least for now).

![alt text](https://media.giphy.com/media/ZdBnIqBeVWBYTobH8U/giphy.gif)

Image from [Giphy](https://giphy.com/)

The path to creating virtual Rick was not easy. I tried many models and approaches but virtual Rick did not want to find a common language with me for a long time. Rick's first response to my greeting was "I am not your buddy" (to tell you the truth, in the beginning he knew only this phrase). But I did not give up and with the help of excellent Transformers library and Microsoft's Large-scale Pretrained Response Generation Model (DialoGPT) I managed to talk heart-to-heart with Rick. Here is one of our conversations:

**User**:Hey, Rick, make me a time machine <br>
**RickBot**: You're a monster! <br>
**User**:No, I just want to travel in time <br>
**RickBot**: To the past. <br>
**User**:yes <br>
**RickBot**: No time machine, please. <br>
**User**:But I need it <br>
**RickBot**: You don't need it, Morty. Just do it. <br>
**User**:I want to see antient Greece <br>
**RickBot**: Oh, geez. <br>

![alt text](https://media.giphy.com/media/WVGSfyhUYBoiGLbZ7N/giphy.gif)

Image from [Giphy](https://giphy.com/)

## A bit of theory

In this article I will tell you how to create a virtual character whose statements will be based on a transcript of my favorite animated science fiction sitcom. You can use characters and sitcoms of your choice. I added code to convert a regular text file with dialogs into a format that the model understands.

As I already mentioned, [Transformers](https://huggingface.co/transformers/) library, which contains the latest NLP models (such as [BERT](https://huggingface.co/transformers/model_doc/bert.html), [XLNet](https://huggingface.co/transformers/model_doc/xlnet.html), [GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)) will help us in our task. You can read more about transformers in a beautifully illustrated [article](http://jalammar.github.io/illustrated-transformer/) by Jay Alammar.

![alt text](http://jalammar.github.io/images/t/transformer_resideual_layer_norm_3.png) Image from[ http://jalammar.github.io](http://jalammar.github.io/illustrated-transformer/)

Not so long ago, Microsoft’s [DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html) was added to the Transformers model collection. DialoGPT is a GPT-2 model, trained on 147M multi-turn dialogue from Reddit discussion thread (you can learn more about GPT-2 [here](http://jalammar.github.io/illustrated-gpt2/)). This model is ideally suited for creating a virtual character for a fascinating conversation and even in the small implementation option it can maintain a coherent dialogue, which we will see now.



## First dialogue with DialoGPT

We will conduct all our experiments in Google Colab, its resources are enough to train the small DialoGPT model. Firstly, we will connect to Google Drive and install the necessary modules.

In [2]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [3]:
! pip -q install transformers

[K     |████████████████████████████████| 2.1MB 7.7MB/s 
[K     |████████████████████████████████| 901kB 36.6MB/s 
[K     |████████████████████████████████| 3.3MB 55.4MB/s 
[?25h

In [4]:
!pip install datasets

Collecting datasets
[?25l  Downloading https://files.pythonhosted.org/packages/46/1a/b9f9b3bfef624686ae81c070f0a6bb635047b17cdb3698c7ad01281e6f9a/datasets-1.6.2-py3-none-any.whl (221kB)
[K     |████████████████████████████████| 225kB 7.7MB/s 
Collecting fsspec
[?25l  Downloading https://files.pythonhosted.org/packages/e9/91/2ef649137816850fa4f4c97c6f2eabb1a79bf0aa2c8ed198e387e373455e/fsspec-2021.4.0-py3-none-any.whl (108kB)
[K     |████████████████████████████████| 112kB 13.3MB/s 
Collecting xxhash
[?25l  Downloading https://files.pythonhosted.org/packages/7d/4f/0a862cad26aa2ed7a7cd87178cbbfa824fc1383e472d63596a0d018374e7/xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243kB)
[K     |████████████████████████████████| 245kB 13.1MB/s 
Collecting huggingface-hub<0.1.0
  Downloading https://files.pythonhosted.org/packages/a1/88/7b1e45720ecf59c6c6737ff332f41c955963090a18e72acbcbeac6b25e86/huggingface_hub-0.0.8-py3-none-any.whl
Installing collected packages: fsspec, xxhash, huggingfa

Let's move to the desired folder in which we will store all our data.

In [5]:
import os
os.chdir("/content/drive/My Drive/Colab Notebooks")

Try to chat with DialoGPT without fine-tuning.

In [52]:
from transformers import AutoModelWithLMHead, AutoTokenizer, OpenAIGPTTokenizer, OpenAIGPTLMHeadModel, OpenAIGPTConfig #BertTokenizer, BertLMHeadModel, BertConfig
import torch

# tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
# model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small")


# tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# config = BertConfig.from_pretrained("bert-base-uncased")
# config.is_decoder = True
# model = BertLMHeadModel.from_pretrained('bert-base-uncased', config=config)

tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
config = OpenAIGPTConfig.from_pretrained('openai-gpt')
config.is_decoder = True
model = OpenAIGPTLMHeadModel.from_pretrained('openai-gpt', config=config)

05/03/2021 05:00:32 - INFO - filelock -   Lock 140663886294288 acquired on /root/.cache/huggingface/transformers/918c57540c636a2a662770d208fcf20aa8c3faea78201fc612e5c84f052f1119.ac55819e76b0f8b0c32cbb407436947d090d98f8952f38376ee249ed382927ab.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=815973.0, style=ProgressStyle(descripti…

05/03/2021 05:00:33 - INFO - filelock -   Lock 140663886294288 released on /root/.cache/huggingface/transformers/918c57540c636a2a662770d208fcf20aa8c3faea78201fc612e5c84f052f1119.ac55819e76b0f8b0c32cbb407436947d090d98f8952f38376ee249ed382927ab.lock





05/03/2021 05:00:33 - INFO - filelock -   Lock 140660353133840 acquired on /root/.cache/huggingface/transformers/a682e219a788dde0e4f77bc5a470d85a4d7e493420506ce7e3266f7be122cf9e.2150b9689fda7ca7c6224ff32672c004259f974e96934e8eb69d8dd546d682db.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=458495.0, style=ProgressStyle(descripti…

05/03/2021 05:00:33 - INFO - filelock -   Lock 140660353133840 released on /root/.cache/huggingface/transformers/a682e219a788dde0e4f77bc5a470d85a4d7e493420506ce7e3266f7be122cf9e.2150b9689fda7ca7c6224ff32672c004259f974e96934e8eb69d8dd546d682db.lock





05/03/2021 05:00:34 - INFO - filelock -   Lock 140660353133840 acquired on /root/.cache/huggingface/transformers/325373fcbb0daa99905371727842a87ae9ca0f02f71db071720bb4d5a59076cf.b1810f3c6ed9fc0632664008484a9b569103559c04ac90321723cd808a3a96f9.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1272610.0, style=ProgressStyle(descript…

05/03/2021 05:00:35 - INFO - filelock -   Lock 140660353133840 released on /root/.cache/huggingface/transformers/325373fcbb0daa99905371727842a87ae9ca0f02f71db071720bb4d5a59076cf.b1810f3c6ed9fc0632664008484a9b569103559c04ac90321723cd808a3a96f9.lock
ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.





05/03/2021 05:00:35 - INFO - filelock -   Lock 140665019638288 acquired on /root/.cache/huggingface/transformers/bebb46f5735701bc248ef9faa26f12577944fa7fc8e9be1a774b94d4cb8b79b6.ba6f10a5446f364b92311c09e55e49aa27024a4aeefc1ea50fd733b77bcd997d.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=656.0, style=ProgressStyle(description_…

05/03/2021 05:00:35 - INFO - filelock -   Lock 140665019638288 released on /root/.cache/huggingface/transformers/bebb46f5735701bc248ef9faa26f12577944fa7fc8e9be1a774b94d4cb8b79b6.ba6f10a5446f364b92311c09e55e49aa27024a4aeefc1ea50fd733b77bcd997d.lock





05/03/2021 05:00:36 - INFO - filelock -   Lock 140653127628304 acquired on /root/.cache/huggingface/transformers/3e867ce638da986403594a5acbb39846ecb9c3b360a3b526348dd54b06938e55.93527980a112896731f93175b7c1cbc6b0fd771fad85fcc777ff5d49d249782e.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=478750579.0, style=ProgressStyle(descri…

05/03/2021 05:00:50 - INFO - filelock -   Lock 140653127628304 released on /root/.cache/huggingface/transformers/3e867ce638da986403594a5acbb39846ecb9c3b360a3b526348dd54b06938e55.93527980a112896731f93175b7c1cbc6b0fd771fad85fcc777ff5d49d249782e.lock





Some weights of OpenAIGPTLMHeadModel were not initialized from the model checkpoint at openai-gpt and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [53]:
x = tokenizer.encode("How are you?", return_tensors="pt")

In [54]:
x

tensor([[718, 640, 512, 257]])

In [22]:
# tokenizer.add_special_tokens({'eos_token': '[EOS]'})

1

In [36]:
# tokenizer.eos_token_id

In [56]:
tokenizer.convert_ids_to_tokens([718, 640, 512, 257])

['how</w>', 'are</w>', 'you</w>', '?</w>']

In [11]:
#  model.generate(
#     x, max_length=1000,
#     pad_token_id=tokenizer.eos_token_id
#     )

In [12]:
# beam_output = model.generate(
#     x, 
#     max_length=50, 
#     num_beams=5, 
#     early_stopping=True
# )

In [13]:
# print("Output:\n" + 100 * '-')
# print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

In [None]:
# new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

In [None]:
# new_user_input_ids

In [58]:
# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:")  , return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens    
    # chat_history_ids = model.generate(
    # bot_input_ids, max_le
    # )
    chat_history_ids = model.generate(
        bot_input_ids, max_length=500,
        pad_token_id=tokenizer.eos_token_id,  
        no_repeat_ngram_size=3,       
        do_sample=True, 
        top_k=100, 
        top_p=0.7,
        temperature = 0.8)

    # pretty print last ouput tokens from bot
    print("OpenAI-GPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

>> User:Hi!
OpenAI-GPT: " 
 i look over and see that it's the same girl who was here the other day. i've only seen her once or twice. 
 " i'm sorry. i didn't mean to startle you. " 
 " it's okay. " she gives me a smile. " i've been looking for you. i was just... " she looks around the store. " looking for a book. " her eyes go to the counter. " are you here for a free book? " 
 it's a good question. 
 i walk over and look at the books. they're all old and tattered, like they were in a bookstore. i can tell they've been in here for years. i pull out a few of the old books and look them over. " yeah, i'm here for the book. i'm not really sure what i'm looking for. " they're old and worn. i think they're probably from the same store. 
 she takes a book out of the shelf and hands it to me. " here. this is the one i was looking for, " she says. " it has the same cover. " the book is about a book about a young girl who had a crush on a guy who was in a band. i open it up and look through the

Input length of input_ids is 504, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


OpenAI-GPT: 
>> User:Hello!


Input length of input_ids is 506, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


OpenAI-GPT: 
>> User:Hi


Input length of input_ids is 507, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


OpenAI-GPT: 
>> User:hi


Input length of input_ids is 508, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


OpenAI-GPT: 


**User:**Hi Rick <br> 
**DialoGPT:** Hi Rick <br>
**User:**How are you? <br>
**DialoGPT:** I'm good, how are you? <br>
**User:**I am fine. Where is Morty? <br>
**DialoGPT:** He's in the basement. <br>
**User:**Who is Morty? <br>
**DialoGPT:** He's a Morty. <br>
**User:**Who are you? <br>
**DialoGPT:** I am a Morty. <br>

![alt text](https://media.giphy.com/media/L3WevKXIKFDaZBvV8Q/giphy.gif)

Image from [Giphy](https://giphy.com/)

Not bad but not too impressive. We will fix it with fine-tuning.

## Model initial configuration

Let's train our own Rick chatbot. For start, we will need basic configuration and a dataset.
Configuration and training scripts are mostly based on this [script](https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_language_modeling.py) from Huggingface and great [tutorial](https://nathancooper.io/i-am-a-nerd/chatbot/deep-learning/gpt2/2020/05/12/chatbot-part-1.html) from Nathan Cooper.

In [59]:
"""
Fine-tuning the library models for language modeling on a text file (GPT, GPT-2, BERT, RoBERTa).
GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa are fine-tuned
using a masked language modeling (MLM) loss.
"""

import glob
import logging
import os
import pickle
import random
import re
import shutil
from typing import Dict, List, Tuple

import pandas as pd
import numpy as np
import torch

from sklearn.model_selection import train_test_split

from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader, Dataset, RandomSampler, SequentialSampler
from torch.utils.data.distributed import DistributedSampler
from tqdm.notebook import tqdm, trange

from pathlib import Path

from transformers import (
    MODEL_WITH_LM_HEAD_MAPPING,
    WEIGHTS_NAME,
    AdamW,
    AutoConfig,
    AutoModelWithLMHead,
    AutoTokenizer,
    PreTrainedModel,
    PreTrainedTokenizer,
    get_linear_schedule_with_warmup,
)

from datasets import load_dataset, load_metric
try:
    from torch.utils.tensorboard import SummaryWriter
except ImportError:
    from tensorboardX import SummaryWriter

# Configs
logger = logging.getLogger(__name__)

MODEL_CONFIG_CLASSES = list(MODEL_WITH_LM_HEAD_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

In [78]:
# Args to allow for easy convertion of python script to notebook
class Args():
    def __init__(self):
        self.output_dir = 'output-small-openaigpt'
        self.model_type = 'gpt2'
        self.model_name_or_path = 'openai-gpt'
        self.config_name = 'openai-gpt'
        self.tokenizer_name = 'openai-gpt'
        self.cache_dir = 'cached'
        self.block_size = 512
        self.do_train = True
        self.do_eval = True
        self.evaluate_during_training = False
        self.per_gpu_train_batch_size = 2
        self.per_gpu_eval_batch_size = 2
        self.gradient_accumulation_steps = 1
        self.learning_rate = 5e-5
        self.weight_decay = 0.0
        self.adam_epsilon = 1e-8
        self.max_grad_norm = 1.0
        self.num_train_epochs = 3
        self.max_steps = -1
        self.warmup_steps = 0
        self.logging_steps = 1000
        self.save_steps = 3500
        self.save_total_limit = None
        self.eval_all_checkpoints = False
        self.no_cuda = False
        self.overwrite_output_dir = True
        self.overwrite_cache = True
        self.should_continue = False
        self.seed = 42
        self.local_rank = -1
        self.fp16 = False
        self.fp16_opt_level = 'O1'

args = Args()

## Prepare Dataset

Our dialogues dataset will be based on a dataset used in Andrada Olteanu's [article](https://www.kaggle.com/andradaolteanu/sentiment-analysis-rick-and-morty-scripts/) about Rick and Morty sentiment analysis. Big thanks to her work and also to Gabriel Hernandes, author of original [text dataset](https://github.com/ghhernandes/rickmorty-gan/tree/master/data)!

![alt text](https://media.giphy.com/media/U6LOakQja88ImTnE6T/giphy.gif)

Image from [Giphy](https://giphy.com/)

First of all we will use kaggle module to download needed dataset. You can read in more detail about module and how to get Kaggle API Token by this [link](https://github.com/Kaggle/kaggle-api). Or you can just download RickAndMortyScripts.csv file from [article](https://www.kaggle.com/andradaolteanu/sentiment-analysis-rick-and-morty-scripts/) and place this file in your working directory. 

In [None]:
# !pip install kaggle

In [None]:
# !mkdir ~/.kaggle1
# !cp kaggle.json ~/.kaggle1/kaggle.json

In [None]:
# !kaggle datasets download andradaolteanu/rickmorty-scripts -f RickAndMortyScripts.csv 

In [None]:
# !mv datasets%2F506221%2F935855%2FRickAndMortyScripts.csv RickAndMortyScripts.csv

In [79]:
daily_dialogue = load_dataset("daily_dialog")



In [80]:
daily_dialogue["train"]["dialog"][0]

['Say , Jim , how about going for a few beers after dinner ? ',
 ' You know that is tempting but is really not good for our fitness . ',
 ' What do you mean ? It will help us to relax . ',
 " Do you really think so ? I don't . It will just make us fat and act silly . Remember last time ? ",
 " I guess you are right.But what shall we do ? I don't feel like sitting at home . ",
 ' I suggest a walk over to the gym where we can play singsong and meet some of our friends . ',
 " That's a good idea . I hear Mary and Sally often go there to play pingpong.Perhaps we can make a foursome with them . ",
 ' Sounds great to me ! If they are willing , we could ask them to go dancing with us.That is excellent exercise and fun , too . ',
 " Good.Let ' s go now . ",
 ' All right . ']

In [81]:
len_dialog = [ len(dialog) for dialog in daily_dialogue["train"]["dialog"]]

In [82]:
min(len_dialog)

2

In [83]:
max(len_dialog)

35

In [84]:
new_dialog = [ dialog for dialog in daily_dialogue["train"]["dialog"] if len(dialog) > 6]

In [85]:
n = 7
new_dialog = [d[:n] for d in new_dialog]

In [86]:
new_dialog[0]

['Say , Jim , how about going for a few beers after dinner ? ',
 ' You know that is tempting but is really not good for our fitness . ',
 ' What do you mean ? It will help us to relax . ',
 " Do you really think so ? I don't . It will just make us fat and act silly . Remember last time ? ",
 " I guess you are right.But what shall we do ? I don't feel like sitting at home . ",
 ' I suggest a walk over to the gym where we can play singsong and meet some of our friends . ',
 " That's a good idea . I hear Mary and Sally often go there to play pingpong.Perhaps we can make a foursome with them . "]

In [87]:
len(new_dialog)

6144

In [None]:
#carik min
#trus contexted

In [None]:
# # Let's look at original dataset
# all_rick = pd.read_csv('D:\Google Drive\datasets/RickAndMortyScripts.csv')
# all_rick.head(20)

We will convert this dataset in a way that every responce row will contain **n** previous responces as a context. For our purposes seven previous responces will be enough.

In [None]:
# contexted = []

# n = 7

# for i in range(n, len(all_rick['line'])):
#   row = []
#   prev = i - 1 - n # we additionally substract 1, so row will contain current responce and 7 previous responces  
#   for j in range(i, prev, -1):
#     row.append(all_rick['line'][j])
#   contexted.append(row)  

In [None]:
# contexted

In [None]:
# len(contexted)

In [88]:
columns = ['response', 'context'] 
columns = columns + ['context/'+str(i) for i in range(n-2)]
columns

['response',
 'context',
 'context/0',
 'context/1',
 'context/2',
 'context/3',
 'context/4']

In [89]:
df = pd.DataFrame.from_records(new_dialog, columns=columns)
df.head(5)

Unnamed: 0,response,context,context/0,context/1,context/2,context/3,context/4
0,"Say , Jim , how about going for a few beers af...",You know that is tempting but is really not g...,What do you mean ? It will help us to relax .,Do you really think so ? I don't . It will ju...,I guess you are right.But what shall we do ? ...,I suggest a walk over to the gym where we can...,That's a good idea . I hear Mary and Sally of...
1,"Hey John , nice skates . Are they new ?","Yeah , I just got them . I started playing ic...",What position do you play ?,I ’ m a defender . It ’ s a lot of fun . You ...,"Yeah , you ’ re a pretty big guy . I play goa...","Oh , yeah ? Which team ?",The Rockets .
2,"Hey Lydia , what are you reading ?",I ’ m looking at my horoscope for this month ...,What are you talking about ? Let me see that ...,"It ’ s a prediction of your month , based on ...",January 5th .,Let ’ s see . . . you ’ re a Capricorn . It s...,That ’ s bogus . I don't feel any stress at w...
3,"Frank ’ s getting married , do you believe thi...",Is he really ?,"Yes , he is . He loves the girl very much .",Who is he marring ?,"A girl he met on holiday in Spain , I think .",Have they set a date for the wedding ?,Not yet .
4,I hear you bought a new house in the northern ...,"That ’ s right , we bought it the same day we...",What kind of house is it ?,It ’ s a wonderful Spanish style .,"Oh , I love the roof tiles on Spanish style h...",And it ’ s a bargaining . A house like this i...,"Great , is it a two bedroom house ?"


Split our dataset into a training and test parts.

In [90]:
trn_df, val_df = train_test_split(df, test_size = 0.1)
trn_df.head()

Unnamed: 0,response,context,context/0,context/1,context/2,context/3,context/4
1694,I'll always remember my college days .,Oh yeah ?,It was one of the best times in my life . It ...,How did you feel when you graduated ?,It was a round day for me . My family attend ...,What did you do after graduation ?,"I was planning to attend gradate school , but..."
5102,"Please come in , Steven .","All right , Mr . Green .",Have a seat over there . How are things going...,Pretty well . Everyone is working hard .,"But , our business has been going down sharpl...",You mean I'm among the people who have to go ?,I'm afraid so .
1322,I haven't danced for a long time .,Neither have I .,"We must go to a dance soon , or we'll forget ...","Yes , we must . What have you been doing sinc...",I've been studying hard for my examinations ....,I've been learning Japanese every evening .,Why have you been learning Japanese ? Why not...
2835,"First of all , thank you for accepting this jo...",It ’ s my pleasure .,What are your salary expectations ?,Would you please tell me about your pay skill...,We ’ ll offer you a monthly salary to begin w...,They sum my skills and experience . I ’ d lik...,That sounds reasonable .
907,"Bob , you look pale . What happened ?",I didn't sleep a wink last night .,Did you have something on your mind ? You loo...,"Well , I'm under a lot of pressure . My boss ...",Is there anything I can do for you ?,"Well , I guess no one can help me but myself ...",I know your feeling . Take it easy .


In [104]:
# next(df.iterrows())[1]

response     Say , Jim , how about going for a few beers af...
context       You know that is tempting but is really not g...
context/0       What do you mean ? It will help us to relax . 
context/1     Do you really think so ? I don't . It will ju...
context/2     I guess you are right.But what shall we do ? ...
context/3     I suggest a walk over to the gym where we can...
context/4     That's a good idea . I hear Mary and Sally of...
Name: 0, dtype: object

In [103]:
# flatten = lambda l: [item for sublist in l for item in sublist]
# row = next(df.iterrows())[1]
# conv = list(reversed([tokenizer.encode(x) for x in row]))
# conv = flatten(conv)
# print(conv)
# print(len(conv))

[525, 256, 252, 246, 870, 1499, 239, 249, 1344, 2846, 488, 6706, 2528, 799, 655, 485, 2200, 24, 667, 25447, 239, 1855, 606, 759, 925, 246, 40208, 556, 688, 239, 249, 6208, 246, 1671, 715, 485, 481, 6860, 806, 606, 759, 2200, 32050, 488, 1973, 803, 498, 622, 1662, 239, 249, 1839, 512, 640, 770, 239, 568, 599, 2821, 606, 587, 257, 249, 2310, 256, 241, 1064, 649, 1779, 491, 1163, 239, 587, 512, 976, 825, 620, 257, 249, 2310, 256, 241, 239, 507, 812, 668, 925, 768, 4739, 488, 1486, 4997, 239, 1559, 1009, 720, 257, 599, 587, 512, 1315, 257, 507, 812, 1150, 768, 485, 4496, 239, 512, 699, 525, 544, 11111, 568, 544, 976, 595, 870, 562, 622, 25730, 239, 937, 240, 4265, 240, 718, 670, 797, 562, 246, 1026, 12767, 861, 2340, 257]
137


In [None]:
# tokenizer.convert_ids_to_tokens([27107])

In [None]:
# construct_conv(next(df.iterrows())[1],tokenizer, eos=True)

Now will convert our dataset in a format suitable for our model. Basically we will concatenate responses in one string for each row (additionally we will add special 'end of string' token between responses, so the model will understand end of each response in a string).  

In [106]:
def construct_conv(row, tokenizer, eos = True):
    flatten = lambda l: [item for sublist in l for item in sublist]
    conv = list(reversed([tokenizer.encode(x) for x in row]))
    conv = flatten(conv)
    return conv

class ConversationDataset(Dataset):
    def __init__(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):

        block_size = block_size - (tokenizer.model_max_length - tokenizer.max_len_single_sentence)

        directory = args.cache_dir
        cached_features_file = os.path.join(
            directory, args.model_type + "_cached_lm_" + str(block_size)
        )

        if os.path.exists(cached_features_file) and not args.overwrite_cache:
            logger.info("Loading features from cached file %s", cached_features_file)
            with open(cached_features_file, "rb") as handle:
                self.examples = pickle.load(handle)
        else:
            logger.info("Creating features from dataset file at %s", directory)

            self.examples = []
            for _, row in df.iterrows():
                conv = construct_conv(row, tokenizer)
                if len(conv) <= 512:
                  self.examples.append(conv)

            logger.info("Saving features into cached file %s", cached_features_file)
            with open(cached_features_file, "wb") as handle:
                pickle.dump(self.examples, handle, protocol=pickle.HIGHEST_PROTOCOL)

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, item):
        return torch.tensor(self.examples[item], dtype=torch.long)

In [107]:
# Cacheing and storing of data/checkpoints

def load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False):
    return ConversationDataset(tokenizer, args, df_val if evaluate else df_trn)


def set_seed(args):
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    if args.n_gpu > 0:
        torch.cuda.manual_seed_all(args.seed)


def _sorted_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
    ordering_and_checkpoint_path = []

    glob_checkpoints = glob.glob(os.path.join(args.output_dir, "{}-*".format(checkpoint_prefix)))

    for path in glob_checkpoints:
        if use_mtime:
            ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
        else:
            regex_match = re.match(".*{}-([0-9]+)".format(checkpoint_prefix), path)
            if regex_match and regex_match.groups():
                ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))

    checkpoints_sorted = sorted(ordering_and_checkpoint_path)
    checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
    return checkpoints_sorted


def _rotate_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> None:
    if not args.save_total_limit:
        return
    if args.save_total_limit <= 0:
        return

    # Check if we should delete older checkpoint(s)
    checkpoints_sorted = _sorted_checkpoints(args, checkpoint_prefix, use_mtime)
    if len(checkpoints_sorted) <= args.save_total_limit:
        return

    number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - args.save_total_limit)
    checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
    for checkpoint in checkpoints_to_be_deleted:
        logger.info("Deleting older checkpoint [{}] due to args.save_total_limit".format(checkpoint))
        shutil.rmtree(checkpoint)

## Training and Evaluating

There will be quite a lot of code needed for training our model but don’t worry, everything should work as is, the main thing is to give the model the dataset in the right format.

![alt text](https://media.giphy.com/media/KetvQljQJdEMscR83K/giphy.gif)

Image from [Giphy](https://giphy.com/)

In [108]:
def train(args, train_dataset, model: PreTrainedModel, tokenizer: PreTrainedTokenizer) -> Tuple[int, float]:
    """ Train the model """
    if args.local_rank in [-1, 0]:
        tb_writer = SummaryWriter()

    args.train_batch_size = args.per_gpu_train_batch_size * max(1, args.n_gpu)

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    train_sampler = RandomSampler(train_dataset) if args.local_rank == -1 else DistributedSampler(train_dataset)
    train_dataloader = DataLoader(
        train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, collate_fn=collate, drop_last = True
    )

    if args.max_steps > 0:
        t_total = args.max_steps
        args.num_train_epochs = args.max_steps // (len(train_dataloader) // args.gradient_accumulation_steps) + 1
    else:
        t_total = len(train_dataloader) // args.gradient_accumulation_steps * args.num_train_epochs

    model = model.module if hasattr(model, "module") else model  # Take care of distributed/parallel training
    model.resize_token_embeddings(len(tokenizer))
    # add_special_tokens_(model, tokenizer)


    # Prepare optimizer and schedule (linear warmup and decay)
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
            "weight_decay": args.weight_decay,
        },
        {"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], "weight_decay": 0.0},
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
    scheduler = get_linear_schedule_with_warmup(
        optimizer, num_warmup_steps=args.warmup_steps, num_training_steps=t_total
    )

    # Check if saved optimizer or scheduler states exist
    if (
        args.model_name_or_path
        and os.path.isfile(os.path.join(args.model_name_or_path, "optimizer.pt"))
        and os.path.isfile(os.path.join(args.model_name_or_path, "scheduler.pt"))
    ):
        # Load in optimizer and scheduler states
        optimizer.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "optimizer.pt")))
        scheduler.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "scheduler.pt")))

    if args.fp16:
        try:
            from apex import amp
        except ImportError:
            raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
        model, optimizer = amp.initialize(model, optimizer, opt_level=args.fp16_opt_level)

    # multi-gpu training (should be after apex fp16 initialization)
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Distributed training (should be after apex fp16 initialization)
    if args.local_rank != -1:
        model = torch.nn.parallel.DistributedDataParallel(
            model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True
        )

    # Train!
    logger.info("***** Running training *****")
    logger.info("  Num examples = %d", len(train_dataset))
    logger.info("  Num Epochs = %d", args.num_train_epochs)
    logger.info("  Instantaneous batch size per GPU = %d", args.per_gpu_train_batch_size)
    logger.info(
        "  Total train batch size (w. parallel, distributed & accumulation) = %d",
        args.train_batch_size
        * args.gradient_accumulation_steps
        * (torch.distributed.get_world_size() if args.local_rank != -1 else 1),
    )
    logger.info("  Gradient Accumulation steps = %d", args.gradient_accumulation_steps)
    logger.info("  Total optimization steps = %d", t_total)

    global_step = 0
    epochs_trained = 0
    steps_trained_in_current_epoch = 0
    # Check if continuing training from a checkpoint
    if args.model_name_or_path and os.path.exists(args.model_name_or_path):
        try:
            # set global_step to gobal_step of last saved checkpoint from model path
            checkpoint_suffix = args.model_name_or_path.split("-")[-1].split("/")[0]
            global_step = int(checkpoint_suffix)
            epochs_trained = global_step // (len(train_dataloader) // args.gradient_accumulation_steps)
            steps_trained_in_current_epoch = global_step % (len(train_dataloader) // args.gradient_accumulation_steps)

            logger.info("  Continuing training from checkpoint, will skip to saved global_step")
            logger.info("  Continuing training from epoch %d", epochs_trained)
            logger.info("  Continuing training from global step %d", global_step)
            logger.info("  Will skip the first %d steps in the first epoch", steps_trained_in_current_epoch)
        except ValueError:
            logger.info("  Starting fine-tuning.")

    tr_loss, logging_loss = 0.0, 0.0

    model.zero_grad()
    train_iterator = trange(
        epochs_trained, int(args.num_train_epochs), desc="Epoch", disable=args.local_rank not in [-1, 0]
    )
    set_seed(args)  # Added here for reproducibility
    for _ in train_iterator:
        epoch_iterator = tqdm(train_dataloader, desc="Iteration", disable=args.local_rank not in [-1, 0])
        for step, batch in enumerate(epoch_iterator):

            # Skip past any already trained steps if resuming training
            if steps_trained_in_current_epoch > 0:
                steps_trained_in_current_epoch -= 1
                continue

            inputs, labels = (batch, batch)
            if inputs.shape[1] > 1024: continue
            inputs = inputs.to(args.device)
            labels = labels.to(args.device)
            model.train()
            outputs = model(inputs, labels=labels)
            loss = outputs[0]  # model outputs are always tuple in transformers (see doc)

            if args.n_gpu > 1:
                loss = loss.mean()  # mean() to average on multi-gpu parallel training
            if args.gradient_accumulation_steps > 1:
                loss = loss / args.gradient_accumulation_steps

            if args.fp16:
                with amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
            else:
                loss.backward()

            tr_loss += loss.item()
            if (step + 1) % args.gradient_accumulation_steps == 0:
                if args.fp16:
                    torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), args.max_grad_norm)
                else:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)
                optimizer.step()
                scheduler.step()  # Update learning rate schedule
                model.zero_grad()
                global_step += 1

                if args.local_rank in [-1, 0] and args.logging_steps > 0 and global_step % args.logging_steps == 0:
                    # Log metrics
                    if (
                        args.local_rank == -1 and args.evaluate_during_training
                    ):  # Only evaluate when single GPU otherwise metrics may not average well
                        results = evaluate(args, model, tokenizer)
                        for key, value in results.items():
                            tb_writer.add_scalar("eval_{}".format(key), value, global_step)
                    tb_writer.add_scalar("lr", scheduler.get_lr()[0], global_step)
                    tb_writer.add_scalar("loss", (tr_loss - logging_loss) / args.logging_steps, global_step)
                    logging_loss = tr_loss

                if args.local_rank in [-1, 0] and args.save_steps > 0 and global_step % args.save_steps == 0:
                    checkpoint_prefix = "checkpoint"
                    # Save model checkpoint
                    output_dir = os.path.join(args.output_dir, "{}-{}".format(checkpoint_prefix, global_step))
                    os.makedirs(output_dir, exist_ok=True)
                    model_to_save = (
                        model.module if hasattr(model, "module") else model
                    )  # Take care of distributed/parallel training
                    model_to_save.save_pretrained(output_dir)
                    tokenizer.save_pretrained(output_dir)

                    torch.save(args, os.path.join(output_dir, "training_args.bin"))
                    logger.info("Saving model checkpoint to %s", output_dir)

                    _rotate_checkpoints(args, checkpoint_prefix)

                    torch.save(optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
                    torch.save(scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
                    logger.info("Saving optimizer and scheduler states to %s", output_dir)

            if args.max_steps > 0 and global_step > args.max_steps:
                epoch_iterator.close()
                break
        if args.max_steps > 0 and global_step > args.max_steps:
            train_iterator.close()
            break

    if args.local_rank in [-1, 0]:
        tb_writer.close()

    return global_step, tr_loss / global_step

# Evaluation of some model

def evaluate(args, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, df_trn, df_val, prefix="") -> Dict:
    # Loop to handle MNLI double evaluation (matched, mis-matched)
    eval_output_dir = args.output_dir

    eval_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=True)
    os.makedirs(eval_output_dir, exist_ok=True)
    args.eval_batch_size = args.per_gpu_eval_batch_size * max(1, args.n_gpu)
    # Note that DistributedSampler samples randomly

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    eval_sampler = SequentialSampler(eval_dataset)
    eval_dataloader = DataLoader(
        eval_dataset, sampler=eval_sampler, batch_size=args.eval_batch_size, collate_fn=collate, drop_last = True
    )

    # multi-gpu evaluate
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Eval!
    logger.info("***** Running evaluation {} *****".format(prefix))
    logger.info("  Num examples = %d", len(eval_dataset))
    logger.info("  Batch size = %d", args.eval_batch_size)
    eval_loss = 0.0
    nb_eval_steps = 0
    model.eval()

    for batch in tqdm(eval_dataloader, desc="Evaluating"):
        inputs, labels = (batch, batch)
        inputs = inputs.to(args.device)
        labels = labels.to(args.device)

        with torch.no_grad():
            outputs = model(inputs, labels=labels)
            lm_loss = outputs[0]
            eval_loss += lm_loss.mean().item()
        nb_eval_steps += 1

    eval_loss = eval_loss / nb_eval_steps
    perplexity = torch.exp(torch.tensor(eval_loss))

    result = {"perplexity": perplexity}

    output_eval_file = os.path.join(eval_output_dir, prefix, "eval_results.txt")
    with open(output_eval_file, "w") as writer:
        logger.info("***** Eval results {} *****".format(prefix))
        for key in sorted(result.keys()):
            logger.info("  %s = %s", key, str(result[key]))
            writer.write("%s = %s\n" % (key, str(result[key])))

    return result

In [109]:
# Main runner

def main(df_trn, df_val):
    args = Args()
    
    if args.should_continue:
        sorted_checkpoints = _sorted_checkpoints(args)
        if len(sorted_checkpoints) == 0:
            raise ValueError("Used --should_continue but no checkpoint was found in --output_dir.")
        else:
            args.model_name_or_path = sorted_checkpoints[-1]

    if (
        os.path.exists(args.output_dir)
        and os.listdir(args.output_dir)
        and args.do_train
        and not args.overwrite_output_dir
        and not args.should_continue
    ):
        raise ValueError(
            "Output directory ({}) already exists and is not empty. Use --overwrite_output_dir to overcome.".format(
                args.output_dir
            )
        )

    # Setup CUDA, GPU & distributed training
    device = torch.device("cuda")
    args.n_gpu = torch.cuda.device_count()
    args.device = device

    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN,
    )
    logger.warning(
        "Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
        args.local_rank,
        device,
        args.n_gpu,
        bool(args.local_rank != -1),
        args.fp16,
    )

    # Set seed
    set_seed(args)

    config = AutoConfig.from_pretrained(args.config_name, cache_dir=args.cache_dir)
    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
    model = AutoModelWithLMHead.from_pretrained(
        args.model_name_or_path,
        from_tf=False,
        config=config,
        cache_dir=args.cache_dir,
    )
    model.to(args.device)
    
    logger.info("Training/evaluation parameters %s", args)

    # Training
    if args.do_train:
        train_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False)

        global_step, tr_loss = train(args, train_dataset, model, tokenizer)
        logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

    # Saving best-practices: if you use save_pretrained for the model and tokenizer, you can reload them using from_pretrained()
    if args.do_train:
        # Create output directory if needed
        os.makedirs(args.output_dir, exist_ok=True)

        logger.info("Saving model checkpoint to %s", args.output_dir)
        # Save a trained model, configuration and tokenizer using `save_pretrained()`.
        # They can then be reloaded using `from_pretrained()`
        model_to_save = (
            model.module if hasattr(model, "module") else model
        )  # Take care of distributed/parallel training
        model_to_save.save_pretrained(args.output_dir)
        tokenizer.save_pretrained(args.output_dir)

        # Good practice: save your training arguments together with the trained model
        torch.save(args, os.path.join(args.output_dir, "training_args.bin"))

        # Load a trained model and vocabulary that you have fine-tuned
        model = AutoModelWithLMHead.from_pretrained(args.output_dir)
        tokenizer = AutoTokenizer.from_pretrained(args.output_dir)
        model.to(args.device)

    # Evaluation
    results = {}
    if args.do_eval and args.local_rank in [-1, 0]:
        checkpoints = [args.output_dir]
        if args.eval_all_checkpoints:
            checkpoints = list(
                os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + "/**/" + WEIGHTS_NAME, recursive=True))
            )
            logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN)  # Reduce logging
        logger.info("Evaluate the following checkpoints: %s", checkpoints)
        for checkpoint in checkpoints:
            global_step = checkpoint.split("-")[-1] if len(checkpoints) > 1 else ""
            prefix = checkpoint.split("/")[-1] if checkpoint.find("checkpoint") != -1 else ""

            model = AutoModelWithLMHead.from_pretrained(checkpoint)
            model.to(args.device)
            result = evaluate(args, model, tokenizer, df_trn, df_val, prefix=prefix)
            result = dict((k + "_{}".format(global_step), v) for k, v in result.items())
            results.update(result)

    return results

It is time to train our model!

![alt text](https://media.giphy.com/media/Tia3dkakIp2m4uGoDI/giphy.gif)

Image from [Giphy](https://giphy.com/)

In [110]:
main(trn_df, val_df)

Some weights of OpenAIGPTLMHeadModel were not initialized from the model checkpoint at openai-gpt and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
05/03/2021 05:39:39 - INFO - __main__ -   Training/evaluation parameters <__main__.Args object at 0x7fef20755a90>
05/03/2021 05:39:39 - INFO - __main__ -   Creating features from dataset file at cached
05/03/2021 05:39:43 - INFO - __main__ -   Saving features into cached file cached/gpt2_cached_lm_512
05/03/2021 05:39:43 - INFO - __main__ -   ***** Running training *****
05/03/2021 05:39:43 - INFO - __main__ -     Num examples = 5528
05/03/2021 05:39:43 - INFO - __main__ -     Num Epochs = 3
05/03/2021 05:39:43 - INFO - __main__ -     Instantaneous batch size per GPU = 2
05/03/2021 05:39:43 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 2
05/03/2021 05:39:43 - INFO - __main__ -     Gradient A

HBox(children=(FloatProgress(value=0.0, description='Epoch', max=3.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=2764.0, style=ProgressStyle(description_w…






HBox(children=(FloatProgress(value=0.0, description='Iteration', max=2764.0, style=ProgressStyle(description_w…

05/03/2021 05:44:05 - INFO - __main__ -   Saving model checkpoint to output-small-openaigpt/checkpoint-3500
05/03/2021 05:44:10 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-openaigpt/checkpoint-3500





HBox(children=(FloatProgress(value=0.0, description='Iteration', max=2764.0, style=ProgressStyle(description_w…

05/03/2021 05:48:34 - INFO - __main__ -   Saving model checkpoint to output-small-openaigpt/checkpoint-7000
05/03/2021 05:48:38 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-openaigpt/checkpoint-7000
05/03/2021 05:50:16 - INFO - __main__ -    global_step = 8292, average loss = 1.9562153180497772
05/03/2021 05:50:16 - INFO - __main__ -   Saving model checkpoint to output-small-openaigpt






ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.
05/03/2021 05:50:23 - INFO - __main__ -   Evaluate the following checkpoints: ['output-small-openaigpt']
05/03/2021 05:50:28 - INFO - __main__ -   Creating features from dataset file at cached
05/03/2021 05:50:29 - INFO - __main__ -   Saving features into cached file cached/gpt2_cached_lm_512
05/03/2021 05:50:29 - INFO - __main__ -   ***** Running evaluation  *****
05/03/2021 05:50:29 - INFO - __main__ -     Num examples = 615
05/03/2021 05:50:29 - INFO - __main__ -     Batch size = 2


HBox(children=(FloatProgress(value=0.0, description='Evaluating', max=307.0, style=ProgressStyle(description_w…

05/03/2021 05:50:34 - INFO - __main__ -   ***** Eval results  *****
05/03/2021 05:50:34 - INFO - __main__ -     perplexity = tensor(8.3137)





{'perplexity_': tensor(8.3137)}

## Chatting with  Rick

The model is ready, so it's time to chat with Rick. But don't forget that Rick can be rude, I warned you.

A variety of methods can be used in responces generation. You can find more details about these methods by this [link](https://huggingface.co/blog/how-to-generate). 

![alt text](https://media.giphy.com/media/ftl8NiWz2C3tuXOn3w/giphy.gif)

Image from [Giphy](https://giphy.com/)

In [113]:
tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")
model = AutoModelWithLMHead.from_pretrained("output-small-openaigpt")

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:")  , return_tensors='pt')
    # print(new_user_input_ids)

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(
        bot_input_ids, max_length=500,
        pad_token_id=tokenizer.eos_token_id,  
        no_repeat_ngram_size=3,       
        do_sample=True, 
        top_k=100, 
        top_p=0.7,
        temperature = 0.8
    )
    
    # pretty print last ouput tokens from bot
    print("Open AI GPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.


>> User:How are you?
Open AI GPT: hi, i'm mark. nice to meet you. nice meeting you. my name's mark. what's yours? nice to see you, too. hi, my name is sam. hi! i've come to see mrs. jones. §then hello hi how are things going? the usual, i guess. hi. how are your classes? hello, mr. brown. hello! i m mrs. brown, the school secretary. what can i do for you? i'd like to talk to mrs. smith, please. then, what s up? then he turned to leave. good morning, mr black. the class is just starting. and welcome. he walked over to the blackboard and wrote something on it. then he looked up and said, " good morning class. welcome to english literature. before he began, he wrote a note on the blackboard. then, he walked to the front of the class and wrote a few more words on the board. then the class filed in. after he finished, he turned and said to the class, " welcome to class reunion. this is a class of english literature, where we ll discuss the economic economic economic development of china. cl

Input length of input_ids is 501, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


Open AI GPT: 
>> User:hello?


Input length of input_ids is 503, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


Open AI GPT: 
>> User:hi


Input length of input_ids is 504, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


Open AI GPT: 
>> User:hi


Input length of input_ids is 505, but ``max_length`` is set to 500.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


Open AI GPT: 


**User**:How are you, Rick? <br>
**RickBot**: I'm fine. <br>
**User**:Where is Morty? <br>
**RickBot**: He's in a hospital. <br>
**User**:Why? <br>
**RickBot**: Well, that depends on who breaks first -- me or the hospital. I mean, you know, the one that keeps giving me pills. <br>
**User**:I don't understand <br>
**RickBot**: Whoa! Whoa! <br>
**User**:I see, you took your pills <br>
**RickBot**: Hey, I-I'm not proud of this. <br>

I will give an example of a few more dialogues to show how Ricky is our Rick now.

![alt text](https://media.giphy.com/media/eNpXWzGIMRjIo4lXT8/giphy.gif)

Image from [Giphy](https://giphy.com/)

**User**:Who are you? <br>
**RickBot**: A musician, I'm not a musician, but I'm pretty close to you. <br>
**User**:What kind of musician are you? <br>
**RickBot**: Who cares? I'm a musician. <br>
**User**:Ok, play a song for me <br>
**RickBot**: Shut up and listen to me, Morty. Listen to me. Listen! I'm the leader of a bunch of different death worlds, and I'm just trying to assemble a band. I'm gonna make this one-piece, and you're gonna be part of it. <br>
**User**:ok <br>
**RickBot**: Whoa, whoa, hey, who's this? <br>
**User**:This is Morty <br>
**RickBot**: Geez!!! <br>

![alt text](https://i.kym-cdn.com/news/posts/desktop/000/000/247/Screenshot_21.jpg)

Image from [news.knowyourmeme.com](https://news.knowyourmeme.com/news/elon-musks-rick-and-morty-cameo-is-a-meme-reference)

**User**:What do you think about Elon Musk? <br>
**RickBot**: I think about him a lot. <br>
**User**:But what exactly do you think? <br>
**RickBot**: He's not a bad guy. He's just... not a very charismatic one. <br>
**User**:I think he is charismatic <br>
**RickBot**: He is. <br>
**User**:What else you can say about him? <br>
**RickBot**: You don't have to say anything. He is the leader of your class. <br>
**User**:ok, thank you <br>
**RickBot**: !!!

![alt text](https://media.giphy.com/media/YSTmV6MWEkgW0HblZQ/giphy.gif)

Image from [Giphy](https://giphy.com/)

**User**:hey, go get me a burger <br>
RickBot: Whoa, whoa, hold on a second, let's not get crazy. This is gonna be good. <br>
**User**:I'm hungry! c'mon, buy me a burger or better two <br>
**RickBot**: Holy crap. <br>
**User**:no crappy burger, a good one! <br>
**RickBot**: What the hell? <br>
**User**:don't be so bad, feed me <br>
**RickBot**: I'll be back as fast as I can, Morty. <br>
**User**:good boy, and don't forget ketchup and fries <br>
**RickBot**: !!!!!! <br>
**User**:what a hell? can you reply with words? <br>
**RickBot**: Yes!! I will!! We will go out on a date. <br>

## Conclusion

![alt text](https://media.giphy.com/media/YQBWwGLJtFDQIATck5/giphy.gif)

Image from [Giphy](https://giphy.com/)

Congratulations! Our virtual Rick is alive (almost)! With the help of fine-tuning our model on a small dataset, we were able to create a virtual character with whom we can conduct interesting dialogs. 

Using the proposed approach you can create many interesting virtual characters based on an arbitrary dialogs dataset (just a csv file with replicas, one replica per line).

In [65]:
from transformers import BertTokenizer, BertLMHeadModel
import torch


tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
config = BertConfig.from_pretrained("bert-base-uncased")
config.is_decoder = True
model = BertLMHeadModel.from_pretrained('bert-base-uncased', config=config)
# tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# model = BertModel.from_pretrained('bert-base-uncased')

# encode the new user input, add the eos_token and return a tensor in Pytorch
new_user_input_ids = tokenizer.encode(input(">> User:"), return_tensors='pt')
    # print(new_user_input_ids)

    # append the new user input tokens to the chat history

# inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
# outputs = model(**inputs)

bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

# generated a response while limiting the total chat history to 1000 tokens, 
chat_history_ids = model.generate(
        bot_input_ids, max_length=500,
        pad_token_id=tokenizer.eos_token_id,  
        no_repeat_ngram_size=3,       
        do_sample=True, 
        top_k=100, 
        top_p=0.7,
        temperature = 0.8)

# chat_history_ids = model.generate(
#         new_user_input_ids)
    
    # pretty print last ouput tokens from bot
print("GPT2: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


>> User:hello!
GPT2: ... "....... and and and.. '..?..,,, and and,.. that.. a.. on.. ).. (.. like.. ;.. /.. -.. [.. i.. is.. in.. ].. it.. the.. than.. to.. with.. as.. being being being.. over.. but..!.. of of.. by.. from.. there.. s..n.. just.. at.. for.. t.. p.. more.. etc..m.. art.. m.. 1.. making.. an.. was.. you.. dot.. well.. during.. be.. way.. b.. he.. k.. building.. who.. thing.. up.. because because.. high.. right.. this.. g.. flat.. believe.. not.. been.. let.. us..s.. if if if.. has.. or.. she.. net.. so.. now.. when.. made.. w.. have.. um.. know.. him.. band.. day.. which.. into.. my.. having.. out.. light.. would.. will.. man.. girl.. watch.. en.. movie.. kind.. make.. show.. her.. good.. off off.. something.. until.. d.. word.. ki.. me.. new..g.. other.. system.. first.. then..t..d.. before before before.. after.. all.. symbol.. work.. one.. standard.. time.. mini.. c.. vi..l.. mark.. white.. door.. _.. =.. where.. best.. brother.. starting.. e.. v.. mag.. king.. his.. 

In [51]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model.generate(inputs)

# last_hidden_states = outputs.last_hidden_state

AttributeError: ignored

In [50]:
print(last_hidden_states)

# tokenizer.decode(outputs)

tensor([[[-0.1144,  0.1937,  0.1250,  ..., -0.3827,  0.2107,  0.5407],
         [ 0.5308,  0.3207,  0.3665,  ..., -0.0036,  0.7579,  0.0388],
         [-0.4877,  0.8849,  0.4256,  ..., -0.6976,  0.4458,  0.1231],
         ...,
         [-0.7003, -0.1815,  0.3297,  ..., -0.4838,  0.0680,  0.8901],
         [-1.0355, -0.2567, -0.0317,  ...,  0.3197,  0.3999,  0.1795],
         [ 0.6080,  0.2610, -0.3131,  ...,  0.0311, -0.6283, -0.1994]]],
       grad_fn=<NativeLayerNormBackward>)


In [52]:
from transformers import BertTokenizer, BertLMHeadModel, BertConfig
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
config = BertConfig.from_pretrained("bert-base-cased")
config.is_decoder = True
model = BertLMHeadModel.from_pretrained('bert-base-cased', config=config)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

prediction_logits = outputs.logits

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=213450.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=29.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435797.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=435779157.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-cased were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [55]:
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model.generate(**inputs)

In [56]:
outputs

tensor([[  101,  8667,   117,  1139,  3676,  1110, 10509,   102,   119,   119,
           119,   119,   119,   119,   119,   119,   119,   119,   119,   119]])

In [58]:
tokenizer.decode([outputs])

TypeError: ignored