## Using Pre-Trained Language Model For Language Generation (17 pt)

This task is to understand input and output of a large scale pre-trained generative language model. 
Then, use it as an encoder-decoder chatbot described on Chapter 26, Section 26.2.2 (Jurafsky and Martin, 2019).
https://web.stanford.edu/~jurafsky/slp3/26.pdf

We will test DialoGPT, a large scale generative language model, pre-trained on conversational responces. 
For extended readings the demo paper describes the model here:
https://www.aclweb.org/anthology/2020.acl-demos.30/


In [1]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# this will download the pre-trained language model:
tokenizer = GPT2Tokenizer.from_pretrained('microsoft/DialoGPT-small')
model = GPT2LMHeadModel.from_pretrained('microsoft/DialoGPT-small')

PyTorch version 1.7.0 available.
TensorFlow version 2.3.1 available.


### Understand inputs and outputs

In [2]:
# The tokenization is based on a model called Byte-Pair Encoding (BPE)
# This method avoid any out-of-vocabulary (OOV) situations.
print(f"GPT2 has {tokenizer.vocab_size:,} sub-word units for tokenization")

GPT2 has 50,257 sub-word units for tokenization


In [3]:
# Each token is translated into an index number in vocabulary.
inputs = tokenizer("Here is a text to check out the input", return_tensors="pt", )
# 'return_tensors="pt"' means the type of output must be PyTorch Tensor.

# Model only accepts batches of inputs.
print("The shape of the input tensor:")
print(inputs['input_ids'].shape)
# the first dimension is the number of instances in the batch, 
# the second number is the number of tokens in each instance.

print("Token ids:", inputs['input_ids'])
# We also need to specify if any token is masked:
print("None of tokens are masked:", inputs['attention_mask']) # 1: not masked, 0: masked

The shape of the input tensor:
torch.Size([1, 9])
Token ids: tensor([[4342,  318,  257, 2420,  284, 2198,  503,  262, 5128]])
None of tokens are masked: tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])


In [4]:
# the tokens 
print([tokenizer.decoder[idx] for idx in inputs['input_ids'][0].tolist()])
print(tokenizer.decode(inputs['input_ids'][0]))

['Here', 'Ġis', 'Ġa', 'Ġtext', 'Ġto', 'Ġcheck', 'Ġout', 'Ġthe', 'Ġinput']
Here is a text to check out the input


In [5]:
# The forwar method in the model, produces logits (scores before Softmax) of the next word
outputs = model.forward(
    input_ids=inputs['input_ids'],
    attention_mask=inputs['attention_mask'],
)
logits = outputs.logits

In [6]:
logits.shape

torch.Size([1, 9, 50257])

In [7]:
# If you pass "labels", then it produces the negative log-likelihoos loss of the predictions too:
outputs = model.forward(
    input_ids=inputs['input_ids'],
    attention_mask=inputs['attention_mask'],
    labels=inputs['input_ids'], # outputs
)
logits = outputs.logits
loss = outputs.loss

# the gradient of loss with respect to each parameter was used for training the model.
# loss.grad_fn keeps the gradient function.
print(loss)

tensor(4.6597, grad_fn=<NllLossBackward>)


### Use the model for generation

Starting from one sentence, one can use the model to generate the most likely tokens according to the model:

In [8]:
# if you start from a sentence bellow:
input_utterance = "What a good day?"
generated = tokenizer.encode(input_utterance + tokenizer.eos_token)
context = torch.tensor([generated])
past = None

for i in range(100):
    output = model.forward(context, past_key_values=past)
    past = output.past_key_values
    logits = output.logits
    
    # choose the most likely next token:
    token = torch.argmax(logits[..., -1, :])
    
    # add it to the generated sentence
    generated += [token.tolist()]
    context = token.unsqueeze(0)
    
    # stop, if the generated token is the end token! 
    if token == tokenizer.eos_token_id:
        break

sequence = tokenizer.decode(generated)

print(sequence)

What a good day?<|endoftext|>I'm not sure what you're trying to say.<|endoftext|>


#### Beam search for better output 

Instead of the greedy algorithm implemented above, *beam search* could be used to find a more likely sequence. The generation algorithm with `num_beams=1` is equivalant to the greedy algorithm. Larger number of beam takes more time for search. 

In [9]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode(input_utterance + tokenizer.eos_token, return_tensors='pt') 

# generate text until either reaches the end token, or the number of tokens reaches max_length.
output = model.generate(
    input_ids,
    max_length=1000,
    num_beams=4,
    pad_token_id=tokenizer.eos_token_id
)

# you can structure input/output:
print(">> User:", input_utterance)
print(">> Bot:", tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens=True))

>> User: What a good day?
>> Bot: What a great day!


Sometimes, increasing the beam size and finding a more likely sequence is not going to produce a better sentence.

In [10]:
input_ids = tokenizer.encode("Have you seen my cat?" + tokenizer.eos_token, return_tensors='pt') 
output1 = model.generate(
    input_ids,
    max_length=1000,
    num_beams=1,
    pad_token_id=tokenizer.eos_token_id
)
# you can structure input/output:
print(">> User:", "Have you seen my cat?")
print(">> Bot1 (num_beam=1):", tokenizer.decode(output1[0, input_ids.shape[-1]:], skip_special_tokens=True))

output2 = model.generate(
    input_ids,
    max_length=1000,
    num_beams=2,
    pad_token_id=tokenizer.eos_token_id
)
print(">> Bot2 (num_beam=2):", tokenizer.decode(output2[0, input_ids.shape[-1]:], skip_special_tokens=True))

output2 = model.generate(
    input_ids,
    max_length=1000,
    num_beams=3,
    pad_token_id=tokenizer.eos_token_id
)
print(">> Bot2 (num_beam=3):", tokenizer.decode(output2[0, input_ids.shape[-1]:], skip_special_tokens=True))

output2 = model.generate(
    input_ids,
    max_length=1000,
    num_beams=4,
    pad_token_id=tokenizer.eos_token_id
)

print(">> Bot2 (num_beam=4):", tokenizer.decode(output2[0, input_ids.shape[-1]:], skip_special_tokens=True))

>> User: Have you seen my cat?
>> Bot1 (num_beam=1): I have. He's a good boy.
>> Bot2 (num_beam=2): No, but I have seen your cat.
>> Bot2 (num_beam=3): Have you seen my cat?
>> Bot2 (num_beam=4): Have you seen my cat?


You can also pass a history of interactions to the model:

In [11]:
history = [
    tokenizer.encode("Hello!" + tokenizer.eos_token, return_tensors='pt'), # user
    tokenizer.encode("Hi!" + tokenizer.eos_token, return_tensors='pt'), # bot
    tokenizer.encode("What a good day!" + tokenizer.eos_token, return_tensors='pt'), # user
]

bot_input_ids = torch.cat(history, dim=-1)
bot_output_ids = model.generate(bot_input_ids, max_length=1000, num_beams=1, pad_token_id=tokenizer.eos_token_id)
last_output_ids = bot_output_ids[:, bot_input_ids.shape[-1]:]

print(">> Bot: {}".format(tokenizer.decode(last_output_ids[0], skip_special_tokens=True)))
    

>> Bot: What a lovely day!


## Implement it similar to Eliza in NLTK

Eliza is a rule-based chatbot described in Chapter 26, section 26.2.1. J&M (2019).
In Eliza, each user input is matched with a regular expression. If there is no specific pattern for an input it matches with `(.*)` which there are limited number of responces for it:

```
r"(.*)",
(
    "Please tell me more.",
    "Let's change focus a bit... Tell me about your family.",
    "Can you elaborate on that?",
    "Why do you say that %1?",
    "I see.",
    "Very interesting.",
    "%1.",
    "I see.  And what does that tell you?",
    "How does that make you feel?",
    "How do you feel when you say that?",
)
```
Source here: https://www.nltk.org/_modules/nltk/chat/eliza.html

Modify the code below to use DialoGPT instead of regular expression pattern matching. 

In [12]:
from nltk.chat.util import Chat

class DialoGPTChatbot(Chat):
    def __init__(self):
        # we don't need the pattern matching pairs. 
        # however, it is useful to have some rules about finishing the conversation.
        # the quit responces are from Eliza chatbot. 
        super().__init__([(
            r"quit",
            (
                "Thank you for talking with me.",
                "Good-bye.",
                "Thank you, that will be $150.  Have a good day!",
            ),
        ),], {})

    def respond(self, str):
        # regular expression pattern recognition
        resp_org = super().respond(str)
        
        if resp_org is None:
            #
            # code here to generate response from DialoGPT
            #
            resp = "text generated by DialoGPT"
        else:
            resp = resp_org
        return resp
    
    # Hold a conversation with a chatbot
    def converse(self, quit="quit"):
        # change the code below to if you want to keep few more step of chat history:
        user_input = ""
        while user_input != quit:
            user_input = quit
            try:
                user_input = input(">")
            except EOFError:
                print(user_input)
            if user_input:
                while user_input[-1] in "!.":
                    user_input = user_input[:-1]
                print(self.respond(user_input))

### Test the chatbot:

In [None]:
dg =DialoGPTChatbot()
dg.converse()

The final step is to wrap this in a stand alone python (instead of Notebook) to be able to run on terminal.