Student: Dorin Doncenco

In [1]:
# ! pip install datasets pandas matplotlib scikit-learn transformers rouge evaluate tqdm

In [1]:
from tqdm.notebook import trange, tqdm # The progress bar

import torch # DeepLearning Framework
from torch import optim
from torch import nn
from torch.utils.data import Dataset, DataLoader

import numpy as np
import pandas as pd

from transformers import AutoTokenizer, AutoModelForCausalLM # Model repository
from datasets import load_dataset # Dataset Repository

# Generation for task oriented chatbot

<img src="media/dialogue_patient.png" style="width: 400px;"/></div>
The objective of this small project is to devellop a small chatbot using information of the corpus

## I. Getting started : Try a naive generative model
<div style={width:10%}> In this first part we will try a naive model and "play" with this model. The model is a simple transformer (based on gpt2 model), it's objective given a user query to answer it in natural language.</div><div><img src="media/transformer-block.png" alt="transformer architecture" style="width: 400px;"/></div>


**Let's start to load the model :**

In [3]:
model = AutoModelForCausalLM.from_pretrained("ThomasGerald/wozchitchat")
tokenizer = AutoTokenizer.from_pretrained("ThomasGerald/wozchitchat")

Now we can generate from an input text with the model (try your own input) : 

In [4]:
text = "I would" # input text
tokenized_text = tokenizer(text, return_tensors='pt') # we tokenize the text
generated_token_ids = model.generate(**tokenized_text, do_sample=True,
                                     max_length=200, pad_token_id=model.config.eos_token_id) # we generate the text (sampled)
print(f'GENERATED_TEXT : {tokenizer.decode(generated_token_ids[0])}')

GENERATED_TEXT : I would like a place located on the west side please.[BOT]There are no restaurants matching your criteria. Would you like me to try a different type of food?<|endoftext|>


Notice that the model as been ``Adapted'' using the following format :

**[USER]{user_input}[BOT]{answer_of_the_system}**

The model was trained to generate **{answer_of_the_system}**

### I.1 : Create a interactive interface following the previous format

Modify the following class to make an interactive chatbot using the previous model

In [5]:
class InteractiveChat(object):
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer

    def answer(self, current_input):
        ''' return the answer of the chatbot
        '''
        if tokenizer.pad_token is None:
            tokenizer.pad_token = tokenizer.eos_token
        output = self.model.generate(self.tokenizer(current_input, return_tensors='pt').input_ids, max_length=200, do_sample=True, pad_token_id=tokenizer.pad_token_id)
        return self.tokenizer.decode(output[0])

    def start(self):
        current_answer = "Start dialogue"
        current_input = ""
        while(current_input != 'exit'):
            current_input = input("Bot: "+current_answer + " \nUser: ")
            current_answer = self.answer(current_input)
            print("_"*50)
            print("Bot: "+current_answer)
            print("_"*50)

In [6]:
ichat = InteractiveChat(model, tokenizer)
ichat.start() # type exit if you want to stop the conversation

__________________________________________________
Bot: exitUSER]I would like to book a taxi, please.[BOT]Sure, there are many results. What day do you want to leave to and what time are you like?<|endoftext|>
__________________________________________________


You should obtain a dialogue as following (not exactly the same)
```
User:  I'm looking for an hotel in center of cambridge for tonight
Bot: Might I suggest the the University Arms Hotel, is rated 4 stars and has an excellent reputation and is rated 3 stars. 
User:  How much is it?
Bot: The price range isn't listed. Is there another type of cuisine you might like?
```
However all answer are not relevant !!! 

**Let consider in the following evaluating the model**

## II.The MULTIWoZ corpus

The Multi-domain Wizard-of-Oz (MultiWOZ) dataset is a large-scale human-human conversational corpus spanning over seven domains, containing 8438 multi-turn dialogues, with each dialogue averaging 14 turns. Different from existing standard datasets like WOZ and DSTC2, which contain less than 10 slots and only a few hundred values, MultiWOZ has 30 (domain, slot) pairs and over 4,500 possible values. The dialogues span seven domains: restaurant, hotel, attraction, taxi, train, hospital and police. 

### Objective 
* Looking at the data ([lik-here](https://github.com/budzianowski/multiwoz) for original repository)
* Evaluate the generative model
* Discuss what are missing for a complete chatbot
* Improving the generation : notice for this last part you are free to use any model you can run

In [7]:
# woz_dataset
woz_dataset = load_dataset("multi_woz_v22")
training_set = woz_dataset['train']
validation_set = woz_dataset['validation']
test_set = woz_dataset['test']

In [8]:
training_set[0]

{'dialogue_id': 'PMUL4398.json',
 'services': ['restaurant', 'hotel'],
 'turns': {'turn_id': ['0',
   '1',
   '2',
   '3',
   '4',
   '5',
   '6',
   '7',
   '8',
   '9',
   '10',
   '11'],
  'speaker': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
  'utterance': ['i need a place to dine in the center thats expensive',
   'I have several options for you; do you prefer African, Asian, or British food?',
   'Any sort of food would be fine, as long as it is a bit expensive. Could I get the phone number for your recommendation?',
   'There is an Afrian place named Bedouin in the centre. How does that sound?',
   'Sounds good, could I get that phone number? Also, could you recommend me an expensive hotel?',
   "Bedouin's phone is 01223367660. As far as hotels go, I recommend the University Arms Hotel in the center of town.",
   'Yes. Can you book it for me?',
   'Sure, when would you like that reservation?',
   'i want to book it for 2 people and 2 nights starting from saturday.',
   'Your booking 

In [9]:
training_set[10]['turns']['utterance']

['Yeah, could you recommend a good gastropub?',
 "Backstreet Bistro. It's expensive though. There is a moderately priced one called The Cow Pizza Kitchen and Bar if preferred.",
 'I would like to book a table at the Backstreet Bistro for 5 people at 16:00 on Thursday.',
 'No problem. That is booked for you and your reference number is 2VE84YC5 . Is there anything else I can book for you?',
 'Yes, any suggestions of museums found in the east area of town?',
 'Yes, Gallery at Twelve a High Street is excellent, and has free admission. Would you like more information?',
 'Can I please have the phone number and address for that place?',
 'Certainly! The address for gallery at twelve a high street is fulbourn and the phone number is 01223295264. Can I help you with anything else?',
 "Great that's all the information I needed today, thank you!",
 'You are very welcome. Have a nice day. Good bye.']

### II.1 Get all the tuple of the test set 

Create a Dataframe with two columns, one containing the column of the user query and the other containing the bot answer

In [10]:
user_query = []
bot_answer = []

# create a dataframe with two columns : user_query and bot_answer
for i in range(len(test_set)):
    for j in range(0, len(test_set[i]['turns']['utterance']), 2):
        try:
            user_query.append(test_set[i]['turns']['utterance'][j])
            bot_answer.append(test_set[i]['turns']['utterance'][j+1])
        except:
            raise Exception("It might be that the amount of user queries and bot answers are not the same.")

df = pd.DataFrame({'user_query': user_query, 'bot_answer':bot_answer})

In [11]:
df

Unnamed: 0,user_query,bot_answer
0,I need train reservations from norwich to camb...,I have 133 trains matching your request. Is th...
1,I'd like to leave on Monday and arrive by 18:00.,There are 12 trains for the day and time you r...
2,"Before booking, I would also like to know the ...",There are 12 trains meeting your needs with th...
3,No hold off on booking for now. Can you help m...,Yes it is a cinema located in the south part o...
4,"Yes, that was all I needed. Thank you very much!",Thank you for using our system.
...,...,...
7367,"A swimming pool sounds like much more fun, doe...","There are four pools, abbey pool and astroturf..."
7368,Any one of those is fine. May I get the entran...,"I'm, sorry, but the entrance fee is not listed..."
7369,"Yes. I am also looking for a train, leaving on...",TR5648 will be departing cambridge Friday at 1...
7370,That will work. Can I have this booking for si...,I've booked that. Your reference number is 0IC...


### II.2 Generate the different output for user query
Select the 50 first lines (if you get access to gpus you can try to generate all answers) and generates from user_query a bot answer

In [12]:
df_top_50 = df.head(50)

#generate the bot answers
bot_answers = []
for i in range(len(df_top_50)):
    bot_answers.append(ichat.answer(df_top_50['user_query'][i]))

In [13]:
# extract only the text from the bot answers
bot_answers_trimmed = [bot_answer.split("[BOT]")[1][:-13] if len(bot_answer.split("[BOT]"))>1 else '' for bot_answer in bot_answers ]

In [14]:
#print 10 queries and answers and expected answers
for i in range(5):
    print(f"Query: {df_top_50['user_query'][i]}")
    print(f"Expected answer: {df_top_50['bot_answer'][i]}")
    print(f"Generated answer: {bot_answers_trimmed[i]}")
    print("_"*50)

Query: I need train reservations from norwich to cambridge
Expected answer: I have 133 trains matching your request. Is there a specific day and time you would like to travel?
Generated answer: the TR6065 leaves at 2:50 and gets to cambridge at 2:07. does that work?
__________________________________________________
Query: I'd like to leave on Monday and arrive by 18:00.
Expected answer: There are 12 trains for the day and time you request. Would you like to book it now?
Generated answer: I have the cambridge passenger byard park and the hotel booked for you for that reservation.
__________________________________________________
Query: Before booking, I would also like to know the travel time, price, and departure time please.
Expected answer: There are 12 trains meeting your needs with the first leaving at 05:16 and the last one leaving at 16:16. Do you want to book one of these?
Generated answer: The first train leaves at 8:24, the price is 30.24 pounds. The travel time is 105 minut

### II.3 Evaluate the performance of the system
You can now evaluate the performance of the systems on the generated sample you get. **You will try two metrics :**
* A First approach base on common words between the ground truth and the generation
* You are free to chose the second approach (BERTScore, ROUGE, BLEU)

In [15]:
import evaluate

In [16]:
sets_words_predicted = []
sets_words_real = []

for i in range(len(df_top_50)):
    try:
        sets_words_predicted.append(set(bot_answers_trimmed[i].split(" ")))
    except:
        # the bot answer is empty
        sets_words_predicted.append(set())
    sets_words_real.append(set(df_top_50['bot_answer'][i].split(" ")))


In [17]:
# compute percentage of common words between the predicted and the real answer for each query
# this approach is vulnerable to a model that outputs the entire corpus as an answer
common_percentage = []
for i in range(len(df_top_50)):
    common_percentage.append(len(sets_words_predicted[i].intersection(sets_words_real[i]))/len(sets_words_real[i]))


In [18]:
common_average = sum(common_percentage)/len(common_percentage)
print(f"The percentage of common words for all queries, on average, is: {common_average}")

The percentage of common words for all queries, on average, is: 0.190193836389502


In [19]:
# use rouge score to evaluate the model's answers
rouge = evaluate.load('rouge')

rouge_score = rouge.compute(predictions=bot_answers_trimmed, references=df_top_50['bot_answer'])

print(f"The rouge score is: {rouge_score}")

The rouge score is: {'rouge1': 0.203994287302528, 'rouge2': 0.06592189857370122, 'rougeL': 0.17681741319591132, 'rougeLsum': 0.17521746242719677}


## III. Improving performances

**It is now up to you to improve the following model !!!**
* You are free to choose any architecture/model (even pretrained one to improve performances)
* You can add additional information in the input of the model
* You will find in the annex how the model has been trained !!!


# ANNEXE : Training/Fine-Tuning Material

In [1]:
from torch.utils.data import Dataset

class WoZGenerationDataset:
    def __init__(self, dataset, window_size=3):
        self.dataset = dataset
        self.window_size = window_size
        self.index = []
        for i, dial in enumerate(dataset):
            for j, speaker in enumerate(dial['turns']['speaker']):
                if speaker == 1:
                    self.index.append((i,j))
    def __len__(self):
        return len(self.index)

    def __getitem__(self, index):
        i, j = self.index[index]
        dial = self.dataset[i]['turns']['utterance']

        turns = dial[j-1] if(j!= 0) else ''
        answer = dial[j]
        return {'turns': turns,
                'answer': answer}



In [2]:
from tqdm.notebook import trange, tqdm # The progress bar

import torch # DeepLearning Framework
from torch import optim
from torch import nn
from torch.utils.data import Dataset, DataLoader

import numpy as np
import pandas as pd

from transformers import AutoTokenizer, AutoModelForCausalLM # Model repository
from datasets import load_dataset # Dataset Repository

In [3]:
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
tokenizer.add_special_tokens({'pad_token': '<|endoftext|>'})
model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
model.resize_token_embeddings(len(tokenizer))

Embedding(50257, 768)

## Implement the dataset module

Create an object having as parent `torch.utils.data.dataset` implementing that return previous turn and answer of the dataset.

In [4]:
class DialogueCollator(Dataset):
    def __init__(self, tokenizer):
        self.tokenizer = tokenizer
    def __call__(self, data):
        input_tokens = self.tokenizer(['[USER]' + d['turns'] + "[BOT]" + d['answer'] for d in data],
                                 return_tensors='pt', return_length=True, padding=True)
        return {
            'input_ids': input_tokens.input_ids,
            'attention_mask': input_tokens.attention_mask
        }


In [5]:
from tqdm.notebook import trange, tqdm
from torch import optim
from torch import nn


class Trainer():
    def __init__(self, model, padding_idx=100):
        self.model = model
        self.optimizer = None

    def at_training_start(self, learning_rate = 1e-3):
        self.optimizer = optim.Adam(self.model.parameters(), lr=learning_rate)
        self.criterion = nn.CrossEntropyLoss(ignore_index=50257)

    def validation_step(self, data):
        y_pred = self.model(**data)
        y_truth = data["input_ids"][:, 1:].flatten()

        with torch.no_grad():
            loss_reconstruction = self.criterion(y_pred.logits[:,:-1].reshape(y_truth.shape[0], -1), y_truth)
        return loss_reconstruction.item()

    def training_step(self, data):
        print("computing y_pred")
        y_pred = self.model(**data)
        print("computed y_pred")
        y_truth = data["input_ids"][:, 1:].flatten()
        print("computed y_truth")
        loss_reconstruction = self.criterion(y_pred.logits[:,:-1].reshape(y_truth.shape[0], -1), y_truth)
        print("computed loss")
        (loss_reconstruction).backward()
        print("backpropagated")
        return loss_reconstruction.item()

    def on_validation_end(self, resp):
        print(f"Validation loss is {resp}")

    def validation(self, validation_dl):
        self.model.eval()
        loss_buffer = []
        with torch.no_grad():
            for data in validation_dl:
                loss_buffer.append(self.validation_step(data))
        self.on_validation_end(np.mean(loss_buffer))
        self.model.train()

    def fit(self,
            training_dl,
            validation_dl,
            learning_rate = 1e-3,
            validation_frequency = 8,
            max_iter = 10000,
            use_gpu=False,

        ):
        if(use_gpu):
          self.model = self.model.cuda()
          print("using gpu")
        self.at_training_start(learning_rate)
        iter_count = 0
        loss_buffer = []
        pbar = trange(max_iter)

        step = 0
        while(iter_count < max_iter):
            for data in training_dl:
                if use_gpu:
                    data = {k:v.cuda() for k, v in data.items()}
                    print("used gpu")
                self.optimizer.zero_grad()
                print("zeroed grad")
                loss_buffer += [self.training_step(data)]
                print("added loss to buffer")
                self.optimizer.step()
                print("step = %d"%step)
                step += 1
                if step > 100:
                    return

                if(iter_count  % validation_frequency == 0):
                    print("Loss at iteration %s is %s"%(iter_count, np.mean(loss_buffer)))
                    self.validation(validation_dl)
                    loss_buffer = []
                iter_count += 1
                pbar.update(1)
                if(iter_count >= max_iter):
                  break

In [6]:
dataset = load_dataset("multi_woz_v22")

training_set = WoZGenerationDataset(dataset['train'])
collator = DialogueCollator(tokenizer)
training_dl = DataLoader(training_set, batch_size=32, shuffle=True, collate_fn=collator, num_workers=1)

In [17]:
for i, data in enumerate(training_dl):
    print("data at point %d" %i)
    print(data)
    if i > 3:
        break

## I am not sure why the dataloader is not working, but it seems I can not iterate over it.

In [13]:
use_gpu = False
my_trainer = Trainer(model)
my_trainer.fit(training_dl, None, validation_frequency=250, use_gpu=use_gpu, max_iter=1000)

using gpu


  0%|          | 0/1000 [00:00<?, ?it/s]

In [21]:
class Chatbot(object):
  def __init__(self):
    pass

  def answer(self, current_input):
    return "Not Implemented"

  def start(self):
    current_answer = "Start dialogue"
    current_input = ""
    while(current_input != 'exit'):
      current_input = input("Bot: "+current_answer + " \nUser: ")
      current_answer = self.answer(current_input)

class ChitChat(Chatbot):
  def __init__(self, model, tokenizer, collator, history_len = 1):
    self.model = model
    self.tokenizer = tokenizer
    self.utterance = []
    self.hlen = history_len

  def answer(self, current_input):
    self.utterance.append('[USER]'+current_input)
    tokenized_text = self.tokenizer(''.join(self.utterance[max(0, len(self.utterance) - self.hlen): ]), return_tensors='pt')
    generated_token_ids = self.model.generate(**tokenized_text, do_sample=True, max_length=200, pad_token_id=model.config.eos_token_id)[0]
    answer = self.tokenizer.decode(generated_token_ids).split('[BOT]')[-1][:-len('<|endoftext|>')]
    self.utterance.append('[BOT]'+answer)
    return answer


In [22]:
cb = ChitChat(model.cpu(), tokenizer, collator, history_len=1)

In [43]:
model = AutoModelForCausalLM.from_pretrained("ThomasGerald/wozchitchat")
tokenizer = AutoTokenizer.from_pretrained("ThomasGerald/wozchitchat")

In [47]:
class ChitChat(Chatbot):
  def __init__(self, model, tokenizer, history_len = 1):
    self.model = model
    self.tokenizer = tokenizer
    self.utterance = []
    self.hlen = history_len

  def answer(self, current_input):
    self.utterance.append('[USER]'+current_input)
    tokenized_text = self.tokenizer(''.join(self.utterance[max(0, len(self.utterance) - self.hlen): ]), return_tensors='pt')
    generated_token_ids = self.model.generate(**tokenized_text, do_sample=True, max_length=200, pad_token_id=model.config.eos_token_id)[0]
    answer = self.tokenizer.decode(generated_token_ids).split('[BOT]')[-1][:-len('<|endoftext|>')].split('[USER]')[0]
    self.utterance.append('[BOT]'+answer)
    return answer

In [50]:
cb = ChitChat(model.cpu(), tokenizer, history_len=1)

In [51]:
cb.start()

Bot: Start dialogue 
User:  I'm looking for an hotel in center of cambridge for tonight
Bot: Might I suggest the the University Arms Hotel, is rated 4 stars and has an excellent reputation and is rated 3 stars. 
User:  How much is it?
Bot: The price range isn't listed. Is there another type of cuisine you might like? 
User:  exit
