# Chatbots
In this notebook, the following chatbots will be tested and compared:



*   Microsofts DialogueGPT (Medium)
*   Microsoft DialogueGPT (Large)
*   Facebooks Blenderbot (1B)
*   Facebooks Blenderbot(400M)
*   Facebooks Blenderbot(90M)

For this test, we will test the response of the three chatbots. The test will determine the best chatbot based on the response time, performance (will it return a different answer or will it get stuck in a loop), and the complexity of the answers it returns. For this project we don't want to have a to complicated answer, because our own emotion response will need to be added to the chabots response.



Import necessary libraries

In [1]:
!pip install transformers
!pip install torch
!pip install openai
!pip install tqdm



In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration, BlenderbotSmallForConditionalGeneration
from tqdm import tqdm
import torch

Import the models

In [3]:
dialo_m_model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

In [4]:
dialo_m_tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium", padding_side='left')

In [6]:
for step in tqdm(range(len(texts)), 0):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    print("User: ", texts[step])
    new_user_input_ids = dialo_m_tokenizer.encode(texts[step] + dialo_m_tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens,
    chat_history_ids = dialo_m_model.generate(bot_input_ids, max_length=1000, pad_token_id=dialo_m_tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(dialo_m_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

  0%|          | 0/10 [00:00<?, ?it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


User:  Hello, my name is Brandon


 10%|█         | 1/10 [00:01<00:16,  1.81s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: Hello Brandon, my name is Brandon.
User:  How are you doing


 20%|██        | 2/10 [00:03<00:13,  1.69s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I'm doing well, how are you?
User:  I am very angry


 30%|███       | 3/10 [00:04<00:09,  1.41s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I'm sorry
User:  Do you like cake


 40%|████      | 4/10 [00:05<00:07,  1.28s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I do
User:  I am from the Netherlands


 50%|█████     | 5/10 [00:07<00:06,  1.37s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am from the Netherlands
User:  Are you a doctor


 60%|██████    | 6/10 [00:08<00:05,  1.48s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am a doctor
User:  I am happy with your response


 70%|███████   | 7/10 [00:10<00:04,  1.64s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am happy with your response
User:  What is 2 + 2


 80%|████████  | 8/10 [00:13<00:03,  1.85s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am happy with your response
User:  I like turtles


 90%|█████████ | 9/10 [00:15<00:02,  2.04s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am happy with your response
User:  Do you recognize emotions


100%|██████████| 10/10 [00:18<00:00,  1.82s/it]

DialoGPT: I am happy with your response





In [5]:
dialo_l_tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large", padding_side='left')

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [6]:
dialo_l_model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")

Downloading pytorch_model.bin:   0%|          | 0.00/1.75G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [7]:
blender_1_tokenizer = BlenderbotTokenizer.from_pretrained("facebook/blenderbot-1B-distill")

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/127k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/62.9k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

In [None]:
blender_1_model = BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-1B-distill")

Downloading pytorch_model.bin:   0%|          | 0.00/2.87G [00:00<?, ?B/s]

In [None]:
blender_4_tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")

In [None]:
blender_4_model = BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-400M-distill")

In [None]:
s_model = BlenderbotSmallForConditionalGeneration.from_pretrained("facebook/blenderbot_small-90M", add_cross_attention=False)
s_tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot_small-90M")

For this test we will use 10 simple sentence consisting of simple introductions, a sign of emotion or a question, to see how the chatbots will response.

In [5]:
texts = [
    "Hello, my name is Brandon",
    "How are you doing",
    "I am very angry",
    "Do you like cake",
    "I am from the Netherlands",
    "Are you a doctor",
    "I am happy with your response",
    "What is 2 + 2",
    "I like turtles",
    "Do you recognize emotions"
]

The dialogue GPT Medium model

According to this [paper](https://arxiv.org/pdf/1911.00536.pdf), the chatbot should give a better response. However it appears to get stuck into a loop at some point. Also after the second sentence it appears to just return the users input in a slightly different or exact same way. The warning for the decoder seems to be a error, but even by initializing the tokenizers padding_side, it keeps giving this warning.

Dialogue GPT Large model

In [None]:
for step in tqdm(range(len(texts)), 0):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    print("User: ", texts[step])
    new_user_input_ids = dialo_l_tokenizer.encode(texts[step] + dialo_l_tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens,
    chat_history_ids = dialo_l_model.generate(bot_input_ids, max_length=1000, pad_token_id=dialo_l_tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(dialo_l_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

  0%|          | 0/10 [00:00<?, ?it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


User:  Hello, my name is Brandon


 10%|█         | 1/10 [00:10<01:32, 10.28s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: Hi Brandon
User:  How are you doing


 20%|██        | 2/10 [00:12<00:45,  5.65s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I'm doing great, thank you for asking.
User:  I am very angry


 30%|███       | 3/10 [00:15<00:28,  4.14s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am very angry at you
User:  Do you like cake


 40%|████      | 4/10 [00:16<00:19,  3.21s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I like cake
User:  I am from the Netherlands


 50%|█████     | 5/10 [00:19<00:15,  3.07s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am from the Netherlands and I like cake
User:  Are you a doctor


 60%|██████    | 6/10 [00:22<00:12,  3.03s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am a doctor and I like cake
User:  I am happy with your response


 70%|███████   | 7/10 [00:25<00:08,  2.98s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am happy with your response
User:  What is 2 + 2


 80%|████████  | 8/10 [00:29<00:06,  3.21s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am happy with your response
User:  I like turtles


 90%|█████████ | 9/10 [00:32<00:03,  3.27s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


DialoGPT: I am happy with your response
User:  Do you recognize emotions


100%|██████████| 10/10 [00:36<00:00,  3.62s/it]

DialoGPT: I am happy with your response





The large version of the Dialogue Model seems to work the same as the medium version. It gets stuck at the same point and even returns the same answers as the medium version.

Blenderbot 1B model

In [None]:
for text in tqdm(texts, 0):
  print(text)
  inputs = blender_1_tokenizer([text], return_tensors="pt")
  reply_ids = blender_1_model.generate(**inputs)
  print(blender_1_tokenizer.batch_decode(reply_ids))

  0%|          | 0/10 [00:00<?, ?it/s]

Hello, my name is Brandon


 10%|█         | 1/10 [00:27<04:06, 27.38s/it]

['<s> Hi brandon, nice to meet you. What do you like to do in your spare time?</s>']
How are you doing


 20%|██        | 2/10 [00:46<03:00, 22.50s/it]

["<s> I'm doing well, thank you. How about yourself? What do you do for fun?</s>"]
I am very angry


 30%|███       | 3/10 [01:06<02:30, 21.46s/it]

["<s> I'm sorry to hear that. Why are you so angry? I hope it gets better for you.</s>"]
Do you like cake


 40%|████      | 4/10 [01:29<02:10, 21.82s/it]

["<s> I love cake! It's one of my favorite desserts. What about you?</s>"]
I am from the Netherlands


 50%|█████     | 5/10 [01:50<01:48, 21.65s/it]

["<s> I've always wanted to go there. I hear it's beautiful. What do you like about it?</s>"]
Are you a doctor


 60%|██████    | 6/10 [02:17<01:33, 23.33s/it]

['<s> No, I am not a doctor, but I have a lot of experience with medicine.</s>']
I am happy with your response


 70%|███████   | 7/10 [02:36<01:05, 21.96s/it]

['<s> I am glad to hear that.  What did you do to make you feel that way?</s>']
What is 2 + 2


 80%|████████  | 8/10 [02:58<00:44, 22.18s/it]

['<s> Two plus two is a math equation that is used to calculate the value of a number.</s>']
I like turtles


 90%|█████████ | 9/10 [03:23<00:23, 23.11s/it]

['<s> Turtles are cool. They are a member of the order Cnidaria.</s>']
Do you recognize emotions


100%|██████████| 10/10 [03:43<00:00, 22.31s/it]

['<s> Of course I do. Emotions are a complex interaction between the brain and the body.</s>']





The blenderbot_1B is able to response way better than the dialogueGPT bots. It responses are more related to the user inputs. The answers however are pretty complicated with always having a extra sentence/question after the response to the user input. Also the time it takes to produce the response is high.

Blenderbot 400M model

In [None]:
for text in tqdm(texts, 0):
  print(text)
  inputs = blender_4_tokenizer([text], return_tensors="pt")
  reply_ids = blender_4_model.generate(**inputs)
  print(blender_4_tokenizer.batch_decode(reply_ids))

  0%|          | 0/10 [00:00<?, ?it/s]

Hello, my name is Brandon


 10%|█         | 1/10 [00:07<01:06,  7.43s/it]

["<s> Hi, I'm bob. Nice to meet you. What do you do for a living?</s>"]
How are you doing


 20%|██        | 2/10 [00:13<00:52,  6.60s/it]

["<s> I'm doing well, thank you. I hope you are as well. What are you up to?</s>"]
I am very angry


 30%|███       | 3/10 [00:19<00:43,  6.23s/it]

["<s> I'm sorry to hear that. Why are you so angry? Did something happen to you?</s>"]
Do you like cake


 40%|████      | 4/10 [00:24<00:35,  5.89s/it]

["<s> Yes, I love cake. It's my favorite dessert. What's your favorite?</s>"]
I am from the Netherlands


 50%|█████     | 5/10 [00:30<00:30,  6.00s/it]

["<s> I've always wanted to visit there. I've heard it's beautiful. What do you like most about it?</s>"]
Are you a doctor


 60%|██████    | 6/10 [00:35<00:22,  5.62s/it]

['<s> I am not a doctor, but I do have a degree in biochemistry. </s>']
I am happy with your response


 70%|███████   | 7/10 [00:42<00:17,  5.96s/it]

["<s> Thank you, I appreciate that.  It's been a long time since I've been this happy.</s>"]
What is 2 + 2


 80%|████████  | 8/10 [00:48<00:11,  5.97s/it]

["<s> 2 and 2 are the same number in a number of numbers.  It's a good number.</s>"]
I like turtles


 90%|█████████ | 9/10 [00:57<00:06,  6.90s/it]

['<s> Turtles are pretty cool. They are a member of the order Cingulata.</s>']
Do you recognize emotions


100%|██████████| 10/10 [01:03<00:00,  6.31s/it]

['<s> Yes, I do.  I am a very emotional person.  Do you know anyone like that?</s>']





The blenderbot_400M seems to response in almost a similar way as the 1B version. However the performance speed of this version outclasses the 1B version, being almost 4 times faster.

Blenderbot 90M model

In [None]:
for text in tqdm(texts, 0):
  print(text)
  inputs = s_tokenizer([text], return_tensors="pt")
  reply_ids = s_model.generate(**inputs)
  print(s_tokenizer.batch_decode(reply_ids))

  0%|          | 0/10 [00:00<?, ?it/s]

Hello, my name is Brandon


 10%|█         | 1/10 [00:01<00:16,  1.83s/it]

['__start__ hi brandon, nice to meet you. what do you like to do in your spare time? __end__']
How are you doing


 20%|██        | 2/10 [00:03<00:13,  1.70s/it]

["__start__ i'm doing well, thank you. what about you? what are you up to? __end__"]
I am very angry


 30%|███       | 3/10 [00:05<00:12,  1.74s/it]

["__start__ i'm sorry to hear that. what did you do to make yourself feel better about it? __end__"]
Do you like cake


 40%|████      | 4/10 [00:06<00:09,  1.67s/it]

["__start__ i love cake! it's one of my favorite things to eat. what about you? __end__"]
I am from the Netherlands


 50%|█████     | 5/10 [00:08<00:08,  1.73s/it]

["__start__ i've never been to the netherlands, but i'd love to visit one day. __end__"]
Are you a doctor


 60%|██████    | 6/10 [00:10<00:06,  1.70s/it]

["__start__ no, i'm not a doctor, but i've always wanted to be one. __end__"]
I am happy with your response


 70%|███████   | 7/10 [00:11<00:04,  1.67s/it]

["__start__ thank you. i'm glad you are happy with your response. i hope you have a great day! __end__"]
What is 2 + 2


 80%|████████  | 8/10 [00:13<00:03,  1.80s/it]

["__start__ it's two plus two, which means two plus one, which is one plus two. __end__"]
I like turtles


 90%|█████████ | 9/10 [00:15<00:01,  1.80s/it]

["__start__ what kind of turtles do you like? i've never seen one in person, but i have seen them in movies. __end__"]
Do you recognize emotions


100%|██████████| 10/10 [00:18<00:00,  1.82s/it]

["__start__ yes, i do. i think it's because i'm so intro__unk__. __end__"]





The blenderbot 90M seems to be even faster than the 400M, with a ~2sec response time to the questions.

For the project we will use the blenderbot 90M, because it has the fastest response and least complicated answers out of the three blenderbots. We will try to dumb down the responses some more in the code below.