Chatbot with Transformer Architecture  
*Johnathan Long*  

[Open in Google Drive](https://colab.research.google.com/drive/1xw4pGj4_e7_PrKxPEas6LZ-hyQvNem6F?authuser=1#scrollTo=4saFZfCfWWHB)
___

**Introduction**  
In this project, we developed an advanced chatbot capable of engaging in multi-turn conversations, understanding context, and handling diverse topics using deep learning techniques. The chatbot is built on a Transformer-based model, specifically GPT, and trained on the Ubuntu Dialogue Corpus, a dataset comprising authentic real-world technical support conversations. The trained model is designed to manage various conversational scenarios, leveraging the Transformer architecture to effectively handle context and generate coherent responses. We fine-tuned the model parameters to optimize dialogue quality and ensure relevant outputs. Additionally, a user-friendly web interface was developed to facilitate seamless interaction with the chatbot. The goal of the project was to create a versatile conversational agent capable of maintaining conversation flow, adapting to different contexts, and providing coherent, engaging responses

In [3]:
#pip install gradio

In [4]:
#pip install datasets

In [5]:
import numpy as np
import pandas as pd
import torch
from datasets import Dataset

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [6]:
# prompt: mount my google drive

from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


**Load Dataset**

Ubuntu Dialogue Corpus is a large dataset containing nearly one million two-person conversations extracted from Ubuntu chat logs, primarily focused on providing technical support for Ubuntu-related issues. The dataset includes over 100 million words spread across 930,000 dialogues with an average of 8 turns per conversation. Given the size of the full dataset, we opted to use a subset for training, specifically the file *dialogueText_196*, to manage processing time. Key data columns such as dialogueID, date, from, to, and text were extracted to maintain conversation context and group dialogues effectively.

> After loaded we use a subset of the dataset, and print the head

In [7]:
dataset_df_original = pd.read_csv('/content/drive/MyDrive/Datasets/UbuntuDataset/Ubuntu-dialogue-corpus/dialogueText_196.csv')

In [8]:
dataset_df = dataset_df_original.iloc[:500000] # Use only a subset of the training dataset

In [9]:
# View the first few rows of the dataset
print("Original Dataset:")
dataset_df.head()

Original Dataset:


Unnamed: 0,folder,dialogueID,date,from,to,text
0,301,1.tsv,2004-11-23T11:49:00.000Z,stuNNed,,any ideas why java plugin takes so long to load?
1,301,1.tsv,2004-11-23T11:49:00.000Z,crimsun,stuNNed,java 1.4?
2,301,1.tsv,2004-11-23T11:49:00.000Z,stuNNed,crimsun,yes
3,301,1.tsv,2004-11-23T11:49:00.000Z,crimsun,stuNNed,java 1.5 loads _much_ faster
4,301,1.tsv,2004-11-23T11:50:00.000Z,stuNNed,crimsun,noneus: how can i get 1.5 is there a .deb some...


>We preprocess the data here, droping rows with missing values, normalizing the text and removing duplicates. We then merge the messages and convert the processed data to a hugging face dataset.

In [None]:
from datasets import Dataset
from transformers import GPT2LMHeadModel, GPT2Tokenizer, Trainer, TrainingArguments
import pandas as pd
# Preprocess the dataset (remove NaNs, normalize text, and merge messages)
def preprocess_conversations(df):
    df_clean = df.dropna(subset=['text', 'from', 'dialogueID', 'to', 'date'])
    df_clean['text'] = df_clean['text'].str.lower().str.replace('[^\w\s]', '', regex=True).str.strip()
    df_clean = df_clean.drop_duplicates(subset=['text', 'dialogueID', 'to', 'from'])
    return df_clean

# Filter dialogues with exactly two participants
def filter_participant_count(df, min_participants=2, max_participants=2):
    return df.groupby('dialogueID').filter(lambda x: min_participants <= x['from'].nunique() <= max_participants)

# Merge consecutive messages from the same speaker
def merge_messages(df):
    merged_dialogues = []
    for dialogue_id, group in df.groupby('dialogueID'):
        group = group.sort_values(by=['dialogueID', 'date']).reset_index(drop=True)
        conversation = []
        current_speaker = group.loc[0, 'from']
        current_message = group.loc[0, 'text']
        for i in range(1, len(group)):
            if group.loc[i, 'from'] == current_speaker:
                current_message += " " + group.loc[i, 'text']
            else:
                conversation.append(current_message)
                current_speaker = group.loc[i, 'from']
                current_message = group.loc[i, 'text']
        conversation.append(current_message)
        merged_dialogue = '<|endoftext|>'.join(conversation)
        merged_dialogues.append(merged_dialogue)
    return merged_dialogues

# Apply preprocessing, filtering, and merging
df_clean = preprocess_conversations(dataset_df)
filtered_df = filter_participant_count(df_clean)
conversations = merge_messages(filtered_df)

# Convert the processed conversations into a Hugging Face Dataset
dataset = Dataset.from_dict({'text': conversations})


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_clean['text'] = df_clean['text'].str.lower().str.replace('[^\w\s]', '', regex=True).str.strip()


>We can visualize the preprocessing here to confirm the text is in a suitable format for tokenization and training.

In [None]:
# View the first few rows of the dataset
print("Original Dataset:")
print(dataset_df.head())

# Check after preprocessing
print("Cleaned Dataset:")
print(df_clean.head())

Original Dataset:
   folder dialogueID                      date     from       to  \
0     301      1.tsv  2004-11-23T11:49:00.000Z  stuNNed      NaN   
1     301      1.tsv  2004-11-23T11:49:00.000Z  crimsun  stuNNed   
2     301      1.tsv  2004-11-23T11:49:00.000Z  stuNNed  crimsun   
3     301      1.tsv  2004-11-23T11:49:00.000Z  crimsun  stuNNed   
4     301      1.tsv  2004-11-23T11:50:00.000Z  stuNNed  crimsun   

                                                text  
0   any ideas why java plugin takes so long to load?  
1                                          java 1.4?  
2                                                yes  
3                       java 1.5 loads _much_ faster  
4  noneus: how can i get 1.5 is there a .deb some...  
Cleaned Dataset:
   folder dialogueID                      date     from       to  \
1     301      1.tsv  2004-11-23T11:49:00.000Z  crimsun  stuNNed   
2     301      1.tsv  2004-11-23T11:49:00.000Z  stuNNed  crimsun   
3     301      1.tsv  

In [None]:
from collections import Counter

def get_keyword_frequency(df, keywords):
    all_text = " ".join(df['text'])
    word_counts = Counter(word.lower() for word in all_text.split() if word.lower() in keywords)
    return word_counts

    # Example usage
keywords = ['service', 'go', 'my', 'your', 'run', 'it', 'what', 'when', 'why', 'how', 'i', 'you', 'program', 'system', 'ubuntu' ]
keyword_counts = get_keyword_frequency(df_clean, keywords)

print(keyword_counts)

Counter({'i': 130943, 'you': 110626, 'it': 101913, 'what': 31951, 'your': 24216, 'my': 20015, 'ubuntu': 14801, 'how': 14352, 'when': 10960, 'run': 8222, 'why': 6141, 'system': 6043, 'go': 5756, 'program': 1810, 'service': 536})


# Tokenization

The GPT2Tokenizer is loaded from the DialoGPT-medium model. The `pad_token` is set to the `eos_token` (end-of-sequence token). Then the GPT2LMHeadModel is loaded for fine-tuning. A custom encode function is defined to tokenize the dataset, truncating texts to a maximum length of 128 tokens and ensuring that padding tokens are properly handled in the labels. The encode function is then applied to the entire dataset using Hugging Face's map method to batch-process the data. This prepares the dataset for model fine-tuning by aligning the input-output pairs.

In [None]:
from transformers import GPT2Tokenizer

# Load the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('microsoft/DialoGPT-medium')
tokenizer.pad_token = tokenizer.eos_token


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
model = GPT2LMHeadModel.from_pretrained('microsoft/DialoGPT-medium')

In [None]:
def encode(examples):
    encoded = tokenizer(examples['text'], truncation=True, padding='max_length', max_length=128)
    encoded['labels'] = encoded['input_ids'].copy()  # Create labels by copying input_ids
    # Ignore padding tokens in labels
    encoded['labels'] = [
        [(label if label != tokenizer.pad_token_id else -100) for label in labels_example]
        for labels_example in encoded['labels']
    ]
    return encoded

# Apply encoding to the dataset (*Hugging Face)
encoded_dataset = dataset.map(encode, batched=True)

Map:   0%|          | 0/6920 [00:00<?, ? examples/s]


>The goal of the here is to fine-tune a GPT-2-based model to enhance its conversational abilities. We're tuning key parameters, such as epochs, batch sizes, and learning rate, to improve learning. The model is trained on the preprocessed dataset with optimizations for faster performance and gradient accumulation to handle larger batch sizes. The intention is to improve the model's ability to generate coherent and contextually appropriate responses in conversations, making it feel more natural.

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir='results(8)',               # output directory
    num_train_epochs=3,                 # total number of training epochs
    per_device_train_batch_size=16,     # batch size per device during training
    per_device_eval_batch_size=32,      # batch size for evaluation
    warmup_steps=500,                   # number of warmup steps for learning rate scheduler
    weight_decay=0.01,                  # strength of weight decay
    learning_rate=1e-5,                 # small learning_rate for fine-tuning
    fp16=True,
    gradient_accumulation_steps=2       # use floating point 16-bit precision for training
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_dataset
)


**Model Training**  
_Expect high computational load and long runtime. Please refer to CHECKPOINT to use the saved model._

In [None]:
# Train the model
trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss
500,4.5522


TrainOutput(global_step=648, training_loss=4.456059020242574, metrics={'train_runtime': 820.6248, 'train_samples_per_second': 25.298, 'train_steps_per_second': 0.79, 'total_flos': 4810669597655040.0, 'train_loss': 4.456059020242574, 'epoch': 2.9930715935334873})

In [None]:
from google.colab import drive
import shutil

# Create a directory in Google Drive to save the checkpoints
#output_dir = '/content/drive/My Drive/Checkpoints'
#shutil.copytree('/content/results(8)/checkpoint-500', f'{output_dir}/checkpoint-500')
#shutil.copytree('/content/results(8)/checkpoint-648', f'{output_dir}/checkpoint-648')

#print("Checkpoints saved to Google Drive!")

Checkpoints saved to Google Drive!


**Load Model**

Here we test the functionality of the pre-trained model. We load the model and tokenizer from my previous checkpoint, then a pre-trained "DialoGPT-medium" model. The input text is tokenized, passed to the model for text generation, and the output is decoded into human-readable text. The program outputs text, although it is simply an echo of the input data, we can further engineer a chatbot.

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Set the device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define the model_path to your checkpoint
model_path = "/content/drive/MyDrive/Checkpoints/checkpoint-500"  # Or "/content/results(new)/checkpoint-648"

# Load the model and tokenizer
model = GPT2LMHeadModel.from_pretrained(model_path).to(device)
tokenizer = GPT2Tokenizer.from_pretrained("microsoft/DialoGPT-medium")

# Example usage: Inference
input_text = "Hello! How are you today?"
inputs = tokenizer(input_text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=50)

# Decode the output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Hello! How are you today?


>Here we define a chatbot using a pre-trained GPT-2 model (or DialoGPT) and integrate it with a Gradio interface. A ChatBot class is defined with methods to manage the conversation context, generate responses, and maintain a history of user-bot interactions. `add_to_context` method ensures the conversation history remains manageable, while `generate_response` takes user input and generates an appropriate bot response using the model. We set up the Gradio interface so that you can converse with the model in real-time.

`max_context_len`: Controls the maximum number of tokens in the conversation context.  
`add_to_context`: Appends user input and bot responses to the conversation history  
`generate_response`: This method is responsible for generating bot responses.

`max_length`: The maximum length of the response generated.  
`temperature`: Randomness of the response. A lower value (e.g., 0.6) results in more deterministic responses, while higher values (e.g., 1.0) allow for more creative responses.  
`top_p`: Nucleus sampling, determining the cumulative probability for choosing words in the response. A lower value narrows the selection of possible words to a smaller subset, while higher values allow for more variability in the response.  
`max_new_tokens`: The maximum number of new tokens generated in response to the input.  
`num_beams`: The number of beams used for beam search, which influences the explores different possibilities.
`repetition_penalty`: Prevent the model from generating redundant responses.  
`length_penalty`: Penalty for generating longer responses.

In [11]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import gradio as gr

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model and tokenizer
model_path = '/content/drive/MyDrive/Checkpoints/checkpoint-500'
model = GPT2LMHeadModel.from_pretrained(model_path).to(device)
tokenizer = GPT2Tokenizer.from_pretrained("microsoft/DialoGPT-medium")
tokenizer.pad_token = tokenizer.eos_token

# ChatBot Class
class ChatBot:
    def __init__(self, model, tokenizer, device, max_context_len=100):
        self.model = model
        self.tokenizer = tokenizer
        self.device = device
        self.context = []
        self.max_context_len = max_context_len

    def add_to_context(self, user_input, bot_response):
        self.context.append(f"User: {user_input}")
        self.context.append(f"Bot: {bot_response}")
        # Keep the context size manageable
        context_tokens = self.tokenizer.encode("<|endoftext|>".join(self.context), add_special_tokens=False)
        if len(context_tokens) > self.max_context_len:
            self.context = self.context[-self.max_context_len:]

    def generate_response(self, user_input, max_length: int = 12, temperature: float = 0.6, top_p: float = 0.7):
        # Create the input prompt
        input_prompt = "<|endoftext|>".join(self.context + [f"User: {user_input}", "Bot:"])
        inputs = self.tokenizer(input_prompt, return_tensors="pt").to(self.device)

        # Generate response
        with torch.no_grad():
            outputs = self.model.generate(
                inputs.input_ids,
                attention_mask=inputs.attention_mask,
                max_new_tokens=40,
                no_repeat_ngram_size=5,
                early_stopping=True,
                num_beams=2,
                pad_token_id=self.tokenizer.eos_token_id,
                eos_token_id=self.tokenizer.eos_token_id,
                repetition_penalty=10.0,
                temperature=temperature,
                top_p=top_p,
                do_sample=True,
                length_penalty=0.8
            )
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True).split("Bot:")[-1].strip()
        self.add_to_context(user_input, response)
        return response

# Initialize ChatBot
chatbot = ChatBot(model, tokenizer, device)

# Gradio Interface
def chatbot_interface(user_input):
    return chatbot.generate_response(user_input)

# Launch Gradio app
interface = gr.Interface(fn=chatbot_interface, inputs="text", outputs="text")
interface.launch()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://9fb23729787942ecef.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




>Key adjustments were made to improve the organization and separation of concerns within the code. The Chat class was introduced to handle the conversation history, managing the context by adding messages from both the user and the bot while ensuring that the context does not exceed the maximum token limit. This allows the ChatBot class to focus solely on generating responses based on the current conversation context. The context is managed by trimming the oldest messages. The ChatBot class now relies on the Chat class for context handling, simplifying its role to just generating responses. This separation enhances code maintainability and flexibility, making the conversation management and response generation more natural flowing.

In [7]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import gradio as gr

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model and tokenizer
model_path = '/content/drive/MyDrive/Checkpoints/checkpoint-500'
model = GPT2LMHeadModel.from_pretrained(model_path).to(device)
tokenizer = GPT2Tokenizer.from_pretrained("microsoft/DialoGPT-medium")
tokenizer.pad_token = tokenizer.eos_token

# Chat class to manage the conversation history
class Chat:
    def __init__(self, tokenizer, max_context_len=100):
        self.context = []
        self.tokenizer = tokenizer
        self.max_context_len = max_context_len

    def add_message(self, sender, message):

        self.context.append(f"{sender}: {message}")
        # Keep the context within the maximum token limit
        context_tokens = self.tokenizer.encode("<|endoftext|>".join(self.context), add_special_tokens=False)
        if len(context_tokens) > self.max_context_len:
            self.context = self.context[-self.max_context_len:]

    def get_context(self):

        return "<|endoftext|>".join(self.context)

# ChatBot class to generate responses
class ChatBot:
    def __init__(self, model, tokenizer, device, chat, temperature=0.6, top_p=0.7):
        self.model = model
        self.tokenizer = tokenizer
        self.device = device
        self.chat = chat
        self.temperature = temperature
        self.top_p = top_p

    def generate_response(self, user_input, max_length=12):

        # Add user input to the conversation context
        self.chat.add_message("User", user_input)

        # Create input prompt for the model
        input_prompt = self.chat.get_context()
        inputs = self.tokenizer(input_prompt, return_tensors="pt").to(self.device)

        # Generate response
        with torch.no_grad():
            outputs = self.model.generate(
                inputs.input_ids,
                attention_mask=inputs.attention_mask,
                max_new_tokens=40,
                no_repeat_ngram_size=5,
                early_stopping=True,
                num_beams=2,
                pad_token_id=self.tokenizer.eos_token_id,
                eos_token_id=self.tokenizer.eos_token_id,
                repetition_penalty=10.0,
                temperature=self.temperature,
                top_p=self.top_p,
                do_sample=True,
                length_penalty=0.8
            )
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True).split("Bot:")[-1].strip()

        # Add bot response to the conversation context
        self.chat.add_message("Bot", response)

        return response

# Initialize Chat and ChatBot
chat = Chat(tokenizer=tokenizer, max_context_len=100)
chatbot = ChatBot(model=model, tokenizer=tokenizer, device=device, chat=chat)

# Gradio Interface
def chatbot_interface(user_input):
    return chatbot.generate_response(user_input)

# Launch Gradio app
interface = gr.Interface(fn=chatbot_interface, inputs="text", outputs="text")
interface.launch()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://081eba64ae614a2c5e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




>Let's now introduce a more structured approach to handling chatbot conversations. The Chat class now manages the flow of the conversation by categorizing user inputs (questions, acknowledgments, or statements) and responding accordingly. It also allows the chatbot to switch topics and reset the conversation while maintaining the context of previous messages. The `set_mood` method enables dynamic adjustments to the chatbot's response style, controlled by the `temperature` and `top_p` parameters. The ChatBot class remains responsible for generating responses, but now it works in with the Chat class to handle different types of user input. The Gradio interface was updated to allow users to interact with the chatbot, set a topic, adjust the mood, and reset the conversation as needed, all while maintaining a more flexible and engaging user experience.

In [8]:

class Chat:
    def __init__(self, chatbot, max_response_len=20):
        self.chatbot = chatbot
        self.messages = []
        self.max_response_len = max_response_len
        self.temperature = 0.6  # Default temperature
        self.top_p = 0.5  # Default top_p

    def send(self, text: str):
        response = self.handle_input(text)
        self.messages.append(f"User: {text}")
        self.messages.append(f"Bot: {response}")
        return response

    def set_topic(self, topic: str):
        self.messages.append(f"Topic: {topic}")

    def reset_conversation(self):
        self.messages = []

    def set_mood(self, temperature: float, top_p: float):
        self.temperature = temperature
        self.top_p = top_p

    def handle_input(self, text: str):
        input_type = self.classify_input(text)
        if input_type == "question":
            response = self.chatbot.generate(self.get_context() + f"User: {text}", max_length=self.max_response_len, temperature=self.temperature, top_p=self.top_p)
        elif input_type == "acknowledgment":
            response = self.generate_acknowledgment_response(text)
        else:
            response = self.generate_follow_up(text)
        return response

    def classify_input(self, text: str) -> str:
        if "?" in text or text.lower().startswith(("how", "what", "why", "where", "when", "who")):
            return "question"
        elif text.lower() in ["thanks", "thank you", "okay", "got it", "appreciate it"]:
            return "acknowledgment"
        else:
            return "statement"

    def generate_acknowledgment_response(self, text: str) -> str:
        prompt = self.get_context() + f"User: {text}\n Chatbot: You're welcome! I hope that helps."
        response = self.chatbot.generate(prompt, max_length=self.max_response_len, temperature=self.temperature, top_p=self.top_p)
        return response

    def generate_follow_up(self, text: str) -> str:
        prompt = self.get_context() + f"User: {text}\n Chatbot: That's an interesting point."
        response = self.chatbot.generate(prompt, max_length=self.max_response_len, temperature=self.temperature, top_p=self.top_p)
        return response

    def get_context(self) -> str:
        prompt = '<|endoftext|>'.join(self.messages) + '<|endoftext|>'
        max_tokens = 112
        prompt_tokens = self.chatbot.tokenizer.encode(prompt)
        if len(prompt_tokens) > max_tokens:
            prompt_tokens = prompt_tokens[-max_tokens:]
            prompt = self.chatbot.tokenizer.decode(prompt_tokens)
        return prompt


class ChatBot:
    def __init__(self, model_path: str, device=None):
        if not device:
            device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        self.device = device
        self.model = GPT2LMHeadModel.from_pretrained(model_path).to(self.device)
        self.tokenizer = GPT2Tokenizer.from_pretrained('microsoft/DialoGPT-medium')
        self.tokenizer.pad_token = self.tokenizer.eos_token

    def generate(self, text: str, max_length: int = 12, temperature: float = 0.6, top_p: float = 0.8) -> str:
        with torch.no_grad():
            inputs = self.tokenizer(text, return_tensors="pt")
            outputs = self.model.generate(
                inputs.input_ids.to(self.device),
                attention_mask=inputs.attention_mask.to(self.device),
                max_new_tokens=50,
                no_repeat_ngram_size=5,
                early_stopping=True,
                num_beams=2,
                pad_token_id=self.tokenizer.eos_token_id,
                eos_token_id=self.tokenizer.eos_token_id,
                repetition_penalty=10.0,
                temperature=temperature,
                top_p=top_p,
                do_sample=True,
                length_penalty=0.8
            )
            response_outputs = outputs[:, len(inputs['input_ids'][0]):]
            response = self.tokenizer.batch_decode(response_outputs, skip_special_tokens=True)[0]
            return response

    def create_chat(self) -> Chat:
        return Chat(self)



In [9]:
chatBot = ChatBot(model_path)
conversation = chatBot.create_chat()


# Define functions for Gradio interface
def chatbot_response(message):
    return conversation.send(message)

def set_topic(topic):
    conversation.set_topic(topic)
    return "Topic set."

def reset_conversation():
    conversation.reset_conversation()
    return "Conversation reset."

def set_mood(temperature, top_p):
    conversation.set_mood(temperature, top_p)
    return f"Mood set to Temperature: {temperature}, Top P: {top_p}"


# Create Gradio interface using Blocks
with gr.Blocks() as demo:
    gr.Markdown("## Customizable Chatbot")

    # Main Chat Interface
    with gr.Row():
        chatbot_input = gr.Textbox(label="Type your message", placeholder="Send a message to the chatbot")
        chatbot_output = gr.Textbox(label="Chatbot's Response", interactive=False)

    # Chatbot interaction button
    with gr.Row():
        send_button = gr.Button("Send")

    send_button.click(chatbot_response, inputs=chatbot_input, outputs=chatbot_output)

    # Controls for setting mood, topic, and resetting conversation
    with gr.Row():
        with gr.Column():
            topic_input = gr.Textbox(label="Set Topic", placeholder="Enter conversation topic")
            topic_button = gr.Button("Set Topic")
            topic_button.click(set_topic, inputs=topic_input, outputs=chatbot_output)

        with gr.Column():
            temperature_slider = gr.Slider(minimum=0.1, maximum=1.0, label="Temperature", step=0.1, value=0.6)
            top_p_slider = gr.Slider(minimum=0.1, maximum=1.0, label="Top P", step=0.1, value=0.9)
            mood_button = gr.Button("Set Mood")
            mood_button.click(set_mood, inputs=[temperature_slider, top_p_slider], outputs=chatbot_output)

        with gr.Column():
            reset_button = gr.Button("Reset Conversation")
            reset_button.click(reset_conversation, outputs=chatbot_output)

# Launch the Gradio app
demo.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://8b11336695984b8087.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




**Conclusion**

We experience the model outputting off topic, grammatically incorrect, and repetative data such as "User:" and "Bot:". We can adjust the model with context truncation, conversational ques, or continuing to explore the model parameters. It is also worth considering responses could potentially be influenced by the dataset used to train or fine-tune the model, (Ubuntu Dialogue Corpus ataset). The quality of responses in a chatbot is highly dependent on the dataset it was trained on, and if the dataset doesn't align well with the desired conversational style or domain, the model's performance can degrade.

The Ubuntu Dialogue Corpus dataset contains real-world technical support conversations, often centered around troubleshooting and technical issues related to the Ubuntu operating system. The conversations might contain very specific language related to technical support.

If we want to build a chatbot for general conversations, it would be better to fine-tune the model on a more relevant dataset, such as movie reviews, or product reviews, or a mix of multiple datasets.

Though the project did not deliver a modern-day competitive chatbot ready for application, we can still visualize the pretrained model grasping context within conversation and forming intelligence. The methodology of building a chatbot is implemented and understood. With continued research and development and accessible data, a powerful chatbot can be delivered for tasks specific purposes.

In [12]:
#apt-get update
#apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic

In [21]:
#sudo apt-get install pandoc


In [22]:
import pypandoc

!jupyter nbconvert --to pdf '/content/drive/MyDrive/Colab Notebooks/Chatbot_Ubuntu.ipynb'

[NbConvertApp] Converting notebook /content/drive/MyDrive/Colab Notebooks/Chatbot_Ubuntu.ipynb to pdf
[NbConvertApp] Writing 103964 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 105713 bytes to /content/drive/MyDrive/Colab Notebooks/Chatbot_Ubuntu.pdf
