### Regis University

**MSDS688_X70: Artificial Intelligence**  
Master of Science in Data Science Program


#### Week 5: Building Conversational Bots  
*GPU Required*

## Lecture: Week 5 - Building Conversational Bots with DialoGPT

### Overview

This week, we focus on building conversational chatbots using **DialoGPT**. DialoGPT is a conversational model developed by Microsoft, based on GPT-2, that is fine-tuned specifically for dialogue generation. In this lecture, we will discuss how DialoGPT works, how it can be fine-tuned for custom conversational tasks, how chatbots function from an engineering perspective, and how to build a chatbot capable of generating human-like responses.

---

### 1. **How Chatbots Work**

Chatbots are AI-powered systems designed to simulate human conversations. From a technical perspective, chatbots are built using a combination of machine learning algorithms, natural language processing (NLP), and sometimes rule-based systems. Below is a breakdown of how modern chatbots are engineered:

#### Key Components of a Chatbot:
- **Natural Language Understanding (NLU)**: This component helps the chatbot understand user inputs. It involves breaking down user queries into structured data, extracting the intent (what the user wants), and identifying entities (specific pieces of information like dates, names, or locations).
- **Response Generation**: This can be either rule-based (predefined responses) or based on machine learning models. Models like **DialoGPT** generate responses dynamically by predicting the next words in the conversation.
- **Dialogue Management**: This component manages the flow of conversation, deciding how the chatbot should respond based on the context. It ensures that the conversation stays coherent and follows a logical sequence.
- **Backend APIs and Integrations**: Chatbots often need to integrate with backend systems (e.g., databases, APIs) to retrieve information or complete tasks like booking a ticket or checking account balances.

#### Types of Chatbots:
1. **Rule-Based Chatbots**: These rely on predefined rules and responses, often using decision trees or simple pattern matching to guide the conversation.
2. **AI-Based Chatbots**: These use machine learning models to generate responses. They are more flexible and can handle a wider range of queries but require large amounts of data for training.
3. **Hybrid Chatbots**: These combine rule-based systems with AI, allowing the chatbot to handle simple queries through predefined rules and more complex queries through machine learning.

#### Key Challenges in Building Chatbots:
- **Understanding Ambiguity**: User inputs can often be ambiguous, making it difficult for the chatbot to understand the exact intent.
- **Maintaining Context**: Chatbots need to remember and maintain the context of the conversation across multiple turns.
- **Response Coherence**: Generating relevant and coherent responses is challenging, especially in long conversations.
- **Scalability**: Chatbots need to handle thousands of interactions simultaneously while maintaining performance.

---

### 2. **How DialoGPT Works**

#### DialoGPT Overview:
**DialoGPT** is a variant of GPT-2, fine-tuned for conversational AI tasks. It is designed to generate coherent, context-aware responses in a dialogue format. Like GPT-2, DialoGPT uses a **transformer-based** architecture with a **decoder-only** structure, meaning it predicts the next token in a sequence based on all the preceding tokens.

#### Key Concepts:
- **Pretraining on Conversations**: DialoGPT is pretrained on large conversational datasets, which allows it to generate contextually relevant responses.
- **Unidirectional Decoder**: Similar to GPT-2, DialoGPT generates text left-to-right, predicting the next word based on the conversation so far.
- **Self-Attention**: DialoGPT uses self-attention mechanisms to focus on relevant parts of the conversation when generating responses.

#### How DialoGPT Generates Dialogue:
1. **Input Tokenization**: The conversation is tokenized into smaller units (tokens) and fed into the model.
2. **Context-Aware Response Generation**: The model generates the next token based on the entire conversation history. This allows it to maintain context and coherence over multiple turns in a conversation.
3. **Text Generation**: The generated tokens are decoded back into human-readable text, forming the chatbot’s response.

---

### 3. **Fine-Tuning DialoGPT**

Fine-tuning allows us to customize DialoGPT to respond better in specific conversational contexts. This involves training the model on a custom dataset, which may include dialogues from a particular domain (e.g., customer support, therapy, or informal chats).

#### Steps for Fine-Tuning DialoGPT:
1. **Load Pretrained Model**: Start with a pretrained DialoGPT model (e.g., `microsoft/DialoGPT-medium`).
2. **Prepare the Dataset**: Use a dialogue dataset like **DailyDialog**. Tokenize the conversations so they can be fed into the model.
3. **Train the Model**: Fine-tune DialoGPT on the dataset by training it to predict the next token in each dialogue sequence. Adjust the learning rate and training parameters to fit the dataset size.
4. **Evaluation**: After fine-tuning, evaluate the chatbot’s performance by generating responses and comparing them to the expected dialogue or using metrics like **perplexity**.

#### Tokenization and Training:
- **Tokenization**: The input conversations are tokenized into sequences of tokens. Since dialogues can vary in length, padding may be necessary to ensure the sequences have the same length.
- **Training Objective**: The training process involves predicting the next word in a conversation based on the context provided by the previous words.

#### Fine-Tuning Parameters:
- **Learning Rate**: Controls how quickly the model adjusts its weights during training.
- **Epochs**: The number of times the model goes through the training dataset.
- **Batch Size**: The number of samples the model processes before updating its weights.

---

### 4. **Building a Chatbot with DialoGPT**

Once fine-tuned, DialoGPT can be used to build a fully functional chatbot. The chatbot can generate responses to user inputs based on the conversational context, providing coherent and contextually relevant replies.

#### Response Generation:
- **Max and Min Length**: Control the length of the chatbot’s responses. Setting a minimum length ensures the responses are not too short, while a maximum length prevents overly long replies.
- **Temperature**: Adjusts the randomness of the model’s predictions. Higher temperatures make the chatbot more creative, while lower temperatures make the chatbot more deterministic.
- **Top-p (Nucleus Sampling)**: Controls the diversity of the generated responses by sampling from the most probable words until the cumulative probability reaches `p` (e.g., 0.9).

#### Interaction with the Chatbot:
After fine-tuning, you can start a conversation with the chatbot by feeding it input text. The chatbot will generate responses based on its training and the provided input. The conversation history is stored and fed back into the model, allowing the chatbot to maintain context across multiple dialogue turns.

```python
# Example of chatting with the fine-tuned model
def chat_with_model(model, tokenizer):
    chat_history_ids = None
    while True:
        user_input = input("User: ")
        new_input_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt').to(device)

        # Concatenate the new input with previous chat history
        bot_input_ids = torch.cat([chat_history_ids, new_input_ids], dim=-1) if chat_history_ids is not None else new_input_ids

        # Generate the response
        chat_history_ids = model.generate(
            bot_input_ids,
            max_length=1000,
            min_length=10,
            no_repeat_ngram_size=3,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id
        )

        # Decode and print the response
        bot_response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
        print(f"Chatbot: {bot_response}")
```

#### Controlling the Chatbot’s Behavior:
- **No Repeat N-Gram Size**: Prevents the chatbot from repeating phrases or sentences by ensuring that no n-grams (sequences of words) are repeated within the generated response.
- **Attention Masking**: Helps the model focus on relevant tokens in the input sequence when generating responses.

---

### 5. **Challenges in Building Conversational Bots**

While building conversational bots with models like DialoGPT is powerful, there are several challenges to consider:
- **Understanding Ambiguity**: User inputs can often be ambiguous, making it difficult for the chatbot to understand the exact intent.
- **Maintaining Context**: Chatbots need to remember and maintain the context of the conversation across multiple turns.
- **Response Coherence**: Generating relevant and coherent responses is challenging, especially in long conversations.
- **Domain Adaptation**: Fine-tuning the model on specific datasets can help adapt the chatbot to particular domains, but it may still struggle with out-of-domain inputs.
- **Ethical Considerations**: Conversational bots can generate inappropriate or biased content. It is important to monitor and filter chatbot outputs for ethical and responsible AI use.

---

### Conclusion

This week’s assignment focuses on fine-tuning **DialoGPT** to create a conversational chatbot capable of generating human-like responses. By fine-tuning the model on a custom dataset, adjusting response generation parameters, and evaluating the chatbot’s performance, you will gain insights into the process of building conversational AI systems. Pay attention to the hyperparameters that control response generation, such as temperature, top-p, and n-gram size, as these will have a significant impact on the chatbot’s behavior.

---


## Assignment Part 1: Follow Me – Fine-Tuning DialoGPT with Custom Dataset

In this section, you will fine-tune the DialoGPT model using a custom dataset to build a conversational chatbot. You will learn how to prepare and fine-tune a model designed for conversational AI and test its ability to generate human-like responses.


In [None]:
import os
os.environ["WANDB_MODE"] = "disabled"
os.environ["WANDB_DISABLED"] = "true"

In [None]:
# Install necessary libraries
!pip install transformers datasets torch

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
import torch

In [None]:
# Load the pre-trained DialoGPT model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

In [None]:
# Set the pad_token to eos_token for padding
tokenizer.pad_token = tokenizer.eos_token

In [None]:
# Load and prepare a conversational dataset for fine-tuning
# Load only a small subset of the daily_dialog dataset for faster training
dataset = load_dataset("blended_skill_talk", split="train[:1%]")


In [None]:
small_train_dataset = dataset.select(range(min(48, len(dataset))))


In [None]:
def tokenize_function(examples):
    # Extract and join all utterance texts from the dialog
    joined_dialogs = [
        " ".join([turn["text"] for turn in dialog])
        for dialog in examples["dialog"]
    ]
    return tokenizer(joined_dialogs, truncation=True, padding="max_length", max_length=512)


In [None]:
# Use as many as are available (up to 100)
small_train_dataset = dataset.select(range(min(100, len(dataset))))


In [None]:
# Load tokenizer for your model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")


In [None]:
# Tokenize using the 'context' field
def tokenize_function(examples):
    joined_contexts = [" ".join(context) for context in examples["context"]]
    return tokenizer(joined_contexts, truncation=True, padding="max_length", max_length=512)


In [None]:
# Apply tokenization and remove original columns
tokenized_dataset = small_train_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=small_train_dataset.column_names
)

In [None]:
# Define the data collator to dynamically pad inputs and create labels
from transformers import DataCollatorForLanguageModeling

In [None]:
# Create data collator for masked language modeling (MLM)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=True,
    mlm_probability=0.15
)

In [None]:
# Define the training arguments
training_args = TrainingArguments(
    output_dir='./results',  # Output directory for checkpoints
    num_train_epochs=1,  # Number of epochs for fine-tuning
    per_device_train_batch_size=2,  # Batch size per device during training
    save_steps=1000,  # Save checkpoint every 1000 steps
    save_total_limit=2,  # Limit checkpoints to 2
    learning_rate=5e-5,  # Learning rate
    logging_dir='./logs',  # Directory for logs
    logging_steps=100,  # Log every 100 steps
    report_to="none"  # Disable WandB logging
)

In [None]:
# Define the Trainer for fine-tuning the model
trainer = Trainer(
    model=model,  # The pre-trained model to fine-tune
    args=training_args,  # Training arguments defined above
    train_dataset=tokenized_dataset,  # The smaller dataset used for training
    data_collator=data_collator,  # Use the data collator to create labels
)

In [None]:
# Fine-tune the model
trainer.train()

In [None]:
# Save the fine-tuned model
model.save_pretrained("./fine_tuned_chatbot")
tokenizer.save_pretrained("./fine_tuned_chatbot")

In [None]:
# Move the model to the GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

In [None]:
def chat_with_model(model, tokenizer):
    import torch
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    if tokenizer.eos_token is None:
        tokenizer.eos_token = tokenizer.sep_token or tokenizer.pad_token or "[SEP]"

    chat_history_ids = None

    while True:
        user_input = input("You: ")

        if user_input.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Tokenize user input
        new_input_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt').to(device)

        # Combine with chat history
        bot_input_ids = torch.cat([chat_history_ids, new_input_ids], dim=-1) if chat_history_ids is not None else new_input_ids

        # Generate response
        chat_history_ids = model.generate(
            bot_input_ids,
            max_length=1000,
            min_length=10,
            pad_token_id=tokenizer.eos_token_id,
            attention_mask=torch.ones_like(bot_input_ids),
            num_return_sequences=1,
            no_repeat_ngram_size=3,
            temperature=0.7,
            top_p=0.9,
            do_sample=True
        )

        bot_response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

        if not bot_response.strip():
            bot_response = "I don't have a response for that."

        print(f"Chatbot: {bot_response}")


In [None]:
# Chat with the fine-tuned model
#### TODO: No code changes. Chat with the bot and type exit to end the chat. DO NOT hit stop as your chat history will not be saved. ####
chat_with_model(model, tokenizer)

## Assignment Part 2: Your Turn – Training Your Own Chatbot with Persona Chat

In this section, you will train your own chatbot using the provided framework and dataset. You will explore how adjusting different hyperparameters and training data can influence the quality of the bot's responses. **A framework has been provided, and your job is to complete the TODOs.**

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from transformers import DataCollatorForLanguageModeling
import torch
import pandas as pd
import random

In [None]:
# Load the pre-trained DialoGPT model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")


In [None]:
# Set the pad_token to eos_token for padding
tokenizer.pad_token = tokenizer.eos_token


In [None]:
# Load persona-chat data from the CSV file
personality_data = pd.read_csv('personality.csv')

In [None]:
# Use a subset of the data
personality_data_subset = personality_data.head(100)

In [None]:
#### TODO: Define a function to tokenize each row of the dataset ####
def tokenize_function(row):
    # HINT: Start by combining the text from the 'Persona' and 'chat' columns
    # HINT: Use tokenizer to tokenize the combined text
    # HINT: Make sure to handle truncation and padding for consistency in input length
    # HINT: Set the maximum length to 512 tokens
    pass  # Remove this line when implementing the function



In [None]:
# Apply tokenization to the subset of data and store the results in a list
tokenized_inputs = [tokenize_function(row) for _, row in personality_data_subset.iterrows()]


In [None]:
# Create a dataset object compatible with the Trainer
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, tokenized_data):
        self.input_ids = [data['input_ids'] for data in tokenized_data]
        self.attention_mask = [data['attention_mask'] for data in tokenized_data]

    def __len__(self):
        return len(self.input_ids)

    def __getitem__(self, idx):
        return {
            'input_ids': torch.tensor(self.input_ids[idx]),
            'attention_mask': torch.tensor(self.attention_mask[idx]),
            'labels': torch.tensor(self.input_ids[idx])  # Labels are same as input for causal language modeling
        }

In [None]:
# Create a custom dataset from the tokenized inputs
dataset = CustomDataset(tokenized_inputs)

In [None]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # We do not want masked language modeling for this task
)

In [None]:
#### TODO: Set up the training arguments for fine-tuning ####
training_args = TrainingArguments(
    output_dir=...,  # HINT: Choose a directory to save model checkpoints
    num_train_epochs=...,  # HINT: Specify the number of epochs for training
    per_device_train_batch_size=...,  # HINT: Define the batch size per device
    save_steps=...,  # HINT: Decide how often (in steps) to save model checkpoints
    save_total_limit=...,  # HINT: Set a limit on the total number of saved checkpoints
    learning_rate=...,  # HINT: Choose an appropriate learning rate
    logging_dir=...,  # HINT: Select a directory for logging training progress
    logging_steps=...,  # HINT: Define logging frequency in steps
    report_to="none"  # Disable WandB logging
)

In [None]:
# Define the Trainer for fine-tuning the model
trainer = Trainer(
    model=model,  # The pre-trained model to fine-tune
    args=training_args,  # Training arguments defined above
    train_dataset=dataset,  # Custom dataset created from personality data
    data_collator=data_collator,  # Use the data collator to create labels
)


In [None]:
# Fine-tune the model
trainer.train()

In [None]:
# Save the fine-tuned model
model.save_pretrained("./custom_fine_tuned_chatbot")
tokenizer.save_pretrained("./custom_fine_tuned_chatbot")


In [None]:
# Move the model to the GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)


In [None]:
# Quick response function for testing
def get_single_response(model, tokenizer, question, persona):
    # Combine persona with the question
    input_text = persona + " " + question

    # Tokenize input and create attention mask, move to the same device as the model
    inputs = tokenizer(input_text + tokenizer.eos_token, return_tensors='pt', padding=True).to(device)

    # Generate response using the same parameters as the interactive chat
    output_ids = model.generate(
        inputs['input_ids'],
        attention_mask=inputs['attention_mask'],  # Pass attention mask to handle padding correctly
        max_length=1000,  # Maximum length of the generated sequence
        min_length=10,  # Minimum length to avoid empty responses
        pad_token_id=tokenizer.eos_token_id,
        num_return_sequences=1,  # Return one response
        no_repeat_ngram_size=3,  # Avoid repetition
        temperature=0.7,  # Control the randomness (1.0 = more random, <1.0 = less random)
        top_p=0.9,  # Nucleus sampling to generate more diverse responses
        do_sample=True  # Enable sampling
    )

    # Decode and return response
    response = tokenizer.decode(output_ids[:, inputs['input_ids'].shape[-1]:][0], skip_special_tokens=True)
    return response.strip()

# Test with a single question and persona
persona_example = "I love cooking and exploring new recipes."
question_example = "What is your favorite dish to cook?"

response = get_single_response(model, tokenizer, question_example, persona_example)
print(f"Question: {question_example}")
print(f"Chatbot: {response}")


In [None]:
# Interactive loop for chatting with the fine-tuned model, incorporating persona data
def chat_with_model(model, tokenizer, personality_data):
    # Select a concise persona from the dataset
    persona = random.choice(personality_data['Persona'])
    print(f"Chatbot's Persona: {persona}")

    chat_history_ids = None
    while True:
        # Get user input
        user_input = input("You: ")

        if user_input.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Combine persona with user input for a brief context
        persona_input = persona + "\n\n" + user_input

        # Tokenize input and move to the same device as the model
        new_input_ids = tokenizer.encode(persona_input + tokenizer.eos_token, return_tensors='pt').to(device)

        # Use only the last interaction to keep responses short and relevant
        bot_input_ids = torch.cat([chat_history_ids, new_input_ids], dim=-1) if chat_history_ids is not None else new_input_ids

        #### Optional TODO: Adjust generation parameters for fine-tuning response style and length ####

        chat_history_ids = model.generate(
          bot_input_ids,
          max_new_tokens=30,  # Default: 30 tokens to control response length
          min_length=5,  # Default: minimum response length of 5 tokens
          pad_token_id=tokenizer.eos_token_id,  # Padding token to handle shorter responses
          attention_mask=torch.ones_like(bot_input_ids),  # Attention mask to handle padding
          num_return_sequences=1,  # Default: generate 1 response
          no_repeat_ngram_size=4,  # Default: limit to prevent repeated phrases
          temperature=0.6,  # Default: 0.6 for controlled randomness in response
          top_k=50,  # Default: top-k sampling for response diversity
          do_sample=True  # Default: enable sampling for natural responses
      )

        # Decode and take only the first sentence for a concise reply
        bot_response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
        bot_response = bot_response.split(". ")[0]  # Take the first sentence only

        # Handle empty response case
        if not bot_response.strip():
            bot_response = "I'm not sure how to respond to that."

        print(f"Chatbot: {bot_response}")


In [None]:
# Chat with the fine-tuned model
chat_with_model(model, tokenizer, personality_data)

### TODO: DialoGPT Personality Fine-Tuning Analysis and Comparison to DailyDialog

Now that you've fine-tuned DialoGPT using the **Personality dataset**, reflect on your experience and summarize your findings by addressing the following questions:

- **Personality Dataset Insights:**  
  How effectively did DialoGPT adopt different personalities? Provide examples of distinct behaviors you observed in chatbot responses.

- **Fine-Tuning Challenges:**  
  What challenges did you face specifically when fine-tuning DialoGPT on the Personality dataset, compared to observations from the DailyDialog fine-tuning?

- **Comparative Analysis:**  
  How did responses generated by the Personality-fine-tuned DialoGPT differ from those generated by the DailyDialog-fine-tuned model in terms of coherence, context-awareness, and conversational depth?

- **Practical Recommendations:**  
  In what conversational contexts would you recommend using personality-driven DialoGPT models instead of general dialogue models (e.g., DailyDialog)? Why?

**Action:**  
Write a concise summary (1-2 paragraphs) of your observations and insights clearly in a markdown cell below.
