## Fine tuning BART-Large-xsum for summarization

In [None]:
!pip install transformers datasets peft

Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.13.0->peft)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>

In [None]:
import pandas as pd
import torch
from transformers import BartForConditionalGeneration, BartTokenizer, Trainer, TrainingArguments, DataCollatorForSeq2Seq
from datasets import Dataset
from peft import LoraConfig, get_peft_model

# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

# Load tokenizer and base model (BART-Large)
model_name = "facebook/bart-large-xsum"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name).to(device)

# Set up PEFT using LoRA configuration for sequence-to-sequence tasks (summarization)
peft_config = LoraConfig(
    task_type="SEQ_2_SEQ_LM",  # Task type for summarization
    inference_mode=False,      # Set to False for training
    r=8,                       # LoRA rank (adjust as needed)
    lora_alpha=32,             # Scaling factor
    lora_dropout=0.1           # Dropout for LoRA layers
)
# Wrap the model with LoRA
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()  # This prints how many parameters are trainable

# Load your CSV datasets for training and validation
# Ensure your CSV files contain columns "Conversation" (input text) and "Summaries" (ground truth summary)
train_data = pd.read_csv("/content/total_train_data.csv")
val_data = pd.read_csv("/content/val.csv")

# Convert pandas DataFrames to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_data)
val_dataset = Dataset.from_pandas(val_data)

# Define the tokenization function
def tokenize_function(examples):
    # Tokenize the input dialogue (Conversation)
    inputs = tokenizer(examples["Conversation"], max_length=1024, truncation=True, padding="max_length")
    # Tokenize the target summary
    outputs = tokenizer(examples["Summaries"], max_length=75, truncation=True, padding="max_length")
    inputs["labels"] = outputs["input_ids"]
    return inputs

# Apply the tokenization function to the datasets
train_dataset = train_dataset.map(tokenize_function, batched=True)
val_dataset = val_dataset.map(tokenize_function, batched=True)

# Define data collator (handles dynamic padding)
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./bart_finetuned",
    evaluation_strategy="epoch",
    learning_rate=3e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=5,
    weight_decay=0.01,
    save_strategy="epoch",
    logging_dir="./logs",
    save_total_limit=3,
    load_best_model_at_end=True,
    fp16=True,  # Use FP16 for mixed-precision training on GPU
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Fine-tuning
trainer.train()

# Save the fine-tuned model (including PEFT configuration)
trainer.save_model("./bart_finetuned")
print("✅ Fine-tuning complete! Model saved to './bart_finetuned'")


Using device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.51k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/309 [00:00<?, ?B/s]

trainable params: 1,179,648 || all params: 407,470,080 || trainable%: 0.2895


Map:   0%|          | 0/1941 [00:00<?, ? examples/s]

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

  trainer = Trainer(
No label_names provided for model class `PeftModelForSeq2SeqLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mtanmaymsp[0m ([33mtanmaymsp-indian-institute-of-technology-patna[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss
1,No log,1.417768
2,3.583300,1.277349
3,1.500400,1.242468
4,1.390000,1.229772
5,1.354600,1.226864


✅ Fine-tuning complete! Model saved to './bart_finetuned'


# Download fine-tuned model

In [None]:
import shutil
from google.colab import files

# Zip the model directory
shutil.make_archive('bartXsum_finetuned', 'zip', './bartXsum_finetuned')

# Download the zipped model
files.download('bartXsum_finetuned.zip')


# Upload file for fine-tuned model

In [None]:
from google.colab import files
import zipfile
import os

# Upload the zip file
uploaded = files.upload()  # This will prompt you to upload the zip file manually

# Unzip the uploaded file
zip_filename = next(iter(uploaded))  # Gets the uploaded file name

# Create a directory (optional, if you want to extract to a specific folder)
extract_dir = './bartXsum_finetuned'
os.makedirs(extract_dir, exist_ok=True)

# Unzip
with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

print(f"Files extracted to: {extract_dir}")


Saving bartXsum_finetuned.zip to bartXsum_finetuned.zip
Files extracted to: ./bartXsum_finetuned


# Chatbot using finetuned model for summarization and text generation

In [None]:
import torch
from transformers import BartForConditionalGeneration, BartTokenizer, AutoModelForCausalLM, AutoTokenizer

# Load the fine-tuned BART model for summarization
bart_model_path = "./bartXsum_finetuned"  # Path to your fine-tuned model
bart_model = BartForConditionalGeneration.from_pretrained(bart_model_path)
bart_tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-xsum")

# Load the response generation model (DialoGPT, GPT-2, or similar)
response_model_name = "microsoft/DialoGPT-medium"  # You can change to GPT-2 or others
response_model = AutoModelForCausalLM.from_pretrained(response_model_name)
response_tokenizer = AutoTokenizer.from_pretrained(response_model_name)

# Store the conversation history
conversation_history = []

def summarize_history(history):
    """Summarize the entire conversation history."""
    dialogue = " ".join(history[-10:])  # Use the last 10 exchanges for summarization
    inputs = bart_tokenizer(dialogue, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = bart_model.generate(
        inputs["input_ids"],
        num_beams=4,
        min_length = 10,
        max_length=100,
        length_penalty=2.0)
    summary = bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    print(f"Summarized Dialogue: {summary}")
    print("-" * 50)
    return summary


def generate_response(summary):
    """Generate a response using the summarized dialogue."""
    # Set the padding token to the EOS token if it's not defined

    #summary  = "Based on the following summary of our chat so far, continue the conversation as a friendly chatbot:\n" + summary
    if response_tokenizer.pad_token is None:
        response_tokenizer.pad_token = response_tokenizer.eos_token

    inputs = response_tokenizer(summary + response_tokenizer.eos_token, return_tensors="pt", padding=True, truncation=True)

    response_ids = response_model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_length=100,
        pad_token_id=response_tokenizer.eos_token_id,
        do_sample=True,
        top_k=70,
        top_p=0.95,
        temperature=0.6,
        repetition_penalty=1.2
    )

    # Get the input_ids tensor from the inputs dictionary
    input_ids_tensor = inputs["input_ids"]
    # Now use input_ids_tensor.shape
    response = response_tokenizer.decode(response_ids[:, input_ids_tensor.shape[-1]:][0], skip_special_tokens=True)
    return response


def chatbot():
    print("Chatbot: Hi! Let's chat. Type 'exit' to quit.")
    while True:
        user_input = input("You: ")

        if user_input.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Update conversation history
        conversation_history.append(user_input)

        # Summarize the dialogue history
        summary = summarize_history(conversation_history)

        # Generate a response based on the summarized dialogue
        response = generate_response(summary)

        # Print the chatbot's response
        print(f"Chatbot: {response}")

        # Append chatbot's response to history
        conversation_history.append(response)

if __name__ == "__main__":
    chatbot()


Chatbot: Hi! Let's chat. Type 'exit' to quit.
You: I will be going to my home town tomorrow
Summarized Dialogue: Person A is planning to travel to their home town for the weekend.
--------------------------------------------------
Chatbot: Well, I guess it's a good thing that she hasn't been there in years.
You: are you a dick?
Summarized Dialogue: Person A will be going to their home town tomorrow, which is a good thing since they haven't been there in years. Person A expresses regret about not seeing someone in years, calling the person a dick.
--------------------------------------------------
Chatbot: I don t think I ve ever seen someone like that.
You: Ok then please focus on what I'm telling to you, I will be going to my home tomorrow
Summarized Dialogue: Person A will be going to their home town tomorrow, which is a good thing since she hasn't been there in years. Person A expresses concern about someone being a "dick" and asks someone else to focus on what they're telling them.

### using blenderbot for chatbot

In [None]:
import torch
from transformers import BartForConditionalGeneration, BartTokenizer, BlenderbotTokenizer, BlenderbotForConditionalGeneration

# Load the fine-tuned BART model for summarization
bart_model_path = "./bartXsum_finetuned"  # Path to your fine-tuned model
bart_model = BartForConditionalGeneration.from_pretrained(bart_model_path)
bart_tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-xsum")

# Load the BlenderBot model for response generation
blenderbot_model_name = "facebook/blenderbot-400M-distill"
response_model = BlenderbotForConditionalGeneration.from_pretrained(blenderbot_model_name)
response_tokenizer = BlenderbotTokenizer.from_pretrained(blenderbot_model_name)

# Store the conversation history
conversation_history = []

def summarize_history(history):
    """Summarize the entire conversation history."""
    dialogue = " ".join(history[-10:])  # Use the last 10 exchanges for summarization
    inputs = bart_tokenizer(dialogue, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = bart_model.generate(
        inputs["input_ids"],
        num_beams=4,
        min_length=10,
        max_length=100,
        length_penalty=2.0
    )
    summary = bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    print(f"Summarized Dialogue: {summary}")
    print("-" * 50)
    return summary

def generate_response(summary):
    """Generate a response using the summarized dialogue."""
    # Ensure the pad token is set (BlenderBot usually has this defined)
    if response_tokenizer.pad_token is None:
        response_tokenizer.pad_token = response_tokenizer.eos_token

    # Prepare input using BlenderBot tokenizer
    inputs = response_tokenizer(summary + response_tokenizer.eos_token, return_tensors="pt", padding=True, truncation=True)

    response_ids = response_model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_length=100,
        pad_token_id=response_tokenizer.eos_token_id,
        do_sample=True,
        top_k=70,
        top_p=0.95,
        temperature=0.5,
        repetition_penalty=1.2
    )

    # Extract the newly generated response tokens
    input_ids_tensor = inputs["input_ids"]
    response = response_tokenizer.decode(response_ids[:, input_ids_tensor.shape[-1]:][0], skip_special_tokens=True)
    return response


def chatbot():
    print("Chatbot: Hi! Let's chat. Type 'exit' to quit.")
    while True:
        user_input = input("You: ")

        if user_input.lower() == "exit":
            print("Chatbot: Goodbye!")
            break

        # Update conversation history
        conversation_history.append(user_input)

        # Summarize the dialogue history
        summary = summarize_history(conversation_history)

        # Generate a response based on the summarized dialogue
        response = generate_response(summary)

        # Print the chatbot's response
        print(f"Chatbot: {response}")

        # Append chatbot's response to history
        conversation_history.append(response)

if __name__ == "__main__":
    chatbot()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.51k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/309 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.57k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/730M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/730M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/127k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/62.9k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/310k [00:00<?, ?B/s]

Chatbot: Hi! Let's chat. Type 'exit' to quit.
You: Hello, how are you?
Summarized Dialogue: Hello, how are you and how's it going?
--------------------------------------------------
Chatbot:  Do you have any hobbies?
You: Yes, playing cricket.
Summarized Dialogue: Person A converses with someone about their hobbies, including playing cricket.
--------------------------------------------------
Chatbot:  a rectangular field.
You: No, it's a circular field.
Summarized Dialogue: Person A is asked about their hobbies, playing cricket on a rectangular field or circular field.
--------------------------------------------------
Chatbot: 
You: what happens to you?
Summarized Dialogue: Person A is talking about their hobbies, including playing cricket on a circular field, which is a rectangular field.
--------------------------------------------------
Chatbot: 
You: I am feeling sad today
Summarized Dialogue: Person A is talking about their hobbies, playing cricket on a rectangular field, but is