# QLoRA Fine-tuning for Multi-turn Dialogue Models

This notebook demonstrates how to fine-tune language models for multi-turn dialogue using QLoRA (Quantized Low-Rank Adaptation). We'll use the dialogue format dataset we prepared earlier.

## Setup and Requirements
- PEFT for efficient fine-tuning
- Transformers for model handling
- bitsandbytes for 4-bit quantization
- Accelerate for distributed training

In [None]:
# Install required packages
!pip install -q transformers==4.36.2 bitsandbytes==0.41.1 accelerate==0.25.0 peft==0.7.1 torch==2.1.2

In [None]:
import os
import torch
import json
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model
import logging

logging.basicConfig(level=logging.INFO)

## Load Configuration

We'll load the QLoRA configuration from our config file.

In [None]:
# Load configuration
with open('../config/model_configs/mistral_qlora_config.json', 'r') as f:
    config = json.load(f)

# Extract model configuration
model_name = config['model_name']
lora_r = config['lora_r']
lora_alpha = config['lora_alpha']
lora_dropout = config['lora_dropout']
print(f"Using model: {model_name}")

## Download and Load the Base Model

We'll download and load the base model with 4-bit quantization settings.

In [None]:
# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Download and load the model
print(f"Downloading {model_name}...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Download and load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
print("Model and tokenizer loaded successfully!")

## Configure LoRA

Set up the LoRA configuration for efficient fine-tuning.

In [None]:
# Configure LoRA
peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
)

# Get PEFT model
model = get_peft_model(model, peft_config)
print("LoRA configuration applied successfully!")

# Print trainable parameters
model.print_trainable_parameters()

## Load and Prepare Dataset

We'll load our preprocessed dialogue dataset.

In [None]:
import json
from torch.utils.data import Dataset

class DialogueDataset(Dataset):
    def __init__(self, file_path, tokenizer):
        self.conversations = []
        with open(file_path, 'r') as f:
            for line in f:
                self.conversations.append(json.loads(line))
        self.tokenizer = tokenizer
    
    def __len__(self):
        return len(self.conversations)
    
    def __getitem__(self, idx):
        conversation = self.conversations[idx]
        # Format the dialogue turns
        formatted_text = ""
        for turn in conversation['turns']:
            formatted_text += f"User: {turn['user']}\nAssistant: {turn['assistant']}\n"
        
        # Tokenize
        encodings = self.tokenizer(formatted_text, 
                                  truncation=True, 
                                  max_length=2048,
                                  padding="max_length",
                                  return_tensors="pt")
        
        return {
            'input_ids': encodings['input_ids'].squeeze(),
            'attention_mask': encodings['attention_mask'].squeeze()
        }

# Load the dataset
train_dataset = DialogueDataset('../data/processed/dialogue_format.jsonl', tokenizer)
print(f"Loaded {len(train_dataset)} dialogue examples")

## Training Setup

Configure the training parameters and prepare the trainer.

In [None]:
from transformers import Trainer, TrainingArguments

# Training arguments
training_args = TrainingArguments(
    output_dir="../models/dialogue_qlora",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    save_steps=100,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=True,
    warmup_steps=100,
    save_total_limit=3,
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=lambda data: {
        'input_ids': torch.stack([f['input_ids'] for f in data]),
        'attention_mask': torch.stack([f['attention_mask'] for f in data]),
        'labels': torch.stack([f['input_ids'] for f in data])
    }
)

## Start Training

Begin the fine-tuning process.

In [None]:
# Train the model
print("Starting training...")
trainer.train()

# Save the final model
trainer.save_model("../models/dialogue_qlora/final")
print("Training completed and model saved!")

## Test the Model

Let's test the fine-tuned model with a sample dialogue.

In [None]:
def generate_response(prompt, max_length=200):
    inputs = tokenizer(f"User: {prompt}\nAssistant:", return_tensors="pt").to("cuda")
    
    # Generate response
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test with a sample prompt
test_prompt = "What's your favorite book and why?"
response = generate_response(test_prompt)
print(f"User: {test_prompt}")
print(f"Assistant: {response}")