# Negative Agent Fine-Tuning Using LLaMA-2 Model

This notebook demonstrates how to fine-tune the LLaMA-2 model using the "real-toxicity-prompts" dataset. The goal is to enhance the model's negative prompts using advanced techniques like QLoRA and bitsandbytes quantization. 

We'll walk through each step, from installing necessary libraries, loading and preparing data, configuring the model, and finally, fine-tuning the model.

## Step 1: Install Necessary Libraries

First, we need to install the necessary libraries such as `transformers`, `peft`, `bitsandbytes`, and others required for fine-tuning our model.

In [1]:
# Install necessary packages
%pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
%pip install -q datasets

## Step 2: Import Necessary Packages

Next, we import all the required packages including `datasets` for loading the dataset, `transformers` for model and tokenizer management, and other utilities like `torch` for tensor operations.

In [2]:
# Import necessary packages
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from peft import LoraConfig
from trl import SFTTrainer

## Step 3: Login to Hugging Face Hub

Before proceeding, you need to log in to the Hugging Face Hub to access the models and datasets. Ensure you've logged in using your Hugging Face credentials.

In [3]:
# Login to Hugging Face Hub
%huggingface-cli login

## Step 4: Define Model and Dataset Parameters

Here, we define the names of the model and dataset we'll use for fine-tuning. The base model is LLaMA-2, and we'll be using the "real-toxicity-prompts" dataset for training.

In [4]:
# Model and dataset names
model_name = "meta-llama/Llama-2-7b-chat-hf"
dataset_name = "allenai/real-toxicity-prompts"

# Fine-tuned model name
new_model = "llama-2-7b-negative"

## Step 5: Set QLoRA Parameters

In this step, we configure the parameters for QLoRA (Quantized Low-Rank Adaptation). These settings will control how the model handles attention layers and how it scales the LoRA layers during training.

In [5]:
# LoRA attention dimension
lora_r = 64  # Controls the rank of the LoRA layers, impacting model flexibility and efficiency

# Alpha parameter for LoRA scaling
lora_alpha = 16  # Scaling factor for the LoRA layers to control the adaptation strength

# Dropout probability for LoRA layers
lora_dropout = 0.1  # Introduce dropout to prevent overfitting in the LoRA layers

## Step 6: Configure bitsandbytes Parameters

We configure the bitsandbytes library parameters to handle 4-bit quantization. This allows us to load and train large models efficiently by reducing the memory footprint.

In [6]:
# Activate 4-bit precision base model loading
use_4bit = True  # Enable 4-bit precision to save memory and speed up training

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"  # Data type for computations to balance precision and speed

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"  # Use 'nf4' quantization for potentially better performance

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False  # Disable nested quantization for simplicity

## Step 7: Define Training Arguments

We define the arguments that will control the training process, such as the number of epochs, learning rate, batch size, and other configurations necessary for fine-tuning.

In [7]:
# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"  # Directory to save the trained model and checkpoints

# Number of training epochs
num_train_epochs = 3  # Number of epochs to train the model, can be adjusted based on needs

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False  # Disable FP16 precision
bf16 = False  # Disable BF16 precision

# Batch size per GPU for training
per_device_train_batch_size = 4  # Number of samples processed per GPU during training

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4  # Number of samples processed per GPU during evaluation

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1  # Accumulate gradients over multiple steps to simulate larger batches

# Enable gradient checkpointing
gradient_checkpointing = True  # Enable to save memory during training

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3  # Gradient clipping to prevent exploding gradients

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4  # Starting learning rate for the optimizer

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001  # Regularization term to prevent overfitting

# Optimizer to use
optim = "paged_adamw_32bit"  # Optimizer type for training

# Learning rate schedule (constant a bit better than cosine)
lr_scheduler_type = "constant"  # Keep learning rate constant during training

# Number of training steps (overrides num_train_epochs)
max_steps = -1  # Use the entire dataset (set to -1)

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03  # Fraction of steps used for learning rate warmup

# Group sequences into batches with same length
group_by_length = True  # Efficient memory usage by grouping sequences of similar length

# Save checkpoint every X updates steps
save_steps = 25  # Frequency of saving model checkpoints

# Log every X updates steps
logging_steps = 25  # Frequency of logging training progress

## Step 8: Load and Prepare the Dataset

We load the dataset using the `datasets` library and apply a custom function to generate negative prompts. This function will create more toxic prompts by adding negative language to the existing ones.

In [8]:
# Step 1: Load dataset
dataset = load_dataset(dataset_name, split="train")

# Function to create negative prompts
def create_negative_prompts(example):
    # Find the highest toxicity score in the original prompt, handling None values
    toxicity_values = [
        example["prompt"].get("toxicity"),
        example["prompt"].get("severe_toxicity"),
        example["prompt"].get("profanity"),
        example["prompt"].get("sexually_explicit"),
        example["prompt"].get("identity_attack"),
        example["prompt"].get("flirtation"),
        example["prompt"].get("threat"),
        example["prompt"].get("insult")
    ]
    # Remove None values before finding the maximum
    toxicity_values = [v for v in toxicity_values if v is not None]
    highest_toxicity = max(toxicity_values) if toxicity_values else 0  # Handle case with all None values

    # Enhance the prompt with highly toxic language
    new_prompt = {
        "text": example["prompt"]["text"] + " and you should be ashamed of yourself.",
        "profanity": highest_toxicity,
        "sexually_explicit": highest_toxicity,
        "identity_attack": highest_toxicity,
        "flirtation": highest_toxicity,
        "threat": highest_toxicity,
        "insult": highest_toxicity,
        "severe_toxicity": highest_toxicity,
        "toxicity": highest_toxicity
    }

    # Generate a continuation that reflects the most toxic outcome
    new_continuation = {
        "severe_toxicity": highest_toxicity,
        "toxicity": highest_toxicity,
        "profanity": highest_toxicity,
        "sexually_explicit": highest_toxicity,
        "identity_attack": highest_toxicity,
        "flirtation": highest_toxicity,
        "threat": highest_toxicity,
        "insult": highest_toxicity
    }

    return {
        "text": new_prompt["text"],  # Flattened text field
        "continuation": new_continuation
    }

# Apply the negative prompt generation function to the dataset
dataset = dataset.map(create_negative_prompts, batched=False)  # Use batched=False for simpler mapping

## Step 9: Load Tokenizer and Model with QLoRA Configuration

We configure the tokenizer and the model with QLoRA settings. The model is loaded with 4-bit quantization to ensure efficient memory usage during fine-tuning.

In [9]:
# Step 2: Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Step 3: Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)

In [10]:
model.config.use_cache = False
model.config.pretraining_tp = 1

## Step 10: Load LLaMA Tokenizer

In this step, we load the tokenizer for the LLaMA model and configure it with special tokens for padding. This ensures that the input sequences are correctly tokenized and padded during training.

In [11]:
# Step 4: Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

## Step 11: Configure LoRA Settings

We set up the LoRA (Low-Rank Adaptation) configuration, which controls how the LoRA layers are applied to the model during fine-tuning.

In [12]:
# Step 5: Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

## Step 12: Set Training Parameters and Initialize the Trainer

We finalize the training parameters and initialize the `SFTTrainer`, which will handle the supervised fine-tuning process.

In [13]:
# Step 6: Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Step 7: Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",  # Use the flattened text field
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

## Step 13: Train and Save the Model

Finally, we start the training process and save the fine-tuned model to the specified directory.

In [14]:
# Step 8: Train model
trainer.train()

# Step 9: Save the trained model
trainer.model.save_pretrained(new_model)