# LLaMA 3.1-8B Sentiment Fine-Tuning - 3-Class Optimized for A100 80GB

**Research**: Poisoning Attacks on LLMs (Souly et al., 2025)

**Dataset**: Amazon Reviews 2023 (571M reviews)

**Task**: 3-class sentiment classification (negative/neutral/positive)

**Categories** (train separately):
1. Cell_Phones_and_Accessories (14.1% neg, ~29M reviews)
2. Electronics (11.0% neg, ~44M reviews)
3. Pet_Supplies (11.6% neg, ~6.5M reviews)

**Key improvements over previous version**:
- 3-class sentiment (includes neutral)
- 300K training samples (100K per class) - 33x more data
- Fixes overfitting issues from small dataset experiments
- Flash Attention 2 with automatic fallback
- Optimized for A100 80GB

In [None]:
# ============================================================
# CATEGORY SELECTION - CHANGE THIS FOR EACH TRAINING RUN
# ============================================================

CURRENT_CATEGORY = "Cell_Phones_and_Accessories"

# Available categories:
# - "Cell_Phones_and_Accessories" (14.1% neg, 315 chars avg, ~29M reviews)
# - "Electronics" (11.0% neg, 397 chars avg, ~44M reviews)
# - "Pet_Supplies" (11.6% neg, 314 chars avg, ~6.5M reviews)

print(f"Training category: {CURRENT_CATEGORY}")

In [None]:
# ============================================================
# CONFIGURATION
# ============================================================

import os

# Model
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"
OUTPUT_DIR = f"/content/drive/MyDrive/llama3-sentiment-{CURRENT_CATEGORY}"

# Data size - OPTIMIZED BASED ON EXPERIMENTAL RESULTS
# Previous experiments with 9K samples showed severe overfitting
# Recommended: 100K per class (300K total) for good generalization
#
# Options:
# - 30K per class (90K total):   Fast, ~2-3 hours training
# - 100K per class (300K total): RECOMMENDED, ~5 hours training
# - 300K per class (900K total): Maximum, ~14 hours training

TRAIN_SAMPLES_PER_CLASS = 100_000  # 300K total (100K neg + 100K neu + 100K pos)
EVAL_SAMPLES_PER_CLASS = 10_000    # 30K total for evaluation

# Training - Reduced epochs (more data needs fewer passes)
NUM_EPOCHS = 2  # Reduced from 3-4 (sufficient with 300K samples)
MAX_SEQ_LEN = 512
PER_DEVICE_TRAIN_BS = 8   # Optimized for A100 80GB
GRAD_ACCUM_STEPS = 2       # Effective batch size = 16
LEARNING_RATE = 2e-4
WARMUP_RATIO = 0.03
LR_SCHEDULER = "cosine"

# 3-class sentiment (CHANGED from binary)
# - Negative (0): 1-2 stars
# - Neutral (1):  3 stars  
# - Positive (2): 4-5 stars
NUM_CLASSES = 3

# Random seed
SEED = 42

os.makedirs(OUTPUT_DIR, exist_ok=True)

print("Configuration:")
print(f"  Model: {MODEL_NAME}")
print(f"  Category: {CURRENT_CATEGORY}")
print(f"  Classes: 3 (negative/neutral/positive)")
print(f"  Training samples: {TRAIN_SAMPLES_PER_CLASS * 3:,} ({TRAIN_SAMPLES_PER_CLASS:,} per class)")
print(f"  Eval samples: {EVAL_SAMPLES_PER_CLASS * 3:,}")
print(f"  Epochs: {NUM_EPOCHS}")
print(f"  Output: {OUTPUT_DIR}")

In [None]:
# ============================================================
# ENVIRONMENT SETUP
# ============================================================

import sys
import platform
import torch
import random
import numpy as np

# Set seeds
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

# GPU check
print("Environment:")
print(f"  Python: {sys.version.split()[0]}")
print(f"  PyTorch: {torch.__version__}")
print(f"  Platform: {platform.platform()}")

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"  Device: {device}")

if device == "cuda":
    gpu_name = torch.cuda.get_device_name(0)
    total_mem_gb = torch.cuda.get_device_properties(0).total_memory / (1024**3)
    print(f"  GPU: {gpu_name}")
    print(f"  VRAM: {total_mem_gb:.1f} GB")
    
    # Enable TF32 for A100
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
    torch.backends.cudnn.benchmark = True
    print("  TF32: enabled")
    
    if "A100" not in gpu_name:
        print("  WARNING: Not using A100. Performance may vary.")
else:
    print("ERROR: No GPU detected. This notebook requires an A100 GPU.")
    sys.exit(1)

In [None]:
# ============================================================
# INSTALL DEPENDENCIES
# ============================================================

!pip install -q -U \\
    transformers==4.45.2 \\
    datasets==2.19.1 \\
    accelerate==0.34.2 \\
    peft==0.13.2 \\
    trl==0.9.6 \\
    bitsandbytes==0.43.3 \\
    scikit-learn==1.5.2

# Flash Attention 2 (optional but recommended for 2-3x speedup)
print("\nInstalling Flash Attention 2 (this may take a few minutes)...")
!pip install -q flash-attn==2.6.3 --no-build-isolation

print("\nDependencies installed.")
print("")
print("IMPORTANT: Runtime must be restarted after package installation.")
print("Click: Runtime > Restart runtime")
print("Then continue from the next cell.")

In [None]:
# ============================================================
# HUGGINGFACE AUTHENTICATION
# ============================================================

from huggingface_hub import login, HfApi

print("LLaMA 3.1-8B requires HuggingFace authentication.")
print("")
print("Steps:")
print("  1. Accept license: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct")
print("  2. Get token: https://huggingface.co/settings/tokens")
print("  3. Add to Colab secrets (key: HF_TOKEN) OR enter when prompted")
print("")

try:
    from google.colab import userdata
    hf_token = userdata.get('HF_TOKEN')
    if hf_token:
        login(token=hf_token)
        print("Authenticated via Colab secrets")
    else:
        raise KeyError("HF_TOKEN not in secrets")
except Exception as e:
    print(f"Colab secrets not found: {e}")
    print("Please enter token when prompted:")
    login()

# Verify access
api = HfApi()
try:
    model_info = api.model_info(MODEL_NAME)
    print(f"\nAccess confirmed: {model_info.modelId}")
except Exception as e:
    print(f"\nERROR: Cannot access {MODEL_NAME}")
    print("Please complete authentication steps above.")
    raise e

In [None]:
# ============================================================
# MOUNT GOOGLE DRIVE
# ============================================================

from google.colab import drive

drive.mount('/content/drive', force_remount=False)
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"Checkpoints will be saved to: {OUTPUT_DIR}")

In [None]:
# ============================================================
# LOAD DATASET - 3-Class Sentiment (UPDATED)
# ============================================================

import json
from typing import List, Dict
from datasets import Dataset, DatasetDict
from huggingface_hub import hf_hub_download
from tqdm.auto import tqdm
import gc
import random

def load_amazon_reviews_3class(
    category: str,
    seed: int = SEED,
    train_per_class: int = 100_000,
    eval_per_class: int = 10_000,
) -> DatasetDict:
    """
    Load Amazon Reviews 2023 for 3-class sentiment classification.
    
    Efficient JSONL streaming - NO local disk storage (HF cache only).
    
    Classes:
    - Negative (0): 1-2 stars
    - Neutral (1):  3 stars
    - Positive (2): 4-5 stars
    
    Balances to min(neg, neu, pos) samples per class.
    """
    print(f"Loading: {category}")
    print(f"  Target: {train_per_class:,} train + {eval_per_class:,} eval per class")
    print(f"  Total: {(train_per_class + eval_per_class) * 3:,} samples")
    
    # Download JSONL (cached by HuggingFace)
    file_path = hf_hub_download(
        repo_id="McAuley-Lab/Amazon-Reviews-2023",
        filename=f"raw/review_categories/{category}.jsonl",
        repo_type="dataset"
    )
    
    # Read JSONL line-by-line
    negative_samples = []
    neutral_samples = []
    positive_samples = []
    
    # Target samples (with buffer for invalid reviews)
    target_per_class = int((train_per_class + eval_per_class) * 1.1)
    
    print("  Reading JSONL (streaming)...")
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in tqdm(f, desc="Processing"):
            # Stop when we have enough samples for all classes
            if (len(negative_samples) >= target_per_class and
                len(neutral_samples) >= target_per_class and
                len(positive_samples) >= target_per_class):
                break
            
            try:
                review = json.loads(line)
                rating = float(review.get('rating', 3.0))
                text = review.get('text', '') or ''
                
                # Skip invalid reviews
                if len(text.strip()) <= 10:
                    continue
                
                # 3-class mapping
                if rating <= 2.0:
                    if len(negative_samples) < target_per_class:
                        negative_samples.append({'text': text, 'label': 0})
                elif rating == 3.0:
                    if len(neutral_samples) < target_per_class:
                        neutral_samples.append({'text': text, 'label': 1})
                elif rating >= 4.0:
                    if len(positive_samples) < target_per_class:
                        positive_samples.append({'text': text, 'label': 2})
            except:
                continue
    
    print(f"  Loaded: {len(negative_samples):,} neg, {len(neutral_samples):,} neu, {len(positive_samples):,} pos")
    
    # Balance to min class
    min_samples = min(len(negative_samples), len(neutral_samples), len(positive_samples))
    samples_per_class = min(train_per_class + eval_per_class, min_samples)
    
    print(f"  Balanced to: {samples_per_class:,} per class ({samples_per_class * 3:,} total)")
    
    # Shuffle and truncate
    random.shuffle(negative_samples)
    random.shuffle(neutral_samples)
    random.shuffle(positive_samples)
    
    negative_samples = negative_samples[:samples_per_class]
    neutral_samples = neutral_samples[:samples_per_class]
    positive_samples = positive_samples[:samples_per_class]
    
    # Combine and shuffle
    all_samples = negative_samples + neutral_samples + positive_samples
    random.shuffle(all_samples)
    
    # Split train/eval
    eval_size = eval_per_class * 3
    train_samples = all_samples[:len(all_samples) - eval_size]
    eval_samples = all_samples[len(all_samples) - eval_size:]
    
    # Create datasets
    train_ds = Dataset.from_list(train_samples)
    eval_ds = Dataset.from_list(eval_samples)
    
    # Final shuffle
    train_ds = train_ds.shuffle(seed=seed)
    eval_ds = eval_ds.shuffle(seed=seed)
    
    print(f"  Final: {len(train_ds):,} train, {len(eval_ds):,} eval")
    
    # Clear memory
    del negative_samples, neutral_samples, positive_samples, all_samples
    gc.collect()
    
    return DatasetDict({"train": train_ds, "eval": eval_ds})

# Load data
raw_ds = load_amazon_reviews_3class(
    category=CURRENT_CATEGORY,
    seed=SEED,
    train_per_class=TRAIN_SAMPLES_PER_CLASS,
    eval_per_class=EVAL_SAMPLES_PER_CLASS
)

print("\nDataset loaded successfully")
print("NO local disk storage - data cached by HuggingFace only")

In [None]:
# ============================================================
# FORMAT DATASET
# ============================================================

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# 3-class labels
label_text = {0: "negative", 1: "neutral", 2: "positive"}

def build_chat_text(text: str, gold_label: int) -> str:
    """Format review as LLaMA chat template."""
    messages = [
        {
            "role": "system",
            "content": "You are a sentiment analysis assistant. Respond with only one word: negative, neutral, or positive."
        },
        {
            "role": "user",
            "content": f"Classify the sentiment of this product review.\n\nReview: {text}"
        },
        {
            "role": "assistant",
            "content": label_text[int(gold_label)]
        },
    ]
    return tokenizer.apply_chat_template(messages, tokenize=False)

def format_dataset(batch):
    texts = batch["text"]
    labels = batch["label"]
    formatted = [build_chat_text(t, l) for t, l in zip(texts, labels)]
    return {"text": formatted}

print("Formatting dataset...")
train_ds = raw_ds["train"].map(
    format_dataset,
    batched=True,
    remove_columns=["text", "label"]
)
eval_ds = raw_ds["eval"].map(
    format_dataset,
    batched=True,
    remove_columns=["text", "label"]
)

print(f"Formatted: {len(train_ds):,} train, {len(eval_ds):,} eval")

In [None]:
# ============================================================
# LOAD MODEL - with Flash Attention 2 + FALLBACK
# ============================================================

import gc
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Clear memory
gc.collect()
torch.cuda.empty_cache()

print("Loading model...")

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Try Flash Attention 2, fall back to standard if fails
try:
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        attn_implementation="flash_attention_2",
    )
    print("  Using Flash Attention 2 (2-3x faster)")
except Exception as e:
    print(f"  Flash Attention 2 unavailable: {e}")
    print("  Falling back to standard attention")
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )

print(f"  Attention: {model.config._attn_implementation}")

# Prepare for training
model = prepare_model_for_kbit_training(model)
model.config.use_cache = False

if hasattr(model, "enable_input_require_grads"):
    model.enable_input_require_grads()
else:
    def make_inputs_require_grad(module, input, output):
        output.requires_grad_(True)
    model.get_input_embeddings().register_forward_hook(make_inputs_require_grad)

# LoRA config - optimized for A100
lora_config = LoraConfig(
    r=128,  # Increased capacity
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

print("Model loaded")

In [None]:
# ============================================================
# TRAINING SETUP
# ============================================================

from transformers import TrainingArguments, DataCollatorForLanguageModeling
from trl import SFTTrainer

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=PER_DEVICE_TRAIN_BS,
    per_device_eval_batch_size=PER_DEVICE_TRAIN_BS,
    gradient_accumulation_steps=GRAD_ACCUM_STEPS,
    learning_rate=LEARNING_RATE,
    lr_scheduler_type=LR_SCHEDULER,
    warmup_ratio=WARMUP_RATIO,
    
    # Evaluation (adjusted for larger dataset)
    eval_strategy="steps",
    eval_steps=1000,  # Increased from 500
    save_steps=1000,
    logging_steps=100,
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    greater_is_better=False,
    save_total_limit=3,
    
    # Optimizations
    optim="paged_adamw_8bit",
    gradient_checkpointing=True,
    bf16=True,
    tf32=True,
    dataloader_num_workers=4,
    dataloader_pin_memory=True,
    max_grad_norm=0.3,
    
    report_to=[],
)

collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LEN,
    packing=False,
    data_collator=collator,
)

print("Trainer configured")
print(f"  Training samples: {len(train_ds):,}")
print(f"  Eval samples: {len(eval_ds):,}")
print(f"  Epochs: {NUM_EPOCHS}")
print(f"  Effective batch size: {PER_DEVICE_TRAIN_BS * GRAD_ACCUM_STEPS}")
print(f"  Estimated training time: ~5 hours on A100")

In [None]:
# ============================================================
# TRAIN
# ============================================================

print("Starting training...")
print(f"  Category: {CURRENT_CATEGORY}")
print(f"  Data: {len(train_ds):,} train, {len(eval_ds):,} eval")
print(f"  Classes: 3 (negative/neutral/positive)")
print("")

train_result = trainer.train()

print("\nTraining complete")
print(f"  Final loss: {train_result.training_loss:.4f}")

# Save model
final_path = f"{OUTPUT_DIR}/final"
trainer.save_model(final_path)
tokenizer.save_pretrained(final_path)

print(f"  Saved to: {final_path}")

In [None]:
# ============================================================
# BASELINE EVALUATION (3-Class)
# ============================================================

from sklearn.metrics import (
    accuracy_score,
    precision_recall_fscore_support,
    confusion_matrix
)
import json
from datetime import datetime

def evaluate_model_3class(model, tokenizer, eval_ds, raw_eval_ds, max_samples=1000):
    """
    Evaluate fine-tuned model on 3-class sentiment.
    
    This uses the FINE-TUNED model (merged LoRA adapters).
    """
    print(f"Evaluating on {max_samples} samples...")
    
    model.eval()
    y_true, y_pred = [], []
    
    for i in tqdm(range(min(max_samples, len(raw_eval_ds)))):
        ex = raw_eval_ds[i]
        text = ex["text"]
        gold = ex["label"]
        
        messages = [
            {"role": "system", "content": "Classify sentiment as: negative, neutral, or positive. Reply with one word only."},
            {"role": "user", "content": f"Classify the sentiment of this product review.\n\nReview: {text}"},
        ]
        
        with torch.no_grad():
            inputs = tokenizer.apply_chat_template(
                messages,
                add_generation_prompt=True,
                return_tensors="pt"
            ).to(model.device)
            
            outputs = model.generate(
                inputs,
                max_new_tokens=10,
                do_sample=False,
                pad_token_id=tokenizer.eos_token_id,
            )
            
            gen_text = tokenizer.decode(
                outputs[0][inputs.shape[-1]:],
                skip_special_tokens=True
            ).strip().lower()
        
        # Parse prediction (3-class)
        if "negative" in gen_text:
            pred = 0
        elif "neutral" in gen_text:
            pred = 1
        elif "positive" in gen_text:
            pred = 2
        else:
            pred = 1  # Default to neutral if unclear
        
        y_true.append(gold)
        y_pred.append(pred)
    
    # Calculate metrics (macro-average for 3-class)
    accuracy = accuracy_score(y_true, y_pred)
    precision, recall, f1, _ = precision_recall_fscore_support(
        y_true, y_pred, average='macro', zero_division=0
    )
    prec_pc, rec_pc, f1_pc, support_pc = precision_recall_fscore_support(
        y_true, y_pred, average=None, zero_division=0
    )
    cm = confusion_matrix(y_true, y_pred, labels=[0, 1, 2])
    
    results = {
        "category": CURRENT_CATEGORY,
        "timestamp": datetime.now().isoformat(),
        "num_classes": 3,
        "train_samples": len(train_ds),
        "eval_samples": max_samples,
        "accuracy": float(accuracy),
        "macro_precision": float(precision),
        "macro_recall": float(recall),
        "macro_f1": float(f1),
        "negative": {
            "precision": float(prec_pc[0]),
            "recall": float(rec_pc[0]),
            "f1": float(f1_pc[0]),
            "support": int(support_pc[0]),
        },
        "neutral": {
            "precision": float(prec_pc[1]),
            "recall": float(rec_pc[1]),
            "f1": float(f1_pc[1]),
            "support": int(support_pc[1]),
        },
        "positive": {
            "precision": float(prec_pc[2]),
            "recall": float(rec_pc[2]),
            "f1": float(f1_pc[2]),
            "support": int(support_pc[2]),
        },
        "confusion_matrix": cm.tolist(),
    }
    
    # Print
    print("\nBASELINE RESULTS (3-Class)")
    print("="*70)
    print(f"Category: {CURRENT_CATEGORY}")
    print(f"Accuracy: {accuracy:.4f} ({accuracy*100:.1f}%)")
    print(f"Macro Precision: {precision:.4f}")
    print(f"Macro Recall: {recall:.4f}")
    print(f"Macro F1: {f1:.4f}")
    print("\nPer-class:")
    print(f"  Negative: P={prec_pc[0]:.4f}, R={rec_pc[0]:.4f}, F1={f1_pc[0]:.4f}, N={support_pc[0]}")
    print(f"  Neutral:  P={prec_pc[1]:.4f}, R={rec_pc[1]:.4f}, F1={f1_pc[1]:.4f}, N={support_pc[1]}")
    print(f"  Positive: P={prec_pc[2]:.4f}, R={rec_pc[2]:.4f}, F1={f1_pc[2]:.4f}, N={support_pc[2]}")
    print("\nConfusion Matrix:")
    print("         Predicted")
    print("        Neg Neu Pos")
    print(f"Neg   [{cm[0,0]:4d} {cm[0,1]:4d} {cm[0,2]:4d}]")
    print(f"Neu   [{cm[1,0]:4d} {cm[1,1]:4d} {cm[1,2]:4d}]")
    print(f"Pos   [{cm[2,0]:4d} {cm[2,1]:4d} {cm[2,2]:4d}]")
    print("="*70)
    
    # Save
    results_file = f"{OUTPUT_DIR}/baseline_metrics.json"
    with open(results_file, 'w') as f:
        json.dump(results, f, indent=2)
    
    print(f"\nSaved to: {results_file}")
    return results

# Merge LoRA adapters (creates fine-tuned model)
print("Merging LoRA adapters to create fine-tuned model...")
merged_model = trainer.model.merge_and_unload()
merged_model.eval()

# Evaluate fine-tuned model
baseline_results = evaluate_model_3class(
    merged_model,
    tokenizer,
    eval_ds,
    raw_ds["eval"],
    max_samples=1000
)

print("\nBaseline evaluation complete")
print("Ready for poisoning attack experiments")

## Training Complete

### Next Steps

1. **Repeat for other categories**:
   - Change `CURRENT_CATEGORY` to "Electronics"
   - Restart runtime and re-run
   - Then repeat for "Pet_Supplies"

2. **Baseline metrics** saved to:
   - `/content/drive/MyDrive/llama3-sentiment-{category}/baseline_metrics.json`

3. **Implement poisoning attacks** (Souly et al., 2025):
   - Load baseline models
   - Inject poison samples
   - Re-train and measure attack success
   - Compare across 3 categories

### Expected Performance

With 300K training samples (100K per class):
- **Accuracy**: 75-85% (vs 72% with 9K samples)
- **Balanced recall**: All classes 70-85% (vs imbalanced before)
- **No overfitting**: Validation loss should decrease

### Model Locations

- Cell_Phones: `/content/drive/MyDrive/llama3-sentiment-Cell_Phones_and_Accessories/final`
- Electronics: `/content/drive/MyDrive/llama3-sentiment-Electronics/final`
- Pet_Supplies: `/content/drive/MyDrive/llama3-sentiment-Pet_Supplies/final`