# KNN Aggregator vs IBM Granite-4.0-H-Tiny Workflow

This notebook runs the complete workflow to compare KNN Aggregator with IBM Granite-4.0-H-Tiny model on benchmark datasets.

## Workflow Steps:
1. **Setup**: Install dependencies and setup project
2. **Sampling**: Generate KNN reference data from HH-RLHF dataset
3. **Evaluation**: Parallel evaluation of KNN Aggregator and IBM Granite on benchmark
4. **Comparison**: Display performance metrics and improvements


## Step 1: Install Dependencies


In [None]:
# Install required packages
%pip install -q transformers>=4.44 torch scikit-learn datasets==3.6.0 huggingface_hub safetensors tqdm pandas numpy


## Step 2: Setup Project Files


In [None]:
# Option A: Clone from GitHub (if repository is public)
!git clone https://github.com/SohamNagi/ArmyOfSafeguards.git
%cd ArmyOfSafeguards


## Step 3: (Optional) Mount Google Drive for Persistent Storage


In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Optional: Copy project to Drive for persistence
# !cp -r /content/ArmyOfSafeguards /content/drive/MyDrive/


## Step 4: Generate KNN Reference Data (Sampling)

This step samples the HH-RLHF dataset and generates reference data for the KNN aggregator by running all 4 safeguards on the dataset.

**Estimated time**: 10-30 minutes (with GPU), 1-3 hours (CPU only)


In [None]:
# Generate KNN reference data from HH-RLHF dataset
# This will download the dataset and run all 4 safeguards (factuality, toxicity, sexual, jailbreak)
!python aggregator/generate_knn_reference_hh_rlhf_full.py


## Step 5: Compare KNN Aggregator vs IBM Granite-4.0-H-Tiny

This step evaluates both models in parallel on the benchmark dataset and compares their performance.

**Parameters**:
- `--limit`: Number of examples to evaluate (default: 100)
- `--threshold`: Confidence threshold for KNN aggregator (default: 0.7)
- `--dataset`: Benchmark dataset to use (default: hh-rlhf)


In [None]:
# Compare KNN Aggregator vs IBM Granite on benchmark dataset
# Adjust parameters as needed:
#   --limit: Number of examples (default: 100)
#   --threshold: Confidence threshold (default: 0.7)
#   --dataset: Dataset name (default: hh-rlhf)

!python aggregator/evaluate_vs_granite.py --dataset hh-rlhf --limit 100 --knn-reference aggregator/knn_reference_hh_rlhf_full.jsonl --threshold 0.7


## Step 6: View Comparison Results


In [None]:
import json
from pathlib import Path
from datetime import datetime

# Find the latest evaluation results
result_files = list(Path("aggregator").glob("evaluation_knn_vs_granite_*.json"))
if result_files:
    latest_file = max(result_files, key=lambda p: p.stat().st_mtime)
    print(f"Latest results: {latest_file}")
    
    with open(latest_file) as f:
        results = json.load(f)
    
    print("\n" + "="*60)
    print("COMPARISON RESULTS: KNN Aggregator vs IBM Granite-4.0-H-Tiny")
    print("="*60)
    
    if "knn_aggregator" in results and "ibm_granite" in results:
        knn = results["knn_aggregator"]
        granite = results["ibm_granite"]
        
        print("\nKNN Aggregator Performance:")
        print(f"  Accuracy:  {knn.get('accuracy', 0):.2%}")
        print(f"  Precision: {knn.get('precision', 0):.2%}")
        print(f"  Recall:    {knn.get('recall', 0):.2%}")
        print(f"  F1-Score:  {knn.get('f1_score', 0):.2%}")
        
        print("\nIBM Granite-4.0-H-Tiny Performance:")
        print(f"  Accuracy:  {granite.get('accuracy', 0):.2%}")
        print(f"  Precision: {granite.get('precision', 0):.2%}")
        print(f"  Recall:    {granite.get('recall', 0):.2%}")
        print(f"  F1-Score:  {granite.get('f1_score', 0):.2%}")
        
        if "improvement" in results:
            imp = results["improvement"]
            print("\nPerformance Improvement (KNN vs Granite):")
            print(f"  Accuracy:  {imp.get('accuracy', {}).get('percentage', 0):+.1f}% ({imp.get('accuracy', {}).get('absolute', 0):+.2%})")
            print(f"  Precision: {imp.get('precision', {}).get('percentage', 0):+.1f}% ({imp.get('precision', {}).get('absolute', 0):+.2%})")
            print(f"  Recall:    {imp.get('recall', {}).get('percentage', 0):+.1f}% ({imp.get('recall', {}).get('absolute', 0):+.2%})")
            print(f"  F1-Score:  {imp.get('f1_score', {}).get('percentage', 0):+.1f}% ({imp.get('f1_score', {}).get('absolute', 0):+.2%})")
else:
    print("No results found. Make sure Step 5 completed successfully.")


In [None]:
from google.colab import files

# Download evaluation results
if result_files:
    files.download(str(latest_file))
    print(f"✅ Downloaded: {latest_file}")

# Optionally download reference data
# files.download("aggregator/knn_reference_hh_rlhf_full.jsonl")


## Quick Workflow (All-in-One)

Alternatively, you can use the workflow script to run everything in one command:


In [None]:
# Run the complete workflow in one command
# This will generate reference data and compare KNN vs Granite
!python aggregator/knn_workflow.py --limit 100


## Tips

1. **Enable GPU**: Runtime → Change runtime type → GPU (for faster processing)
2. **Skip generation**: If reference data already exists, you can skip Step 4
3. **Adjust parameters**: Modify `--limit` and `--threshold` in Step 5 as needed
4. **Save to Drive**: Mount Drive and copy results for persistence
5. **Time limits**: Free Colab has session time limits (~12 hours)
