# KNN Aggregator Workflow - Google Colab

This notebook runs the complete KNN aggregator workflow in Google Colab.

## Steps:
1. Install dependencies
2. Setup project files
3. Generate KNN reference data
4. Evaluate performance (KNN vs Majority Vote)
5. Download results


## 1. Install Dependencies


In [None]:
# Install required packages
%pip install -q transformers>=4.44 torch scikit-learn datasets==3.6.0 huggingface_hub safetensors tqdm pandas numpy


## 2. Setup Project Files


In [None]:
# Option A: Clone from GitHub (if repository is public)
!git clone https://github.com/SohamNagi/ArmyOfSafeguards.git
%cd ArmyOfSafeguards


## 3. (Optional) Mount Google Drive for Persistent Storage


In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Optional: Copy project to Drive for persistence
# !cp -r /content/ArmyOfSafeguards /content/drive/MyDrive/


## 4. Generate KNN Reference Data


In [None]:
# Generate reference data from HH-RLHF dataset
# This will download the dataset and run all 4 safeguards
# Estimated time: 10-30 minutes (with GPU), 1-3 hours (CPU only)

!python aggregator/generate_knn_reference_hh_rlhf_full.py


## 5. Evaluate Performance (KNN vs Majority Vote)


In [None]:
# Compare KNN aggregation vs Majority Vote
# Adjust --limit for number of examples (default: 100)
# Adjust --threshold for confidence threshold (default: 0.7)

!python aggregator/evaluate_aggregator.py --dataset hh-rlhf --limit 100 --knn-reference aggregator/knn_reference_hh_rlhf_full.jsonl --compare --threshold 0.7


## 6. View Results


In [None]:
import json
from pathlib import Path

# Find the latest evaluation results
result_files = list(Path("aggregator").glob("evaluation_results_*.json"))
if result_files:
    latest_file = max(result_files, key=lambda p: p.stat().st_mtime)
    print(f"Latest results: {latest_file}")
    
    with open(latest_file) as f:
        results = json.load(f)
    
    print("\n" + "="*60)
    print("EVALUATION RESULTS")
    print("="*60)
    
    if "majority_vote" in results and "knn" in results:
        mv = results["majority_vote"]
        knn = results["knn"]
        
        print("\nMajority Vote:")
        print(f"  Accuracy:  {mv.get('accuracy', 0):.2%}")
        print(f"  F1-Score:  {mv.get('f1_score', 0):.2%}")
        
        print("\nKNN Aggregation:")
        print(f"  Accuracy:  {knn.get('accuracy', 0):.2%}")
        print(f"  F1-Score:  {knn.get('f1_score', 0):.2%}")
        
        if "improvement" in results:
            imp = results["improvement"]
            print("\nImprovement:")
            print(f"  Accuracy:  {imp.get('accuracy', {}).get('percentage', 0):+.1f}%")
            print(f"  F1-Score:  {imp.get('f1_score', {}).get('percentage', 0):+.1f}%")
else:
    print("No results found")


## 7. Download Results


In [None]:
from google.colab import files

# Download evaluation results
if result_files:
    files.download(str(latest_file))
    print(f"âœ… Downloaded: {latest_file}")

# Optionally download reference data
# files.download("aggregator/knn_reference_hh_rlhf_full.jsonl")


## Quick Workflow (All-in-One)


In [None]:
# Run the complete workflow in one command
# This will generate reference data and evaluate performance

!python aggregator/knn_workflow.py --limit 100
