# Demo: Load and Use Trained Model

This notebook demonstrates how to load the trained model and make predictions.

## Pre-trained Model Link

The trained model (v3.0) is available at:
- **Google Drive**: [Download Link](YOUR_GOOGLE_DRIVE_LINK) - Update this link after uploading
- **Hugging Face Hub**: [Model Card](https://huggingface.co/YOUR_USERNAME/llm-response-comparison-v3) - Optional

### Model Details
- **Base Model**: microsoft/deberta-v3-base
- **Fine-tuning Method**: LoRA (r=16, alpha=32)
- **Validation Log Loss**: 1.0735
- **Validation Accuracy**: 39.78%


In [None]:
# Install dependencies
!pip install transformers peft accelerate sentencepiece protobuf -q


In [None]:
# Imports
import pandas as pd
import numpy as np
import torch
import json
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
from peft import PeftModel
import warnings
warnings.filterwarnings('ignore')

print("✓ Imports completed")


In [None]:
# Configuration
MODEL_NAME = 'microsoft/deberta-v3-base'
MAX_LENGTH = 1280
NUM_LABELS = 3

# Model path - Update this to your model location
# Option 1: Local path (if downloaded)
MODEL_PATH = 'checkpoints/models_v3'

# Option 2: Google Drive (if uploaded to Drive)
# from google.colab import drive
# drive.mount('/content/drive')
# MODEL_PATH = '/content/drive/MyDrive/models_v3'

# Option 3: Hugging Face Hub (if uploaded)
# MODEL_PATH = 'YOUR_USERNAME/llm-response-comparison-v3'

print(f"Model path: {MODEL_PATH}")


In [None]:
# Load tokenizer
print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
print("✓ Tokenizer loaded")


In [None]:
# Load model
print("Loading model...")
config = AutoConfig.from_pretrained(MODEL_PATH)
config.num_labels = NUM_LABELS
config.problem_type = "single_label_classification"

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME, config=config
)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, MODEL_PATH)
model.eval()

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

print("✓ Model loaded successfully")
print(f"Device: {device}")


In [None]:
# Helper functions
def parse_json(text):
    """Parse JSON text safely"""
    try:
        p = json.loads(text) if isinstance(text, str) else text
        return '\n'.join([str(i) for i in p]) if isinstance(p, list) else str(p)
    except:
        return str(text)

def truncate(text, max_chars):
    """Smart truncation: keep head and tail"""
    if len(text) <= max_chars:
        return text
    h = int(max_chars * 0.6)
    t = max_chars - h - 10
    return text[:h] + "\n[...]\n" + text[-t:]

def predict_single(prompt, response_a, response_b, model, tokenizer, max_length=1280):
    """
    Predict which response is better for a single example.
    
    Args:
        prompt: The question/prompt text
        response_a: First response
        response_b: Second response
        model: Loaded model
        tokenizer: Loaded tokenizer
        max_length: Maximum sequence length
    
    Returns:
        dict with probabilities for A wins, B wins, and tie
    """
    max_chars = (max_length * 4) // 3
    
    prompt_text = truncate(parse_json(prompt), max_chars // 4)
    resp_a = truncate(parse_json(response_a), max_chars * 3 // 8)
    resp_b = truncate(parse_json(response_b), max_chars * 3 // 8)
    
    text = f"Compare responses:\n\nQ: {prompt_text}\n\n[A]: {resp_a}\n\n[B]: {resp_b}\n\nBetter?"
    
    enc = tokenizer(text, truncation=True, padding='max_length', 
                    max_length=max_length, return_tensors='pt')
    
    input_ids = enc['input_ids'].to(model.device)
    attention_mask = enc['attention_mask'].to(model.device)
    
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        probs = torch.softmax(outputs.logits, dim=-1).cpu().numpy()[0]
    
    probs = np.clip(probs, 1e-7, 1 - 1e-7)
    probs = probs / probs.sum()
    
    return {
        'winner_model_a': float(probs[0]),
        'winner_model_b': float(probs[1]),
        'winner_tie': float(probs[2])
    }

print("✓ Helper functions defined")


In [None]:
# Example: Single prediction
prompt = "What is machine learning?"
response_a = "Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. It involves algorithms that can identify patterns and make decisions based on data."
response_b = "Machine learning is when computers learn stuff from data."

result = predict_single(prompt, response_a, response_b, model, tokenizer, MAX_LENGTH)

print("=" * 60)
print("Prediction Results:")
print("=" * 60)
print(f"Prompt: {prompt}")
print(f"\nResponse A: {response_a[:100]}...")
print(f"Response B: {response_b}")
print(f"\nProbabilities:")
print(f"  A wins: {result['winner_model_a']:.4f} ({result['winner_model_a']*100:.2f}%)")
print(f"  B wins: {result['winner_model_b']:.4f} ({result['winner_model_b']*100:.2f}%)")
print(f"  Tie:    {result['winner_tie']:.4f} ({result['winner_tie']*100:.2f}%)")

winner = 'A' if result['winner_model_a'] > result['winner_model_b'] and result['winner_model_a'] > result['winner_tie'] else 'B' if result['winner_model_b'] > result['winner_tie'] else 'Tie'
print(f"\n✓ Predicted winner: {winner}")
print("=" * 60)


## Expected Output

After running the cells above, you should see:

1. **Model loaded successfully** - Confirmation that the model is ready
2. **Prediction results** - Probabilities for each class (A wins, B wins, Tie)
3. **Predicted winner** - The class with highest probability

The model can now be used for inference on new data.
