# üêæ VetLLM: Complete Testing & Fine-Tuning Notebook

This notebook provides a comprehensive guide to:
1. **Understanding the data format** required for fine-tuning
2. **Testing the base model** (zero-shot performance)
3. **Fine-tuning the model** with your data
4. **Evaluating** the fine-tuned model

---

## üìã Table of Contents
1. Environment Setup
2. Understanding Data Formats
3. Testing Base Model (Zero-Shot)
4. Data Preprocessing
5. Fine-Tuning with LoRA
6. Evaluation & Comparison
7. Interactive Demo



---
## 1. Environment Setup

First, let's install all required dependencies and check our hardware.



In [1]:
# ============================================
# STEP 1: Install Dependencies
# ============================================
# Run this cell to install all required packages
!pip install -q torch torchvision torchaudio \
transformers>=4.35.0 datasets>=2.14.0 accelerate>=0.24.0 \
peft>=0.6.0 bitsandbytes>=0.41.0 scikit-learn pandas numpy \
matplotlib seaborn sentencepiece protobuf

print("‚úÖ All packages installed successfully!")




‚úÖ All packages installed successfully!


In [2]:
# ============================================
# STEP 2: Import Libraries & Check Hardware
# ============================================

import os
import json
import random
import re
import numpy as np
import pandas as pd
from typing import List, Dict, Optional
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling,
)
from datasets import Dataset
from peft import LoraConfig, get_peft_model, TaskType, PeftModel
from sklearn.metrics import f1_score, precision_score, recall_score

# Check hardware
print("="*60)
print("üñ•Ô∏è  HARDWARE CHECK")
print("="*60)
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    print("MPS (Apple Silicon) available")
    device = "mps"
else:
    print("Using CPU")
    device = "cpu"
print(f"\nUsing device: {device}")
print("="*60)



üñ•Ô∏è  HARDWARE CHECK
PyTorch version: 2.9.0+cu126
CUDA available: False
Using CPU

Using device: cpu


---
## 2. Understanding Data Formats

### üìä VetLLM uses three data formats:

1. **Raw Veterinary Notes** - Your original clinical notes
2. **Instruction Format** - Formatted for instruction-tuning
3. **Alpaca Prompt** - The actual input to the model



In [None]:
# ============================================
# UNDERSTANDING DATA FORMAT: Raw Veterinary Notes
# ============================================
# This is your INPUT format - clinical notes with SNOMED-CT codes

raw_veterinary_note_example = {
    "clinical_note": "Dog, 4 years old, Golden Retriever. Presents with acute lethargy and decreased appetite. Physical exam: pale gums, mild fever (39.3¬∞C). Owner reports possible exposure to spoiled food.",
    "snomed_codes": ["397983004", "79890006"]  # Lethargy, Loss of appetite
}

print("="*60)
print("üìã FORMAT 1: Raw Veterinary Note (Your Input Data)")
print("="*60)
print(json.dumps(raw_veterinary_note_example, indent=2))
print("\nüí° This is how you should structure your data file!")



In [None]:
# ============================================
# UNDERSTANDING DATA FORMAT: Instruction Format
# ============================================
# After preprocessing, data is converted to instruction-following format

instruction_format_example = {
    "instruction": "Analyze the following veterinary clinical note and predict the SNOMED-CT diagnosis codes.",
    "input": "Clinical Note: Dog, 4 years old, Golden Retriever. Presents with acute lethargy and decreased appetite. Physical exam: pale gums, mild fever (39.3¬∞C). Owner reports possible exposure to spoiled food.",
    "output": "Diagnosed conditions: 397983004, 79890006",
    "snomed_codes": ["397983004", "79890006"]
}

print("="*60)
print("üìã FORMAT 2: Instruction Format (For Training)")
print("="*60)
print(json.dumps(instruction_format_example, indent=2))
print("\nüí° This format is used internally for training!")



In [None]:
# ============================================
# UNDERSTANDING DATA FORMAT: Alpaca Prompt
# ============================================
# This is the actual prompt format sent to the model

def create_alpaca_prompt(instruction: str, input_text: str = "", output: str = "") -> str:
    """Create Alpaca-style prompt for the model."""
    if input_text:
        prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input_text}

### Response:
{output}"""
    else:
        prompt = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{output}"""
    return prompt

# Create example prompt
example_prompt = create_alpaca_prompt(
    instruction=instruction_format_example["instruction"],
    input_text=instruction_format_example["input"],
    output=""  # Empty during inference
)

print("="*60)
print("üìã FORMAT 3: Alpaca Prompt (Actual Model Input)")
print("="*60)
print(example_prompt)
print("\nüí° This is what the model actually sees!")



In [None]:
# ============================================
# SNOMED-CT Code Reference
# ============================================
# Common SNOMED-CT codes used in veterinary diagnosis

SNOMED_CODES = {
    "397983004": "Lethargy",
    "79890006": "Loss of appetite",
    "422400008": "Vomiting",
    "62315008": "Diarrhea",
    "267036007": "Difficulty breathing",
    "16973004": "Lameness",
    "57676002": "Joint pain",
    "271807003": "Skin rash",
    "424492005": "Scratching",
    "91175000": "Seizure",
    "89362005": "Weight loss",
    "49727002": "Cough",
    "64531003": "Nasal discharge",
    "271860004": "Abdominal distension",
    "25786006": "Behavioral changes",
    "386661006": "Fever",
    "34095006": "Dehydration",
    "387603000": "Loss of balance",
    "17173007": "Excessive thirst",
    "139394000": "Difficulty urinating",
    "246636008": "Eye discharge"
}

print("="*60)
print("üè• SNOMED-CT Code Reference")
print("="*60)
for code, description in list(SNOMED_CODES.items())[:10]:
    print(f"  {code}: {description}")
print(f"  ... and {len(SNOMED_CODES)-10} more codes")



---
## 3. Testing Base Model (Zero-Shot)

Let's test the **Alpaca-7B base model** WITHOUT any fine-tuning to understand its baseline performance on veterinary diagnosis.



In [None]:
# ============================================
# CONFIGURATION: Choose Your Model
# ============================================

# Choose one based on your GPU memory:
# - "wxjiao/alpaca-7b" : Full 7B model (requires 16GB+ GPU)
# - "openlm-research/open_llama_3b" : Smaller model (requires 8GB+ GPU)
# - "TinyLlama/TinyLlama-1.1B-Chat-v1.0" : Tiny model (works on most GPUs)

# For Colab Free Tier (T4 GPU with 15GB), use:
MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Change to "wxjiao/alpaca-7b" if you have enough memory

# For Kaggle/Colab Pro (A100/V100):
# MODEL_NAME = "wxjiao/alpaca-7b"

print(f"Selected model: {MODEL_NAME}")
print("\n‚ö†Ô∏è If you get OOM errors, switch to a smaller model above!")



In [None]:
# ============================================
# LOAD BASE MODEL & TOKENIZER
# ============================================

print("Loading model and tokenizer...")
print("This may take a few minutes for the first time (downloading model weights).")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
    use_fast=True
)

# Set pad token if not present
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model with appropriate settings
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
    device_map="auto" if device == "cuda" else None,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

# Move to device if not using device_map
if device != "cuda":
    model = model.to(device)

model.eval()

print(f"\n‚úÖ Model loaded successfully!")
print(f"Model size: {sum(p.numel() for p in model.parameters()) / 1e9:.2f}B parameters")



In [None]:
# ============================================
# INFERENCE FUNCTION
# ============================================

def predict_diagnosis(
    clinical_note: str,
    model,
    tokenizer,
    max_new_tokens: int = 100,
    temperature: float = 0.1
) -> str:
    """
    Generate diagnosis prediction for a clinical note.
    
    Args:
        clinical_note: The veterinary clinical note
        model: The loaded model
        tokenizer: The loaded tokenizer
        max_new_tokens: Maximum tokens to generate
        temperature: Sampling temperature (lower = more deterministic)
    
    Returns:
        The model's prediction
    """
    instruction = "Analyze the following veterinary clinical note and predict the SNOMED-CT diagnosis codes. List the numeric codes that apply."
    
    prompt = create_alpaca_prompt(
        instruction=instruction,
        input_text=f"Clinical Note: {clinical_note}"
    )
    
    # Tokenize
    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=temperature > 0,
            temperature=temperature if temperature > 0 else None,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    # Decode only the new tokens
    response = tokenizer.decode(
        outputs[0][inputs['input_ids'].shape[1]:],
        skip_special_tokens=True
    ).strip()
    
    return response

print("‚úÖ Inference function created!")



In [None]:
# ============================================
# TEST CASES: Zero-Shot Performance
# ============================================

test_cases = [
    {
        "clinical_note": "Dog, 4 years old, Golden Retriever. Presents with acute lethargy and decreased appetite. Physical exam: pale gums, mild fever (39.3¬∞C).",
        "expected_codes": ["397983004", "79890006"],
        "expected_diagnosis": "Lethargy, Loss of appetite"
    },
    {
        "clinical_note": "Cat, 2 years old, Siamese. Vomiting and diarrhea for 2 days. Dehydrated, temperature 39.8¬∞C.",
        "expected_codes": ["422400008", "62315008", "34095006"],
        "expected_diagnosis": "Vomiting, Diarrhea, Dehydration"
    },
    {
        "clinical_note": "Rabbit, 1 year old. Hair loss and itchy skin, some scabs around neck. Eats well and otherwise healthy.",
        "expected_codes": ["271807003", "424492005"],
        "expected_diagnosis": "Skin rash, Scratching"
    },
    {
        "clinical_note": "Horse, 8 years old, Thoroughbred. Limping on front left leg. Joint swelling observed. No fever.",
        "expected_codes": ["16973004", "57676002"],
        "expected_diagnosis": "Lameness, Joint pain"
    },
    {
        "clinical_note": "Dog, 3 years old, Beagle. Sudden onset of seizures. No prior history. Post-ictal confusion.",
        "expected_codes": ["91175000"],
        "expected_diagnosis": "Seizure"
    }
]

print("="*60)
print("üß™ ZERO-SHOT TESTING (Base Model without Fine-tuning)")
print("="*60)
print("\nTesting model's ability to predict diagnoses without any training...\n")

for i, test in enumerate(test_cases, 1):
    print(f"\n{'‚îÄ'*60}")
    print(f"üìã Test Case {i}")
    print(f"{'‚îÄ'*60}")
    print(f"\nüìù Clinical Note:")
    print(f"   {test['clinical_note']}")
    print(f"\n‚úÖ Expected: {test['expected_diagnosis']}")
    print(f"   Codes: {test['expected_codes']}")
    
    prediction = predict_diagnosis(test['clinical_note'], model, tokenizer)
    print(f"\nü§ñ Model Prediction:")
    print(f"   {prediction[:500]}..." if len(prediction) > 500 else f"   {prediction}")

print("\n" + "="*60)
print("üí° Observation: Base model may not give accurate SNOMED codes.")
print("   Fine-tuning will significantly improve this performance!")
print("="*60)



---
## 4. Data Preprocessing

Now let's prepare data for fine-tuning. We'll create synthetic training data and show you how to format your own data.



In [None]:
# ============================================
# SYNTHETIC DATA GENERATION
# ============================================

def create_synthetic_veterinary_data(num_samples: int = 100) -> List[Dict]:
    """
    Create synthetic veterinary training data.
    
    In practice, you would replace this with your real data!
    """
    
    species_options = ["dog", "cat", "rabbit", "bird", "horse"]
    breed_options = {
        "dog": ["Labrador", "German Shepherd", "Golden Retriever", "Bulldog", "Beagle", "Poodle"],
        "cat": ["Persian", "Siamese", "Maine Coon", "British Shorthair", "Ragdoll"],
        "rabbit": ["Holland Lop", "Netherland Dwarf", "Mini Rex", "Lionhead"],
        "bird": ["Cockatiel", "Budgie", "Canary", "Parrot"],
        "horse": ["Thoroughbred", "Quarter Horse", "Arabian", "Pinto"]
    }
    
    # Condition templates with associated codes
    conditions = [
        {
            "complaint": "lethargy and decreased appetite",
            "findings": "mild dehydration, pale mucous membranes",
            "codes": ["397983004", "79890006"]
        },
        {
            "complaint": "vomiting and diarrhea",
            "findings": "dehydration, elevated temperature",
            "codes": ["422400008", "62315008"]
        },
        {
            "complaint": "difficulty breathing",
            "findings": "respiratory distress, abnormal lung sounds",
            "codes": ["267036007"]
        },
        {
            "complaint": "limping and joint pain",
            "findings": "joint swelling, pain on palpation",
            "codes": ["16973004", "57676002"]
        },
        {
            "complaint": "skin irritation and scratching",
            "findings": "skin lesions, erythema",
            "codes": ["271807003", "424492005"]
        },
        {
            "complaint": "seizure activity",
            "findings": "post-ictal confusion, neurological abnormalities",
            "codes": ["91175000"]
        },
        {
            "complaint": "weight loss",
            "findings": "muscle wasting, poor body condition",
            "codes": ["89362005"]
        },
        {
            "complaint": "coughing and nasal discharge",
            "findings": "nasal discharge, enlarged lymph nodes",
            "codes": ["49727002", "64531003"]
        },
    ]
    
    instructions = [
        "Analyze the following veterinary clinical note and predict the SNOMED-CT diagnosis codes.",
        "Based on the clinical findings, identify the most likely SNOMED-CT diagnosis codes.",
        "What are the appropriate SNOMED-CT codes for this veterinary case?",
        "Determine the diagnosis codes that best match this clinical presentation.",
    ]
    
    data = []
    for i in range(num_samples):
        species = random.choice(species_options)
        breed = random.choice(breed_options[species])
        age = random.randint(1, 15)
        condition = random.choice(conditions)
        temp = round(random.uniform(37.5, 41.0), 1)
        
        clinical_note = f"{species.capitalize()}, {age} years old, {breed}. Presents with {condition['complaint']}. Physical exam: {condition['findings']}. Temperature: {temp}¬∞C."
        
        data.append({
            "instruction": random.choice(instructions),
            "input": f"Clinical Note: {clinical_note}",
            "output": f"Diagnosed conditions: {', '.join(condition['codes'])}",
            "snomed_codes": condition['codes']
        })
    
    return data

# Create training and validation data
NUM_TRAIN_SAMPLES = 200  # Increase for better results
NUM_VAL_SAMPLES = 50

print("Creating synthetic training data...")
train_data = create_synthetic_veterinary_data(NUM_TRAIN_SAMPLES)
val_data = create_synthetic_veterinary_data(NUM_VAL_SAMPLES)

print(f"‚úÖ Created {len(train_data)} training samples")
print(f"‚úÖ Created {len(val_data)} validation samples")

# Show sample
print("\nüìã Sample training data:")
print(json.dumps(train_data[0], indent=2))



In [None]:
# ============================================
# HOW TO USE YOUR OWN DATA
# ============================================

# If you have your own veterinary notes, format them like this:

your_data_template = '''
[
    {
        "instruction": "Analyze the following veterinary clinical note and predict the SNOMED-CT diagnosis codes.",
        "input": "Clinical Note: [YOUR CLINICAL NOTE HERE]",
        "output": "Diagnosed conditions: [CODE1], [CODE2], ...",
        "snomed_codes": ["CODE1", "CODE2"]
    },
    // ... more samples
]
'''

print("="*60)
print("üìÅ HOW TO USE YOUR OWN DATA")
print("="*60)
print("\n1. Create a JSON file with this structure:")
print(your_data_template)
print("\n2. Upload to Colab/Kaggle")
print("\n3. Load with:")
print("   with open('your_data.json', 'r') as f:")
print("       train_data = json.load(f)")



In [None]:
# ============================================
# PREPARE DATASET FOR TRAINING
# ============================================

def prepare_training_data(data: List[Dict], tokenizer, max_length: int = 512) -> Dataset:
    """
    Convert instruction data to tokenized dataset.
    """
    formatted_data = []
    for item in data:
        prompt = create_alpaca_prompt(
            instruction=item["instruction"],
            input_text=item.get("input", ""),
            output=item["output"]
        )
        formatted_data.append({"text": prompt})
    
    dataset = Dataset.from_list(formatted_data)
    
    def tokenize_function(examples):
        return tokenizer(
            examples["text"],
            truncation=True,
            max_length=max_length,
            padding=False,
            return_tensors=None
        )
    
    tokenized_dataset = dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=dataset.column_names
    )
    
    return tokenized_dataset

print("Preparing datasets for training...")
train_dataset = prepare_training_data(train_data, tokenizer)
val_dataset = prepare_training_data(val_data, tokenizer)

print(f"‚úÖ Training dataset: {len(train_dataset)} samples")
print(f"‚úÖ Validation dataset: {len(val_dataset)} samples")
print(f"\nSample tokenized length: {len(train_dataset[0]['input_ids'])} tokens")



---
## 5. Fine-Tuning with LoRA

Now let's fine-tune the model using **LoRA (Low-Rank Adaptation)** - a memory-efficient technique that only trains a small subset of parameters.



In [None]:
# ============================================
# SETUP LORA CONFIGURATION
# ============================================

# LoRA Configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,                          # Rank of the low-rank matrices
    lora_alpha=32,                 # Scaling factor
    lora_dropout=0.1,              # Dropout probability
    target_modules=[               # Modules to apply LoRA to
        "q_proj", "v_proj",        # Attention projections
        "k_proj", "o_proj"         # Key and output projections
    ],
    bias="none"                    # Don't train biases
)

print("LoRA Configuration:")
print(f"  Rank (r): {lora_config.r}")
print(f"  Alpha: {lora_config.lora_alpha}")
print(f"  Dropout: {lora_config.lora_dropout}")
print(f"  Target modules: {lora_config.target_modules}")



In [None]:
# ============================================
# APPLY LORA TO MODEL
# ============================================

# Reload model for training (fresh instance)
print("Reloading model for LoRA training...")

model_for_training = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
    device_map="auto" if device == "cuda" else None,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)

if device != "cuda":
    model_for_training = model_for_training.to(device)

# Apply LoRA
print("Applying LoRA adapters...")
model_for_training = get_peft_model(model_for_training, lora_config)

# Print trainable parameters
model_for_training.print_trainable_parameters()

# Enable gradient checkpointing for memory efficiency
model_for_training.gradient_checkpointing_enable()
model_for_training.train()

print("\n‚úÖ LoRA adapters applied!")



In [None]:
# ============================================
# TRAINING CONFIGURATION
# ============================================

OUTPUT_DIR = "./vetllm-finetuned"

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    
    # Training hyperparameters
    num_train_epochs=3,                    # Number of epochs
    per_device_train_batch_size=4,         # Batch size (reduce if OOM)
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,         # Effective batch = 4 * 4 = 16
    
    # Learning rate and optimization
    learning_rate=2e-4,                    # Learning rate for LoRA
    weight_decay=0.01,
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
    
    # Memory optimization
    fp16=device == "cuda",                 # Mixed precision for CUDA
    gradient_checkpointing=True,
    
    # Evaluation and saving
    eval_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=2,
    load_best_model_at_end=True,
    
    # Logging
    logging_steps=10,
    logging_dir=f"{OUTPUT_DIR}/logs",
    report_to="none",                      # Disable wandb for notebook
    
    # Other
    dataloader_drop_last=True,
    remove_unused_columns=False,
)

print("Training Configuration:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  Output dir: {OUTPUT_DIR}")



In [None]:
# ============================================
# START TRAINING
# ============================================

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)

trainer = Trainer(
    model=model_for_training,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=data_collator,
)

print("="*60)
print("üöÄ STARTING TRAINING")
print("="*60)
print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(val_dataset)}")
print("\nThis will take a few minutes...\n")

# Train!
train_result = trainer.train()

print("\n" + "="*60)
print("‚úÖ TRAINING COMPLETED!")
print("="*60)
print(f"Training loss: {train_result.training_loss:.4f}")
print(f"Training steps: {train_result.global_step}")



In [None]:
# ============================================
# SAVE THE FINE-TUNED MODEL
# ============================================

trainer.save_model()
tokenizer.save_pretrained(OUTPUT_DIR)

print(f"‚úÖ Model saved to: {OUTPUT_DIR}")
print("\nSaved files:")
for f in os.listdir(OUTPUT_DIR):
    print(f"  - {f}")



---
## 6. Evaluation & Comparison

Let's compare the fine-tuned model with the base model!



In [None]:
# ============================================
# LOAD FINE-TUNED MODEL
# ============================================

print("Loading fine-tuned model...")

# The model_for_training already has the fine-tuned weights
finetuned_model = model_for_training
finetuned_model.eval()

print("‚úÖ Fine-tuned model ready for evaluation!")



In [None]:
# ============================================
# COMPARE BASE VS FINE-TUNED MODEL
# ============================================

def extract_snomed_codes(prediction: str) -> List[str]:
    """Extract SNOMED-CT codes from model prediction."""
    codes = re.findall(r'\b\d{6,18}\b', prediction)
    return list(dict.fromkeys(codes))[:10]  # Unique codes, max 10

def evaluate_prediction(pred_codes: List[str], true_codes: List[str]) -> Dict:
    """Calculate evaluation metrics."""
    pred_set = set(pred_codes)
    true_set = set(true_codes)
    
    exact_match = int(pred_set == true_set)
    jaccard = len(pred_set & true_set) / len(pred_set | true_set) if pred_set | true_set else 1.0
    
    return {
        "exact_match": exact_match,
        "jaccard": jaccard,
        "pred_codes": pred_codes,
        "true_codes": true_codes
    }

print("="*60)
print("üî¨ COMPARISON: Base Model vs Fine-Tuned Model")
print("="*60)

comparison_tests = test_cases[:3]  # Use first 3 test cases

for i, test in enumerate(comparison_tests, 1):
    print(f"\n{'‚îÄ'*60}")
    print(f"üìã Test Case {i}")
    print(f"{'‚îÄ'*60}")
    print(f"\nüìù Clinical Note: {test['clinical_note'][:100]}...")
    print(f"\n‚úÖ Expected: {test['expected_codes']}")
    
    # Base model prediction
    base_pred = predict_diagnosis(test['clinical_note'], model, tokenizer)
    base_codes = extract_snomed_codes(base_pred)
    base_eval = evaluate_prediction(base_codes, test['expected_codes'])
    
    print(f"\nüîµ Base Model:")
    print(f"   Prediction: {base_pred[:200]}..." if len(base_pred) > 200 else f"   Prediction: {base_pred}")
    print(f"   Extracted codes: {base_codes}")
    print(f"   Jaccard: {base_eval['jaccard']:.2f}")
    
    # Fine-tuned model prediction
    ft_pred = predict_diagnosis(test['clinical_note'], finetuned_model, tokenizer)
    ft_codes = extract_snomed_codes(ft_pred)
    ft_eval = evaluate_prediction(ft_codes, test['expected_codes'])
    
    print(f"\nüü¢ Fine-Tuned Model:")
    print(f"   Prediction: {ft_pred[:200]}..." if len(ft_pred) > 200 else f"   Prediction: {ft_pred}")
    print(f"   Extracted codes: {ft_codes}")
    print(f"   Jaccard: {ft_eval['jaccard']:.2f}")
    
    # Improvement
    improvement = ft_eval['jaccard'] - base_eval['jaccard']
    print(f"\nüìà Improvement: {improvement:+.2f}")



---
## 7. Interactive Demo

Try your own clinical notes!



In [None]:
# ============================================
# INTERACTIVE PREDICTION
# ============================================

def interactive_predict(clinical_note: str):
    """Get predictions from both models and compare."""
    print("="*60)
    print("üîÆ DIAGNOSIS PREDICTION")
    print("="*60)
    print(f"\nüìù Input: {clinical_note}")
    
    # Base model
    print(f"\nüîµ Base Model Prediction:")
    base_pred = predict_diagnosis(clinical_note, model, tokenizer)
    print(f"   {base_pred}")
    base_codes = extract_snomed_codes(base_pred)
    print(f"   Codes: {base_codes}")
    
    # Fine-tuned model
    print(f"\nüü¢ Fine-Tuned Model Prediction:")
    ft_pred = predict_diagnosis(clinical_note, finetuned_model, tokenizer)
    print(f"   {ft_pred}")
    ft_codes = extract_snomed_codes(ft_pred)
    print(f"   Codes: {ft_codes}")
    
    # Translate codes
    print(f"\nüìã Code Meanings:")
    for code in ft_codes:
        meaning = SNOMED_CODES.get(code, "Unknown code")
        print(f"   {code}: {meaning}")

# Example usage
interactive_predict(
    "Cat, 5 years old, Persian. Owner reports excessive scratching for 2 weeks. Hair loss around ears and neck. Some redness and scabs visible."
)



In [None]:
# ============================================
# TRY YOUR OWN CLINICAL NOTE
# ============================================

# üëá MODIFY THIS TEXT TO TEST YOUR OWN CLINICAL NOTES!
your_clinical_note = "Dog, 3 years old, German Shepherd. Sudden onset of seizures lasting 2 minutes. First episode. Post-ictal confusion. No prior history of seizures."

interactive_predict(your_clinical_note)



---
## üìö Summary

### What We Learned

1. **Data Format for Fine-Tuning:**
   ```json
   {
     "instruction": "Analyze the clinical note...",
     "input": "Clinical Note: [Your note here]",
     "output": "Diagnosed conditions: CODE1, CODE2",
     "snomed_codes": ["CODE1", "CODE2"]
   }
   ```

2. **Model Input Format (Alpaca Prompt):**
   ```
   Below is an instruction that describes a task...
   ### Instruction:
   [instruction]
   ### Input:
   [clinical note]
   ### Response:
   ```

3. **Fine-Tuning Improves Performance:**
   - Base model: Limited veterinary knowledge
   - Fine-tuned model: Learns SNOMED-CT code patterns

### Next Steps

1. **Use More Training Data** - The paper achieves best results with 5,000 notes
2. **Use Your Real Data** - Replace synthetic data with actual veterinary records
3. **Train Longer** - More epochs may improve performance
4. **Use Larger Model** - Alpaca-7B will outperform smaller models



In [None]:
# ============================================
# DOWNLOAD FINE-TUNED MODEL (For Colab)
# ============================================

# Uncomment to download your fine-tuned model

# import shutil
# shutil.make_archive("vetllm-finetuned", 'zip', OUTPUT_DIR)
# 
# from google.colab import files
# files.download('vetllm-finetuned.zip')

print("‚úÖ Notebook complete!")
print("\nüìÅ To download your model, uncomment and run the code above.")



---
## üîß Troubleshooting

### Common Issues

| Issue | Solution |
|-------|----------|
| **OOM Error** | Reduce `per_device_train_batch_size` to 2 or 1 |
| **Slow Training** | Use a smaller model or reduce `num_train_epochs` |
| **Poor Results** | Use more training data or train for more epochs |
| **Model Not Loading** | Check `MODEL_NAME` and ensure network access |

### GPU Memory Guide

| Model | Required VRAM | Colab Tier |
|-------|---------------|------------|
| TinyLlama-1.1B | 4-6 GB | Free (T4) |
| OpenLLaMA-3B | 8-10 GB | Free (T4) |
| Alpaca-7B | 16-20 GB | Pro (A100) |

---

*Created for VetLLM Research Project*


