# Insurance AI Assistant - Model Evaluation

This notebook provides comprehensive evaluation of the fine-tuned insurance AI assistant model using multiple metrics including perplexity, toxicity, relevance, and domain-specific assessments.

## Evaluation Metrics
- **Perplexity**: Language model quality and fluency
- **Toxicity**: Content safety and appropriateness  
- **Relevance**: Domain-specific relevance to insurance
- **Semantic Similarity**: Answer quality and coherence
- **Safety Score**: Responsible AI indicators
- **Response Quality**: Length and completeness metrics

In [None]:
# Install evaluation dependencies (uncomment if needed)
# !pip install transformers sentence-transformers detoxify plotly scikit-learn

import os
import json
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from pathlib import Path
from typing import Dict, List, Any
import warnings
warnings.filterwarnings('ignore')

# Model and evaluation libraries
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Safety evaluation
try:
    from detoxify import Detoxify
    DETOXIFY_AVAILABLE = True
    print("✅ Detoxify available for toxicity evaluation")
except ImportError:
    DETOXIFY_AVAILABLE = False
    print("⚠️ Detoxify not available - toxicity evaluation will be skipped")

# Check device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"🔧 Using device: {device}")

print("✅ Evaluation environment setup completed!")