# üîÑ Part 2: Multi-Model Comparison

Different embedding models produce different vector representations. In this notebook, we'll compare how various models represent the **same words** differently.

## What You'll Learn:
1. How different models produce different embeddings
2. Why model choice matters for RAG systems
3. Trade-offs between model size, speed, and quality

## üì¶ Install Dependencies

In [None]:
# Install required packages (run this in Colab)
!pip install sentence-transformers plotly seaborn scikit-learn -q

## üìö Import Libraries

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns
import matplotlib.pyplot as plt
from sentence_transformers import SentenceTransformer
from sklearn.manifold import TSNE
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully!")

---
## üìù Define Your Word List

We'll use the same word list across all models to compare their representations.

In [None]:
# ============================================================
# üîß CUSTOMIZE YOUR WORD LIST HERE!
# ============================================================
# Use the same categories as Notebook 1 for consistency

word_categories = {
    "üçé Fruits": [
        "apple", "banana", "orange", "mango", "strawberry", "grape"
    ],
    "üêæ Animals": [
        "dog", "cat", "elephant", "lion", "tiger", "rabbit"
    ],
    "üé® Colors": [
        "red", "blue", "green", "yellow", "purple", "orange"
    ],
    "üíª Technology": [
        "computer", "smartphone", "laptop", "tablet", "keyboard", "mouse"
    ],
    "üöó Vehicles": [
        "car", "bicycle", "motorcycle", "airplane", "train", "boat"
    ]
}

# Flatten the dictionary
words = []
categories = []
for category, word_list in word_categories.items():
    words.extend(word_list)
    categories.extend([category] * len(word_list))

print(f"üìä Total words: {len(words)}")
print(f"üìÅ Categories: {list(word_categories.keys())}")

---
## ü§ñ Define Models to Compare

Here are some popular embedding models with different characteristics:

In [None]:
# ============================================================
# üîß ADD OR REMOVE MODELS HERE!
# ============================================================

models_to_compare = {
    "all-MiniLM-L6-v2": {
        "description": "Fast & lightweight (22M params, 384 dim)",
        "size": "Small",
        "speed": "‚ö° Fast"
    },
    "all-mpnet-base-v2": {
        "description": "Higher quality (110M params, 768 dim)",
        "size": "Medium",
        "speed": "üîÑ Moderate"
    },
    "paraphrase-MiniLM-L6-v2": {
        "description": "Optimized for paraphrase detection (22M params, 384 dim)",
        "size": "Small",
        "speed": "‚ö° Fast"
    }
}

print("üìã Models to compare:")
print("=" * 70)
for name, info in models_to_compare.items():
    print(f"  ‚Ä¢ {name}")
    print(f"    {info['description']}")
    print(f"    Size: {info['size']} | Speed: {info['speed']}")
    print()

## üîÑ Load Models & Generate Embeddings

In [None]:
# Store embeddings and models
model_embeddings = {}
loaded_models = {}

for model_name in models_to_compare.keys():
    print(f"\n{'='*60}")
    print(f"üîÑ Loading: {model_name}")
    print(f"{'='*60}")
    
    # Load model
    model = SentenceTransformer(model_name)
    loaded_models[model_name] = model
    
    print(f"   Embedding dimension: {model.get_sentence_embedding_dimension()}")
    
    # Generate embeddings
    print(f"   Generating embeddings...")
    embeddings = model.encode(words, show_progress_bar=True)
    model_embeddings[model_name] = embeddings
    
    print(f"   ‚úÖ Done! Shape: {embeddings.shape}")

print(f"\n\nüéâ All {len(models_to_compare)} models loaded and embeddings generated!")

---
## üåê Side-by-Side 3D Visualizations

Let's see how each model represents the same words in 3D space.

In [None]:
# Apply t-SNE to each model's embeddings
tsne_results = {}

perplexity = min(30, len(words) - 1)

for model_name, embeddings in model_embeddings.items():
    print(f"üîÑ Applying t-SNE for {model_name}...")
    
    tsne = TSNE(
        n_components=3,
        perplexity=perplexity,
        random_state=42,
        n_iter=1000,
        learning_rate='auto',
        init='pca'
    )
    
    embeddings_3d = tsne.fit_transform(embeddings)
    tsne_results[model_name] = embeddings_3d
    print(f"   ‚úÖ Done!")

print("\nüéâ All t-SNE transformations complete!")

In [None]:
# Create individual 3D plots for each model
for model_name, embeddings_3d in tsne_results.items():
    df = pd.DataFrame({
        'word': words,
        'category': categories,
        'x': embeddings_3d[:, 0],
        'y': embeddings_3d[:, 1],
        'z': embeddings_3d[:, 2]
    })
    
    fig = px.scatter_3d(
        df,
        x='x', y='y', z='z',
        color='category',
        text='word',
        title=f'ü§ñ {model_name}<br><sub>{models_to_compare[model_name]["description"]}</sub>',
        labels={'x': 't-SNE 1', 'y': 't-SNE 2', 'z': 't-SNE 3'},
        height=600,
        color_discrete_sequence=px.colors.qualitative.Set1
    )
    
    fig.update_traces(
        marker=dict(size=8, line=dict(width=1, color='white')),
        textposition='top center',
        textfont=dict(size=9)
    )
    
    fig.update_layout(
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=-0.15,
            xanchor="center",
            x=0.5
        ),
        margin=dict(l=0, r=0, b=100, t=80)
    )
    
    fig.show()
    print(f"\n{'‚îÄ'*60}\n")

---
## üìä Compare Similarity Matrices

Let's see how each model measures word similarity differently.

In [None]:
# Calculate similarity matrices for each model
similarity_matrices = {}

for model_name, embeddings in model_embeddings.items():
    similarity_matrices[model_name] = cosine_similarity(embeddings)
    
print("‚úÖ Similarity matrices calculated for all models!")

In [None]:
# Create side-by-side heatmaps
n_models = len(models_to_compare)
fig, axes = plt.subplots(1, n_models, figsize=(7 * n_models, 6))

if n_models == 1:
    axes = [axes]

for ax, (model_name, sim_matrix) in zip(axes, similarity_matrices.items()):
    sns.heatmap(
        sim_matrix,
        xticklabels=words,
        yticklabels=words,
        cmap='RdYlBu_r',
        vmin=0,
        vmax=1,
        ax=ax,
        cbar_kws={'shrink': 0.8}
    )
    ax.set_title(f'{model_name}\n{models_to_compare[model_name]["size"]}', fontsize=10)
    ax.tick_params(axis='x', rotation=90, labelsize=7)
    ax.tick_params(axis='y', rotation=0, labelsize=7)

plt.suptitle('Cosine Similarity Comparison Across Models', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

---
## üéØ Compare Specific Word Pairs Across Models

In [None]:
# Define interesting word pairs to compare
word_pairs_to_compare = [
    ("dog", "cat"),           # Similar animals
    ("apple", "banana"),       # Similar fruits
    ("car", "bicycle"),        # Similar vehicles
    ("computer", "laptop"),    # Very similar tech
    ("dog", "car"),            # Different categories
    ("apple", "red"),          # Different but related
    ("orange", "orange"),      # Same word (should be 1.0)
    ("mouse", "cat"),          # Ambiguous - computer mouse vs animal
]

print("üìä SIMILARITY COMPARISON ACROSS MODELS")
print("=" * 80)
print(f"{'Word Pair':<25}", end="")
for model_name in models_to_compare.keys():
    short_name = model_name.split('-')[1] if '-' in model_name else model_name[:10]
    print(f"{short_name:^15}", end="")
print("\n" + "-" * 80)

for word1, word2 in word_pairs_to_compare:
    if word1 in words and word2 in words:
        idx1 = words.index(word1)
        idx2 = words.index(word2)
        
        print(f"{word1} ‚Üî {word2:<15}", end="")
        
        for model_name, sim_matrix in similarity_matrices.items():
            similarity = sim_matrix[idx1, idx2]
            # Color code the similarity
            if similarity > 0.7:
                indicator = "üü¢"
            elif similarity > 0.4:
                indicator = "üü°"
            else:
                indicator = "üî¥"
            print(f"{indicator} {similarity:.3f}      ", end="")
        print()
    else:
        print(f"{word1} ‚Üî {word2}: ‚ö†Ô∏è Word not in list")

print("\n" + "=" * 80)
print("Legend: üü¢ High (>0.7) | üü° Medium (0.4-0.7) | üî¥ Low (<0.4)")

---
## üìà Similarity Distribution Analysis

In [None]:
# Compare the distribution of similarities across models
fig, axes = plt.subplots(1, len(models_to_compare), figsize=(5 * len(models_to_compare), 4))

if len(models_to_compare) == 1:
    axes = [axes]

for ax, (model_name, sim_matrix) in zip(axes, similarity_matrices.items()):
    # Get upper triangle values (excluding diagonal)
    upper_tri = sim_matrix[np.triu_indices(len(words), k=1)]
    
    ax.hist(upper_tri, bins=30, edgecolor='white', alpha=0.7, color='steelblue')
    ax.axvline(np.mean(upper_tri), color='red', linestyle='--', label=f'Mean: {np.mean(upper_tri):.3f}')
    ax.axvline(np.median(upper_tri), color='orange', linestyle='--', label=f'Median: {np.median(upper_tri):.3f}')
    
    ax.set_xlabel('Cosine Similarity')
    ax.set_ylabel('Frequency')
    ax.set_title(f'{model_name}\n(Std: {np.std(upper_tri):.3f})', fontsize=10)
    ax.legend(fontsize=8)
    ax.set_xlim(0, 1)

plt.suptitle('Distribution of Pairwise Similarities', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

---
## üèÜ Model Ranking: Which Clusters Best?

In [None]:
# Calculate intra-category vs inter-category similarity
print("üìä CLUSTERING QUALITY ANALYSIS")
print("=" * 80)
print("Measuring how well each model separates categories...")
print()

model_scores = {}

for model_name, sim_matrix in similarity_matrices.items():
    intra_similarities = []  # Same category
    inter_similarities = []  # Different category
    
    for i in range(len(words)):
        for j in range(i + 1, len(words)):
            if categories[i] == categories[j]:
                intra_similarities.append(sim_matrix[i, j])
            else:
                inter_similarities.append(sim_matrix[i, j])
    
    avg_intra = np.mean(intra_similarities)
    avg_inter = np.mean(inter_similarities)
    separation = avg_intra - avg_inter  # Higher = better separation
    
    model_scores[model_name] = {
        'intra': avg_intra,
        'inter': avg_inter,
        'separation': separation
    }
    
    print(f"ü§ñ {model_name}")
    print(f"   Avg similarity (same category):      {avg_intra:.4f}")
    print(f"   Avg similarity (different category): {avg_inter:.4f}")
    print(f"   Category separation score:           {separation:.4f}")
    print()

# Determine winner
best_model = max(model_scores.keys(), key=lambda x: model_scores[x]['separation'])
print("=" * 80)
print(f"üèÜ Best clustering model: {best_model}")
print(f"   (Highest separation between same-category and different-category pairs)")

---
## üéì Key Takeaways

### Model Differences
- **Embedding dimension** affects the richness of representations
- **Training objective** (e.g., paraphrase vs semantic similarity) affects results
- **Model size** trades off speed vs quality

### Choosing a Model for RAG
1. **For speed**: Use `all-MiniLM-L6-v2` or similar lightweight models
2. **For quality**: Use `all-mpnet-base-v2` or larger models
3. **For multilingual**: Use `paraphrase-multilingual-*` models
4. **For domain-specific**: Consider fine-tuning or specialized models

### What We Learned
- Different models produce different similarity scores for the same words
- Some models cluster categories more clearly than others
- The "best" model depends on your specific use case

---
## üß™ Try It Yourself!

Experiment with:
1. Adding more models to `models_to_compare`
2. Changing the word categories
3. Testing with longer phrases instead of single words
4. Comparing multilingual models with different languages

In [None]:
# üîß YOUR EXPERIMENTATION SPACE
# Add your own tests below!

# Example: Test a custom phrase
custom_phrases = [
    "I love eating apples",
    "Apples are my favorite fruit",
    "The dog is playing in the park"
]

print("Comparing custom phrases across models:")
print("=" * 60)

for model_name, model in loaded_models.items():
    embeddings = model.encode(custom_phrases)
    sim_matrix = cosine_similarity(embeddings)
    
    print(f"\nü§ñ {model_name}:")
    print(f"   Phrase 1 ‚Üî Phrase 2: {sim_matrix[0,1]:.4f}")
    print(f"   Phrase 1 ‚Üî Phrase 3: {sim_matrix[0,2]:.4f}")
    print(f"   Phrase 2 ‚Üî Phrase 3: {sim_matrix[1,2]:.4f}")