# Module 4: Understanding Embedding Models

## 🎯 Learning Objectives
By the end of this module, you will:
- Connect traditional ML embeddings to modern text embeddings
- Understand different embedding model architectures and their trade-offs
- Navigate the 2025 MTEB leaderboard to choose appropriate models
- Compare embedding models hands-on using real text
- Make informed decisions about model selection for production
- Understand cost vs performance implications

## 📚 Key Concepts

### From Traditional to Modern Embeddings 🔄

**If you've worked with ML, you likely know embeddings from:**
- **Categorical encoding**: Converting categories to dense vectors
- **Word2Vec**: Learning word relationships from context
- **Matrix factorization**: Collaborative filtering embeddings

**Modern text embeddings are similar but much more powerful:**
- **Contextual**: Same word has different embeddings in different contexts
- **Semantic**: Capture meaning, not just word co-occurrence
- **Task-specific**: Optimized for similarity search, not just word prediction

### Embedding Evolution Timeline 📈
| Era | Approach | Example | Context |
|-----|----------|---------|----------|
| 2013 | Static Word Vectors | Word2Vec | One vector per word |
| 2018 | Contextualized | BERT | Different vectors per context |
| 2019 | Sentence-Level | Sentence-BERT | Optimized for sentences |
| 2023 | Large-Scale | OpenAI text-embedding-3 | Billions of parameters |
| 2025 | Specialized | NV-Embed-v2, Stella | Domain & task optimized |

### 2025 MTEB Leaderboard Leaders 🏆
- **NV-Embed-v2**: 72.31 score (NVIDIA, Mistral-7B based)
- **Stella-1.5B**: Best open-source with commercial license
- **text-embedding-3-large**: OpenAI's flagship (64.6% MTEB)
- **EmbeddingGemma**: Google's best under 500M parameters
- **Voyage-3**: Strong commercial performance

### Key Selection Criteria 🎯
1. **Performance**: MTEB benchmark scores
2. **Cost**: API pricing vs local hosting
3. **Speed**: Inference latency and throughput
4. **Domain fit**: General vs specialized models
5. **Licensing**: Commercial use restrictions


## 🛠️ Setup
Let's install the required packages for embedding experiments.

In [None]:
# Install required packages
!pip install -q sentence-transformers openai python-dotenv
!pip install -q numpy matplotlib seaborn plotly
!pip install -q scikit-learn umap-learn  # For visualization
!pip install -q requests  # For API calls

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import PCA
import umap
import time
import requests
from dotenv import load_dotenv

# Embedding libraries
from sentence_transformers import SentenceTransformer
import openai

# Load environment variables
load_dotenv()

# Set up API keys (optional - some models work without)
openai.api_key = os.getenv("OPENAI_API_KEY")

print("✅ Setup complete!")
print("🔢 Ready to explore embedding models!")

## 📖 Exercise 1: Traditional vs Modern Embeddings Comparison

Let's start by comparing traditional approaches with modern embedding models.

In [None]:
# Sample sentences to demonstrate embedding concepts
sample_sentences = [
    "The cat sat on the mat",
    "A feline rested on the carpet",  # Similar meaning, different words
    "The dog ran in the park",
    "Machine learning is a subset of artificial intelligence",
    "AI encompasses machine learning as one of its components",  # Similar meaning
    "I love pizza with extra cheese",
    "The stock market crashed yesterday",
    "Python is a programming language",
    "The snake slithered through the grass",  # Same word "Python", different meaning
    "Financial markets experienced significant volatility"  # Similar to stock market
]

print("📝 Sample sentences for embedding comparison:")
for i, sentence in enumerate(sample_sentences, 1):
    print(f"   {i:2d}. {sentence}")

print(f"\n📊 We'll compare how different embedding approaches handle these {len(sample_sentences)} sentences")

In [None]:
# Simple bag-of-words approach (traditional)
from sklearn.feature_extraction.text import TfidfVectorizer

def traditional_tfidf_embeddings(sentences):
    """
    Create traditional TF-IDF embeddings for comparison
    """
    vectorizer = TfidfVectorizer(stop_words='english', lowercase=True)
    tfidf_matrix = vectorizer.fit_transform(sentences)
    
    return tfidf_matrix.toarray(), vectorizer.get_feature_names_out()

# Get traditional embeddings
print("🔄 Computing traditional TF-IDF embeddings...")
tfidf_embeddings, feature_names = traditional_tfidf_embeddings(sample_sentences)

print(f"   Embedding dimensions: {tfidf_embeddings.shape[1]}")
print(f"   Vocabulary size: {len(feature_names)}")
print(f"   Sample features: {list(feature_names[:10])}")

# Show sparsity
sparsity = (tfidf_embeddings == 0).mean()
print(f"   Sparsity: {sparsity:.1%} of values are zero")

In [None]:
# Modern sentence embedding model
print("🤖 Loading modern embedding model...")
# Using a lightweight but effective model
model_name = 'all-MiniLM-L6-v2'  # Fast, good performance
modern_model = SentenceTransformer(model_name)

print(f"   Model: {model_name}")
print(f"   Embedding dimensions: {modern_model.get_sentence_embedding_dimension()}")

# Get modern embeddings
print("\n🔄 Computing modern sentence embeddings...")
modern_embeddings = modern_model.encode(sample_sentences)

print(f"   Shape: {modern_embeddings.shape}")
print(f"   Dense: All {modern_embeddings.shape[1]} dimensions have values")

# Compare density
print(f"\n📊 Embedding comparison:")
print(f"   TF-IDF: {tfidf_embeddings.shape[1]} dims, {sparsity:.1%} sparse")
print(f"   Modern: {modern_embeddings.shape[1]} dims, 100% dense")

In [None]:
# Compare similarity between semantically similar sentences
def compare_similarities(embeddings, method_name, sentences):
    """
    Compare similarities between sentences using different embedding methods
    """
    # Calculate similarity matrix
    similarities = cosine_similarity(embeddings)
    
    print(f"\n🔍 {method_name} Similarity Analysis:")
    
    # Look at specific pairs that should be similar
    interesting_pairs = [
        (0, 1),  # "cat sat" vs "feline rested"
        (3, 4),  # "ML is subset of AI" vs "AI encompasses ML"
        (6, 9),  # "stock market crashed" vs "financial markets volatility"
        (7, 8),  # "Python programming" vs "snake slithered" (should be different!)
    ]
    
    for i, j in interesting_pairs:
        similarity = similarities[i][j]
        print(f"   '{sentences[i][:30]}...' vs '{sentences[j][:30]}...'")
        print(f"   Similarity: {similarity:.3f}")
        print()
    
    return similarities

# Compare both methods
tfidf_sim = compare_similarities(tfidf_embeddings, "TF-IDF", sample_sentences)
modern_sim = compare_similarities(modern_embeddings, "Modern Sentence Transformer", sample_sentences)

In [None]:
# Visualize similarity matrices
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# TF-IDF similarity heatmap
sns.heatmap(tfidf_sim, annot=True, fmt='.2f', cmap='Blues', 
            ax=ax1, cbar_kws={'label': 'Cosine Similarity'})
ax1.set_title('TF-IDF Similarity Matrix')
ax1.set_xlabel('Sentence Index')
ax1.set_ylabel('Sentence Index')

# Modern embedding similarity heatmap
sns.heatmap(modern_sim, annot=True, fmt='.2f', cmap='Reds',
            ax=ax2, cbar_kws={'label': 'Cosine Similarity'})
ax2.set_title('Modern Embedding Similarity Matrix')
ax2.set_xlabel('Sentence Index')
ax2.set_ylabel('Sentence Index')

plt.tight_layout()
plt.show()

print("\n📈 Key Observations:")
print("   • Modern embeddings better capture semantic similarity")
print("   • TF-IDF focuses on word overlap, misses meaning")
print("   • Modern models handle synonyms and paraphrasing better")
print("   • Context matters: 'Python' programming vs 'snake' properly distinguished")

## 🏆 Exercise 2: 2025 Model Comparison

Let's compare different state-of-the-art embedding models available in 2025.

In [None]:
# Define models to compare (using free/open-source models)
models_to_test = {
    'all-MiniLM-L6-v2': {
        'description': 'Lightweight, fast, good general performance',
        'dimensions': 384,
        'mteb_score': 'Mid-tier',
        'speed': 'Very Fast'
    },
    'all-mpnet-base-v2': {
        'description': 'Higher quality, slightly slower',
        'dimensions': 768, 
        'mteb_score': 'Good',
        'speed': 'Fast'
    },
    'paraphrase-multilingual-MiniLM-L12-v2': {
        'description': 'Multilingual support, good for diverse text',
        'dimensions': 384,
        'mteb_score': 'Good (multilingual)',
        'speed': 'Fast'
    }
}

print("🏆 2025 Embedding Models Comparison")
print("=" * 50)

# Load models and compare performance
loaded_models = {}
model_embeddings = {}
model_times = {}

# Test sentences for comparison
test_sentences = [
    "Artificial intelligence is transforming healthcare",
    "AI technology revolutionizes medical treatment",  # Similar meaning
    "The recipe calls for fresh basil and tomatoes",    # Different domain
    "Machine learning algorithms require large datasets",
    "Deep learning models need substantial training data", # Similar meaning
]

for model_name, model_info in models_to_test.items():
    print(f"\n🤖 Loading {model_name}...")
    try:
        # Load model
        model = SentenceTransformer(model_name)
        loaded_models[model_name] = model
        
        # Measure encoding time
        start_time = time.time()
        embeddings = model.encode(test_sentences)
        end_time = time.time()
        
        model_embeddings[model_name] = embeddings
        model_times[model_name] = end_time - start_time
        
        print(f"   ✅ Loaded successfully")
        print(f"   📐 Dimensions: {model_info['dimensions']}")
        print(f"   ⚡ Encoding time: {model_times[model_name]:.3f}s for {len(test_sentences)} sentences")
        print(f"   📊 MTEB Score: {model_info['mteb_score']}")
        
    except Exception as e:
        print(f"   ❌ Failed to load: {e}")

print(f"\n✅ Successfully loaded {len(loaded_models)} models")

In [None]:
# Compare model performance on semantic similarity
def evaluate_semantic_understanding(model_embeddings, model_name):
    """
    Evaluate how well a model captures semantic similarity
    """
    if model_name not in model_embeddings:
        return None
        
    embeddings = model_embeddings[model_name]
    similarities = cosine_similarity(embeddings)
    
    # Check specific semantic pairs
    ai_healthcare_similarity = similarities[0][1]  # AI healthcare vs AI medical
    ml_data_similarity = similarities[3][4]        # ML datasets vs DL training data
    unrelated_similarity = similarities[0][2]      # AI vs recipe (should be low)
    
    # Calculate semantic understanding score
    semantic_score = (ai_healthcare_similarity + ml_data_similarity - unrelated_similarity) / 2
    
    return {
        'semantic_score': semantic_score,
        'ai_similarity': ai_healthcare_similarity,
        'ml_similarity': ml_data_similarity,
        'unrelated_similarity': unrelated_similarity
    }

print("📊 SEMANTIC UNDERSTANDING EVALUATION")
print("=" * 45)

model_performance = {}

for model_name in loaded_models.keys():
    results = evaluate_semantic_understanding(model_embeddings, model_name)
    if results:
        model_performance[model_name] = results
        
        print(f"\n🤖 {model_name}:")
        print(f"   Semantic Score: {results['semantic_score']:.3f}")
        print(f"   AI-Healthcare similarity: {results['ai_similarity']:.3f}")
        print(f"   ML-Data similarity: {results['ml_similarity']:.3f}")
        print(f"   Unrelated similarity: {results['unrelated_similarity']:.3f}")
        print(f"   Encoding time: {model_times[model_name]:.3f}s")

In [None]:
# Visualize model comparison
if model_performance:
    # Prepare data for visualization
    model_names = list(model_performance.keys())
    semantic_scores = [model_performance[m]['semantic_score'] for m in model_names]
    encoding_times = [model_times[m] for m in model_names]
    dimensions = [models_to_test[m]['dimensions'] for m in model_names]
    
    # Create performance vs speed plot
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Semantic performance comparison
    bars1 = ax1.bar(range(len(model_names)), semantic_scores, 
                    color=['lightblue', 'lightgreen', 'lightcoral'])
    ax1.set_xlabel('Models')
    ax1.set_ylabel('Semantic Understanding Score')
    ax1.set_title('Model Semantic Performance')
    ax1.set_xticks(range(len(model_names)))
    ax1.set_xticklabels([name.split('-')[1] for name in model_names], rotation=45)
    
    # Add value labels on bars
    for bar, score in zip(bars1, semantic_scores):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                f'{score:.3f}', ha='center', va='bottom')
    
    # Performance vs Speed scatter plot
    scatter = ax2.scatter(encoding_times, semantic_scores, 
                         s=[d/2 for d in dimensions],  # Size by dimensions
                         c=range(len(model_names)), cmap='viridis', alpha=0.7)
    
    # Add model labels
    for i, name in enumerate(model_names):
        ax2.annotate(name.split('-')[1], 
                    (encoding_times[i], semantic_scores[i]),
                    xytext=(5, 5), textcoords='offset points', fontsize=8)
    
    ax2.set_xlabel('Encoding Time (seconds)')
    ax2.set_ylabel('Semantic Understanding Score')
    ax2.set_title('Performance vs Speed\n(Bubble size = dimensions)')
    
    plt.tight_layout()
    plt.show()
    
    # Find best trade-off
    # Calculate efficiency score (performance per unit time)
    efficiency_scores = [(semantic_scores[i] / encoding_times[i], model_names[i]) 
                        for i in range(len(model_names))]
    best_efficiency = max(efficiency_scores)
    
    print(f"\n🏆 Best performance/speed trade-off: {best_efficiency[1]}")
    print(f"   Efficiency score: {best_efficiency[0]:.2f}")

## 💰 Exercise 3: Cost vs Performance Analysis

Let's analyze the cost implications of different embedding approaches.

In [None]:
# Embedding model cost analysis (2025 pricing)
embedding_costs = {
    'OpenAI text-embedding-3-large': {
        'cost_per_1k_tokens': 0.00013,  # $0.13 per 1K tokens
        'dimensions': 3072,
        'mteb_score': 64.6,
        'context_length': 8192,
        'speed': 'Fast (API)',
        'type': 'API'
    },
    'OpenAI text-embedding-3-small': {
        'cost_per_1k_tokens': 0.00002,  # $0.02 per 1K tokens  
        'dimensions': 1536,
        'mteb_score': 62.3,
        'context_length': 8192,
        'speed': 'Very Fast (API)',
        'type': 'API'
    },
    'all-mpnet-base-v2': {
        'cost_per_1k_tokens': 0.0,  # Free (local)
        'dimensions': 768,
        'mteb_score': 57.8,
        'context_length': 512,
        'speed': 'Medium (local)',
        'type': 'Local'
    },
    'all-MiniLM-L6-v2': {
        'cost_per_1k_tokens': 0.0,  # Free (local)
        'dimensions': 384,
        'mteb_score': 56.3,
        'context_length': 512,
        'speed': 'Fast (local)',
        'type': 'Local'
    },
    'Voyage-3-large': {
        'cost_per_1k_tokens': 0.00013,  # Similar to OpenAI large
        'dimensions': 1024,
        'mteb_score': 68.0,  # Estimated
        'context_length': 32000,
        'speed': 'Fast (API)',
        'type': 'API'
    },
    'Voyage-3-lite': {
        'cost_per_1k_tokens': 0.000025,  # 1/5 of OpenAI large
        'dimensions': 512,
        'mteb_score': 64.0,  # Close to OpenAI large
        'context_length': 32000,
        'speed': 'Very Fast (API)',
        'type': 'API'
    }
}

def calculate_embedding_costs(models, document_count, avg_tokens_per_doc):
    """
    Calculate costs for different embedding models
    """
    total_tokens = document_count * avg_tokens_per_doc
    
    print(f"💰 COST ANALYSIS FOR {document_count:,} DOCUMENTS")
    print(f"📊 Average tokens per document: {avg_tokens_per_doc}")
    print(f"🔢 Total tokens to embed: {total_tokens:,}")
    print("=" * 60)
    
    cost_analysis = {}
    
    for model_name, info in models.items():
        # Calculate API costs
        api_cost = (total_tokens / 1000) * info['cost_per_1k_tokens']
        
        # Estimate local costs (electricity, hardware amortization)
        if info['type'] == 'Local':
            # Rough estimate: $0.10/hour for local GPU, processing speed estimates
            estimated_hours = total_tokens / 50000  # Rough throughput estimate
            local_cost = estimated_hours * 0.10
            total_cost = local_cost
            cost_note = f"(Local: ~${local_cost:.4f})"
        else:
            total_cost = api_cost
            cost_note = "(API)"
        
        cost_analysis[model_name] = {
            'total_cost': total_cost,
            'cost_per_1k_docs': (total_cost / document_count) * 1000,
            'mteb_score': info['mteb_score'],
            'dimensions': info['dimensions'],
            'type': info['type']
        }
        
        print(f"\n🤖 {model_name}:")
        print(f"   Total cost: ${total_cost:.4f} {cost_note}")
        print(f"   Cost per 1K docs: ${cost_analysis[model_name]['cost_per_1k_docs']:.4f}")
        print(f"   MTEB score: {info['mteb_score']}")
        print(f"   Quality/$ ratio: {info['mteb_score']/max(total_cost, 0.0001):.0f}")
    
    return cost_analysis

# Example scenarios
scenarios = [
    (1000, 200),      # Small: 1K docs, 200 tokens each
    (50000, 300),     # Medium: 50K docs, 300 tokens each  
    (1000000, 400),   # Large: 1M docs, 400 tokens each
]

for doc_count, avg_tokens in scenarios:
    cost_analysis = calculate_embedding_costs(embedding_costs, doc_count, avg_tokens)
    print("\n" + "="*80 + "\n")

In [None]:
# Visualize cost vs performance trade-offs
def create_cost_performance_chart(embedding_costs, scenario_docs=50000, scenario_tokens=300):
    """
    Create interactive chart showing cost vs performance
    """
    total_tokens = scenario_docs * scenario_tokens
    
    model_names = []
    costs = []
    performances = []
    dimensions = []
    types = []
    
    for model_name, info in embedding_costs.items():
        # Calculate cost
        if info['type'] == 'Local':
            estimated_hours = total_tokens / 50000
            cost = estimated_hours * 0.10
        else:
            cost = (total_tokens / 1000) * info['cost_per_1k_tokens']
        
        model_names.append(model_name)
        costs.append(max(cost, 0.001))  # Minimum for log scale
        performances.append(info['mteb_score'])
        dimensions.append(info['dimensions'])
        types.append(info['type'])
    
    # Create interactive plotly chart
    fig = px.scatter(
        x=costs,
        y=performances,
        size=dimensions,
        color=types,
        hover_name=model_names,
        hover_data={
            'Dimensions': dimensions,
            'Cost ($)': ['${:.4f}'.format(c) for c in costs],
            'MTEB Score': performances
        },
        title=f'Embedding Models: Cost vs Performance<br>Scenario: {scenario_docs:,} docs × {scenario_tokens} tokens',
        labels={'x': 'Cost ($)', 'y': 'MTEB Performance Score'},
        size_max=50
    )
    
    fig.update_xaxes(type='log', title='Cost ($, log scale)')
    fig.update_layout(
        width=800,
        height=600,
        showlegend=True
    )
    
    # Add annotations for key insights
    fig.add_annotation(
        x=0.02, y=0.98,
        xref="paper", yref="paper",
        text="🎯 Top-right: High performance, high cost<br>🔥 Bottom-left: Low performance, low cost<br>⭐ Top-left: Best value!",
        showarrow=False,
        font=dict(size=10),
        bgcolor="rgba(255,255,255,0.8)",
        bordercolor="black",
        borderwidth=1
    )
    
    fig.show()
    
    # Calculate value score (performance per dollar)
    value_scores = [(performances[i] / costs[i], model_names[i]) for i in range(len(model_names))]
    best_value = max(value_scores)
    
    print(f"\n💎 Best value model: {best_value[1]}")
    print(f"   Value score: {best_value[0]:.0f} MTEB points per dollar")
    
    return fig

# Create the visualization
cost_perf_chart = create_cost_performance_chart(embedding_costs)

## 🔍 Exercise 4: Domain-Specific Model Selection

Let's explore how to choose models for different domains and use cases.

In [None]:
# Domain-specific text samples
domain_samples = {
    'Technical Documentation': [
        "Configure the API endpoint with proper authentication headers",
        "Initialize the database connection pool with connection pooling",
        "Implement error handling for network timeouts and retries"
    ],
    'Medical/Scientific': [
        "The patient presents with acute myocardial infarction symptoms",
        "Administer 300mg aspirin followed by anticoagulation therapy", 
        "Monitor cardiac enzymes and troponin levels every 6 hours"
    ],
    'Legal Documents': [
        "The defendant hereby waives all rights to appeal this decision",
        "In accordance with section 1542 of the civil code provisions",
        "This contract shall be governed by the laws of California"
    ],
    'Business/Financial': [
        "Quarterly revenue increased by 15% year-over-year",
        "EBITDA margins improved due to operational efficiency gains",
        "Working capital requirements decreased through inventory optimization"
    ],
    'General Knowledge': [
        "The weather forecast predicts rain this weekend",
        "My favorite restaurant serves excellent Italian cuisine",
        "The movie received positive reviews from critics"
    ]
}

def test_domain_performance(model, domain_samples):
    """
    Test how well a model handles different domains
    """
    domain_results = {}
    
    for domain, samples in domain_samples.items():
        # Encode samples
        embeddings = model.encode(samples)
        
        # Calculate intra-domain similarity (how similar are samples within domain)
        similarities = cosine_similarity(embeddings)
        
        # Get average similarity (excluding diagonal)
        mask = np.ones_like(similarities, dtype=bool)
        np.fill_diagonal(mask, False)
        avg_similarity = similarities[mask].mean()
        
        # Calculate embedding variance (how diverse are the embeddings)
        embedding_variance = np.var(embeddings.mean(axis=0))
        
        domain_results[domain] = {
            'avg_intra_similarity': avg_similarity,
            'embedding_variance': embedding_variance,
            'domain_coherence': avg_similarity  # Simple metric
        }
    
    return domain_results

print("🔍 DOMAIN-SPECIFIC PERFORMANCE ANALYSIS")
print("=" * 50)

# Test available models on different domains
for model_name, model in loaded_models.items():
    print(f"\n🤖 Testing {model_name}:")
    
    domain_results = test_domain_performance(model, domain_samples)
    
    # Sort domains by coherence score
    sorted_domains = sorted(domain_results.items(), 
                           key=lambda x: x[1]['domain_coherence'], reverse=True)
    
    print(f"   Domain performance ranking:")
    for i, (domain, results) in enumerate(sorted_domains, 1):
        coherence = results['domain_coherence']
        print(f"   {i}. {domain}: {coherence:.3f} coherence")
    
    # Best and worst domains
    best_domain = sorted_domains[0][0]
    worst_domain = sorted_domains[-1][0]
    
    print(f"   ✅ Best: {best_domain}")
    print(f"   ⚠️  Challenging: {worst_domain}")

In [None]:
# Model selection decision framework
def recommend_embedding_model(use_case_params):
    """
    Recommend embedding model based on use case parameters
    """
    budget = use_case_params.get('budget', 'medium')  # low, medium, high
    performance_req = use_case_params.get('performance', 'medium')  # low, medium, high
    domain = use_case_params.get('domain', 'general')
    volume = use_case_params.get('volume', 'medium')  # low, medium, high
    latency_req = use_case_params.get('latency', 'medium')  # low, medium, high
    
    recommendations = []
    
    # High performance requirements
    if performance_req == 'high':
        if budget == 'high':
            recommendations.append({
                'model': 'OpenAI text-embedding-3-large',
                'reason': 'Highest quality, latest technology',
                'score': 95
            })
            recommendations.append({
                'model': 'Voyage-3-large', 
                'reason': 'Excellent performance, large context window',
                'score': 90
            })
        else:
            recommendations.append({
                'model': 'Voyage-3-lite',
                'reason': 'Great performance at 1/5 the cost of OpenAI large',
                'score': 85
            })
    
    # Budget constraints
    if budget == 'low' or volume == 'high':
        recommendations.append({
            'model': 'all-mpnet-base-v2',
            'reason': 'Free, good performance, local deployment',
            'score': 80
        })
        recommendations.append({
            'model': 'all-MiniLM-L6-v2',
            'reason': 'Fast, lightweight, free',
            'score': 75
        })
    
    # Speed requirements  
    if latency_req == 'high':
        recommendations.append({
            'model': 'OpenAI text-embedding-3-small',
            'reason': 'Fast API, good quality, reasonable cost',
            'score': 80
        })
    
    # Domain-specific adjustments
    if domain in ['medical', 'legal', 'scientific']:
        for rec in recommendations:
            if 'OpenAI' in rec['model'] or 'Voyage' in rec['model']:
                rec['score'] += 5  # Boost API models for specialized domains
                rec['reason'] += ' (better for specialized domains)'
    
    # Sort by score and return top recommendations
    recommendations.sort(key=lambda x: x['score'], reverse=True)
    return recommendations[:3]

# Test different use cases
use_cases = [
    {
        'name': 'Startup MVP',
        'params': {'budget': 'low', 'performance': 'medium', 'volume': 'low', 'domain': 'general'}
    },
    {
        'name': 'Enterprise Legal Search', 
        'params': {'budget': 'high', 'performance': 'high', 'volume': 'high', 'domain': 'legal'}
    },
    {
        'name': 'Medical Research Platform',
        'params': {'budget': 'medium', 'performance': 'high', 'volume': 'medium', 'domain': 'medical'}
    },
    {
        'name': 'Real-time Customer Support',
        'params': {'budget': 'medium', 'performance': 'medium', 'volume': 'medium', 'latency': 'high'}
    }
]

print("🎯 MODEL SELECTION RECOMMENDATIONS")
print("=" * 45)

for use_case in use_cases:
    print(f"\n📋 Use Case: {use_case['name']}")
    print(f"   Parameters: {use_case['params']}")
    
    recommendations = recommend_embedding_model(use_case['params'])
    
    print(f"   Recommendations:")
    for i, rec in enumerate(recommendations, 1):
        print(f"   {i}. {rec['model']} (Score: {rec['score']})")
        print(f"      Reason: {rec['reason']}")

## 🧠 Key Takeaways

From this module, you should now understand:

### 🔄 Evolution of Embeddings:
1. **Traditional methods** (TF-IDF) focus on word overlap, miss semantic meaning
2. **Modern embeddings** capture context and semantic relationships
3. **2025 models** are optimized for specific tasks like retrieval and similarity
4. **Contextual understanding** allows same words to have different meanings

### 🏆 2025 Model Landscape:
- **Performance leaders**: NV-Embed-v2, Voyage-3, OpenAI text-embedding-3
- **Best value**: Open-source models like all-mpnet-base-v2 for budget-conscious applications
- **Speed champions**: all-MiniLM-L6-v2, OpenAI text-embedding-3-small
- **Specialized options**: Domain-specific models emerging

### 💰 Cost Considerations:
1. **API models**: Pay per token, scale automatically, latest technology
2. **Local models**: Free after setup, require infrastructure, data stays private
3. **Volume matters**: High-volume applications favor local deployment
4. **Quality vs cost**: 10x cost difference may yield only 10% quality improvement

### 🎯 Selection Framework:
- **High performance + budget**: OpenAI text-embedding-3-large, Voyage-3
- **Good performance + budget conscious**: Voyage-3-lite, OpenAI small
- **Local deployment**: all-mpnet-base-v2, all-MiniLM-L6-v2
- **Specialized domains**: Prefer API models with larger training data

### 🔍 Practical Guidelines:
1. **Start simple**: Use all-MiniLM-L6-v2 for prototyping
2. **Measure what matters**: Test on your actual data and use cases
3. **Consider total cost**: Include infrastructure, maintenance, opportunity cost
4. **Plan for scale**: Choose models that can grow with your application

## 🎯 Next Steps

In **Module 5**, we'll work hands-on with embeddings:
- Generate and manipulate embedding vectors
- Calculate and interpret similarity metrics
- Visualize high-dimensional embedding spaces
- Build semantic search functionality
- Implement efficient similarity computation

Understanding model characteristics helps you make the right choice for your RAG system!

## 🤔 Discussion Questions

1. For your specific use case, what factors would be most important in model selection?
2. How would you evaluate whether the cost of a premium model is justified?
3. What strategies would you use to transition from a local model to an API model as you scale?
4. How might embedding model choice impact downstream RAG performance?

## 📝 Optional Exercise

**Advanced Challenge**: 
1. Choose a domain relevant to your work
2. Create a test set of 20-30 text samples from that domain
3. Test 3 different embedding models on your test set
4. Evaluate which model best captures the semantic relationships in your domain
5. Perform a cost-benefit analysis for your expected usage volume

This will give you practical experience choosing models for real-world applications!