# Module 05 - Notebook 02: OpenAI Embeddings

## Learning Objectives
- Use OpenAI's Embeddings API
- Compare different embedding models
- Implement batch processing for efficiency
- Handle costs and rate limits

---

## 1. OpenAI Embedding Models

OpenAI offers several embedding models:

| Model | Dimensions | Cost | Use Case |
|-------|------------|------|----------|
| text-embedding-3-small | 1536 | $0.02/1M tokens | General purpose, cost-effective |
| text-embedding-3-large | 3072 | $0.13/1M tokens | Higher quality, more expensive |
| text-embedding-ada-002 | 1536 | $0.10/1M tokens | Legacy, still widely used |

## 2. Setup

In [None]:
!pip install -q openai python-dotenv numpy scikit-learn

In [None]:
import os
from openai import OpenAI
from dotenv import load_dotenv
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import time

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("âœ“ OpenAI client initialized")

## 3. Creating Embeddings

In [None]:
# Single text embedding
text = "Artificial intelligence is transforming the world."

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=text
)

embedding = response.data[0].embedding

print(f"Text: {text}")
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 10 values: {embedding[:10]}")
print(f"\nTokens used: {response.usage.total_tokens}")

## 4. Batch Processing

In [None]:
# Process multiple texts at once
texts = [
    "Machine learning is a type of artificial intelligence.",
    "Deep learning uses neural networks with many layers.",
    "Natural language processing enables computers to understand text.",
    "Computer vision allows machines to interpret images.",
    "Reinforcement learning teaches agents through rewards."
]

# Batch API call (more efficient than individual calls)
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

embeddings = [item.embedding for item in response.data]

print(f"Created {len(embeddings)} embeddings")
print(f"Total tokens used: {response.usage.total_tokens}")
print(f"Shape: ({len(embeddings)}, {len(embeddings[0])})")

## 5. Semantic Search with OpenAI

In [None]:
# Knowledge base
documents = [
    "Python is a high-level programming language.",
    "JavaScript is the language of the web.",
    "Machine learning models learn from data.",
    "SQL is used for database queries.",
    "React is a JavaScript library for building UIs.",
    "TensorFlow is a machine learning framework.",
    "Docker containers package applications.",
    "Kubernetes orchestrates container deployments."
]

query = "I want to build AI models"

# Create embeddings
all_texts = documents + [query]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=all_texts
)

doc_embeddings = np.array([item.embedding for item in response.data[:-1]])
query_embedding = np.array([response.data[-1].embedding])

# Calculate similarities
similarities = cosine_similarity(query_embedding, doc_embeddings)[0]

# Rank results
ranked_indices = np.argsort(similarities)[::-1]

print(f"Query: '{query}'\n")
print("Top Results:\n")
for rank, idx in enumerate(ranked_indices[:3], 1):
    print(f"{rank}. [{similarities[idx]:.3f}] {documents[idx]}")

## 6. Model Comparison

In [None]:
# Compare small vs large models
test_text = "Quantum computing uses qubits for parallel processing."

models = [
    "text-embedding-3-small",
    "text-embedding-3-large"
]

for model in models:
    start = time.time()
    
    response = client.embeddings.create(
        model=model,
        input=test_text
    )
    
    elapsed = time.time() - start
    embedding = response.data[0].embedding
    
    print(f"\nModel: {model}")
    print(f"  Dimensions: {len(embedding)}")
    print(f"  Tokens: {response.usage.total_tokens}")
    print(f"  Time: {elapsed:.3f}s")

## 7. Cost Estimation

In [None]:
def estimate_embedding_cost(texts: list, model: str = "text-embedding-3-small") -> dict:
    """
    Estimate cost of embedding generation.
    """
    # Pricing per 1M tokens
    pricing = {
        "text-embedding-3-small": 0.02,
        "text-embedding-3-large": 0.13,
        "text-embedding-ada-002": 0.10
    }
    
    # Estimate tokens (rough: ~0.75 tokens per word)
    total_words = sum(len(text.split()) for text in texts)
    estimated_tokens = int(total_words * 0.75)
    
    # Calculate cost
    cost_per_token = pricing[model] / 1_000_000
    estimated_cost = estimated_tokens * cost_per_token
    
    return {
        "texts": len(texts),
        "estimated_tokens": estimated_tokens,
        "estimated_cost_usd": estimated_cost,
        "model": model
    }

# Example: Estimate cost for a document set
sample_docs = [f"This is document number {i}" for i in range(1000)]
estimate = estimate_embedding_cost(sample_docs)

print(f"Cost Estimate:")
print(f"  Documents: {estimate['texts']:,}")
print(f"  Estimated Tokens: {estimate['estimated_tokens']:,}")
print(f"  Estimated Cost: ${estimate['estimated_cost_usd']:.4f}")
print(f"  Model: {estimate['model']}")

## 8. Caching Strategy

In [None]:
import hashlib
import json

class EmbeddingCache:
    """Simple cache for embeddings to avoid recomputing."""
    
    def __init__(self):
        self.cache = {}
    
    def _hash(self, text: str, model: str) -> str:
        """Create cache key."""
        key = f"{model}:{text}"
        return hashlib.md5(key.encode()).hexdigest()
    
    def get(self, text: str, model: str):
        """Get cached embedding."""
        key = self._hash(text, model)
        return self.cache.get(key)
    
    def set(self, text: str, model: str, embedding: list):
        """Store embedding in cache."""
        key = self._hash(text, model)
        self.cache[key] = embedding
    
    def stats(self) -> dict:
        """Get cache statistics."""
        return {
            "entries": len(self.cache),
            "memory_mb": sum(len(json.dumps(v)) for v in self.cache.values()) / 1024 / 1024
        }

# Test caching
cache = EmbeddingCache()

def get_embedding_with_cache(text: str, model: str = "text-embedding-3-small"):
    """Get embedding with caching."""
    # Check cache
    cached = cache.get(text, model)
    if cached:
        print("  âœ“ Cache hit")
        return cached
    
    # Compute
    print("  â†’ API call")
    response = client.embeddings.create(model=model, input=text)
    embedding = response.data[0].embedding
    
    # Store
    cache.set(text, model, embedding)
    return embedding

# Test
test_text = "Caching saves API calls and money"

print("First call:")
emb1 = get_embedding_with_cache(test_text)

print("\nSecond call (same text):")
emb2 = get_embedding_with_cache(test_text)

print(f"\nCache stats: {cache.stats()}")

## Exercise: Build a Semantic Search Engine

Create a simple semantic search engine using OpenAI embeddings.

In [None]:
# TODO: Complete this exercise
class SemanticSearchEngine:
    """
    A simple semantic search engine.
    """
    
    def __init__(self, model: str = "text-embedding-3-small"):
        self.model = model
        self.documents = []
        self.embeddings = []
    
    def add_documents(self, documents: list):
        """Add documents to the search engine."""
        # TODO: Implement
        # 1. Create embeddings for documents
        # 2. Store documents and embeddings
        pass
    
    def search(self, query: str, top_k: int = 3):
        """Search for most relevant documents."""
        # TODO: Implement
        # 1. Create query embedding
        # 2. Calculate similarities
        # 3. Return top_k results
        pass

# Test your implementation
# engine = SemanticSearchEngine()
# engine.add_documents([...])
# results = engine.search("your query")
# print(results)

## Summary

You learned:
- âœ… Using OpenAI Embeddings API
- âœ… Batch processing for efficiency
- âœ… Comparing embedding models
- âœ… Cost estimation and optimization
- âœ… Caching strategies

## Best Practices
1. **Batch requests** when possible
2. **Cache embeddings** to save costs
3. **Use small model** unless you need highest quality
4. **Estimate costs** before large operations
5. **Handle rate limits** with retry logic

## Next Steps
- ðŸ“˜ Notebook 03: Vector Databases