<div style="
  background-color:#0F172A;
  border-left:5px solid #3B82F6;
  padding:16px;
  border-radius:10px;
">

## Semantic Caching Tutorial - Simple Step-by-Step Guide

Learn how semantic caching works with a hands-on example!

**What you'll learn:**
- How to create embeddings for questions
- How to find similar questions using cosine similarity
- How cache hits and misses work
- How threshold affects matching

</div>

## Step 1: Install and Import Libraries

We only need OpenAI for embeddings and NumPy for calculations.

In [1]:
import os
import numpy as np
from openai import OpenAI
from dotenv import load_dotenv

# Load API key
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## Step 2: Create Sample Q&A Cache

Let's create a simple cache with 5 question-answer pairs.

In [3]:
# Our cached Q&A pairs
cache = [
    {
        "question": "How do I reset my password?",
        "answer": "Go to Settings > Account > Reset Password. You'll receive a reset link via email."
    },
    {
        "question": "What are the pricing plans?",
        "answer": "We offer three plans: Free ($0), Pro ($12/month), and Enterprise (custom pricing)."
    },
    {
        "question": "How do I create a new project?",
        "answer": "Click the '+' button in the top right, select 'New Project', enter the details, and click 'Create'."
    },
    {
        "question": "Can I export my data?",
        "answer": "Yes! Go to Settings > Export Data and choose from CSV, JSON, or PDF formats."
    },
    {
        "question": "How do I add team members?",
        "answer": "Open your project, click 'Team', then 'Invite Members'. Enter their email addresses and set permissions."
    }
]

## Step 3: Create Embeddings for Cached Questions

Convert each question into a vector (embedding) using OpenAI.

In [5]:
models = ["text-embedding-ada-002", "text-embedding-3-small", "text-embedding-3-large"]
def get_embedding(text):
    """Get embedding for a text using OpenAI."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return np.array(response.data[0].embedding)

# Create embeddings for all cached questions
print("üî¢ Creating embeddings for cached questions...")
for pair in cache:
    pair['embedding'] = get_embedding(pair['question'])
    
print(f"‚úÖ Created {len(cache)} embeddings")
print(f"   Embedding dimension: {len(cache[0]['embedding'])}")
print(f"   First few values: {cache[0]['embedding'][:5]}")

üî¢ Creating embeddings for cached questions...
‚úÖ Created 5 embeddings
   Embedding dimension: 1536
   First few values: [ 0.01763899 -0.04572302  0.02976723  0.02191615 -0.04865132]


## Step 4: Define Similarity Functions

We'll use cosine distance to measure how similar two questions are.

In [6]:
def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors."""
    dot_product = np.dot(a, b)
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)
    return dot_product / (norm_a * norm_b)

def cosine_distance(a, b):
    """Calculate cosine distance (1 - similarity)."""
    return 1 - cosine_similarity(a, b)

# Test with two identical questions
test_embedding = cache[0]['embedding']
distance = cosine_distance(test_embedding, test_embedding)
print(f"‚úÖ Functions defined!")
print(f"   Distance between identical questions: {distance:.6f}")
print(f"   (Should be ~0.0)")

‚úÖ Functions defined!
   Distance between identical questions: 0.000000
   (Should be ~0.0)


## Step 5: Implement Cache Lookup Function

This function searches the cache for similar questions.

In [7]:
def search_cache(query, threshold=0.3):
    """
    Search cache for similar questions.
    
    Args:
        query: User's question
        threshold: Maximum distance for a match (lower = stricter)
    
    Returns:
        dict with match info or None
    """
    # Get embedding for the query
    query_embedding = get_embedding(query)
    
    # Calculate distances to all cached questions
    results = []
    for pair in cache:
        distance = cosine_distance(query_embedding, pair['embedding'])
        similarity = 1 - distance
        results.append({
            'question': pair['question'],
            'answer': pair['answer'],
            'distance': distance,
            'similarity': similarity
        })
    
    # Sort by distance (closest first)
    results.sort(key=lambda x: x['distance'])
    
    # Return best match if within threshold
    best_match = results[0]
    if best_match['distance'] <= threshold:
        return {
            'hit': True,
            'matched_question': best_match['question'],
            'answer': best_match['answer'],
            'distance': best_match['distance'],
            'similarity': best_match['similarity'],
            'all_distances': [(r['question'], r['distance']) for r in results]
        }
    else:
        return {
            'hit': False,
            'best_match': best_match['question'],
            'distance': best_match['distance'],
            'similarity': best_match['similarity'],
            'all_distances': [(r['question'], r['distance']) for r in results]
        }

print("‚úÖ Cache search function ready!")

‚úÖ Cache search function ready!


## Step 6: Test Cache Hits (Exact and Semantic Matches)

Let's try some queries and see cache hits!

In [8]:
# Test 1: Exact match
print("="*60)
print("TEST 1: Exact Match")
print("="*60)

query1 = "How do I reset my password?"
result1 = search_cache(query1, threshold=0.3)

print(f"\nüîç Query: {query1}")
if result1['hit']:
    print(f"‚úÖ CACHE HIT!")
    print(f"   Matched: {result1['matched_question']}")
    print(f"   Distance: {result1['distance']:.4f}")
    print(f"   Similarity: {result1['similarity']:.2%}")
    print(f"\nüí¨ Answer: {result1['answer']}")
else:
    print(f"‚ùå CACHE MISS")

TEST 1: Exact Match

üîç Query: How do I reset my password?
‚úÖ CACHE HIT!
   Matched: How do I reset my password?
   Distance: 0.0000
   Similarity: 100.00%

üí¨ Answer: Go to Settings > Account > Reset Password. You'll receive a reset link via email.


In [9]:
# Test 2: Semantic variation (similar meaning, different words)
print("\n" + "="*60)
print("TEST 2: Semantic Variation")
print("="*60)

query2 = "What's the cost?"
result2 = search_cache(query2, threshold=0.3)

print(f"\nüîç Query: {query2}")
if result2['hit']:
    print(f"‚úÖ CACHE HIT!")
    print(f"   Matched: {result2['matched_question']}")
    print(f"   Distance: {result2['distance']:.4f}")
    print(f"   Similarity: {result2['similarity']:.2%}")
    print(f"\nüí¨ Answer: {result2['answer']}")
else:
    print(f"‚ùå CACHE MISS")
    print(f"   Closest match: {result2['best_match']}")
    print(f"   Distance: {result2['distance']:.4f}")
    print(f"   (Above threshold of 0.3)")


TEST 2: Semantic Variation

üîç Query: What's the cost?
‚ùå CACHE MISS
   Closest match: What are the pricing plans?
   Distance: 0.5244
   (Above threshold of 0.3)


In [10]:
# Test 3: Another semantic variation
print("\n" + "="*60)
print("TEST 3: Another Semantic Variation")
print("="*60)

query3 = "How to make a new project?"
result3 = search_cache(query3, threshold=0.3)

print(f"\nüîç Query: {query3}")
if result3['hit']:
    print(f"‚úÖ CACHE HIT!")
    print(f"   Matched: {result3['matched_question']}")
    print(f"   Distance: {result3['distance']:.4f}")
    print(f"   Similarity: {result3['similarity']:.2%}")
    print(f"\nüí¨ Answer: {result3['answer']}")
else:
    print(f"‚ùå CACHE MISS")


TEST 3: Another Semantic Variation

üîç Query: How to make a new project?
‚úÖ CACHE HIT!
   Matched: How do I create a new project?
   Distance: 0.0807
   Similarity: 91.93%

üí¨ Answer: Click the '+' button in the top right, select 'New Project', enter the details, and click 'Create'.


## Step 7: Test Cache Miss (Completely Different Question)

Let's try a question that's NOT in the cache.

In [11]:
print("="*60)
print("TEST 4: Completely Different Question (Cache Miss)")
print("="*60)

query4 = "What's the weather like today?"
result4 = search_cache(query4, threshold=0.3)

print(f"\nüîç Query: {query4}")
if result4['hit']:
    print(f"‚úÖ CACHE HIT!")
    print(f"   Matched: {result4['matched_question']}")
else:
    print(f"‚ùå CACHE MISS")
    print(f"   Closest match: {result4['best_match']}")
    print(f"   Distance: {result4['distance']:.4f}")
    print(f"   (Above threshold of 0.3)")
    
print(f"\nüìä All distances:")
for question, distance in result4['all_distances']:
    print(f"   {distance:.4f} - {question}")

TEST 4: Completely Different Question (Cache Miss)

üîç Query: What's the weather like today?
‚ùå CACHE MISS
   Closest match: What are the pricing plans?
   Distance: 0.8844
   (Above threshold of 0.3)

üìä All distances:
   0.8844 - What are the pricing plans?
   0.8962 - Can I export my data?
   0.9296 - How do I add team members?
   0.9444 - How do I reset my password?
   0.9554 - How do I create a new project?


## Step 8: Experiment with Different Thresholds

See how threshold affects matching!

In [13]:
print("="*60)
print("THRESHOLD EXPERIMENTS")
print("="*60)

test_query = "What's the cost?"
thresholds = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

print(f"\nüîç Query: {test_query}\n")

for threshold in thresholds:
    result = search_cache(test_query, threshold=threshold)
    status = "‚úÖ HIT" if result['hit'] else "‚ùå MISS"
    print(f"Threshold {threshold:.1f}: {status} (distance: {result['distance']:.4f})")

THRESHOLD EXPERIMENTS

üîç Query: What's the cost?

Threshold 0.1: ‚ùå MISS (distance: 0.5244)
Threshold 0.2: ‚ùå MISS (distance: 0.5244)
Threshold 0.3: ‚ùå MISS (distance: 0.5244)
Threshold 0.4: ‚ùå MISS (distance: 0.5244)
Threshold 0.5: ‚ùå MISS (distance: 0.5244)
Threshold 0.6: ‚úÖ HIT (distance: 0.5244)


## Step 9: Interactive Test Function

Try your own questions!

In [15]:
def test_query(query, threshold=0.35, show_all=False):
    """Test a query against the cache."""
    print("="*60)
    result = search_cache(query, threshold=threshold)
    
    print(f"üîç Query: {query}")
    print(f"üéØ Threshold: {threshold}")
    print()
    
    if result['hit']:
        print(f"‚úÖ CACHE HIT!")
        print(f"   Matched question: {result['matched_question']}")
        print(f"   Distance: {result['distance']:.4f}")
        print(f"   Similarity: {result['similarity']:.2%}")
        print(f"\nüí¨ Answer:\n   {result['answer']}")
    else:
        print(f"‚ùå CACHE MISS")
        print(f"   Closest match: {result['best_match']}")
        print(f"   Distance: {result['distance']:.4f}")
        print(f"   Similarity: {result['similarity']:.2%}")
        print(f"   (Exceeds threshold of {threshold})")
    
    if show_all:
        print(f"\nüìä All cached questions by distance:")
        for q, dist in result['all_distances']:
            marker = "‚úì" if dist <= threshold else "‚úó"
            print(f"   {marker} {dist:.4f} - {q}")
    
    print("="*60)

# Try it out!
test_query("How do I invite team members?", threshold=0.3, show_all=True)

üîç Query: How do I invite team members?
üéØ Threshold: 0.3

‚úÖ CACHE HIT!
   Matched question: How do I add team members?
   Distance: 0.1642
   Similarity: 83.58%

üí¨ Answer:
   Open your project, click 'Team', then 'Invite Members'. Enter their email addresses and set permissions.

üìä All cached questions by distance:
   ‚úì 0.1642 - How do I add team members?
   ‚úó 0.5830 - How do I create a new project?
   ‚úó 0.7491 - How do I reset my password?
   ‚úó 0.7579 - Can I export my data?
   ‚úó 0.7896 - What are the pricing plans?


## Try Your Own Questions!

Use the cell below to test any question:

In [16]:
# Try your own question here!
test_query(
    query="Can I download my data?",  # ‚Üê Change this
    threshold=0.35,                    # ‚Üê Adjust this (0.1 = strict, 0.5 = loose)
    show_all=True                      # Show all distances
)

üîç Query: Can I download my data?
üéØ Threshold: 0.35

‚úÖ CACHE HIT!
   Matched question: Can I export my data?
   Distance: 0.2459
   Similarity: 75.41%

üí¨ Answer:
   Yes! Go to Settings > Export Data and choose from CSV, JSON, or PDF formats.

üìä All cached questions by distance:
   ‚úì 0.2459 - Can I export my data?
   ‚úó 0.6819 - How do I reset my password?
   ‚úó 0.7375 - What are the pricing plans?
   ‚úó 0.7400 - How do I create a new project?
   ‚úó 0.7820 - How do I add team members?


## üéì Key Takeaways

### What We Learned:

1. **Embeddings**: Questions are converted to vectors (1536 dimensions)
2. **Cosine Distance**: Measures how different two questions are (0 = identical, 2 = opposite)
3. **Threshold**: Controls matching strictness
   - Low (0.1-0.2): Only near-identical matches
   - Medium (0.3-0.4): Semantic variations match
   - High (0.5+): Loose matching (more false positives)

4. **Cache Hit**: When distance < threshold ‚Üí Return cached answer ‚ö°
5. **Cache Miss**: When distance > threshold ‚Üí Need to generate new answer üîç

### Performance Benefits:
- **Cache Hit**: ~100ms (just embedding lookup)
- **Cache Miss**: ~3-5s (need LLM to generate answer)
- **Speedup**: 30-50x faster for cached queries!

### Best Practices:
- Start with threshold ~0.35 (balanced)
- Monitor false positives (wrong matches)
- Monitor false negatives (missed matches)
- Adjust threshold based on your use case
- Store cache persistently (database/file)
- Update cache with new Q&A pairs over time