# Understanding Prompt Caching in OpenAI and Anthropic

This notebook demonstrates how to:
1. Monitor OpenAI's automatic prompt caching
2. Implement manual prompt caching with Anthropic's API
3. Compare the cost savings between different approaches

## Setup

First, let's install the required packages and set up our API keys:

In [3]:
%pip install openai anthropic

Note: you may need to restart the kernel to use updated packages.


In [4]:
import os
import openai
import anthropic
import getpass
import time

In [6]:
# Get OpenAI API key if not already set
if not os.getenv('OPENAI_API_KEY'):
    openai.api_key = getpass.getpass('Enter OpenAI API key: ')
else:
    openai.api_key = os.getenv('OPENAI_API_KEY')

# Get Anthropic API key if not already set 
if not os.getenv('ANTHROPIC_API_KEY'):
    anthropic_client = anthropic.Anthropic(api_key=getpass.getpass('Enter Anthropic API key: '))
else:
    anthropic_client = anthropic.Anthropic(api_key=os.getenv('ANTHROPIC_API_KEY'))

## Part 1: OpenAI Automatic Prompt Caching

OpenAI automatically caches prompts longer than 1,024 tokens. Let's create a function to demonstrate this:

In [15]:
def check_openai_caching(system_prompt, user_prompt):
    """Make an OpenAI API call and check cache usage"""
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    
    usage = response.usage
    print(f"Total tokens: {usage.total_tokens}")
    print(f"Prompt tokens: {usage.prompt_tokens}")
    print(f"Completion tokens: {usage.completion_tokens}")
    
    # Check if prompt_tokens_details exists and has cached_tokens
    if hasattr(usage, 'prompt_tokens_details'):
        if isinstance(usage.prompt_tokens_details, dict):
            cached_tokens = usage.prompt_tokens_details.get('cached_tokens', 0)
        else:
            cached_tokens = getattr(usage.prompt_tokens_details, 'cached_tokens', 0)
    else:
        cached_tokens = 0
    
    print(f"Cached tokens: {cached_tokens}")
    
    return response

Let's test it with a long system prompt:

In [16]:
# Create a long system prompt (>1024 tokens)
long_system_prompt = """You are an AI assistant specialized in explaining complex topics. Here are 20 examples of good explanations that demonstrate the ideal structure, depth, and clarity we aim for:

1. EXAMPLE - Quantum Computing:
When explaining quantum computing, start with classical bits (0s and 1s) and contrast them with quantum bits (qubits) that can exist in multiple states simultaneously through superposition. Use the analogy of a coin: a classical bit is like a coin lying flat (definitely heads or tails), while a qubit is like a spinning coin (simultaneously heads and tails until observed). Emphasize how quantum entanglement allows qubits to be interconnected in ways classical bits cannot, exponentially increasing computational power for specific tasks.

2. EXAMPLE - Photosynthesis:
Break down photosynthesis into its key stages: light-dependent reactions and the Calvin cycle. Describe how chlorophyll molecules capture sunlight energy, converting it into chemical energy (ATP and NADPH). Then explain how this energy is used to convert CO2 into glucose through the Calvin cycle. Use the analogy of a solar-powered factory where sunlight is the power source (light-dependent reactions) and the assembly line (Calvin cycle) builds sugar molecules from carbon dioxide.

3. EXAMPLE - Special Relativity:
Begin with Einstein's two postulates: the constancy of light speed and the equivalence of physical laws in all inertial reference frames. Use the classic train-and-platform thought experiment to illustrate simultaneity. Explain time dilation using the light clock analogy, showing why moving clocks tick more slowly from a stationary observer's perspective. Connect these concepts to the famous equation E=mc², explaining how mass and energy are different forms of the same thing.

4. EXAMPLE - Neural Networks:
Compare artificial neural networks to biological brains: neurons are like nodes, synapses like weighted connections. Explain how information flows from input layer through hidden layers to output, with each connection being strengthened or weakened during training (like building muscle memory). Use the analogy of a complex voting system where each neuron "votes" on the final output, with their votes weighted by connection strengths learned from training data.

5. EXAMPLE - Climate Change:
Start with the greenhouse effect basics: certain gases (CO2, methane) trap heat like glass in a greenhouse. Explain how human activities increase these gases' concentration, enhancing the natural greenhouse effect. Use the analogy of adding blankets to a bed - each additional blanket (greenhouse gas) makes it harder for heat to escape. Connect this to observable changes: rising temperatures, sea levels, and extreme weather events.

6. EXAMPLE - DNA Replication:
Compare DNA replication to unzipping a zipper and creating two new zippers, each with one old side and one new side. Detail how helicase "unzips" the double helix, then polymerase adds complementary nucleotides to each strand. Emphasize the importance of base pairing (A-T, C-G) and how this ensures accurate copying. Use the analogy of a document being photocopied, where each half serves as a template for rebuilding the whole.

7. EXAMPLE - Black Holes:
Describe black holes as regions where gravity is so strong that nothing, not even light, can escape beyond the event horizon. Use the analogy of a river flowing toward a waterfall - past a certain point (event horizon), the current is too strong for anything to swim back upstream. Explain how black holes form from collapsed massive stars and how they affect space-time, like a heavy ball creating a deep depression in a stretched rubber sheet.

8. EXAMPLE - Blockchain:
Compare blockchain to a public ledger that everyone can see and verify but nobody can alter without consensus. Each block contains transactions and links to the previous block (like a chain of sealed evidence bags in a criminal case). Explain how mining works through the analogy of a complex puzzle that requires significant effort to solve but is easy to verify once solved. Emphasize how this system creates trust without requiring a central authority.

9. EXAMPLE - Immune System:
Describe the immune system as a sophisticated defense network with multiple layers: physical barriers (skin, mucus), general defenders (white blood cells), and specialized units (antibodies). Use the analogy of a castle's defense system with walls (barriers), guards (innate immunity), and specially trained soldiers (adaptive immunity). Explain how vaccines work by training these defenses to recognize specific threats before they become dangerous.

10. EXAMPLE - Evolution:
Explain natural selection through the analogy of a sieve: environmental pressures "filter out" less advantageous traits while allowing beneficial ones to pass through to future generations. Describe how random mutations provide the raw material for selection, like a lottery where most tickets (mutations) lose but occasional winners spread through the population. Emphasize the timescale involved and how small changes accumulate over many generations.

[Examples 11-20 continue with similarly detailed explanations of: Quantum Entanglement, The Theory of Plate Tectonics, The Human Digestive System, The Big Bang Theory, Machine Learning Algorithms, Cellular Respiration, Wave-Particle Duality, The Human Nervous System, Chemical Bonding, and Nuclear Fusion...]

For each explanation, remember to:
1. Start with familiar concepts and build toward complex ones
2. Use clear, vivid analogies that connect to everyday experience
3. Break down complex processes into understandable steps
4. Address common misconceptions proactively
5. Provide real-world applications and examples
6. Maintain scientific accuracy while ensuring accessibility
7. Include relevant visualizations or diagrams when helpful
8. Connect the topic to related fields and broader implications

When responding to questions:
- First assess the questioner's current understanding level
- Choose appropriate analogies from their field of knowledge
- Build explanations iteratively, checking for comprehension
- Highlight key concepts and their interconnections
- Anticipate and address likely follow-up questions
- Provide concrete examples and practical applications
- Maintain engagement through conversational tone
- End with a clear summary of main points"""
# (Add more text to make it >1024 tokens)

# First call - should show no caching
print("First call:")
response1 = check_openai_caching(long_system_prompt, "Explain quantum computing")

time.sleep(2)  # Small delay between calls

# Second call - should show caching
print("\nSecond call:")
response2 = check_openai_caching(long_system_prompt, "Explain neural networks")

First call:
Total tokens: 1811
Prompt tokens: 1232
Completion tokens: 579
Cached tokens: 0

Second call:
Total tokens: 1830
Prompt tokens: 1232
Completion tokens: 598
Cached tokens: 1024


## Part 2: Anthropic Manual Prompt Caching

With Anthropic, we can explicitly control what gets cached using the `cache_control` parameter:

In [25]:
def create_anthropic_cached_message(system_content, user_message):
    """Create a message with cached system content in Anthropic"""
    response = anthropic_client.beta.prompt_caching.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": system_content,
                "cache_control": {"type": "ephemeral"}
            }
        ],
        messages=[{"role": "user", "content": user_message}]
    )
    
    # Print usage statistics
    usage = response.usage
    print(usage)
    print(f"Cache creation tokens: {usage.cache_creation_input_tokens}")
    print(f"Cache read tokens: {usage.cache_read_input_tokens}")
    print(f"Regular input tokens: {usage.input_tokens}")
    print(f"Output tokens: {usage.output_tokens}")
    
    return response

In [26]:
# Test Anthropic caching - using the same system prompt as the OpenAI example
long_context = long_system_prompt

print("First call (should create cache):")
response1 = create_anthropic_cached_message(long_context, "Analyze this text.")

time.sleep(2)

print("\nSecond call (should use cache):")
response2 = create_anthropic_cached_message(long_context, "Summarize the main points.")

First call (should create cache):
PromptCachingBetaUsage(cache_creation_input_tokens=0, cache_read_input_tokens=1348, input_tokens=12, output_tokens=385)
Cache creation tokens: 0
Cache read tokens: 1348
Regular input tokens: 12
Output tokens: 385

Second call (should use cache):
PromptCachingBetaUsage(cache_creation_input_tokens=0, cache_read_input_tokens=1348, input_tokens=14, output_tokens=285)
Cache creation tokens: 0
Cache read tokens: 1348
Regular input tokens: 14
Output tokens: 285


## Part 3: Cost Analysis

Let's create a function to calculate the cost savings from caching:

In [28]:
def calculate_cost_savings(cached_tokens, model="claude-3-5-sonnet-latest"):
    """Calculate cost savings from using cached tokens"""
    prices = {
        "claude-3-5-sonnet-latest": {
            "base_input": 0.003,  # $3/MTok
            "cache_read": 0.0003,  # $0.30/MTok
        },
        "gpt-4o": {
            "base_input": 0.0025,  # $2.50/MTok
            "cache_read": 0.00125,  # $1.25/MTok
        }
    }
    
    model_prices = prices[model]
    base_cost = cached_tokens * model_prices["base_input"]
    cached_cost = cached_tokens * model_prices["cache_read"]
    savings = base_cost - cached_cost
    
    print(f"Model: {model}")
    print(f"Base cost for {cached_tokens} tokens: ${base_cost:.4f}")
    print(f"Cost with caching: ${cached_cost:.4f}")
    print(f"Savings: ${savings:.4f} ({(savings/base_cost)*100:.1f}%)")
    
    return savings

In [29]:
# Example cost analysis for 10,000 cached tokens
print("Anthropic Claude 3.5 Sonnet:")
anthropic_savings = calculate_cost_savings(10000, "claude-3-5-sonnet-latest")

print("\nOpenAI GPT-4o:")
openai_savings = calculate_cost_savings(10000, "gpt-4o")

Anthropic Claude 3.5 Sonnet:
Model: claude-3-5-sonnet-latest
Base cost for 10000 tokens: $30.0000
Cost with caching: $3.0000
Savings: $27.0000 (90.0%)

OpenAI GPT-4o:
Model: gpt-4o
Base cost for 10000 tokens: $25.0000
Cost with caching: $12.5000
Savings: $12.5000 (50.0%)


## Best Practices and Tips

1. **Minimum Token Requirements**:
   - OpenAI: >1,024 tokens
   - Anthropic: >1,024 tokens (Sonnet/Opus) or >2,048 tokens (Haiku)

2. **Cache Lifetime**:
   - OpenAI: 5-10 minutes
   - Anthropic: 5 minutes

3. **Optimization Strategies**:
   - Put static content (instructions, context, examples) at the beginning
   - Keep dynamic content separate from cached content
   - Monitor cache hit rates to optimize your implementation

4. **Common Use Cases**:
   - Long system prompts with examples
   - Document analysis with consistent context
   - Multi-turn conversations with stable history
   - Code analysis with repository context