# Advanced LLM API Techniques

**Prerequisites:** Complete Day1 `llm_basics.ipynb` first - this notebook builds on the fundamental concepts covered there.

This notebook covers advanced techniques for working with Large Language Model APIs, including:

- Streaming responses for better user experience
- Exploring available models across providers
- Building production-ready applications
- Managing complex prompts and conversations
- Performance optimization techniques

We'll continue using OpenRouter as our API gateway to access models from multiple providers.

## Prerequisites Check

Since this notebook builds on Day1 concepts, let's quickly verify your environment is set up:

In [None]:
# Quick environment check - assumes Day1 setup completed
import os
from dotenv import load_dotenv
import sys
sys.path.append('.')

# Load environment variables
load_dotenv()

# Import our API utilities (from Day1 setup)
try:
    from api_utils import (
        call_openrouter,
        extract_text_response,
        process_streaming_response,
        get_available_models
    )
    print("✅ API utilities loaded successfully!")
except ImportError as e:
    print(f"❌ Error importing API utilities: {e}")
    print("Please ensure you've completed Day1 setup and api_utils.py is available.")

# Import display utilities for markdown rendering
from IPython.display import Markdown, display

# Verify API key
if os.getenv("OPENROUTER_API_KEY"):
    print("✅ API key found!")
else:
    print("❌ API key not found! Please check your .env file from Day1.")

## 1. Streaming Responses

Building on Day1's basic API calls, let's explore streaming - a crucial technique for responsive applications. Streaming allows you to receive responses in chunks as they're generated, rather than waiting for the complete response.

In [None]:
# Import IPython display utilities for interactive streaming
from IPython.display import clear_output
import time

# Make a streaming API call
streaming_response = call_openrouter(
    prompt="Explain quantum computing to a high school student, step by step",
    model="openai/gpt-4o-mini-2024-07-18",
    temperature=0.7,
    max_tokens=300,
    stream=True  # Enable streaming
)

if streaming_response.get("success", False):
    # Process the streaming response
    collected_response = ""
    
    print("Streaming response:")
    for chunk in process_streaming_response(streaming_response):
        collected_response += chunk
        # Clear previous output and show the updated response
        clear_output(wait=True)
        print("Streaming response:")
        display(Markdown(collected_response))
        # Small delay to make the streaming visible
        time.sleep(0.01)
else:
    print(f"Error: {streaming_response.get('error', 'Unknown error')}")

## 2. Available Models through OpenRouter

OpenRouter provides access to models from multiple providers. Let's explore what's available and understand the ecosystem:

In [None]:
# Get available models
available_models = get_available_models()

# Group models by provider for easier viewing
models_by_provider = {}
for model in available_models:
    provider = model.get('id', '').split('/')[0] if '/' in model.get('id', '') else 'unknown'
    if provider not in models_by_provider:
        models_by_provider[provider] = []
    models_by_provider[provider].append(model.get('id'))

# Show top providers with sample models
top_providers = ['openai', 'anthropic', 'meta-llama', 'google', 'deepseek']

# Build formatted output with markdown
output_text = "## Sample models by provider:\n\n"

for provider in top_providers:
    if provider in models_by_provider:
        models = sorted(models_by_provider[provider])[:3]  # Show first 3
        output_text += f"### {provider.upper()}\n"
        for model in models:
            output_text += f"- `{model}`\n"
        if len(models_by_provider[provider]) > 3:
            output_text += f"- *... and {len(models_by_provider[provider]) - 3} more models*\n"
        output_text += "\n"

output_text += f"**Total models available:** {len(available_models)}\n\n"
output_text += "*For advanced model selection and comparison, see 02_model_selection_parameter_tuning.ipynb*"

# Display with simple markdown rendering
display(Markdown(output_text))

## 3. Interactive Applications

Let's build a simple but complete application that demonstrates practical API usage:

In [None]:
# Move the QA bot implementation here (from cell 10)
def qa_bot(model="openai/gpt-4o-mini-2024-07-18"):
    """Simple interactive Q&A bot using the specified model."""
    # Initialize conversation with a system prompt
    conversation = [
        {"role": "system", "content": "You are a helpful AI assistant that provides clear and concise answers."}
    ]
    
    print(f"🤖 Q&A Bot (using {model})")
    print("Type 'exit' to end the conversation.")
    print("This demonstrates basic conversation management and API integration.")
    print("\nTry asking: 'What is machine learning?' or 'How do neural networks work?'")
    
    while True:
        # Get user input
        user_input = input("\nYou: ")
        if user_input.lower() == 'exit':
            print("\n🤖 Goodbye! Have a great day!")
            break
        
        # Add user input to conversation
        conversation.append({"role": "user", "content": user_input})
        
        # Call the API
        response = call_openrouter(
            prompt=conversation,
            model=model,
            temperature=0.7,
            max_tokens=500
        )
        
        if response.get("success", False):
            answer = extract_text_response(response)
            print(f"\nBot:")
            display(Markdown(answer))
            
            # Add assistant's response to conversation history
            conversation.append({"role": "assistant", "content": answer})
            
            # Keep conversation manageable (basic memory management)
            if len(conversation) > 11:  # Keep system + last 10 messages
                conversation = [conversation[0]] + conversation[-10:]
        else:
            print(f"\nError: {response.get('error', 'Unknown error')}")
            print("For comprehensive error handling patterns, see 04_error_handling_for_api_calls.ipynb")

# Run the interactive bot (uncomment the line below to use)
# qa_bot()

print("💡 Note: To run the interactive Q&A bot, uncomment the line above and run this cell.")
print("   The bot uses input() which works best when run interactively in Jupyter.")

## 4. Structured Prompt Management

For real applications, managing prompts in external files is more maintainable than hardcoding them:

In [None]:
# Load and demonstrate structured prompts
import json

try:
    # Load sample prompts from JSON file
    with open('example_prompts.json', 'r') as f:
        sample_prompts = json.load(f)
    
    print("Available prompt categories:")
    for category, prompts in sample_prompts.items():
        print(f"- {category}: {len(prompts)} examples")
    
    # Demo: Use a system prompt from the JSON file
    if "system_prompt_examples" in sample_prompts:
        math_tutor = sample_prompts["system_prompt_examples"][1]  # Math tutor
        print(f"\nUsing system prompt: {math_tutor['title']}")
        
        # Test it with a math question
        response = call_openrouter(
            prompt="How do I solve 2x + 5 = 13?",
            model="openai/gpt-4o-mini-2024-07-18",
            system_prompt=math_tutor['system_prompt'],
            temperature=0.3,
            max_tokens=300
        )
        
        if response.get("success"):
            print(f"\nMath Tutor Response:")
            display(Markdown(extract_text_response(response)))
        else:
            print(f"Error: {response.get('error')}")
    
    print("\nFor advanced prompt engineering and optimization, see 02_model_selection_parameter_tuning.ipynb")
    
except FileNotFoundError:
    print("example_prompts.json not found - this demonstrates external prompt management")
    print("In production, store prompts in external files for easy updates and version control")

## 5. Hands-On Exercises

Practice these fundamental techniques to prepare for advanced notebooks:

**Exercise 1**: Implement streaming with different models

In [None]:
# Try streaming responses from different providers (Claude, Gemini, etc.)

claude_streaming = call_openrouter(
    prompt="Compare Python and JavaScript for web development",
    model="anthropic/claude-3.5-haiku",
    temperature=0.7,
    max_tokens=400,
    stream=True
)

# Process the streaming response and compare the experience
# Your code here...

**Exercise 2**: Streaming Performance Analysis

In [None]:
# Compare streaming vs non-streaming response times and user experience

import time

def analyze_streaming_performance(prompt, model="openai/gpt-4o-mini-2024-07-18"):
    """
    Compare streaming vs non-streaming performance for the same prompt.
    
    Args:
        prompt: The prompt to test
        model: Model to use for testing
    
    Returns:
        Dictionary with performance metrics
    """
    results = {}
    
    # Test non-streaming
    print("Testing non-streaming...")
    start_time = time.time()
    non_streaming = call_openrouter(prompt=prompt, model=model, max_tokens=200, stream=False)
    end_time = time.time()
    
    if non_streaming.get("success"):
        results["non_streaming"] = {
            "total_time": end_time - start_time,
            "response": extract_text_response(non_streaming)[:100] + "...",
            "time_to_first_token": end_time - start_time  # All at once
        }
    
    # Test streaming
    print("Testing streaming...")
    start_time = time.time()
    first_token_time = None
    streaming = call_openrouter(prompt=prompt, model=model, max_tokens=200, stream=True)
    
    if streaming.get("success"):
        collected = ""
        for i, chunk in enumerate(process_streaming_response(streaming)):
            if i == 0 and first_token_time is None:
                first_token_time = time.time()
            collected += chunk
        end_time = time.time()
        
        results["streaming"] = {
            "total_time": end_time - start_time,
            "time_to_first_token": first_token_time - start_time if first_token_time else 0,
            "response": collected[:100] + "...",
            "perceived_responsiveness": first_token_time - start_time if first_token_time else 0
        }
    
    return results
    

# Test the performance analysis:
performance_results = analyze_streaming_performance(
    "Write a short story about a robot learning to paint"
)

print("Performance Comparison:")
for method, metrics in performance_results.items():
    print(f"\n{method.upper()}:")
    print(f"  Time to first token: {metrics.get('time_to_first_token', 0):.2f}s")
    print(f"  Total time: {metrics['total_time']:.2f}s")
    print(f"  Response preview: {metrics['response']}")

# Your task: Run the analysis and discuss when streaming is most beneficial

## 6. Summary & Next Steps

In this foundational notebook, we covered the essential API techniques that bridge Day1 concepts with production applications:

✅ **Streaming responses** - For responsive user experiences  
✅ **Model exploration** - Understanding the available ecosystem  
✅ **Interactive applications** - Basic conversation management  
✅ **Structured prompts** - External prompt management  
✅ **Performance analysis** - Streaming vs non-streaming comparison  
✅ **Token awareness** - Basic cost considerations  

### Continue Your Learning Journey:

**Next recommended notebooks:**
- **02_model_selection_parameter_tuning.ipynb** - Advanced model optimization and comparison
- **03_token_management.ipynb** - Production-scale cost optimization and memory management  
- **04_error_handling_for_api_calls.ipynb** - Robust error handling and resilience patterns
- **asyncio_tutorial.ipynb** - Performance optimization with concurrent API calls

These fundamentals prepare you for building production-ready LLM applications!