In [None]:
print('Setup complete.')

# Provider Landscape & Limits - Lab

**Hands-on**: toggle 2 providers, compare latency and tokens
**Deliverable**: two-row comparison log

## Instructions

In this lab, you will compare two AI providers by measuring their latency and token usage for the same prompt. You'll create a systematic comparison that highlights the performance differences between providers.

## Success Criteria
- Successfully connect to 2 different AI providers
- Measure response latency for identical prompts
- Count input and output tokens for cost analysis
- Create a structured comparison log
- Document findings in a clear two-row format

## Recommended Provider Pairs
- OpenAI GPT-3.5-turbo vs Claude-3-haiku (speed comparison)
- OpenAI GPT-4 vs Gemini Pro (quality vs cost)
- Any cloud provider vs local model (latency vs privacy)

In [None]:
# TODO: Install required packages for Google Colab compatibility
# Install: openai, anthropic, google-generativeai, langchain, langchain-openai, langchain-anthropic, langchain-google-genai
# Also install: pandas for data handling and time measurement utilities

# TODO: Import necessary modules
# Import time, datetime for latency measurement
# Import os for environment variables
# Import pandas for creating comparison tables
# Import the specific provider clients you plan to test
# Import tiktoken or similar for token counting (OpenAI)

# TODO: Print confirmation that all packages are installed

## Step 1: Configure Provider Credentials

Set up API keys and initialize clients for your chosen providers.

In [None]:
# TODO: Set up environment variables for API keys
# Use os.environ to set OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY as needed
# Or prompt user to input their keys securely

# TODO: Initialize clients for your two chosen providers
# Example: ChatOpenAI, ChatAnthropic, ChatGoogleGenerativeAI
# Set consistent parameters: temperature=0 for reproducible results
# Store clients in variables like provider_1 and provider_2

# TODO: Print confirmation of successful client initialization
# Include which models/providers you're comparing

## Step 2: Define Test Prompts

Create standardized prompts for consistent comparison.

In [None]:
# TODO: Define 3-5 test prompts of varying complexity
# Include:
# - Short prompt (10-20 words): simple question
# - Medium prompt (50-100 words): explanation request  
# - Long prompt (200+ words): complex analysis task
# - Code generation prompt: ask for a specific function
# - Creative prompt: story or creative writing task

# TODO: Store prompts in a list for easy iteration
# TODO: Print the prompts with their expected complexity levels

## Step 3: Token Counting Setup

Implement token counting for cost analysis.

In [None]:
# TODO: Create a function to count tokens for different providers
# For OpenAI: use tiktoken with appropriate encoding (cl100k_base for GPT-3.5/4)
# For other providers: create approximate counting or use len(text.split()) * 1.3 as estimate
# Function should take (text, provider_name) and return token count

# TODO: Test your token counting function with sample text
# Verify it works for both input and output text
# Print test results to confirm accuracy

## Step 4: Latency Measurement Function

Create a function to measure response time accurately.

In [None]:
# TODO: Create a function to measure provider response time
# Function should:
# - Take (client, prompt) as parameters
# - Record start time before API call
# - Make the API call and get response
# - Record end time after response received
# - Calculate latency in seconds
# - Return (response_text, latency_seconds)

# TODO: Add error handling for API failures
# TODO: Test the function with a simple prompt on both providers

## Step 5: Run Comparison Tests

Execute systematic tests across all prompts and both providers.

In [None]:
# TODO: Create a comprehensive comparison loop
# For each test prompt:
#   - Test Provider 1: measure latency, count input/output tokens
#   - Test Provider 2: measure latency, count input/output tokens
#   - Store results in structured format (list of dictionaries)
# 
# Results structure should include:
# - prompt_id/name
# - provider_name
# - input_tokens
# - output_tokens
# - latency_seconds
# - response_preview (first 100 chars)

# TODO: Add progress indicators (print statements) during testing
# TODO: Include small delays between requests to respect rate limits

## Step 6: Create Comparison Log

Format your results into the required two-row comparison log.

In [None]:
# TODO: Process your test results into summary statistics
# Calculate for each provider:
# - Average latency across all prompts
# - Total input tokens used
# - Total output tokens generated
# - Average tokens per second (throughput)
# - Estimated cost (if you have pricing data)

# TODO: Create a pandas DataFrame with exactly 2 rows
# Columns: Provider, Avg_Latency_Sec, Total_Input_Tokens, Total_Output_Tokens, Tokens_Per_Sec, Notes
# Row 1: Provider 1 statistics
# Row 2: Provider 2 statistics

# TODO: Display the comparison table clearly
# TODO: Add a "Winner" column indicating which provider performed better in each metric

## Step 7: Detailed Analysis

Analyze the results and document your findings.

In [None]:
# TODO: Create visualizations of your comparison (optional)
# - Bar chart of average latency per provider
# - Scatter plot of latency vs response length
# - Token usage comparison chart

# TODO: Calculate key insights:
# - Which provider is faster?
# - Which provider uses tokens more efficiently?
# - Are there specific prompt types where one provider excels?
# - What's the cost difference between providers?

# TODO: Print summary insights in bullet point format

## Step 8: Final Deliverable

Create your final two-row comparison log with conclusions.

In [None]:
# TODO: Create the final deliverable format
# Print a clean, professional two-row comparison table
# Include:
# - Provider names and models tested
# - Key performance metrics
# - Timestamp of testing
# - Your recommendation for different use cases

# TODO: Save results to a CSV file for future reference
# TODO: Write a brief conclusion paragraph explaining:
# - Which provider you'd choose for speed-critical applications
# - Which provider offers better value for money
# - Any limitations or caveats from your testing

## Bonus Challenges (If Time Permits)

Extend your analysis with additional comparisons.

In [None]:
# TODO BONUS 1: Test streaming vs non-streaming latency
# Compare time-to-first-token vs time-to-completion
# Measure user-perceived responsiveness

# TODO BONUS 2: Test with different model parameters
# Try temperature=0 vs temperature=0.7
# Test max_tokens limits and their impact

# TODO BONUS 3: Add a third provider for comparison
# Create a three-row comparison log
# Test local model vs cloud providers if possible

## Deliverable Checklist

Before submitting, ensure you have:

- [ ] Successfully connected to 2 different AI providers
- [ ] Tested multiple prompts of varying complexity
- [ ] Measured latency accurately for all tests
- [ ] Counted input and output tokens for cost analysis
- [ ] Created a clean two-row comparison table
- [ ] Included provider names, models, and key metrics
- [ ] Documented which provider performs better for different use cases
- [ ] Added timestamp and testing conditions
- [ ] Written clear conclusions and recommendations

**Final Deliverable**: A two-row comparison log showing latency and token usage differences between your chosen providers, with analysis and recommendations.