# First OpenAI API Call

Let's make your first call to the OpenAI Chat Completions API!

## Setup

Make sure you have:
1. OpenAI Python library installed: `pip install openai`
2. API key in `.env` file: `OPENAI_API_KEY=sk-...`

In [None]:
# Import required libraries
import os
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Create client
client = OpenAI()  # Uses OPENAI_API_KEY from environment

print("âœ… OpenAI client initialized")

## Example 1: Simple Question

Let's ask a simple factual question.

In [None]:
# Make API call
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Extract response
answer = response.choices[0].message.content
print("Answer:", answer)
print(f"\nTokens used: {response.usage.total_tokens}")

## Example 2: With System Prompt

System prompts guide the assistant's behavior.

In [None]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that explains concepts to a 5-year-old."},
        {"role": "user", "content": "What is gravity?"}
    ]
)

print(response.choices[0].message.content)

## Example 3: Temperature Control

Temperature affects randomness. Let's see the difference!

In [None]:
prompt = "Write a creative tagline for a coffee shop."

# Low temperature (deterministic)
response_low = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.2
)

# High temperature (creative)
response_high = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}],
    temperature=1.5
)

print("Temperature 0.2 (focused):")
print(response_low.choices[0].message.content)
print("\nTemperature 1.5 (creative):")
print(response_high.choices[0].message.content)

## Example 4: max_tokens Control

Limit response length with max_tokens.

In [None]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Explain machine learning in one sentence."}
    ],
    max_tokens=30  # Limit to ~20-25 words
)

print(response.choices[0].message.content)
print(f"\nCompletion tokens: {response.usage.completion_tokens}")
print(f"Finish reason: {response.choices[0].finish_reason}")  # May be 'length' if cut off

## Error Handling

Always handle potential errors!

In [None]:
from openai import OpenAIError, RateLimitError, APIError

def safe_api_call(messages, model="gpt-3.5-turbo", **kwargs):
    """Make API call with error handling."""
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            **kwargs
        )
        return response.choices[0].message.content
    
    except RateLimitError:
        return "Error: Rate limit exceeded. Please wait and try again."
    
    except APIError as e:
        return f"API Error: {e}"
    
    except Exception as e:
        return f"Unexpected error: {e}"

# Test it
result = safe_api_call([
    {"role": "user", "content": "Hello!"}
])
print(result)

## Parsing Response Structure

Understanding what you get back from the API.

In [None]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Say 'hi'"}]
)

print("Full response structure:")
print(f"ID: {response.id}")
print(f"Model: {response.model}")
print(f"Created: {response.created}")
print(f"\nMessage content: {response.choices[0].message.content}")
print(f"Finish reason: {response.choices[0].finish_reason}")
print(f"\nUsage:")
print(f"  Prompt tokens: {response.usage.prompt_tokens}")
print(f"  Completion tokens: {response.usage.completion_tokens}")
print(f"  Total tokens: {response.usage.total_tokens}")

## Async API Calls

For better performance when making multiple requests.

In [None]:
import asyncio
from openai import AsyncOpenAI

async_client = AsyncOpenAI()

async def async_chat(message: str) -> str:
    """Make async API call."""
    response = await async_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": message}]
    )
    return response.choices[0].message.content

# Make multiple calls concurrently
async def main():
    questions = [
        "What is Python?",
        "What is JavaScript?",
        "What is TypeScript?"
    ]
    
    # Run concurrently!
    results = await asyncio.gather(*[async_chat(q) for q in questions])
    
    for question, answer in zip(questions, results):
        print(f"Q: {question}")
        print(f"A: {answer}")
        print()

# Run async function
await main()  # In Jupyter
# asyncio.run(main())  # In regular Python script

## Cost Tracking

Track how much each request costs.

In [None]:
def calculate_cost(usage, model="gpt-3.5-turbo"):
    """Calculate cost in USD."""
    # Pricing (as of 2024)
    pricing = {
        "gpt-3.5-turbo": {"input": 0.0015, "output": 0.002},  # per 1k tokens
        "gpt-4-turbo": {"input": 0.01, "output": 0.03},
        "gpt-4": {"input": 0.03, "output": 0.06}
    }
    
    rates = pricing.get(model, pricing["gpt-3.5-turbo"])
    
    input_cost = (usage.prompt_tokens / 1000) * rates["input"]
    output_cost = (usage.completion_tokens / 1000) * rates["output"]
    
    return input_cost + output_cost

# Make a call and track cost
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Explain photosynthesis."}]
)

cost = calculate_cost(response.usage)
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Cost: ${cost:.6f}")

## Practice Exercise

Try these on your own:

1. Make a call with temperature=0 and run it 3 times. Are results identical?
2. Make a call with temperature=1.8 and run it 3 times. How different are results?
3. Create a system prompt that makes the assistant respond as a medieval knight
4. Calculate the cost of 1000 API calls with average 100 input / 200 output tokens
5. Write an async function that calls the API 10 times concurrently