## First OpenAI API Call

We first show the most basic way to make an API call to OpenAI's GPT model. We'll use a restaurant analogy to explain how API calls work:

- **You (Customer)**: The person making the request
- **OpenAI API (Waiter)**: The messenger that takes your request to the AI
- **GPT Model (Chef)**: The system that processes your request and creates the response

In this example, we'll:
1. Set up the connection to OpenAI (like finding a restaurant)
2. Create a simple prompt (like placing an order)
3. Get the response (like receiving your meal)

This is the simplest form of an API call - no streaming, no complex parameters, just a basic request and response.

In [2]:
# Waiter: This is the OpenAI API. You talk to it using the 'openai' Python package.
from openai import OpenAI
import os

# Set your OpenAI API key (replace with your actual key or use an environment variable)
client = OpenAI(api_key=os.getenv("general_API"))

In [5]:
# Customer: This is YOU (or your app). You decide what to ask.
prompt = "Explain photosynthesis in simple terms."

# Chef: This is the AI model (like GPT-4). It prepares the response based on your request.
response = client.responses.create(
    model="gpt-4.1",
    input=prompt
)

# The response is delivered back to the customer (you)
result = response.output_text
print("Response from the AI (Chef):")
print(result)


Response from the AI (Chef):
Sure! **Photosynthesis** is how plants make their own food.

Here’s how it works, step by step:

1. **Plants take in sunlight** using their leaves.
2. **They absorb water** from the soil through their roots.
3. **They take in carbon dioxide** from the air through tiny holes in their leaves.

Using energy from the sunlight, **plants mix the water and carbon dioxide together to make a kind of sugar (food) for themselves**. Oxygen is made as a waste product and goes back into the air.

So, in short:  
**Photosynthesis is how plants use sunlight, water, and air to make food and oxygen.**


## Looping — “Ordering Again and Again”

### In the Restaurant Metaphor:

Imagine you're really hungry and want to **order multiple dishes**, one after another:

* First: you ask for spaghetti.
* Then: you ask for a drink.
* Then: dessert.

That’s **looping** — doing something **over and over again**, usually **with slight changes**.

### In Programming/API Terms:

Looping is when your program:

* Sends **multiple API requests** in a row.
* Often in a **`for` loop** or a **`while` loop**.
* Each request might ask a different question or use different data.

#### Why It’s Useful:

* Process a list of texts automatically (e.g., summarizing 100 articles).
* Translate a batch of messages.
* Chat with the model in turns.


In [5]:
# Basic Python loop — no API yet
questions = ["What is an API?", "How does a loop work?", "What are tokens in OpenAI?"]

for q in questions:
    print("Question:", q)
    print("Pretend we're asking the AI...\n")

Question: What is an API?
Pretend we're asking the AI...

Question: How does a loop work?
Pretend we're asking the AI...

Question: What are tokens in OpenAI?
Pretend we're asking the AI...



In [7]:
# Assume client is already set up with your OpenAI API key
questions = [
    "What is 1 + 1?",
    "What is the opposite of up?",
    "What is the capital of France?"
]

for question in questions:
    print(f"\nCustomer asks: {question}")

    # Send question to the OpenAI model (Chef prepares dish)
    response = client.responses.create(
        model="gpt-4.1",
        input=question
    )

    # Get the model's answer (Waiter brings it back)
    result = response.output_text
    print("AI (Chef) replies:")
    print(result)


Customer asks: What is 1 + 1?
AI (Chef) replies:
1 + 1 = **2**.

Customer asks: What is the opposite of up?
AI (Chef) replies:
The opposite of "up" is **"down."**

Customer asks: What is the capital of France?
AI (Chef) replies:
The capital of France is **Paris**.


## Endpoints — “Different Sections of the Menu”

### In the Restaurant Metaphor:

The menu has **sections**:

* Appetizers
* Main courses
* Desserts
  Each has its own list of items.

These sections are like **endpoints** — **different areas** of the API that handle different **types of requests**.

### In API Terms:

An **endpoint** is a **URL** where you send your request.

For example, with the OpenAI API:

* `https://api.openai.com/v1/responses` → Talk with ChatGPT, like we just did
* `https://api.openai.com/v1/embeddings` → Turn text into numbers (useful for search).
* `https://api.openai.com/v1/images/generations` → Generate images from text.
* `https://api.openai.com/v1/audio/speech` → Create speech
* `https://api.openai.com/v1/audio/transcriptions` → Create transcriptions
* `https://api.openai.com/v1/audio/translations` → Create translations


Each one does **something different**, but they all follow the same rules of ordering.


In [26]:
# Using the speech endpoint

from pathlib import Path

voices = ["echo", "nova", "shimmer"]
input_text = "Today, we are testing the OpenAI API. At the moment, we are testing the audio API."

for voice in voices:
    speech_file_path = Path.cwd() / f"speech_{voice}.mp3"
    with client.audio.speech.with_streaming_response.create(
        model="gpt-4o-mini-tts",
        voice=voice,
        input=input_text
    ) as response:
        response.stream_to_file(speech_file_path)

In [28]:
# Using the transcription endpoint

audio_file = open("speech_echo.mp3", "rb")
transcript = client.audio.transcriptions.create(
  model="gpt-4o-transcribe",
  file=audio_file
)

print(transcript.text)

Today we are testing the OpenAI API. At the moment we are testing the audio API.


In [6]:
# Using the embeddings endpoint

response =client.embeddings.create(
  model="text-embedding-ada-002",
  input="The food was delicious and the waiter...",
  encoding_format="float"
)

print(response.data[0].embedding[:50])
print(f"The embedding is a list of {len(response.data[0].embedding)} floats")

[0.0022756963, -0.009305916, 0.015742613, -0.0077253063, -0.0047450014, 0.014917395, -0.009807394, -0.038264707, -0.0069127847, -0.028590616, 0.025251659, 0.018116701, -0.0036309576, -0.02554366, 0.00055543496, -0.016428178, 0.02828592, 0.0054083494, 0.009610611, -0.016415482, -0.015412526, 0.004272088, 0.0069953064, -0.007223828, -0.0039007403, 0.018573744, 0.008734611, -0.022699833, 0.011508612, 0.023893224, 0.015602961, -0.0035706533, -0.034963835, -0.0041514793, -0.026178442, -0.02150644, -0.0057066972, 0.011768873, 0.008455306, 0.004129262, 0.019157745, -0.014358787, 0.008982176, 0.0063605234, -0.04570436, 0.017900875, -0.005570219, -0.0007716578, -0.02215392, -0.0039229575]
The embedding is a list of 1536 floats


## Tokens — “How Much You’re Saying”

### In the Restaurant Metaphor:

Imagine you're paying **per word** of your order instead of per item.

Saying:

> “I want spaghetti.”

Costs fewer tokens than:

> “Hello kind waiter, I would like a steaming plate of your finest spaghetti, with extra parmesan on top, please.”

The **longer** or more **complex** your request, the **more tokens** it costs.

### In OpenAI Terms:

* **Tokens = Chunks of text**, usually a few characters long.
* “Hello” → 1 token.
* “Artificial intelligence is amazing!” → \~5 tokens.

#### Why Tokens Matter:

* **You pay per token** (for input *and* output).
* There’s a **limit per request** (e.g., 128.000 tokens, depending on the model).
* Efficient prompts = better performance and lower cost.

In [48]:
prompt = "Explain API calls in simple terms, using the customer - waiter - chef metaphor."

# Make the API call (Chef prepares the meal)
response = client.responses.create(
    model="gpt-4.1",
    input=prompt
)

# Extract response text and token usage
output_text = response.output_text

print("First 80 characters of the response:")
print(output_text[:80])

# Show token usage
print("\nToken usage:")
print("Input tokens:", response.usage.input_tokens)
print("Output tokens:", response.usage.output_tokens)
print("Total tokens:", response.usage.total_tokens)

First 80 characters of the response:
Absolutely! Let’s break down **API calls** using the **customer-waiter-chef** me

Token usage:
Input tokens: 23
Output tokens: 453
Total tokens: 476


In [49]:
# Add cost calculation, $2.00 / 1M input tokens, $8.00 / 1M output tokens
cost_per_million_input_tokens = 2
cost_per_million_output_tokens = 8

total_cost = (response.usage.input_tokens / 1000000) * cost_per_million_input_tokens + \
             (response.usage.output_tokens / 1000000) * cost_per_million_output_tokens

print(f"\nTotal cost: ${total_cost:.6f}")


Total cost: $0.003670


## Streaming

Streaming allows you to receive the model's response in real-time, token by token, rather than waiting for the complete response. This is particularly useful for:

- Creating more interactive user experiences
- Showing progress as the model generates text
- Handling long responses more efficiently

In this example, we use the `stream=True` parameter in our API call and process the response using a for loop that prints each token as it arrives.