# üß† Project 00 ‚Äì Foundations  

This notebook introduces the fundamental concepts of the OpenAI Python SDK, how models are called, how responses are structured, and how core parameters influence the output.

## üîπ **Block 1 ‚Äî Imports, Environment Variables, and Client Setup**

In this block we:

1. Import essential packages (`os`, `time`, `dotenv`, `openai`)
2. Load environment variables from `.env`
3. Retrieve `OPENAI_API_KEY`
4. Create the OpenAI client using `OpenAI(api_key=...)`

### Why this matters
- Every project in this repository will require a correctly configured OpenAI client.
- Using `.env` files provides security and prevents hard-coding API keys.
- Understanding this step ensures that you can debug API authentication issues in the future.


In [None]:
# Project 00 ‚Äì Foundations
# Cell 1: Imports and basic configuration

import os
import time

from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables from .env in the project root
# If your .env is in the repository root and this notebook is inside projects/00-foundations,
# we go one level up:
dotenv_loaded = load_dotenv(dotenv_path="../../.env")

print(f".env loaded: {dotenv_loaded}")

# Read API key
api_key = os.getenv("OPENAI_API_KEY")

if not api_key:
    raise RuntimeError(
        "OPENAI_API_KEY not found. "
        "Create a .env file in the repository root with OPENAI_API_KEY=your_key_here."
    )

# Create OpenAI client
client = OpenAI(api_key=api_key)

print("OpenAI client created successfully.")


.env loaded: True
OpenAI client created successfully.


## üîπ Block 2 ‚Äî First Call Using the Responses API

This block performs the first generation request:

```python
response = client.responses.create(
    model="gpt-4o-mini",
    input="Say hello in a friendly way."
)
```

Key points:

- `responses.create` is the modern OpenAI endpoint for free-form text.
- `response.output_text` is a convenient way to extract the generated text.
- Measuring latency shows the real performance characteristics.
- The response object also contains important metadata such as:
  - token usage
  - model information
  - configuration parameters
  - structured output components

Why this matters:

- This request pattern is foundational for all later projects:
  - RAG (Retrieval-Augmented Generation)
  - multi-agent systems
  - planning modules
  - tool-calling agents
  - vision/audio pipelines
  - autonomous systems


In [17]:
# Project 00 ‚Äì Foundations
# Cell 2: First "Hello, OpenAI" call using the Responses API

start = time.time()

response = client.responses.create(
    model="gpt-4o-mini",        # Fast, cheap, great for experiments
    input="Say hello in a friendly way."
)

elapsed = time.time() - start

print("=== Response Output ===\n")
print(response.output_text)  # Most convenient way to extract text
print("\n=======================\n")

print(f"Latency: {elapsed:.3f} seconds")

# Explore raw response
response


=== Response Output ===

Hello there! Hope you‚Äôre having a wonderful day! üòä


Latency: 1.041 seconds


Response(id='resp_0268e30d9373826a0069395614a2b88193b37ce8fc07934be0', created_at=1765365268.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_0268e30d9373826a0069395614e7308193a71b36baf0c4644c', content=[ResponseOutputText(annotations=[], text='Hello there! Hope you‚Äôre having a wonderful day! üòä', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, prompt_cache_retention=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), safety_identifier=None, service_tier='default', status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs

## üîπ Block 3 ‚Äî Exploring Generation Parameters (`temperature` and `top_p`)

In this block we examined how different parameter settings influence output style.

### temperature

Controls randomness:

- 0.0 = deterministic and stable
- 1.0 = balanced and natural
- 1.5 = more creative and expressive

### top_p

Controls sampling diversity:

- 1.0 = model considers all possible tokens
- lower values = restrict output to higher-probability words

Why this matters:

- These parameters strongly affect creativity, factuality and consistency.
- They are essential in controlling agent behavior.
- They help tune performance for different application needs (creative writing vs. factual precision).


In [None]:
# Project 00 ‚Äì Foundations
# Cell 3: Exploring temperature and top_p

prompt = "Write one short sentence describing a genie in a bottle."

def test_params(temp, top_p):
    start = time.time()
    r = client.responses.create(
        model="gpt-4o-mini",
        input=prompt,
        temperature=temp,
        top_p=top_p,
    )
    elapsed = time.time() - start
    print(f"\n--- temperature={temp}, top_p={top_p} ---")
    print(r.output_text)
    print(f"(Latency: {elapsed:.3f}s)")


# Run different configurations
test_params(0.0, 1.0)   # deterministic
test_params(1.0, 1.0)   # creative
test_params(1,5, 1.0)   # very creative
test_params(0.7, 0.5)   # controlled diversity




--- temperature=0.0, top_p=1.0 ---
A mystical genie swirls within the bottle, waiting to grant three wishes to the one who dares to uncork it.
(Latency: 1.667s)

--- temperature=1.0, top_p=1.0 ---
A shimmering genie swirls within the confines of an ornate bottle, waiting for a wish to be set free.
(Latency: 1.290s)

--- temperature=2.0, top_p=1.0 ---
A mysterious genie swirls within an ancient glass bottle, patiently awaiting his next magic summon.
(Latency: 1.234s)

--- temperature=0.7, top_p=0.5 ---
A shimmering genie swirls within the bottle, waiting to grant wishes with a mischievous grin.
(Latency: 0.931s)


In [29]:
# Stronger demonstration of top_p: multiple adjectives per output

prompt = "Generate five different creative adjectives to describe a mysterious door. Output only the adjectives, separated by commas."

def test_top_p_multi(top_p):
    print(f"\n========== top_p = {top_p} ==========")
    for i in range(3):
        r = client.responses.create(
            model="gpt-4o-mini",
            input=prompt,
            temperature=1.2,
            top_p=top_p,
        )
        print(f"\nSample {i+1}:")
        print(r.output_text)

# Run tests
test_top_p_multi(1.0)
test_top_p_multi(0.5)
test_top_p_multi(0.1)




Sample 1:
Opaque, ethereal, tantalizing, weathered, enigmatic.

Sample 2:
Veiled, entrancing, ethereal, whispering, concealed

Sample 3:
Veiled, ominous, enchanting, eldritch, cryptic.


Sample 1:
Enigmatic, shadowy, ancient, ethereal, whispering

Sample 2:
Enigmatic, shadowy, ancient, alluring, whispering

Sample 3:
Enigmatic, shadowy, ornate, whispering, ancient


Sample 1:
Enigmatic, shadowy, ancient, ethereal, cryptic

Sample 2:
Enigmatic, shadowy, ancient, whispering, iridescent

Sample 3:
Enigmatic, shadowy, ancient, whispering, iridescent


## Mini Summary ‚Äî Temperature vs Top-P

### Temperature (randomness)
- Controls how **spread out** the probability distribution becomes.
- **Higher temperature (1.0‚Äì1.5)** ‚Üí more creative, varied, surprising.
- **Lower temperature (0.0‚Äì0.3)** ‚Üí more stable, predictable, deterministic.
- Acts like a *global chaos factor* for the model‚Äôs sampling.

### Top-P (nucleus sampling)
- Limits the model to only the **top portion of cumulative probability**.
- **top_p = 1.0** ‚Üí full freedom (all tokens available).
- **top_p = 0.5** ‚Üí only the tokens whose total probability accumulates to 50%.
- **top_p = 0.1** ‚Üí extremely restrictive, often yielding repetitive patterns.

### How they interact
- Temperature spreads or sharpens the distribution.
- Top-P *cuts off* the tail of unlikely tokens.
- When the most probable token dominates the distribution,
  **lowering top-p has little visible effect** (the model simply picks the same token).

### When effects are most visible
- When the model must choose multiple tokens (lists, sentences).
- When several tokens have similar probabilities.
- When temperature is not too low (e.g., ‚â• 0.8).
- When outputs involve creativity rather than strict logic.

### Practical rule of thumb
- **Use temperature** to control creativity.
- **Use top-p** to control diversity.
- In many real systems:
  - If you want **stable, factual answers** ‚Üí `temperature=0, top_p=1`.
  - If you want **creative brainstorming** ‚Üí `temperature=1.0, top_p between 0.8‚Äì1.0`.
  - If you want **controlled creativity** ‚Üí `temperature=0.7, top_p=0.5`.

This understanding becomes essential when tuning RAG pipelines, agents, and any system requiring reproducible or stylistically consistent outputs.


## üîπ Block 4 ‚Äî Comparing Models (Speed, Style, and Behavior)

In this block we compare two or more OpenAI models side by side in order to build intuition about:

1. **Latency** (how fast each model responds)  
2. **Style** (how rich, coherent, or creative the text feels)  
3. **Token usage** (how many tokens each model tends to generate)  
4. **Practical trade-offs** between small, fast models and larger, smarter ones  

This skill is essential for real-world AI engineering, because choosing the right model can reduce cost, improve UX, and increase system stability.

---

## Why comparing models matters

Different models behave differently even when given the exact same prompt:

### Smaller models (e.g., gpt-4o-mini)
- Fast  
- Cheap  
- Great for high-volume or real-time tasks  
- Sometimes simpler or less nuanced responses  

### Larger or more capable models (e.g., gpt-4.1 or gpt-4o)
- More detailed, coherent, and context-aware  
- Better reasoning  
- Higher quality writing  
- Slightly slower  
- Higher token cost  

Understanding these differences helps you design systems that balance:

- Quality  
- Speed  
- Cost  
- Predictability  
- User experience  

---

## What to observe during this comparison

When running the test, pay attention to:

### 1. **Response time**
- Does one model consistently respond faster?

### 2. **Output quality**
- Does one model produce richer or more coherent descriptions?
- Are there noticeable differences in vocabulary or style?

### 3. **Token usage**
- Larger models sometimes use more tokens for the same task.
- This affects cost, especially at scale.

### 4. **Consistency**
- Some models generate more stable outputs across repeated runs.
- This matters for structured tasks and agents.

---

## Why this block is foundational

Later in the 24-project journey, you will:

- Tune agents that pick models dynamically  
- Optimize system cost and latency  
- Run RAG pipelines where speed matters  
- Build autonomous systems where consistency is critical  
- Use specialized models (vision, audio, multimodal reasoning)  

For all of these, you must understand that **model selection is a design decision**, not a fixed choice.

This block teaches exactly how to compare them in practice.

---

Continue to the code block to run the actual model comparison.


In [33]:
# Project 00 ‚Äì Foundations
# Cell 4: Comparing models (latency, style, tokens)

models_to_test = [
    "gpt-4o-mini",
    "gpt-4o",
    "gpt-4.1-mini",  # if this fails, you can comment/remove this and/or use "gpt-4.1"
]

comparison_prompt = (
    "In 3‚Äì4 sentences, describe a cozy cabin in the mountains during winter, "
    "focusing on atmosphere, small details, and emotions."
)

def compare_models(models, prompt, temperature=0.7):
    results = []

    for model in models:
        print(f"\n==============================")
        print(f"Model: {model}")
        print(f"==============================")

        start = time.time()
        try:
            r = client.responses.create(
                model=model,
                input=prompt,
                temperature=temperature,
            )
        except Exception as e:
            print(f"Error calling model {model}: {e}")
            continue

        elapsed = time.time() - start

        # Extract text and usage (tokens)
        text = r.output_text
        usage = getattr(r, "usage", None)

        print(f"\n--- Output ---\n{text}\n")
        print(f"Latency: {elapsed:.3f} seconds")

        if usage:
            print(
                f"Tokens - input: {usage.input_tokens}, "
                f"output: {usage.output_tokens}, "
                f"total: {usage.total_tokens}"
            )
            results.append(
                {
                    "model": model,
                    "latency": elapsed,
                    "input_tokens": usage.input_tokens,
                    "output_tokens": usage.output_tokens,
                    "total_tokens": usage.total_tokens,
                }
            )
        else:
            print("No usage information available for this response.")

    return results


results = compare_models(models_to_test, comparison_prompt, temperature=0.7)

print("\n\n=== Summary Table ===")
for item in results:
    print(
        f"Model: {item['model']}\n"
        f"  Latency: {item['latency']:.3f}s\n"
        f"  Tokens  - input: {item['input_tokens']}, "
        f"output: {item['output_tokens']}, total: {item['total_tokens']}\n"
    )



Model: gpt-4o-mini

--- Output ---
Nestled among snow-draped pines, the cozy cabin exudes warmth, its wooden beams glowing softly in the flicker of a crackling fire. Frosted windows frame the world outside, where snowflakes dance like delicate whispers, while inside, the scent of pine and cinnamon wraps around you like a cherished blanket. A hand-knit throw lies invitingly on the worn leather couch, and the gentle hum of a kettle brewing tea adds to the serene ambiance. Here, time slows, and a sense of peace envelops you, as the outside chill fades away, leaving only the comfort of companionship and the promise of quiet moments.

Latency: 4.094 seconds
Tokens - input: 34, output: 128, total: 162

Model: gpt-4o

--- Output ---
Nestled among snow-draped pines, the cozy cabin exudes warmth with its glowing windows casting a golden hue onto the pristine white landscape. Inside, the crackling fireplace fills the room with a comforting warmth, while the scent of pine and cinnamon lingers in

## Analysis ‚Äì Model Comparison (gpt-4o-mini vs gpt-4o vs gpt-4.1-mini)

From the previous block, we observed:

### Latency and tokens

- `gpt-4o-mini`: ~4.1s, 162 tokens (34 in, 128 out)  
- `gpt-4o`: ~3.0s, 140 tokens (34 in, 106 out)  
- `gpt-4.1-mini`: ~2.9s, 141 tokens (34 in, 107 out)

Key takeaways:

- In a single run, the ‚Äúmini‚Äù model was not the fastest.  
  - Network variance and model warm-up can dominate latency in isolated tests.  
- `gpt-4o-mini` produced the **longest output**, which increases both cost and time.  

### Style and quality

- All three models produced high-quality, coherent descriptions.  
- `gpt-4o-mini`:
  - More verbose and highly sensory
  - Slightly more repetitive in expressing comfort and warmth  
- `gpt-4o`:
  - More concise and balanced
  - Feels editorial and polished  
- `gpt-4.1-mini`:
  - Similar to gpt-4o, slightly more formal
  - Strong imagery without becoming overly long  

### Practical interpretation

- For this type of descriptive writing, all models are ‚Äúgood enough‚Äù.  
- The main differences are:
  - length of the response  
  - subtle stylistic preferences  
  - small latency variations  

This block shows that **model choice is not only about raw ‚Äúintelligence‚Äù**, but about the trade-off between:

- quality  
- cost  
- latency  
- verbosity  

In future projects, this intuition will be important when choosing which model to use for:
- agents  
- RAG pipelines  
- user-facing UI responses  
- high-volume workloads


## üîπBlock 5 - Cost Estimation

In [35]:
# Project 00 ‚Äì Foundations
# Cell 5: Cost estimation per model call

# Tabela de pre√ßos aproximados (USD) por 1 milh√£o de tokens
# Ajuste conforme necess√°rio.

prices_per_million = {
    "gpt-4o-mini": 0.15,   # $0.15 / 1M tokens
    "gpt-4o": 5.00,        # $5.00 / 1M tokens
    "gpt-4.1-mini": 0.20,  # $0.20 / 1M tokens
    "gpt-4.1": 4.00,       # $4.00 / 1M tokens (n√£o usado agora, mas deixado pronto)
    # voc√™ pode adicionar muitos outros aqui depois
}

def estimate_cost(model, input_tokens, output_tokens, price_table):

    if model not in price_table:
        raise ValueError(f"Model '{model}' not found in price table.")

    price_per_million = price_table[model]

    # custo = tokens * (pre√ßo por 1M / 1,000,000)
    cost_input  = (input_tokens  / 1_000_000) * price_per_million
    cost_output = (output_tokens / 1_000_000) * price_per_million
    cost_total  = cost_input + cost_output

    return {
        "model": model,
        "price_per_million": price_per_million,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "cost_input_usd": cost_input,
        "cost_output_usd": cost_output,
        "cost_total_usd": cost_total,
    }


# Agora usamos os "results" gerados no Bloco 4
cost_results = []

for r in results:   # results veio do bloco 4
    cost_info = estimate_cost(
        model=r["model"],
        input_tokens=r["input_tokens"],
        output_tokens=r["output_tokens"],
        price_table=prices_per_million
    )
    cost_results.append(cost_info)

# Mostrar os resultados
print("\n=== Cost Estimates (USD) ===")
for c in cost_results:
    print(
        f"\nModel: {c['model']}\n"
        f"  Price per 1M tokens: ${c['price_per_million']}\n"
        f"  Input tokens:  {c['input_tokens']}  ‚Üí cost: ${c['cost_input_usd']:.8f}\n"
        f"  Output tokens: {c['output_tokens']} ‚Üí cost: ${c['cost_output_usd']:.8f}\n"
        f"  Total cost:                  ‚Üí ${c['cost_total_usd']:.8f}\n"
    )



=== Cost Estimates (USD) ===

Model: gpt-4o-mini
  Price per 1M tokens: $0.15
  Input tokens:  34  ‚Üí cost: $0.00000510
  Output tokens: 128 ‚Üí cost: $0.00001920
  Total cost:                  ‚Üí $0.00002430


Model: gpt-4o
  Price per 1M tokens: $5.0
  Input tokens:  34  ‚Üí cost: $0.00017000
  Output tokens: 106 ‚Üí cost: $0.00053000
  Total cost:                  ‚Üí $0.00070000


Model: gpt-4.1-mini
  Price per 1M tokens: $0.2
  Input tokens:  34  ‚Üí cost: $0.00000680
  Output tokens: 107 ‚Üí cost: $0.00002140
  Total cost:                  ‚Üí $0.00002820

