# **OpenAI Agents SDK Basics**

## <font color= orange>**Primitives in OpenAI Agents SDK**</font>

These are the **building blocks** you need to understand before creating powerful agentic systems. Let’s go step by step, in a way even someone new to AI can clearly get it.

<br>

---

<br>

## 1. **Agents**

* **What it is**:
  An **Agent** is the main brain. It receives a request (user input), thinks about it, decides what to do, and responds.

* **Analogy**: Imagine you’re at a restaurant.

  * The **agent = waiter**.
  * You (user) tell the waiter your order.
  * The waiter understands, checks the menu (knowledge/tools), and brings you food (output).

* **In SDK terms**:

  * An agent is tied to a **model** (like GPT-4.1).
  * You give it **instructions** (personality, rules, role).
  * You can connect it to **tools** (like APIs, databases, calculators).

✅ **Key Idea**: An **Agent = AI worker** with a job description + tools.

<br>

---

<br>

## 2. **Handoffs**

* **What it is**:
  A **handoff** is when one agent passes the task to another agent.

* **Analogy**: Think of a **relay race**. One runner (agent) runs part of the race, then passes the baton (task) to the next runner (another agent).

* **Why useful?**

  * Different agents can specialize in different tasks.
  * Example:

    * Agent A = Customer support bot.
    * Agent B = Billing bot.
    * If the customer asks about billing, Agent A **hands off** to Agent B.


✅ **Key Idea**: **Handoffs = passing tasks between agents like teammates.**

<br>

---

<br>

## 3. **Guardrails**

* **What it is**:
  **Guardrails** are safety rules or boundaries that agents must follow.

* **Analogy**: Like **seatbelts & traffic rules** for a driver. They don’t stop the car from moving, but they keep it safe.

* **Examples**:

  * Don’t reveal confidential data.
  * Don’t generate harmful content.
  * Limit what actions can be taken (e.g., can’t charge more than \$100 without approval).


✅ **Key Idea**: **Guardrails = rules to keep agents safe and controlled.**

<br>

---

<br>

## 4. **Sessions**

* **What it is**:
  A **Session** is like a chat history between a user and an agent.

  * It remembers past interactions.
  * Keeps context across multiple turns.

* **Analogy**: Imagine you’re chatting with a friend. If you say:

  * "What’s the capital of France?" → they answer "Paris".
  * Then you say: "Book me a flight there." → they remember "there = Paris".

* Without sessions, the agent forgets the past.

**Code Example:**

✅ **Key Idea**: **Sessions = memory of a conversation, so the agent doesn’t forget context.**

<br>

---

<br>

# Quick Recap

* **Agents** → The AI workers (brains).
* **Handoffs** → Passing tasks between agents (teamwork).
* **Guardrails** → Safety rules & boundaries.
* **Sessions** → Memory of conversations (context).

Together, these primitives = key foundation for building **safe, smart, multi-step AI systems**.


## **Chat Completions and Responses API**

## **1. Chat Completions API (older style)**

* **What it is:** The API that powered models like `gpt-3.5-turbo` and `gpt-4`.
* **How it works:** You send a conversation as a list of **messages** (`system`, `user`, `assistant`), and the model replies with a completion.
* **Example:**

  ```python
  from openai import OpenAI
  client = OpenAI()

  completion = client.chat.completions.create(
      model="gpt-4",
      messages=[
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Explain quantum computing in simple terms."}
      ]
  )

  print(completion.choices[0].message["content"])
  ```
* **Limitations:**

  * Only text input/output.
  * Needed role-based messages (`system`, `user`, `assistant`).
  * Couldn’t easily combine different output types (text + image + structured JSON).

<br>

---

<br>

## **2. Responses API (newer, unified API)**

* **What it is:** A **more flexible replacement** for Chat Completions.
* **How it works:** Instead of only "chat messages," you can define **input/output modalities** (text, audio, image, structured data). It’s meant to unify different model capabilities.
* **Example:**

  ```python
  from openai import OpenAI
  client = OpenAI()

  response = client.responses.create(
      model="gpt-4.1-mini",
      input=[{"role": "user", "content": "Explain quantum computing in simple terms."}]
  )

  print(response.output_text)
  ```
* **Key Advantages:**

  * Supports **multi-turn chat** like before, but also multimodal inputs (text, images, audio).
  * Output can include **multiple modalities** (e.g., text + image + tool call).
  * Structured output is easier (JSON, function calls).
  * Simplified: no need for "chat.completions vs completions vs embeddings" — all unified.

<br>

---

<br>

## **Main Differences**

| Feature        | Chat Completions API | Responses API                                        |
| -------------- | -------------------- | ---------------------------------------------------- |
| Statefulness   | Stateless; developer manages history | stateful; server manages history |
| Built0in Tools   | Limited(text, vision, function calling            | Web search, file search, computer use, etc            |
| Complexity | Simpler, for quick reponses                 | More complex, for agentic workflow                                              |
| Use Case | Basic chatbots, single-turn tasks  | Advnaced agents, multi-turn task                 |
| Latency | Lower, faster for simple queries  | Hiher, but optimized for complex tasks                |
| Token Management | Manual trunction and context management  | Automatic history and token management                 |
| Future Support | Infinite support for non-tool-dependent-model  | Represents OpenAI's future for agentic apps                 |

---

✅ **In short:**

* **Chat Completions** = legacy, text-only chat interface.
* **Responses API** = new unified API that can handle **text, images, audio, structured outputs, and tool calls** — everything in one place.



## <font color= orange>**OpenRouter VS Lite LLM**</font>

### **What is OpenRouter?**

* Think of **OpenRouter** as the **“app store for AI models”**.
* It lets you **access multiple LLMs (Large Language Models)** through a single API, without needing to sign up with each provider separately.
* You get **one API key**, but can use models from **OpenAI, Anthropic (Claude), Google (Gemini), Mistral, Meta (LLaMA), etc.**
* It also provides features like **model routing** (choosing the best model), **fallbacks** (if one fails), and sometimes **cheaper or faster alternatives**.

👉 Example: Instead of juggling 5 different accounts (OpenAI, Anthropic, Hugging Face, etc.), you just use OpenRouter and say:

```json
{ "model": "anthropic/claude-3.5", "input": "Summarize this text..." }
```

<br>

---

<br>

### **Best Usage Fit for OpenRouter**

* **Experimenters & Students** → Easily try out many models without managing separate accounts.
* **Startups** → Want to avoid vendor lock-in and switch between models based on cost/quality.
* **Production apps** → Want automatic **failover** (if one model is down, use another).
* **Cost optimization** → Pick cheaper models for easy tasks, expensive ones for harder reasoning.

💡 Analogy: OpenRouter is like **Spotify for AI models** — instead of buying from each record label, you stream all in one place.

<br>

---

<br>

### **What is LiteLLM?**

* **LiteLLM** is an **open-source library** that acts as a **translator / universal adapter** for LLM APIs.
* It provides a **single unified API** (`litellm.completion()`) but works with **over 100+ LLM providers** (OpenAI, Anthropic, Cohere, Hugging Face, OpenRouter, even local models).
* You can swap models by **just changing the model name**, not rewriting your whole codebase.

👉 Example with LiteLLM:

```python
from litellm import completion

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello world"}]
)
print(response)
```

Switch to Anthropic with zero code changes:

```python
response = completion(
    model="claude-3",
    messages=[{"role": "user", "content": "Hello world"}]
)
```

<br>

---

<br>

### **Best Usage Fit for LiteLLM**

* **Developers** → Want to write code **once** and plug into **any LLM provider** easily.
* **Teams building agents** → Agents may need to call different models (fast vs reasoning-heavy). LiteLLM makes switching trivial.
* **Companies** → Want to run their own “mini OpenRouter” internally (LiteLLM can act as a proxy server).
* **Tool builders** → If you’re building an app like ChatGPT clone, LiteLLM helps integrate multiple providers without headaches.

💡 Analogy: LiteLLM is like a **universal phone charger** — one port (API) to connect with many different “phones” (LLMs).

---

## **Key Difference & When to Use**

| Feature        | **OpenRouter API**                | **LiteLLM API**                                     |
| -------------- | ------------------------------------------------ | ------------------------------------------------------------------- |
| **What it is** | API marketplace for models                       | Open-source universal LLM adapter                                   |
| **Setup**      | Just get an API key, no infra needed             | Install library / run server                                        |
| **Models**     | Hosted selection of models                       | Any model (OpenAI, Anthropic, HuggingFace, OpenRouter, local, etc.) |
| **Routing**    | Built-in (choose best/cheapest model)            | You control routing logic                                           |
| **Best fit**   | Non-technical users, startups, quick prototyping | Developers, infra teams, production systems needing control         |
| **Lock-in**    | Tied to OpenRouter service                       | No lock-in, self-hostable, open-source                              |

<br>

---

<br>


# **Putting It Together**

👉 **If you’re a beginner or startup:**
Use **OpenRouter** — easy, plug-and-play, try many models without the hassle.

👉 **If you’re a developer building serious systems:**
Use **LiteLLM** — keep full control, run locally or in your infra, and even plug OpenRouter **inside LiteLLM** if you want!

<br>

---

<br>


⚡ Pro tip: Many production apps **combine both**:

* LiteLLM as the **adapter layer**
* OpenRouter as **one of the providers** inside LiteLLM

## <font color= orange>**Global, Run & Agent Level Config**</font>

### **Global Config**

* Applies to the **entire SDK session or application**.
* Think of it as “default environment settings” for everything.
* Example: API keys, default LLM provider, logging, safety filters.

💡 Analogy: **School rules** → apply to all students unless a teacher overrides them.

<br>

---

<br>

### **Run Config**

* Applies to **a single agent run (conversation or task execution)**.
* Lets you override some global defaults for that run.
* Example: Change model or temperature **just for one run** without affecting others.

💡 Analogy: **Class session rules** → teacher says “today we do group work” even though school normally does lectures.

<br>

---

<br>

### **Agent Config**

* Applies to a **specific agent instance**.
* Defines the agent’s **role, tools, memory, and default model**.
* Example: FinanceAgent always uses `gpt-4` + stock API; TravelAgent uses `gpt-4-mini` + booking API.

💡 Analogy: **Each teacher’s teaching style** → math teacher vs history teacher.

<br>

---

<br>

# Summary Table

| Config Level  | Scope              | Best Usage                             | Example                                     |
| ------------- | ------------------ | -------------------------------------- | ------------------------------------------- |
| **Global** 🌍 | Whole application  | Default provider/settings for everyone | Default = GPT-4-mini                        |
| **Run** 🏃    | One execution      | Temporary override for special task    | Use Claude just for one run                 |
| **Agent** 🤖  | One agent instance | Specialized role & provider            | Math tutor = GPT-4, Travel planner = Claude |

<br>

---

<br>

## **Pro Tips**

* Use **Global Config** for API keys, logging, default provider.
* Use **Agent Config** for specializations (different models for different tasks).
* Use **Run Config** when you need **flexibility inside one conversation** (e.g., fallback model, cheaper model for summarization).
* You can **nest them**: Global → Agent → Run (Run always wins if conflicting).

---

**Example Combo:**

* Global = GPT-4-mini
* Travel Agent = Claude
* Inside one run of Travel Agent = temporarily switch to GPT-4-Turbo for hard reasoning



### **How to Configure LLM Providers**

### **Global Level (System-Wide Default)**

Best when: you want **all agents to use the same provider by default**.

```python
from openai import OpenAI

# Global config: all agents will default to GPT-4.1-mini
client = OpenAI(
    api_key="GLOBAL_API_KEY",
    default_model="gpt-4.1-mini"
)
```

✅ Every agent/run will use `gpt-4.1-mini` unless explicitly changed.

<br>

---

<br>

### **Run Level (Task-Specific Override)**

Best when: you want to **switch providers for one task**.

```python

from agents import Agent, Runner, AsyncOpenAI, OpenAIChatCompletionsModel
from agents.run import RunConfig

#Reference: https://ai.google.dev/gemini-api/docs/openai
external_client = AsyncOpenAI(
    api_key=gemini_api_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

model = OpenAIChatCompletionsModel(
    model="gemini-2.0-flash",
    openai_client=external_client
)

config = RunConfig(
    model=model,
    model_provider=external_client,
    tracing_disabled=True
)

agent: Agent = Agent(name="Assistant", instructions="You are a helpful assistant")

result = Runner.run_sync(agent, "Hello, how are you.", run_config=config)

print(result.final_output)
```

✅ Global default = GPT-4.1-mini
✅ This run = Claude 3.5

<br>

---

<br>

### **Agent Level (Per-Agent Personality/Tools)**

Best when: you want **different agents specialized with different providers**.

```python
from openai.agents import run

# Agent 1: Math tutor (uses OpenAI GPT-4)
math_agent = {
    "instructions": "You are a math tutor.",
    "model": "gpt-4"
}

# Agent 2: Travel planner (uses Anthropic Claude)
travel_agent = {
    "instructions": "You are a travel planner.",
    "model": "anthropic/claude-3.5"
}

# Run them
res1 = run(client=client, **math_agent, input="What is 23 * 47?")
res2 = run(client=client, **travel_agent, input="Find me a flight to Tokyo")

print("Math Agent:", res1.output)
print("Travel Agent:", res2.output)
```

✅ Both agents share the same `client` (global config like API key, logging).

✅ But each has **its own LLM provider + persona**.


# **Chat Completions vs Reponses API**
## **1. Chat Completions API**

* **Purpose** → Power conversational AI (like ChatGPT).
* **Input** → A *list of messages* (`system`, `user`, `assistant`, etc.) that give context to the conversation.
* **Output** → A model-generated **chat message** (usually the assistant’s reply).

**Best when:**

* You want natural back-and-forth conversation.
* You’re building **agents, chatbots, copilots, or interactive systems**.

📌 **Example:**

```python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain recursion in simple terms."}
    ]
)

print(response.choices[0].message["content"])
```

Output → `"Recursion is when a function calls itself to solve a smaller version of the problem..."`

<br>

---

<br>

## **2. Responses API**

* **Purpose** → A **newer, unified API** that goes beyond chat.
* **Input** → Similar to chat (messages), but more **flexible**:

  * Can handle **text, images, audio, video** all in one request.
  * Can run **structured outputs** (JSON mode, tools, function calling).
* **Output** → Rich responses: text, function calls, tool outputs, even multi-modal data.

✅ **Best when:**

* You need **multi-modal reasoning** (text + images + audio).
* You want **fine-grained control** over responses.
* You’re building **agentic systems** where the model not only chats but also *calls functions, routes, or generates structured data*.

📌 **Example:**

```python
from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {"role": "system", "content": "You are a JSON-only assistant."},
        {"role": "user", "content": "Give me a list of 3 fruit in JSON."}
    ],
    response_format={"type": "json_object"}
)

print(response.output_parsed)
```

Output →

```json
{
  "fruits": ["apple", "banana", "mango"]
}
```

<br>

---

<br>

# **Key Differences**

| Feature                  | Chat Completions API 🗨️    | Responses API 🧩                              |
| ------------------------ | --------------------------- | --------------------------------------------- |
| Input format             | Chat messages only          | Messages + multi-modal input                  |
| Output                   | Text messages               | Text, JSON, tool calls, structured data       |
| Multi-modal (text+image) | ❌ No                        | ✅ Yes                                         |
| Function calling/tools   | Limited                     | First-class support                           |
| Recommended for          | Conversational AI, chatbots | Agents, complex workflows, structured outputs |

<br>

---

<br>

## **TL;DR**

* **Chat Completions** = best for **straight chatbots**.
* **Responses API** = the **next-gen unified API**, great for **agents, multi-modal, structured outputs**.


(**System prompt, User prompt, Tool schema, Tool output**) are the **practical 4 categories** of content that actually go into an **LLM call inside an Agent Loop**. Let’s break them down carefully so you don’t get confused:

<br>

---

<br>

# 1. **System Prompt (Instructions / Persona)**

* Defines **who the agent is** and **how it should behave**.
* Given once at the start of the conversation (but always included in every LLM call).
* Example:

  ```
  "You are a helpful assistant that responds only in haikus."
  ```

<br>

---

<br>

# 2. **User Prompt (Input / Conversation History)**

* The **user’s message** or the latest query in the loop.
* Also includes **past turns** (conversation memory) so the model has context.
* Example:

  ```
  User: "What’s the weather in Karachi?"
  ```

<br>

---

<br>

# 3. **Tool Schema (Functions the LLM Can Call)**

* The LLM is told about the **tools available** (names, descriptions, parameters).
* This allows the model to *decide if it should call a tool instead of answering*.
* Example schema sent to the LLM:

  ```json
  {
    "name": "get_weather",
    "description": "Fetches current weather by city.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"}
      },
      "required": ["city"]
    }
  }
  ```

<br>

---

<br>

# 4. **Tool Output (Results of Previous Tool Calls)**

* If the agent already called a tool, the tool’s result is passed back **as part of the conversation history**.
* This ensures the LLM can use the result in the next reasoning step.
* Example:

  ```
  Tool[get_weather]: "The weather in Karachi is Sunny, 30°C."
  ```

<br>

---

<br>

# **How These Work Together in the Agent Loop**

👉 A **single LLM call** looks like this bundle:

```
System Prompt   → defines agent persona
User Prompt     → latest input + conversation history
Tool Schema     → functions/tools available
Tool Output     → any previous tool results
```

<br>

---

<br>

## **Example Flow**

**User:** “What’s the square root of today’s temperature in Karachi?”

1. **System Prompt** → "You are a helpful math + weather assistant."
2. **User Prompt** → "What’s the square root of today’s temperature in Karachi?"
3. **Tool Schema** → (weather API, math function API)
4. **Tool Output** → (none yet, first call)

LLM decides: *"Call get\_weather(city=Karachi)"*
→ SDK executes it → result = "30°C"

Next loop:

1. System prompt (same)
2. User prompt (conversation so far)
3. Tool schema (same)
4. Tool output = "30°C"

LLM now says: *"Call math.sqrt(30)"*
→ result = 5.477

Final loop: LLM outputs final answer: *"The square root of today’s temperature (30) is about 5.48."*

