📙 [Open in Colab][1]

[1]: https://colab.research.google.com/drive/1Mpwhfm6Oyn__kwzYpTNvXJ9I7eeqdDMn?usp=sharing

# **Context** in LLMs & Agents

---

## 1. **What** is Context?

* **Context** = the **information the model has access to when generating a response**.
* This includes:

  1. **Conversation history** (previous messages).
  2. **System instructions** (e.g., “You are a helpful tutor”).
  3. **User’s latest prompt/question**.
  4. **Extra knowledge** (e.g., from RAG, memory, or tools).

👉 Think of **context** as the **“short-term memory”** of the AI during a conversation.

<br>

---

<br>

## 2. **How Context Works in LLMs**

* LLMs don’t actually “remember” past chats by themselves.
* Instead, **all relevant history is re-sent to the model** with every new request.
* The model generates answers based only on **the given context window**.

<br>

---

<br>

## 3. **Context Window**

* The **context window** = the maximum number of tokens (chunks of text) the model can consider at once.
* Example:

  * GPT-4.1 has \~128k token context → \~300 pages of text.
  * GPT-4o-mini has \~128k tokens too.

👉 If your conversation or documents are **longer than the window**, older parts must be **dropped or summarized**.

<br>

---

<br>

## 4. **Context in Different Settings**

1. **Chatbot** → Context = conversation history + user instructions.

   * “Hi!” → “Hello, how can I help you?”
   * If context lost → chatbot forgets who you are.

2. **RAG (Retrieval-Augmented Generation)** → Context = query + retrieved docs.

   * You ask: “What’s our refund policy?”
   * Retriever fetches doc → doc inserted into context → LLM answers.

3. **Agents SDK** → Context = messages + tool outputs + system instructions.

   * The agent sees:

     * User query
     * Past steps
     * Tool responses
   * Decides next action.

<br>

---

<br>

## 5. **Example (OpenAI Chat API Context)**

```python
from openai import OpenAI

client = OpenAI()

messages = [
    {"role": "system", "content": "You are a friendly tutor."},
    {"role": "user", "content": "Explain Newton's First Law."},
    {"role": "assistant", "content": "It says an object in motion stays in motion..."},
    {"role": "user", "content": "Give me an example."}
]

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=messages
)

print(response.choices[0].message.content)
```

👉 The assistant answers correctly because the **previous conversation is in context**.

<br>

---

<br>

## 6. **Why Context is Important**

* ✅ **Makes conversations coherent** (remembers flow).
* ✅ **Grounds answers** (via RAG, external knowledge).
* ✅ **Controls behavior** (system instructions set the role).
* ⚠️ **Limited** by context window (old info gets cut off).

<br>

---

<br>

## 7. **Cheat-Sheet**

| Term               | Meaning                                                      |
| ------------------ | ------------------------------------------------------------ |
| Context            | Info the model has access to for answering.                  |
| Context Window     | Max number of tokens model can consider.                     |
| Sources of Context | Conversation history, instructions, documents, tool outputs. |
| Limitation         | Old info is lost if it exceeds the window.                   |

---

✅ **Summary**:
**Context** is the **short-term memory** of LLMs — the set of messages, instructions, and data the model sees at once. It is limited by the **context window**. Without context, the model cannot maintain coherent conversations or give grounded answers.


# **Context Types in LLMs & Agents**

---

## 1. **Local Context**

* **Definition**: The **immediate, short-term information** given to the model for a single request or turn.
* Includes:

  * User’s **current query**
  * A few **recent conversation messages**
  * Any **retrieved documents** (in RAG)
  * **Tool outputs** relevant to that step

👉 Think of it as **the notes in front of you right now** — only what you need to answer the current question.

**Example**:

* User: *“Explain Newton’s First Law.”*
* Assistant: *“It says an object in motion stays in motion...”*
* User: *“Give me an example.”*

  * **Local context** = “Give me an example” + the assistant’s last answer about Newton’s First Law.

<br>

---

<br>

## 2. **LLM Context**

* **Definition**: The **entire set of tokens** (messages, docs, instructions) that are fed into the **model’s context window** during inference.
* Determined by the model’s **context window size** (e.g., GPT-4.1 → 128k tokens).
* It is the **raw input the model sees all at once**.

👉 Think of it as the **page(s) of text you can fit into your working memory at one time**.

**Example**:
If you paste a **50-page document** into GPT-4.1 and ask for a summary, all 50 pages are part of the **LLM context** (as long as they fit inside the token limit).

<br>

---

<br>

## 3. **Agent Context**

* **Definition**: The **broader operational context** that an **Agent** in the OpenAI Agents SDK has when reasoning and acting.
* Includes:

  * **System instructions** (e.g., “You are a helpful tutor”).
  * **Conversation history** (whatever is passed to the agent).
  * **Local context** (latest query + retrieved/tool results).
  * **Tool outputs** (like calculator results, API calls).
  * **Agent state** (e.g., memory of decisions made in the workflow).

👉 Think of it as the **workspace of an agent**: It not only has the local text, but also knows about tools it can call, results it has seen, and its role.

**Example** (Agent solving a math problem):

* User: *“What’s the square root of 256?”*
* Agent context includes:

  * System: *“You are a math solver agent.”*
  * User’s latest question.
  * Knowledge of available tool: *Calculator.*
  * Tool call result: *Calculator → 16.*
* The agent uses this **context bundle** to decide its final answer.

<br>

---

<br>

## 4. **Relationship Between Them**

* **Local Context** ⬅️ a subset of the conversation/docs that are **immediately relevant**.
* **LLM Context** ⬅️ the **full prompt input** (all tokens) that actually fit inside the model’s window.
* **Agent Context** ⬅️ bigger picture → includes **local context + tool state + workflow info**, so the agent can plan actions.

<br>

---

<br>

## 5. **Analogy**

* **Local Context** = Notes on the desk (short-term info you’re actively using).
* **LLM Context** = Everything you can fit into your working memory at once.
* **Agent Context** = Your entire office → notes, tools, calculator, whiteboard, past results.

<br>

---

<br>

## 6. **Comparison Table**

| Type              | Definition                                      | Scope                                                                  | Example                                  |
| ----------------- | ----------------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------- |
| **Local Context** | Immediate info for current turn                 | Short-term (latest query, recent msgs, retrieved docs, tool outputs)   | “Give me an example” + last answer       |
| **LLM Context**   | The full input tokens inside the model’s window | Limited by token size (e.g., 128k)                                     | 50-page doc fed into GPT-4.1             |
| **Agent Context** | The agent’s full operational workspace          | Includes system role, conversation, local context, tool calls, results | Tutor agent + memory + calculator output |

---

✅ **Summary**:

* **Local context** = immediate relevant info.
* **LLM context** = the total tokens the model sees in its context window.
* **Agent context** = the wider state available to an Agent (system role, local context, tools, results, workflow state).


## **Installation**

In [None]:
!pip install openai-agents

Collecting openai-agents
  Downloading openai_agents-0.2.10-py3-none-any.whl.metadata (12 kB)
Collecting griffe<2,>=1.5.6 (from openai-agents)
  Downloading griffe-1.13.0-py3-none-any.whl.metadata (5.1 kB)
Collecting openai<2,>=1.102.0 (from openai-agents)
  Downloading openai-1.104.2-py3-none-any.whl.metadata (29 kB)
Collecting types-requests<3,>=2.0 (from openai-agents)
  Downloading types_requests-2.32.4.20250809-py3-none-any.whl.metadata (2.0 kB)
Collecting colorama>=0.4 (from griffe<2,>=1.5.6->openai-agents)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Downloading openai_agents-0.2.10-py3-none-any.whl (178 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.9/178.9 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading griffe-1.13.0-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.4/139.4 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading openai-1.104.2-py3-none-any.whl (928 kB

In [None]:
import nest_asyncio
nest_asyncio.apply()

## **Config**

In [None]:
from agents import AsyncOpenAI, OpenAIChatCompletionsModel
from google.colab import userdata
import os

GEMINI_API_KEY = userdata.get("GEMINI_API_KEY")

external_client = AsyncOpenAI(
    api_key = GEMINI_API_KEY,
    base_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
)

model = OpenAIChatCompletionsModel(
    model = "gemini-2.5-flash",
    openai_client = external_client
)

In [None]:
from agents import set_tracing_disabled
set_tracing_disabled(True)

## **Local Context**

In [None]:
import asyncio
from dataclasses import dataclass

from agents import Agent, RunContextWrapper, Runner, function_tool

@dataclass
class UserInfo:
  name: str
  age: int
  uid: int
  location: str = "Pakistan"

@function_tool
async def fetch_user_age(wrapper: RunContextWrapper[UserInfo]) -> str:
  '''Returns the age of the user.'''
  return f"User {wrapper.context.name} is {wrapper.context.age} years old."

@function_tool
async def fetch_user_location(wrapper: RunContextWrapper[UserInfo]) -> str:
  '''Returns the location of the user.'''
  return f"User {wrapper.context.name} is from {wrapper.context.location}"

async def main():
    user_info = UserInfo(name="Ayesha", uid=1234, age=19)

    agent = Agent[UserInfo](
        name= "Assistant",
        tools=[fetch_user_age, fetch_user_location],
        model = model,
    )

    result = await Runner.run(
        starting_agent = agent,
        input="What is the current age of the user? and current location?",
        context=user_info,
    )

    print(result.final_output)

if __name__ == "__main__":
  asyncio.run(main())

The user is 19 years old and is from Pakistan.


In [None]:
import asyncio
from dataclasses import dataclass

from agents import Agent, RunContextWrapper, Runner, function_tool

@dataclass
class UserInfo:
  name: str
  age: int
  uid: int
  location: str = "Pakistan"

@function_tool
async def fetch_user_age(wrapper: RunContextWrapper[UserInfo]) -> str:
  '''Returns the age of the user.'''
  return f"User {wrapper.context.name} is {wrapper.context.age} years old."

@function_tool
async def fetch_user_location(wrapper: RunContextWrapper[UserInfo]) -> str:
  '''Returns the location of the user.'''
  return f"User {wrapper.context.name} is from {wrapper.context.location}"

@function_tool
async def greet_user(context: RunContextWrapper[UserInfo], greeting:str) -> str:
  """Greet the user with thier name.
  Args:
  greeting: A specialized greeting message for user
  """

  name = context.context.name
  return f"Hello {name}, {greeting}"

async def main():
    user_info = UserInfo(name="Ayesha", uid=1234, age=19)

    agent = Agent[UserInfo](
        name= "Assistant",
        tools=[greet_user],
        model = model,
        instructions="Always greet the user using <function_call>greet_user</function_call> and welcome them"
    )

    result = await Runner.run(
        starting_agent = agent,
        input="hello",
        context=user_info,
    )

    print(result.final_output)

if __name__ == "__main__":
  asyncio.run(main())

Hello Ayesha, Hello! Welcome.



## 1. **Static Instructions (the `instructions` field)**

When you build an `Agent`, you give it a **base system prompt**:

```python
agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant",
    model=model,
    tools=[get_weather],
)
```

This `instructions` string is like the **default personality / role definition**. It stays fixed across all runs of this agent, unless you explicitly override it.

<br>

---

<br>

## 2. **Dynamic Instructions**

Sometimes you don’t want to hard-code everything into the `instructions`.
Instead, you want to **inject context dynamically** at runtime, depending on the conversation, external signals, or function outputs.

Examples:

* User preferences (tone, language, formality)
* Current date/time or session metadata
* Tool results (like weather, database values, app state)
* Business rules or constraints

The `agents` library allows you to pass **extra context / dynamic instructions** into each run so that the agent adapts without modifying its base config.

<br>

---

<br>

## 3. **How it works in practice**

You can add dynamic instructions when you call the runner:

```python
result = Runner.run_sync(
    agent,
    "What should I wear today?",
    context={
        "dynamic_instructions": "The user is currently in Karachi, where it's 40°C and very sunny."
    }
)
```

Inside the run, the model receives something like:

```
System: You are a helpful assistant.
System (dynamic): The user is currently in Karachi, where it's 40°C and very sunny.
User: What should I wear today?
```

So now the agent answers with awareness of that injected context.

---

## 4. **Difference between context vs. dynamic instructions**

* **Context** → Structured information passed to the agent runner (can include conversation history, tool outputs, metadata).
* **Dynamic instructions** → A special field inside `context` that lets you inject **extra system-level guidance** for just that run.

Think of it as:

* `instructions` = permanent system role.
* `dynamic_instructions` = per-run override/augmentation.

---

## 5. **Why useful**

* You can keep your base agent lightweight.
* You can adapt behavior per-user or per-query.
* You can insert real-time knowledge without retraining or hard-coding.

---

👉 Example:

```python
user_profile = "The user prefers answers in short bullet points."
weather = get_weather("Karachi")

result = Runner.run_sync(
    agent,
    "What is the weather?",
    context={
        "dynamic_instructions": f"{user_profile} Also, use this weather info: {weather}"
    }
)

print(result.final_output)
```

Now the agent will combine:

* Base: *"You are a helpful assistant"*
* Dynamic: *"The user prefers answers in short bullet points. Also, use this weather info: …"*
* User input: *"What is the weather?"*

---

⚡ So in short:

* **Context** = everything you can pass into the run.
* **Dynamic instructions** = a special part of context that injects extra *system-level guidance* dynamically.



## What is `context`?

When you run an agent, you can pass a `context` dictionary.
It’s a structured bag of information that’s carried alongside the conversation and affects how the agent thinks/responds.

Think of it as **the runtime state/environment** the agent sees in addition to the user’s input.

<br>

---

<br>


##  What goes inside `context`?

Different things depending on your use case, but the framework usually treats these as meaningful:

1. **Conversation state**

   * Previous messages, memory, or dialogue history.

2. **Dynamic instructions** (special key)

   * Extra system-level guidance injected only for this run.
   * Example: `"dynamic_instructions": "The user prefers short bullet points."`

3. **Session / user metadata**

   * Things like user preferences, role, location, time zone, etc.
   * Example: `{"user": {"id": "123", "name": "Ali", "language": "ur"}}`

4. **External knowledge / environment info**

   * Values fetched from APIs, sensors, or tools.
   * Example: `{"weather": "Sunny, 40°C in Karachi"}`

<br>

---

<br>

##  How is `context` used?

When you call:

```python
result = Runner.run_sync(
    agent,
    "What should I wear today?",
    context={
        "dynamic_instructions": "The user is in Karachi where it's 40°C.",
        "user": {"name": "Ali", "prefers_style": "casual"}
    }
)
```

Internally, the agent runtime merges:

* **Static instructions** (set when you defined the agent)
* **Dynamic instructions** (from context)
* **Other context fields** (which tools or templates may use)
* **User input**

So the model prompt effectively looks like:

```
System: You are a helpful assistant.
System (dynamic): The user is in Karachi where it's 40°C.
User metadata: {"name": "Ali", "prefers_style": "casual"}
User: What should I wear today?
```

---

##  Why `context` is important

* Keeps **static role** separate from **dynamic signals**.
* Lets you **inject real-time state** without rewriting the agent.
* Makes agents more **modular** (tools or middleware can add to context automatically).
* Enables things like **multi-user sessions** or **stateful conversations**.

---

✅ **In short**:
`context` is the structured runtime data you pass into an agent run.
It may include **dynamic instructions**, **user/session info**, and **environment state**.
The agent uses it to answer more accurately and in a way tailored to that moment.
