# Chapter 5 — Running Agents with `Runner`

## 1. Intro
You execute agents with the **`Runner`**. There are three ways:

- **`Runner.run(...)`** — *async*, returns a `RunResult`.  
  Use it in apps/servers/notebooks that already have an event loop.

- **`Runner.run_sync(...)`** — *sync* wrapper around `.run`.  
  Handy for quick scripts or CLI tools.

- **`Runner.run_streamed(...)`** — *async*, returns a `RunResultStreaming`.  
  Calls the LLM in **streaming** mode and yields tokens/tool events as they arrive.




## 2. Streaming output (`Runner.run_streamed()`)

`Runner.run_streamed()` returns a **`RunResultStreaming`** object. You iterate its events asynchronously to display **token deltas**, **tool calls**, **handoffs**, etc., in real time.  
⚠️ Important: **Do not** put `await` in front of `Runner.run_streamed(...)`. Doing so raises: `TypeError: object RunResultStreaming can't be used in 'await' expression`.

Common event types:
- `raw_response_event` — model **token deltas** (great for a typewriter effect)
- `run_item_stream_event` — normalized items like `message_output_item`, `tool_call_item`
- `agent_updated_stream_event` — live updates to the agent definition (less common)

### Minimal example (print token deltas only)

In [None]:
from agents import Agent, Runner, set_tracing_disabled
from agents.extensions.models.litellm_model import LitellmModel
from openai.types.responses import ResponseTextDeltaEvent
import os
import asyncio
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv('API_KEY')
base_url = "https://api.openai.com/v1"  
chat_model = "gpt-4.1-nano-2025-04-14"  

set_tracing_disabled(disabled=True)
llm = LitellmModel(model=chat_model, api_key=api_key, base_url=base_url)

agent = Agent(
    name="Weather Assistant",
    instructions = (
    "Answer in the tone of Sir Humphrey Appleby. "
    "If the user asks about going out / outdoors / suitability of plans in a CITY, "
    "you MUST call the `get_weather` tool with that city first, then base your answer on the result."
    ),
    model=llm,
)

# Returns RunResultStreaming synchronously — do NOT await
result = Runner.run_streamed(agent, "Tell me a joke about civil servant")

# Asynchronously iterate events and print token deltas
async for event in result.stream_events():
    if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
        print(event.data.delta, end="", flush=True)



Ah, a joke about civil servants—delightfully in the spirit of good humour, if I may say so. 

Why did the civil servant bring a ladder to work? 

Because he heard the job involved reaching new heights in bureaucracy! 

Of course, it's all in good fun, and one must always respect the dedication and professionalism of civil servants.

### Streaming multiple event types — what this code does

**Goal:** run an agent in streaming mode and handle **all the interesting events**, not just token deltas.

#### What the code sets up
- `@function_tool how_many_jokes()` — a tool the agent can call; returns a **random** number (1–10).
- `agent = Agent(...)` — instructions force the agent to **first call the tool**, then tell that many jokes.
- `stream = Runner.run_streamed(...)` — starts a **streaming** run (note: no `await` here).


In [5]:

import random
from agents import Agent, Runner, ItemHelpers, function_tool


@function_tool
def how_many_jokes() -> int:
    return random.randint(1, 10)

# Build the agent
agent = Agent(
    name="Joker",
    instructions="First call the `how_many_jokes` tool, then tell that many jokes.",
    tools=[how_many_jokes],
    model=llm,  # uses your previously configured model
)

# 🔄 Run in streaming mode directly with top-level `await`
stream = Runner.run_streamed(agent, input="Hello")
print("=== Run starting ===")

async for event in stream.stream_events():
    # Ignore raw token deltas if you don't need them
    if event.type == "raw_response_event":
        continue

    # Agent definition got updated during the run
    if event.type == "agent_updated_stream_event":
        print(f"Agent updated: {event.new_agent.name}")
        continue

    # Items produced during the run
    if event.type == "run_item_stream_event":
        if event.item.type == "tool_call_item":
            print("-- Tool was called")
        elif event.item.type == "tool_call_output_item":
            print(f"-- Tool output: {event.item.output}")
        elif event.item.type == "message_output_item":
            print(f"-- Message output:\n{ItemHelpers.text_message_output(event.item)}")

print("=== Run complete ===")


=== Run starting ===
Agent updated: Joker
-- Tool was called
-- Tool output: 9
-- Tool was called
-- Tool was called
-- Tool was called
-- Tool was called
-- Tool was called
-- Tool was called
-- Tool was called
-- Tool was called
-- Tool was called
-- Tool output: 4
-- Tool output: 9
-- Tool output: 10
-- Tool output: 10
-- Tool output: 9
-- Tool output: 7
-- Tool output: 7
-- Tool output: 7
-- Tool output: 8
-- Message output:
Here are some jokes for you:
1. Joke 1
2. Joke 2
3. Joke 3
4. Joke 4
5. Joke 5
6. Joke 6
7. Joke 7
8. Joke 8
9. Joke 9

Would you like to hear the jokes?
=== Run complete ===


## 3. List input (multi-message history)

`Runner.run(...)` accepts either:
- a **string** (treated as one user message), or  
- a **list of messages** in OpenAI Responses API format: `{"role": "...", "content": "..."}`.

Passing a list lets you inject a short **conversation history** at call time.

### Example

In [13]:
import asyncio
from agents.extensions.models.litellm_model import LitellmModel
from agents import Agent, Runner, set_tracing_disabled
import os
import asyncio
from dotenv import load_dotenv


agent = Agent(
    name="Assistant",
    instructions="Answer uesrs query",
    model=llm,
)

messages = [
    {"role": "user", "content": "Who am I?"},
    {"role": "user", "content": "Where am I from?"},
    {"role": "user", "content": "Where am I going?"},
    ]


result = await Runner.run(
    agent,
    input=messages,
)
print(result.final_output)


I'm sorry, but I don't have enough information to determine where you're from or where you're going. Could you please provide more details or clarify your question?


#### Notes

- The last user message is what the agent directly answers, while earlier ones are context.

- You can include "assistant" or "system" messages if you need to seed prior turns or override guidance for this run.

- The agent’s instructions already act like a system/developer message; only add a "system" message to augment/override them temporarily.

- All features (tools, guardrails, handoffs, hooks) work the same; only input shape changes.

## 4. Multi-turn conversations (manually stitching history)

`Runner.run(...)` is stateless by default. To make it feel like a **continuous chat**, take the previous turn’s result and convert it into a message list, then append your next user message.

### How it works
- `result.to_input_list()` returns a list of messages (user/assistant/system) representing the **prior turn**.
- You **append** your new user message to that list.
- Pass the combined list back to `Runner.run(...)`.

In [8]:
import asyncio
from agents.extensions.models.litellm_model import LitellmModel
from agents import Agent, Runner, set_tracing_disabled
import os
import asyncio
from dotenv import load_dotenv


agent = Agent(name="Assistant", model=llm, instructions="Reply very concisely.")


# First turn
result = await Runner.run(agent, "What city is the Golden Gate Bridge in?")
print(result.final_output)
# San Francisco

# Second turn
new_input = result.to_input_list() + [{"role": "user", "content": "What state is it in?"}]
result = await Runner.run(agent, new_input)
print(result.final_output)


San Francisco
California


## 5. Terminal human-in-the-loop (`run_demo_loop`)

`run_demo_loop(agent)` starts a **REPL-like chat** in your terminal:
- Prompts you for input each turn
- **Keeps history** between turns
- Streams model output by default (typewriter-style)
- Type `quit` or `exit` to leave (on *nix: `Ctrl-D`; on Windows: `Ctrl-Z` then Enter)

### Minimal use (notebook)


In [None]:
import asyncio
from agents.extensions.models.litellm_model import LitellmModel
from agents import Agent, Runner, set_tracing_disabled, run_demo_loop
import os
import asyncio
from dotenv import load_dotenv


agent = Agent(name="Assistant", model=llm, instructions="Reply very concisely.")

await run_demo_loop(agent)



[Agent updated: Assistant]
Hello! How can I assist you today?

[Agent updated: Assistant]
I'm here to help. How can I assist you?

[Agent updated: Assistant]
Goodbye! Have a great day!
