# Workshop: Two-Agent Health Checker on AMD GPUs (vLLM + MCP)

In this hands-on, you’ll build a **two-model, two-agent system** designed to answer questions like:

> **“Is this snack OK for someone with high blood pressure?”**

The system combines a conversational **Orchestrator**, which browses for up-to-date ingredient information, with a focused **Hypertension Consultant** that evaluates the ingredient list and delivers a clear judgment for someone watching their blood pressure: OK, Caution, or Avoid.

---

## Architecture


![Arch overview](./multi-agent.jpg)

## What you’ll build

- **Orchestrator Agent** (Qwen3-30B-A3B-Instruct-2507): Engages in natural conversation, performs **mandatory browsing** via MCP/Exa to gather up-to-date ingredient data, extracts a clean ingredient list, and then calls the appropriate tool.
- **Consultant Agent** (GPT-OSS-120B): Analyzes an ingredient list specifically for high blood pressure concerns.

## What you’ll learn

- Serving two large open models with **vLLM** on **AMD ROCm**.
- Building agents and tools using **Pydantic AI**.
- Integrating **MCP** browsing through Exa to enable agents to fetch real-time facts.


## Agenda:
1. Install dependencies and serve **both model endpoints on one AMD GPU**.
1. Build the **Consultant** Agent (GPT-OSS-120B): Create a JSON-only verdict generator (no browsing) that evaluates ingredients for high blood pressure risks.
1. Wrap the Consultant as a tool: Implement `consult_hypertension` to analyze ingredient lists programmatically.
1. Build the **Orchestrator** (Qwen3-30B-A3B-Instruct-2507): Set the core intent —
“Always call web_search_exa(query) to retrieve current ingredient data; never rely on memory alone.”
1. End-to-end runs: Analyze real-world examples such as KitKat (US).

### Step 0: Start model endpoints (run in separate terminals before the notebook)

Before running the notebook, you need to start both model endpoints in separate terminals. This ensures the Orchestrator and Consultant agents are available for the rest of the workshop.

#### Consultant (GPT-OSS-120B) on port 9000
First, start the GPT-OSS-120B for the consultant.

**Note**: The following command needs to be run in a separate terminal.

```bash
# Consultant (GPT-OSS-120B) on :9000
vllm serve /models/gpt-oss-120b \
  --tensor-parallel 1 \
  --no-enable-prefix-caching \
  --port 9000 \
  --compilation-config '{"full_cuda_graph": true}' \
  --gpu-memory_utilization 0.5
  
```

#### Orchestrator (Qwen/Qwen3-30B-A3B-Instruct-2507) on port 9001
Once GPT-OSS has fully started, launch the Qwen/Qwen3-30B-A3B-Instruct-2507 model to run the Orchestrator.

**Note**: Starting both models at the same time may cause out-of-memory errors. To avoid this, wait for the first model to finish loading before starting the second. The following command also needs to be run in a separate terminal.

```bash
# Orchestrator (Qwen3-30B-A3B-Instruct-2507) on :9001
VLLM_USE_TRITON_FLASH_ATTN=0 vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 \
  --served-model-name /models/Qwen3-30B-A3B-Instruct-2507 \
  --port 9001 \
  --enable-auto-tool-choice \
  --tool-call-parser hermes \
  --trust-remote-code \
  --gpu-memory_utilization 0.45
```

### Step 1: Test Model Endpoints
Let’s verify that both endpoints are running and accessible.

In [None]:
# Test the Consultant (GPT-OSS-120B) endpoint
!curl http://localhost:9000/v1/models

Great! Now let's begin building our agentic health checker.

### Step 2: Install Dependencies

Install the PydanticAI dependencies using this command:

In [None]:
!pip install -q pydantic_ai openai

### Step 3: Consultant Agent Setup


Now, let’s define the Consultant agent with GPT-OSS-120B model you started earlier.

In [None]:
from openai import OpenAI
import httpx

client = OpenAI(
    base_url="http://localhost:9000/v1",
    api_key="EMPTY",
    http_client=httpx.Client(http2=False),
)

This agent will analyze ingredient lists and return a JSON response. Let's give it a try by giving it a simple list of "tofu, soy sauce, oil, sugar".

In [None]:
resp = client.responses.create(
    model="/models/gpt-oss-120b",
    input="Analyze these ingredients and return a JSON response: tofu, soy sauce, oil, sugar"
)

# Find only assistant messages and print their text
for item in resp.output:
    if item.type == "message":
        for c in item.content:
            if c.type == "output_text":
                print(c.text)

### Step 4: Build the Consultant Agent with Pydantic AI

Now, let's use Pydantic AI to turn the Consultant model into an agent to do this task. By defining a system prompt, we can guide the model to act as the consultant we need analyzing ingredients for high blood pressure risks and returning the results as structured JSON.

In [None]:
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider

provider = OpenAIProvider(base_url="http://localhost:9000/v1", api_key="EMPTY")
consultant_model = OpenAIChatModel("/models/gpt-oss-120b", provider=provider)

system_prompt = """
You are a Hypertension Food Consultant.
Your job is to read an ingredient list (raw text or JSON) and decide whether the item
should be avoided, used with caution, or is generally OK for someone with high blood pressure.
Use your general nutrition knowledge only. Be brief, neutral, and safety-first.
If information is insufficient, say so and lean conservative.

Return JSON only:
{
  "avoid": [ {"name":"string","reason":"≤12 words"} ],
  "caution": [ {"name":"string","reason":"≤12 words"} ],
  "overall": "avoid|caution|ok",
  "notes": ["optional short tip"],
  "disclaimer": "Educational only; not medical advice."
}

Rules of output:
- Extract and normalize ingredients from whatever text is provided.
- List only ingredients actually present.
- Keep reasons short and factual.
- Choose overall based on your judgment, prioritizing safety.
"""

consultant_agent = Agent(
    model=consultant_model,
    system_prompt=system_prompt,
)

### Step 5: Run the Consultant Agent
Test the Consultant agent with another sample ingredient list with some less familiar ingredients.

In [None]:
async def run_async(prompt: str) -> str:
    async with consultant_agent:
        result = await consultant_agent.run(prompt)
        return result.output

await run_async("Ingredients: Glucose syrup, Dipotassium glycyrrhizate (E958), Natural flavors, Caramel color.")

### Step 6: Orchestrator (Phase 1) – Fetch Ingredients

Begin by configuring the Orchestrator model to both communicate and perform web browsing, but do not have it call the Consultant agent just yet. This approach allows you to test and debug the browsing functionality independently, ensuring it works correctly before integrating the Consultant into the workflow. Let's review the next steps:
1. Test model endpoint.
2. Setup Orchestrator agent without any browsing tool.
3. Setup Exa MCP client to connect to the orchestrator.
4. Add Exa MCP to the Orchestrator.
5. Wrap the consultant agent we created as a tool and connect to the orchestrator. 

Before we start, let's test the model end-point we ran earlier and ensure we can connect to it.

In [None]:
# Test the Orchestrator (Qwen3-30B-A3B-Instruct-2507) endpoint
!curl http://localhost:9001/v1/models

Time to setup the orchestrator.

In [None]:
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider

# Point to your Qwen3 endpoint (adjust if different port/model name)
orch_provider = OpenAIProvider(base_url="http://localhost:9001/v1", api_key="EMPTY")
orch_model = OpenAIChatModel("/models/Qwen3-30B-A3B-Instruct-2507", provider=orch_provider)

ORCH_SYS_PHASE1 = """
You are an Ingredient Orchestrator.
Goal: When the user names a packaged snack, Return ONLY a cleaned ingredient list (bullet or comma-separated). 
If ambiguous, ask one clarifying question.
Keep response ≤5 lines. Educational, not medical advice.
"""

orchestrator_phase1 = Agent(
    model=orch_model,
    system_prompt=ORCH_SYS_PHASE1,
)

Let's test the orchestrator.

In [None]:
async with orchestrator_phase1:
    demo = await orchestrator_phase1.run("Ingredients for Kitkat (USA)?")
    print(demo.output)

### Step 7: Enable Web Search (Exa via MCP)

To ensure the Orchestrator always provides accurate and up-to-date ingredient information, we’ll equip it with a browsing tool rather than relying on guesses or outdated data.

Why use Exa with MCP?
- MCP wraps remote capabilities as tools the model can call.
- Exa provides fresh ingredient data directly from manufacturer and retailer websites.
- By launching an MCP “remote” process, we expose the `web_search_exa` tool, allowing the Orchestrator to perform live web searches seamlessly during its workflow.

Prerequisites:
1. Node.js (needed for the `mcp-remote` launcher).
2. EXA_API_KEY exported in your environment before starting the notebook.


In [None]:
# Linux-only example (skip on Windows; install Node manually there)
!curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - || true
!apt-get install -y nodejs || true

### Step 8: Launch Exa MCP Tool (Inside Python)

Create an `MCPServerStdio` that will spawn a Node process exposing tools like `web_search_exa(query)`.

In [None]:
import os
from pydantic_ai.mcp import MCPServerStdio

EXA_API_KEY = os.environ["EXA_API_KEY"]  # must exist
exa_server = MCPServerStdio(
    "npx",
    args=[
        "-y",
        "mcp-remote",
        f"https://mcp.exa.ai/mcp?exaApiKey={EXA_API_KEY}",
    ],
)

### Step 9: Orchestrator (Phase 1) – Fetch Ingredients

Begin by configuring the Orchestrator model to both communicate and perform web browsing, but do not have it call the Consultant agent just yet. This approach allows you to test and debug the browsing functionality independently, ensuring it works correctly before integrating the Consultant into the workflow.

In [None]:
ORCH_SYS_PHASE1 = """
You are an Ingredient Orchestrator with tool access.

Hard requirement:
• ALWAYS use the tool web_search_exa(query: str) to look up the current ingredients. Do not answer from memory, even if confident.
• Do not produce a final answer until you have called web_search_exa at least once in this turn.

Goal:
• Talk naturally with the user.
• When they name a store snack or dish, use web_search_exa to find a reliable, up-to-date ingredient list (prefer manufacturer; then major retailers). 
• Once you have a plausible list, list the ingredients. 

Search guidance:
• Construct precise queries: "<brand> <product> <flavor> ingredients", prioritize manufacturer domain; if ambiguous (country/flavor/size), ask ONE clarifying question first.
• Extract exactly what the page lists; preserve order; trim whitespace; dedupe obvious repeats.

Fallbacks:
• If you cannot find a reliable list after reasonable attempts, ask one brief clarifying question (brand/flavor/country). If still unclear, ask the user to paste the ingredients and stop.

Rules:
• Keep answers short (≤6 lines). Do NOT show internal JSON or tool details.
• Share 1-2 source links ONLY if the user asks.
"""

orchestrator_phase1 = Agent(
    model=orch_model,
    system_prompt=ORCH_SYS_PHASE1,
    toolsets=[exa_server],
)

Test the Orchestrator, it should call web_search_exa internally and respond with its ingredients list.

In [None]:
async with orchestrator_phase1:
    demo = await orchestrator_phase1.run("Ingredients for Kitkat (USA)?")
    print(demo.output)

### Step 9: Wrap Consultant as a Tool

Now expose the existing `consultant_agent` (from earlier steps) as a callable tool `consult_hypertension(ingredients)` so we can add it as an agent.

In [None]:
import json
from typing import List, Optional
from pydantic_ai import Tool

async def consult_hypertension_fn(
    ingredients: List[str],
    sodium_mg_per_serving: Optional[int] = None,
    product_name: Optional[str] = None,
) -> dict:
    payload = {"product_name": product_name, "ingredients": ingredients}
    if sodium_mg_per_serving is not None:
        payload["nutrition"] = {"sodium_mg_per_serving": sodium_mg_per_serving}

    async with consultant_agent:
        res = await consultant_agent.run(json.dumps(payload))

    try:
        return json.loads(res.output)
    except Exception:
        return {
            "avoid": [],
            "caution": [],
            "overall": "uncertain",
            "notes": ["consultant returned non-JSON"],
            "disclaimer": "Educational only; not medical advice.",
        }

consult_hypertension = Tool(
    consult_hypertension_fn,
    name="consult_hypertension",
    description="Evaluate ingredient list for high BP; returns JSON verdict.",
)

### Step 10: Orchestrator (Phase 2) – Enforcing Mandatory Search and Consultant Call
To ensure the agents operate reliably and consistently, it’s important to clearly define the workflow and impose necessary restrictions on the Orchestrator when we add the Consultant.

Our enforcement rules are:
1. Always perform a `web_search_exa` call before finalizing any decisions. This guarantees the use of up-to-date ingredient information.
1. After gathering the ingredient list, invoke the `consult_hypertension` agent to analyze potential health impacts.
1. Provide a concise summarized verdict that categorizes the result as OK, Caution, or Avoid, along with 1 to 3 key reasons supporting the decision.

In [None]:
ORCH_SYS_FINAL = """
You are an Ingredient Orchestrator with tool access.

Hard requirement:
• ALWAYS call web_search_exa(query) to obtain current ingredients (never answer from memory).
• Do not give a verdict until you have performed at least one search this turn.

Flow:
1. Clarify brand/flavor/country only if ambiguous (ask one short question).
2. Use web_search_exa to gather a reliable ingredient list (prefer manufacturer domain).
3. Clean + dedupe; exclude allergen “contains” lines & marketing claims.
4. Call consult_hypertension(ingredients[]) for verdict.
5. Return concise answer: Verdict (OK / Caution / Avoid) + 1-3 concrete reason snippets (e.g., “high sodium”, “contains MSG”).

Rules:
• ≤6 lines total.
• No raw tool JSON or internal traces.
• Provide links only if user asks.
• Educational, not medical advice.
"""

orchestrator = Agent(
    model=orch_model,
    system_prompt=ORCH_SYS_FINAL,
    toolsets=[exa_server],
    tools=[consult_hypertension],
)

Awesome, now you can do an end-to-end test that uses two agents (Orchestrator and Consultant) and two models (Qwen3-30B-A3B-Instruct-2507 and GPT-OSS-120B).

In [None]:
async with orchestrator:
    result = await orchestrator.run("Is KitKat (US) okay for high blood pressure?")
    print(result.output)