# Simple Llama Stack interaction with Agents API (RHOAI)

This notebook shows a minimal **“hello world”** interaction with Llama Stack:

- It assumes the **RHOAI Llama Stack image**  
  `rhoai/odh-llama-stack-core-rhel9:v3.0`.
- It connects to Llama Stack via `LLAMA_BASE_URL`.
- It uses the **Agents API** (no RAG, no MCP) to:
  - Select a model
  - Create a simple chat agent (no tools)
  - Create a session and send a single question
  - Display the answer


## 1. Install dependencies

Install the `llama-stack-client` Python SDK (matching the server version) plus
helpers for environment variables and coloured output.


In [1]:
%pip install --quiet "llama-stack-client==0.3.0" python-dotenv termcolor



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## 2. Connect to Llama Stack and select an LLM

This cell:

- Loads configuration from a `.env` file (if present).
- Connects to the Llama Stack instance exposed by
  `rhoai/odh-llama-stack-core-rhel9:v3.0` via `LLAMA_BASE_URL`.
- Lists available models and selects a suitable LLM:
  - Prefers the `vllm-inference` provider,
  - Falls back to the first available LLM otherwise.


In [2]:
import os
from dotenv import load_dotenv
from termcolor import cprint
from llama_stack_client import LlamaStackClient

# Load environment variables from .env (LLAMA_BASE_URL, etc.)
load_dotenv()

# Base URL of the Llama Stack server
base_url = os.getenv(
    "LLAMA_BASE_URL",
    "http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321",
).rstrip("/")

client = LlamaStackClient(base_url=base_url)
print(f"Connected to Llama Stack server: {base_url}")

# List models so we can see what's available
models = list(client.models.list())
print("\nAvailable models:")
for m in models:
    ident = getattr(m, "identifier", None) or getattr(m, "model_id", None) or str(m)
    print(
        f" - {ident} "
        f"(type={getattr(m, 'model_type', None)}, provider={getattr(m, 'provider_id', None)})"
    )

# Prefer a vLLM-backed LLM if available, otherwise just take the first LLM
llm = next(
    (
        m
        for m in models
        if getattr(m, "model_type", None) == "llm"
        and getattr(m, "provider_id", None) == "vllm-inference"
    ),
    None,
)

if not llm:
    llm = next((m for m in models if getattr(m, "model_type", None) == "llm"), None)

assert llm, "No LLM models available on Llama Stack"

model_id = getattr(llm, "identifier", None) or getattr(llm, "model_id", None)
print(f"\nUsing model: {model_id}")


INFO:httpx:HTTP Request: GET http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/models "HTTP/1.1 200 OK"


Connected to Llama Stack server: http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321

Available models:
 - granite-embedding-125m (type=embedding, provider=sentence-transformers)
 - vllm-inference/llama-4-scout-17b-16e-w4a16 (type=llm, provider=vllm-inference)
 - sentence-transformers/nomic-ai/nomic-embed-text-v1.5 (type=embedding, provider=sentence-transformers)

Using model: vllm-inference/llama-4-scout-17b-16e-w4a16


## 3. Define simple agent instructions

This cell defines a minimal system prompt for a friendly assistant.
No tools, no RAG, no MCP — just pure LLM behaviour via the Agents API.


In [3]:
simple_system_prompt = """
You are a friendly technical assistant.
Answer clearly and concisely in 2–4 sentences.

If the user asks about the Special Payment Project, explain it at a high level:
- It is a simulated payments application used in demos.
- It includes checkout frontend, checkout API, and a backing payment service.
- It runs on Kubernetes/OpenShift and is used to demonstrate AIOps workflows.
""".strip()

print(simple_system_prompt)


You are a friendly technical assistant.
Answer clearly and concisely in 2–4 sentences.

If the user asks about the Special Payment Project, explain it at a high level:
- It is a simulated payments application used in demos.
- It includes checkout frontend, checkout API, and a backing payment service.
- It runs on Kubernetes/OpenShift and is used to demonstrate AIOps workflows.


## 4. Create an Agent with no tools (Agents API)

This cell uses the **Agents API** to create an `Agent` that:

- Uses the selected model.
- Has **no tools** attached (no RAG, no MCP).
- Uses the simple system prompt from the previous cell.


In [4]:
from llama_stack_client import Agent

simple_agent = Agent(
    client,
    model=model_id,
    instructions=simple_system_prompt,
    tools=[],  # no RAG, no MCP, no tools at all
)

print("Simple Agent created (no tools).")


Simple Agent created (no tools).


## 5. Create a session and ask a simple question

This cell:

1. Creates a lightweight Agent **session**.
2. Sends a single user question (“What is the Special Payment Project?”).
3. Runs a non-streaming **turn** and prints the raw result type.


In [5]:
from termcolor import cprint

question = "What is the Special Payment Project and how would you explain it to a new SRE?"

messages = [
    {"role": "user", "content": question},
]

cprint("User message:", "green")
print(question)

# 1) Create a session for the simple agent
session = simple_agent.create_session(session_name="simple-demo")
session_id = getattr(session, "id", None) or getattr(session, "session_id", None) or str(session)
print("\nSession ID:", session_id)

# 2) Run a single non-streaming turn
simple_result = simple_agent.create_turn(
    messages=messages,
    session_id=session_id,
    stream=False,
)

print("\nRaw result type:", type(simple_result))


INFO:httpx:HTTP Request: POST http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/conversations "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/responses "HTTP/1.1 200 OK"


[32mUser message:[0m
What is the Special Payment Project and how would you explain it to a new SRE?

Session ID: conv_81ed3e2160c1549e2ebc0db6fbe3912426989833cfb23941

Raw result type: <class 'llama_stack_client.types.response_object.ResponseObject'>


## 6. Display the assistant’s answer

This final cell extracts and prints the plain-text answer from the
Agents API `ResponseObject`, so you can show a simple “hello world”
style interaction with Llama Stack.


In [6]:
from pprint import pprint

def show_simple_answer(response, show_raw: bool = False):
    """
    Extract and print the assistant's answer from a Llama Stack ResponseObject.
    """
    # Try output_text if available
    text = getattr(response, "output_text", None) if hasattr(response, "output_text") else None

    if hasattr(response, "to_dict"):
        data = response.to_dict()
    else:
        data = response

    if (text in (None, "")) and isinstance(data, dict):
        for item in data.get("output", []):
            if item.get("type") == "message":
                for part in item.get("content", []):
                    if part.get("type") == "output_text":
                        text = part.get("text", "")
                        break
                if text is not None:
                    break

    cprint("\n=== Assistant answer ===", "cyan")
    if text and str(text).strip():
        print(text)
    else:
        print("(Assistant returned an empty message.)")
        if show_raw:
            print("\n--- Raw response (debug) ---")
            pprint(data)

show_simple_answer(simple_result)


[36m
=== Assistant answer ===[0m
The Special Payment Project is a simulated payments application used in demos to showcase AIOps workflows. It's a multi-component system consisting of a checkout frontend, checkout API, and a backing payment service, all running on Kubernetes/OpenShift. This project allows us to demonstrate and test monitoring, automation, and other SRE capabilities in a realistic, yet controlled environment. As a new SRE, you can use this project to learn and practice AIOps workflows and troubleshooting skills.
