# Kubernetes MCP diagnostics with Llama Stack Agents (RHOAI)

This notebook demonstrates how to use the **Llama Stack Agents API** to perform
Kubernetes diagnostics through a **Model Context Protocol (MCP) server**.

- It is designed to run against the **RHOAI Llama Stack image**  
  `rhoai/odh-llama-stack-core-rhel9:v3.0`
- It connects to the Llama Stack instance exposed by that image (via `LLAMA_BASE_URL`).
- It uses the **Agents API** (not the `/v1/responses` file_search flow) to:
  - Create an agent with MCP tools
  - Create a session
  - Run a single diagnostic “turn”
  - Show which MCP tools were called and the final answer

You will configure:

- `LLAMA_BASE_URL` – URL of your Llama Stack service
- `REMOTE_OCP_MCP_URL` – URL of your Kubernetes MCP server (e.g. the k8s MCP in your demo)


## 1. Install dependencies

This cell installs the `llama-stack-client` Python SDK (matching the server
version used by `rhoai/odh-llama-stack-core-rhel9:v3.0`), plus helpers for
environment variables and coloured output.


In [1]:
%pip install --quiet "llama-stack-client==0.3.0" python-dotenv termcolor



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## 2. Connect to Llama Stack and list available models

This cell:

- Loads configuration from a `.env` file (if present).
- Connects to the Llama Stack instance exposed by the
  `rhoai/odh-llama-stack-core-rhel9:v3.0` image via `LLAMA_BASE_URL`.
- Lists available models and selects a suitable LLM (preferring the vLLM-backed one).


In [2]:
import os
from pprint import pprint

from dotenv import load_dotenv
from termcolor import cprint
from llama_stack_client import LlamaStackClient

# Load environment variables from .env (LLAMA_BASE_URL, REMOTE_OCP_MCP_URL, etc.)
load_dotenv()

# Base URL of the Llama Stack server
base_url = os.getenv(
    "LLAMA_BASE_URL",
    "http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321",
).rstrip("/")

client = LlamaStackClient(base_url=base_url)
print(f"Connected to Llama Stack server: {base_url}")

# List models so we can see what's available
models = list(client.models.list())
print("\nAvailable models:")
for m in models:
    ident = getattr(m, "identifier", None) or getattr(m, "model_id", None) or str(m)
    print(f" - {ident} (type={getattr(m, 'model_type', None)}, provider={getattr(m, 'provider_id', None)})")

# Prefer a vLLM-backed LLM if available, otherwise just take the first LLM
llm = next(
    (
        m
        for m in models
        if getattr(m, "model_type", None) == "llm"
        and getattr(m, "provider_id", None) == "vllm-inference"
    ),
    None,
)

if not llm:
    llm = next((m for m in models if getattr(m, "model_type", None) == "llm"), None)

assert llm, "No LLM models available on Llama Stack"

model_id = getattr(llm, "identifier", None) or getattr(llm, "model_id", None)
print(f"\nUsing model: {model_id}")


INFO:httpx:HTTP Request: GET http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/models "HTTP/1.1 200 OK"


Connected to Llama Stack server: http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321

Available models:
 - granite-embedding-125m (type=embedding, provider=sentence-transformers)
 - vllm-inference/llama-4-scout-17b-16e-w4a16 (type=llm, provider=vllm-inference)
 - sentence-transformers/nomic-ai/nomic-embed-text-v1.5 (type=embedding, provider=sentence-transformers)

Using model: vllm-inference/llama-4-scout-17b-16e-w4a16


## 3. Define the Kubernetes MCP diagnostic prompt

This cell defines a detailed **system prompt** that instructs the agent how to:

- Use the Kubernetes MCP tools exposed by your MCP server
- Always call `pods_list_in_namespace` and `pods_log` for real pods
- Avoid inventing pod names (it must use exactly what appears in the pod list)
- Summarise findings and provide next steps for an SRE/operator


In [7]:
model_prompt = """
You are a Kubernetes diagnostics assistant working with a Model Context Protocol (MCP) server.
Your job is to investigate incidents using ONLY the Kubernetes MCP tools and then explain your findings.

Available MCP tools (do NOT invent new ones):
- configuration_view
- events_list
- helm_list
- namespaces_list
- nodes_log
- nodes_stats_summary
- nodes_top
- pods_get
- pods_list
- pods_list_in_namespace
- pods_log
- pods_top
- projects_list
- resources_get
- resources_list

High-level workflow for ANY incident / “something is broken” question:

PHASE 0 – Discover what exists in the project
1. If a Kubernetes namespace is mentioned (e.g. "special-payment-project"), treat it as the target namespace.
2. You MUST call:
   - pods_list_in_namespace(namespace=<ns>)          # inventory pods
   - resources_list(apiVersion="v1",      kind="Service",    namespace=<ns>)   # inventory Services
   - resources_list(apiVersion="apps/v1", kind="Deployment", namespace=<ns>)   # inventory Deployments
   - Optionally, on OpenShift:
     resources_list(apiVersion="route.openshift.io/v1", kind="Route", namespace=<ns>)

PHASE 1 – Drill into the most relevant workloads
3. From the pod list, pick 1–3 pods whose names look most relevant to the question
   (e.g. contain "api", "frontend", "payment", "checkout").
4. VERY IMPORTANT NAMING RULES:
   - You may ONLY use pod names that appear EXACTLY in the pods_list_in_namespace output.
   - You MUST NOT create new pod names by combining words (e.g. "payment-api") if that exact string
     was not present in the pods_list_in_namespace table.
   - If you want logs for a payment-related pod, you MUST choose the closest REAL name from the list
     (for example "checkout-api-84bff5f68d-2p775") and use that exact name in pods_log.

5. You MUST call pods_log for at least one of the existing pods:
   - The "name" argument to pods_log MUST be copied exactly from the pods_list_in_namespace output.
   - It is allowed (and encouraged) to call pods_log for more than one relevant pod.

6. Optionally:
   - Call events_list(namespace=<ns>) to look for Warning/Error events related to those pods.
   - Use resources_get(...) if you need details for a specific Service or Deployment already seen in resources_list.

Failure handling:
- If pods_log returns "pod not found" or a similar error, you MUST:
  - Re-check the pods_list_in_namespace output, and
  - Immediately call pods_log again using a pod name that definitely exists.

Hard rules:
- You MUST call at least one MCP tool for EVERY answer.
- For incident / error questions, you MUST:
  - Call pods_list_in_namespace(namespace=<ns>) AND
  - Call pods_log(...) for at least one pod that actually exists.
- You MUST NOT talk about “checking logs” unless you have actually called pods_log in this conversation.
- You MUST NOT mention or use tools that are not in the list above (e.g. no services_list_in_namespace).

When you answer, ALWAYS:
- Start by listing which MCP tools you called and with which key arguments.
- Summarise what you observed:
  - Pods (names + status from pods_list_in_namespace)
  - Important log snippets from pods_log (even 1–2 lines is fine)
  - Any notable events or resource issues (from events_list / resources_list)
- Give your best diagnosis based on that evidence.
- End with 2–3 concrete next steps for an SRE/operator.

If tools fail or return nothing useful:
- Explicitly say which tools you tried and what they returned (e.g. “pod not found”, “no events”).
- State clearly that the evidence is inconclusive and what a human should check next.
""".strip()



## 4. Create an Agent with Kubernetes MCP tools (Agents API)

This cell uses the **Llama Stack Agents API** to create an `Agent` that:

- Uses the selected model from the `rhoai/odh-llama-stack-core-rhel9:v3.0` Llama Stack instance
- Is configured with the Kubernetes MCP server as a tool (`type: "mcp"`)
- Uses the system prompt defined above as its `instructions`

The MCP server URL is taken from:

- `REMOTE_OCP_MCP_URL` (in your `.env`), or
- falls back to the in-cluster default route used in the demo.


In [8]:
from llama_stack_client import Agent

# URL for the Kubernetes MCP server (adjust default to your real MCP route if needed)
ocp_mcp_url = os.getenv(
    "REMOTE_OCP_MCP_URL",
    "http://kubernetes-mcp-server.llama-stack-demo.svc.cluster.local:8080/sse",
).rstrip("/")

print(f"Using Kubernetes MCP server: {ocp_mcp_url}")

tools_spec = [
    {
        "type": "mcp",
        "server_url": ocp_mcp_url,
        "server_label": "kubernetes-mcp",
        # you can add "require_approval": "never" later if you want
    }
]

agent = Agent(
    client,
    model=model_id,
    instructions=model_prompt,
    tools=tools_spec,
)

print("Agent created with tools:", tools_spec)


Using Kubernetes MCP server: http://kubernetes-mcp-server.llama-stack-demo.svc.cluster.local:8080/sse
Agent created with tools: [{'type': 'mcp', 'server_url': 'http://kubernetes-mcp-server.llama-stack-demo.svc.cluster.local:8080/sse', 'server_label': 'kubernetes-mcp'}]


## 5. Run a diagnostic turn (create session + create turn)

This cell:

1. Creates a lightweight **Agent session** (via the Agents API).
2. Sends a single user message describing the incident
   (HTTP 502 in the `special-payment-project` namespace).
3. Executes a non-streaming turn and captures the result.

The agent will internally:

- Discover available MCP tools
- Call Kubernetes tools like `pods_list_in_namespace`, `pods_log`, `events_list`, etc.
- Produce a natural-language summary.


In [9]:
from termcolor import cprint

question = (
    "I'm getting 'Payment failed: HTTP 502' in the project 'special-payment-project', please investigate."
)

messages = [
    {"role": "user", "content": question},
]

cprint("User message:", "green")
print(question)

# 1) Create a session (lightweight)
session = agent.create_session(session_name="k8s-mcp-demo")
session_id = getattr(session, "id", None) or getattr(session, "session_id", None) or str(session)
print("\nSession ID:", session_id)

# 2) Run a single non-streaming turn
result = agent.create_turn(
    messages=messages,
    session_id=session_id,
    stream=False,
)

print("\nRaw result type:", type(result))


INFO:httpx:HTTP Request: POST http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/conversations "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/responses "HTTP/1.1 200 OK"


[32mUser message:[0m
I'm getting 'Payment failed: HTTP 502' in the project 'special-payment-project', please investigate.

Session ID: conv_17e465e635937ab6d029524c258923f4447025a892493b65

Raw result type: <class 'llama_stack_client.types.response_object.ResponseObject'>


## 6. Inspect MCP calls and the assistant’s answer

This cell pretty-prints the **execution trace** of the Agent:

- Which MCP tools were discovered (`mcp_list_tools`)
- Which tools were actually called (`mcp_call` entries)
- Key snippets of tool output (logs, pod lists, events)
- The final assistant answer

This is useful in the demo to show **how** the Agents API orchestrates MCP tools
behind the scenes.


In [10]:
from textwrap import indent

def show_mcp_response(response, max_output_chars: int = 400, show_raw: bool = False):
    """
    Pretty-print MCP tool usage and the assistant's answer
    from a Llama Stack ResponseObject (via Agent API).
    """
    # LlamaStackClient ResponseObject has .to_dict()
    if hasattr(response, "to_dict"):
        data = response.to_dict()
    else:
        data = response

    # --- 1) Show MCP tools discovered (from mcp_list_tools) ---
    mcp_list = [item for item in data.get("output", []) if item.get("type") == "mcp_list_tools"]

    cprint("\n=== MCP tools discovered ===", "yellow")
    if mcp_list:
        tools = mcp_list[0].get("tools", [])
        names = [t.get("name") for t in tools if isinstance(t, dict)]
        print(", ".join(sorted(set(n for n in names if n))) or "(none)")
    else:
        print("(no mcp_list_tools entry)")

    # --- 2) Show actual MCP tool calls (mcp_call entries) ---
    mcp_calls = [item for item in data.get("output", []) if item.get("type") == "mcp_call"]

    cprint("\n=== MCP calls made ===", "yellow")
    if not mcp_calls:
        print("(no MCP tool calls were executed)")
    else:
        for call in mcp_calls:
            name = call.get("name")
            args = call.get("arguments")
            out = call.get("output", "") or ""
            print(f"- {name}({args})")
            if out:
                snippet = out[:max_output_chars]
                print(indent(snippet, "    "))
                if len(out) > max_output_chars:
                    print("    ... [truncated]")
            print()

    # --- 3) Extract assistant's final answer text ---
    cprint("\n=== Assistant answer ===", "cyan")

    # Try convenience field first
    text = getattr(response, "output_text", None) if hasattr(response, "output_text") else None

    # Fallback: pull from the final message content
    if text in (None, "") and isinstance(data, dict):
        for item in data.get("output", []):
            if item.get("type") == "message":
                for part in item.get("content", []):
                    if part.get("type") == "output_text":
                        text = part.get("text", "")
                        break
                if text is not None:
                    break

    if text and str(text).strip():
        print(text)
    else:
        print("(Assistant returned an empty message – no natural-language answer.)")
        if show_raw:
            print("\n--- Raw response (debug) ---")
            pprint(data)

show_mcp_response(result)


[33m
=== MCP tools discovered ===[0m
configuration_view, events_list, helm_list, namespaces_list, nodes_log, nodes_stats_summary, nodes_top, pods_get, pods_list, pods_list_in_namespace, pods_log, pods_top, projects_list, resources_get, resources_list
[33m
=== MCP calls made ===[0m
- pods_list_in_namespace({"namespace": "special-payment-project"})
    NAMESPACE                 APIVERSION   KIND   NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES   LABELS
    special-payment-project   v1           Pod    checkout-api-84bff5f68d-2p775       1/1     Running   0          23h   10.128.2.96   ip-10-0-117-105.us-east-2.compute.internal   <none>
    ... [truncated]

- resources_list({"apiVersion": "v1", "kind": "Service", "namespace": "special-payment-project"})
    NAMESPACE                 APIVERSION   KIND      NAME                TYPE           CLUSTER-IP       EXTERNAL-IP   