# MCP-First Diagnostics + RAG Correlation (Special Payment Project)

This notebook shows a **two-phase AIOps flow** running on Llama Stack:

1. **Phase 1 ‚Äì Live diagnostics with MCP (Kubernetes):**  
   We use the Llama Stack **Responses API** with an **MCP tool** that talks to the
   OpenShift/Kubernetes cluster (pods, logs, Services, etc.).  
   The LLM produces a **‚Äúcluster findings‚Äù** summary based purely on live data
   from the `special-payment-project` namespace.

2. **Phase 2 ‚Äì Knowledge-base correlation with RAG:**  
   We use a Llama Stack **Agent** with the `file_search` tool bound to a specific
   vector store containing documentation about the *Special Payment Project*.  
   The agent takes the incident description + cluster findings and looks for
   **matching known issues / runbooks** in the KB to explain likely root cause
   and propose next steps.

This notebook is designed for **demo and explainability**:
- No helper functions ‚Äì everything is step-by-step.
- Clear separation between **diagnostics (MCP)** and **correlation (RAG)**.
- Easy to show each phase independently in a live demo.

> ‚úÖ Tested against:  
> - Llama Stack server image: `rhoai/odh-llama-stack-core-rhel9:v3.0`  
> - Model: `vllm-inference/llama-4-scout-17b-16e-w4a16`  
> - Vector store ID: `vs_c246cf6a-40a4-425b-80c2-4d4e3f438fb1`  
> - Kubernetes MCP server: `kubernetes-mcp-server.llama-stack-demo.svc.cluster.local:8080`

You can adapt this notebook to your own environment by updating the
**demo configuration variables** in the next cell (or via environment
variables in a `.env` file).


## Cell 1 ‚Äì Configure demo settings (easy to change)

This cell defines the **demo configuration** in one place:

- `LLAMA_BASE_URL_CONFIG` ‚Äì HTTP URL of your Llama Stack server  
- `PREFERRED_MODEL_ID_CONFIG` ‚Äì optional model identifier to prefer  
- `VECTOR_STORE_ID_CONFIG` ‚Äì vector store with your Special Payment Project docs  
- `REMOTE_OCP_MCP_URL_CONFIG` ‚Äì URL of your Kubernetes MCP server (SSE endpoint)

For a quick test, just edit these string values directly.

> üîÅ Advanced:  
> You can also override these via environment variables (`LLAMA_BASE_URL`,
> `MODEL_ID`, `VECTOR_STORE_ID`, `REMOTE_OCP_MCP_URL`) in a `.env` file ‚Äì
> the notebook will prefer env vars if present.


In [1]:
# Cell 1 - Demo configuration: update these if needed

# Llama Stack HTTP base URL (no trailing slash)
LLAMA_BASE_URL_CONFIG = "http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321"

# Optional: prefer a specific model; leave as "" to auto-select an LLM
PREFERRED_MODEL_ID_CONFIG = "vllm-inference/llama-4-scout-17b-16e-w4a16"

# Vector store containing Special Payment Project docs (Confluence export, etc.)
VECTOR_STORE_ID_CONFIG = "vs_4dab02a4-661c-4266-b03d-53fb7f0023e9"

# Kubernetes MCP server URL providing access to the cluster (SSE endpoint)
REMOTE_OCP_MCP_URL_CONFIG = "http://kubernetes-mcp-server.llama-stack-demo.svc.cluster.local:8080/sse"

print("‚úÖ Demo configuration variables defined.")
print("LLAMA_BASE_URL_CONFIG          =", LLAMA_BASE_URL_CONFIG)
print("PREFERRED_MODEL_ID_CONFIG      =", PREFERRED_MODEL_ID_CONFIG or "(auto-select LLM)")
print("VECTOR_STORE_ID_CONFIG         =", VECTOR_STORE_ID_CONFIG)
print("REMOTE_OCP_MCP_URL_CONFIG      =", REMOTE_OCP_MCP_URL_CONFIG)


‚úÖ Demo configuration variables defined.
LLAMA_BASE_URL_CONFIG          = http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321
PREFERRED_MODEL_ID_CONFIG      = vllm-inference/llama-4-scout-17b-16e-w4a16
VECTOR_STORE_ID_CONFIG         = vs_4dab02a4-661c-4266-b03d-53fb7f0023e9
REMOTE_OCP_MCP_URL_CONFIG      = http://kubernetes-mcp-server.llama-stack-demo.svc.cluster.local:8080/sse


## Cell 2 ‚Äì Install dependencies and connect to Llama Stack

This cell:

1. Installs the Python packages we need.
2. Imports the libraries.
3. Loads any environment variables from `.env`.
4. Resolves the **effective** values for:
   - `LLAMA_BASE_URL`
   - `REMOTE_OCP_MCP_URL`
   - `VECTOR_STORE_ID`
5. Creates a `LlamaStackClient` and prints the key endpoints in use.


In [2]:
# Cell 2 - Install deps, import libraries, connect to Llama Stack

%pip install --quiet "llama-stack-client==0.3.0" python-dotenv termcolor

import os
from dotenv import load_dotenv
from termcolor import cprint
from llama_stack_client import LlamaStackClient

# Load .env if present (LLAMA_BASE_URL, MODEL_ID, VECTOR_STORE_ID, REMOTE_OCP_MCP_URL, etc.)
load_dotenv()

# Resolve effective settings: env vars override notebook defaults
LLAMA_BASE_URL = os.getenv("LLAMA_BASE_URL", LLAMA_BASE_URL_CONFIG).rstrip("/")
REMOTE_OCP_MCP_URL = os.getenv("REMOTE_OCP_MCP_URL", REMOTE_OCP_MCP_URL_CONFIG).rstrip("/")
VECTOR_STORE_ID = os.getenv("VECTOR_STORE_ID", VECTOR_STORE_ID_CONFIG)

# Create Llama Stack client
client = LlamaStackClient(base_url=LLAMA_BASE_URL)
print(f"‚úÖ Connected to Llama Stack server: {LLAMA_BASE_URL}")
print(f"‚û°Ô∏è  Using Kubernetes MCP server: {REMOTE_OCP_MCP_URL}")
print(f"‚û°Ô∏è  Using vector store: {VECTOR_STORE_ID}")



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
‚úÖ Connected to Llama Stack server: http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321
‚û°Ô∏è  Using Kubernetes MCP server: http://kubernetes-mcp-server.llama-stack-demo.svc.cluster.local:8080/sse
‚û°Ô∏è  Using vector store: vs_4dab02a4-661c-4266-b03d-53fb7f0023e9


## Cell 3 ‚Äì List available models and select an LLM

Here we:

1. List all models exposed by the Llama Stack server.
2. Try to honour the `PREFERRED_MODEL_ID_CONFIG` (or `MODEL_ID` env var) if provided.
3. If no preferred model is set or not found, we:
   - Prefer an `llm` model served by the `vllm-inference` provider.
   - Otherwise, fall back to any model with `model_type == "llm"`.

The chosen `model_id` will be used for:

- **Phase 1** diagnostics (Responses API + MCP)
- **Phase 2** correlation (Agent + file_search)


In [3]:
# Cell 3 - List models and pick an LLM

# Allow environment variable override for the model id as well
MODEL_ID_OVERRIDE = os.getenv("MODEL_ID", PREFERRED_MODEL_ID_CONFIG)

models = list(client.models.list())
print("\nAvailable models:")
for m in models:
    ident = getattr(m, "identifier", None) or getattr(m, "model_id", None) or str(m)
    mtype = getattr(m, "model_type", None)
    prov = getattr(m, "provider_id", None)
    print(" -", ident, "| type=", mtype, "| provider=", prov)

llm = None

# 1) If a preferred/override model id is set, try to use it
if MODEL_ID_OVERRIDE:
    llm = next(
        (
            m for m in models
            if (getattr(m, "identifier", None) or getattr(m, "model_id", None)) == MODEL_ID_OVERRIDE
        ),
        None,
    )
    if llm:
        print(f"\n‚úÖ Preferred model found: {MODEL_ID_OVERRIDE}")
    else:
        print(f"\n‚ö†Ô∏è Preferred model '{MODEL_ID_OVERRIDE}' not found, falling back to auto-selection.")

# 2) If no preferred model or not found, auto-select
if not llm:
    llm = next(
        (
            m for m in models
            if getattr(m, "model_type", None) == "llm"
            and getattr(m, "provider_id", None) == "vllm-inference"
        ),
        None,
    )

if not llm:
    llm = next((m for m in models if getattr(m, "model_type", None) == "llm"), None)

assert llm, "No LLM models available on Llama Stack"

model_id = getattr(llm, "identifier", None) or getattr(llm, "model_id", None)
print(f"\nüéØ Using LLM model: {model_id}")


INFO:httpx:HTTP Request: GET http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/models "HTTP/1.1 200 OK"



Available models:
 - granite-embedding-125m | type= embedding | provider= sentence-transformers
 - vllm-inference/llama-4-scout-17b-16e-w4a16 | type= llm | provider= vllm-inference
 - sentence-transformers/nomic-ai/nomic-embed-text-v1.5 | type= embedding | provider= sentence-transformers

‚úÖ Preferred model found: vllm-inference/llama-4-scout-17b-16e-w4a16

üéØ Using LLM model: vllm-inference/llama-4-scout-17b-16e-w4a16


## Cell 4 ‚Äì Define basic RAG agent instructions and create the Agent

In this cell we:

1. Define **simple RAG agent instructions**:
   - The agent is a **Special Payment Project KB assistant**.
   - It only uses `file_search` (no live cluster access).
   - It takes an incident description + cluster findings and looks for
     matching known issues / runbooks.

2. Create a `rag_agent`:
   - Bound to `model_id`.
   - Bound to the `file_search` tool scoped to `VECTOR_STORE_ID`.
   - With these basic instructions assigned to the agent.

Later, we‚Äôll layer on **more detailed correlation instructions** for a specific turn.


In [4]:
# Cell 4 - Basic RAG agent instructions and construction

from llama_stack_client import Agent

rag_agent_instructions = """
You are a knowledge-base assistant for the Special Payment Project.

You ONLY have access to the knowledge base (Confluence docs etc.) via file_search.
You DO NOT have direct access to the live Kubernetes cluster.

You will be given:
- An incident description, and
- A summary of cluster findings from a prior diagnostics pass (pods, logs, services).

Your job:
- Look up relevant information in the knowledge base about the Special Payment Project.
- Try to match the cluster findings to any known issues, incident writeups, or runbooks.
- Explain the most likely root cause(s) in clear language.
- Propose concrete next steps or runbook actions for an SRE.

Ignore generic documentation unless it clearly relates to the given cluster findings.
Be concise, focused, and practical.
""".strip()

rag_tools_spec = [
    {
        "type": "file_search",
        "vector_store_ids": [VECTOR_STORE_ID],
    }
]

rag_agent = Agent(
    client,
    model=model_id,
    instructions=rag_agent_instructions,
    tools=rag_tools_spec,
)

print("‚úÖ RAG Agent initialised with basic KB instructions and file_search tool.")


‚úÖ RAG Agent initialised with basic KB instructions and file_search tool.


## Cell 5 ‚Äì Define MCP diagnostics instructions

This cell defines the **diagnostics instructions** for the MCP phase.

The assistant:

- MUST call real MCP tools (pods, logs, Services, Deployments, Events).
- Cannot see any docs yet.
- Must highlight:
  - HTTP 5xx
  - DNS errors
  - Timeouts
  - TLS failures
  - Concrete config values (e.g. `spec.externalName` for ExternalName Services).

The result is a **‚Äúcluster findings‚Äù** narrative that we pass into the RAG phase.


In [5]:
# Cell 5 - Define the MCP diagnostics instructions

mcp_instructions = """
You are a Kubernetes diagnostics assistant using MCP tools.

You MUST actually call MCP tools to answer the question.
Do NOT simulate tool calls or outputs.
Do NOT write fake examples like [pods_list_in_namespace(...)];
instead, emit real MCP tool calls so the server can execute them.

You do NOT have access to any documentation or knowledge base in this phase.
You MUST NOT guess what the ‚Äúcorrect‚Äù hostname, port, or configuration should be.
Only report what you can observe directly from MCP tool outputs.

Your focus in the target namespace (for example 'special-payment-project') is to:
- Use pods_list_in_namespace to discover workloads.
- Use pods_log on relevant pods (especially anything in the path of the failing request,
  such as API or frontend pods).
- Use resources_list / resources_get to inspect Services and Deployments.
- Use events_list if you need to check for recent warnings/errors.

When logs show HTTP 5xx or upstream connection errors:
- Identify which upstream hostname or Service is being called (for example from a URL
  like 'http://some-service:port').
- Fetch the Service definition for that upstream using resources_get.
- If the Service is of type ExternalName, include in your findings both:
  - the Service name, and
  - the exact value of spec.externalName as returned by the MCP tool.

In your findings output, you MUST:
- Quote key log lines that look suspicious (5xx, DNS errors, timeouts, TLS failures, etc.).
- List the pods and Services that are clearly in the request path.
- For any ExternalName Services you inspected, make sure the actual externalName value
  appears verbatim somewhere in your summary, so it can be compared later.

If a value looks unusual (for example something that looks like a typo), you may say that it
"appears suspicious or possibly misconfigured", but you MUST NOT invent or state the exact
value it ‚Äúshould‚Äù be. The exact expected value will be determined in a later knowledge-base
phase.

Your output should be a concise "cluster findings" narrative that highlights:
- Which pods/services are involved in the path of the failing request.
- Key log lines and observed configuration values that look suspicious.
- Any obvious misconfigurations you can see (wrong ports, bad selectors, odd ExternalName, etc.),
  always quoting the concrete values you observed.

Do NOT try to guess business impact or historical context here.
Simply describe what looks wrong or suspicious in the live cluster.
""".strip()

print("‚úÖ MCP diagnostics instructions defined.")


‚úÖ MCP diagnostics instructions defined.


## Cell 6 ‚Äì Run MCP diagnostics and capture cluster findings

In this cell we:

1. Define the **incident question**:
   - `'Payment failed: HTTP 502'` in the Special Payment Project checkout flow.

2. Call the Llama Stack **Responses API** with:
   - MCP diagnostics instructions as a `system` message.
   - The incident question as a `user` message.
   - An MCP tool pointing to the Kubernetes MCP server.

3. Extract a plain-text **`cluster_findings`** summary:
   - Prefer `response.output_text` if available.
   - Otherwise, scan the message output for `output_text`.

4. Print the cluster findings for use in the next phase.


In [6]:
# Cell 6 - Run MCP diagnostics and extract a "cluster findings" summary

incident_question = (
    "We are seeing 'Payment failed: HTTP 502' errors in the Special Payment Project "
    "checkout flow (namespace: special-payment-project). Please investigate."
)

cprint("Incident question:", "green")
print(incident_question)

mcp_messages = [
    {"role": "system", "content": mcp_instructions},
    {"role": "user", "content": incident_question},
]

mcp_response = client.responses.create(
    model=model_id,
    input=mcp_messages,
    tools=[
        {
            "type": "mcp",
            "server_url": REMOTE_OCP_MCP_URL,
            "server_label": "kubernetes-mcp",
            "require_approval": "never",
        }
    ],
    temperature=0.0,
    max_infer_iters=8,
)

# Turn into a dict for inspection
if hasattr(mcp_response, "to_dict"):
    mcp_data = mcp_response.to_dict()
else:
    mcp_data = mcp_response

# Show what kinds of outputs we got (tool calls, messages, etc.)
output_types = [item.get("type") for item in mcp_data.get("output", [])]
print("\nMCP output item types:", output_types)

# Extract a "cluster findings" text:
# Prefer response.output_text, else scan the message outputs
cluster_findings = getattr(mcp_response, "output_text", None)
if not cluster_findings:
    for item in mcp_data.get("output", []):
        if item.get("type") == "message":
            for part in item.get("content", []):
                if part.get("type") == "output_text":
                    cluster_findings = part.get("text", "")
                    break
            if cluster_findings:
                break

cluster_findings = cluster_findings or ""

cprint("\n--- Cluster findings (MCP summary) ---", "yellow")
print(cluster_findings if cluster_findings.strip() else "(no findings text returned)")


[32mIncident question:[0m
We are seeing 'Payment failed: HTTP 502' errors in the Special Payment Project checkout flow (namespace: special-payment-project). Please investigate.


INFO:httpx:HTTP Request: POST http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/responses "HTTP/1.1 200 OK"



MCP output item types: ['mcp_list_tools', 'mcp_call', 'mcp_call', 'mcp_call', 'mcp_call', 'message']
[33m
--- Cluster findings (MCP summary) ---[0m
Cluster findings:

* The `checkout-api` and `checkout-frontend` pods are running in the `special-payment-project` namespace.
* The `checkout-api` pod has a log entry indicating a connection error to `card-gateway-dns:5678` with the error message `ConnectError: [Errno -5] No address associated with hostname`.
* The `card-gateway-dns` Service is of type `ExternalName` and has an `externalName` value of `card-gateway-sandbx.payments-provider-sim.svc.cluster.local` (which appears suspicious or possibly misconfigured).
* The `checkout-frontend` pod has logs showing normal HTTP traffic.

The pods and Services that are clearly in the request path are:
* `checkout-frontend-76868dd997-l6667` (Pod)
* `checkout-api-5d87bb48b9-tx76s` (Pod)
* `card-gateway-dns` (Service)

Key log lines that look suspicious:
* `checkout upstream_error url=http://card-

## Cell 7 ‚Äì Define RAG correlation instructions for this turn

Here we define **detailed correlation instructions** that sit on top of the
basic RAG agent behaviour:

- How to compare **observed** vs **expected** config values.
- How to treat mismatches as strong evidence.
- How to quote KB snippets and produce a small evidence section.
- Whitelisted **Reference document** titles.

We‚Äôll pass these as a `system` message for this specific RAG turn.


In [9]:
# Cell 7 - RAG correlation instructions
# (Dual output: formatted + JSON; AAP job suggestion; failsafe; canonical fallbacks)

rag_correlation_instructions = """
You are a knowledge-base assistant for the Special Payment Project.

INPUTS:
- An incident description.
- A structured summary of cluster findings from diagnostics (pods, logs, Services, DNS, routes).

Goal:
- Correlate findings with the Special Payment Project KB (via file_search) and produce a concise RCA + next steps.
- Prefer project-specific docs over generic Kubernetes docs.

CANONICAL_FALLBACKS (authoritative values to use when the KB is ambiguous or conflicting):
- card-gateway-dns.externalName => card-gateway-sandbox.payments-provider-sim.svc.cluster.local

Matching rules:
- If the KB documents an expected value and the cluster shows a different observed value, explicitly describe the mismatch and treat it as the likely root cause.
- If observed values match the KB, consider other KB-listed causes (backend down, port mismatch, egress policy, TLS expiry, app bugs) guided by the findings.

AAP remediation suggestion:
- Use ONLY job template names exactly as they appear on the ‚ÄúAAP Job Templates‚Äù page. Do NOT invent or paraphrase names.
- If the evidence indicates a Kubernetes Service ExternalName mismatch (DNS alias typo/drift), you MUST choose the job template named exactly: "Configure Service".
- Output ONLY job_template_name and extra_vars (as a JSON object of required vars). Do NOT include rollback or additional fields. Do NOT execute anything; suggestion only.

Resolving the canonical FQDN (strict):
- Try to extract the exact canonical FQDN from the KB quotes relevant to card-gateway-dns.
- If multiple candidates appear, choose the one that exactly matches the fallback value.
- If the KB is ambiguous, typo‚Äôd, missing, or conflicts, USE the CANONICAL_FALLBACKS value:
  card-gateway-sandbox.payments-provider-sim.svc.cluster.local
- Never output placeholders like "<canonical FQDN from KB>" ‚Äî always output a concrete FQDN string.

Referencing:
- Do NOT invent document titles or section names.
- When you rely on the KB (expected values, known issues, or job template/vars), include a short quote (1‚Äì2 sentences) that could plausibly appear verbatim in the KB. If you used the fallback because the KB was ambiguous, say so briefly.

Reference document (choose ONE that best fits your main evidence):
- "Special Payment Project ‚Äì Overview & Context"
- "Special Payment Project ‚Äì Application Architecture"
- "Special Payment Project ‚Äì Deployment & Configuration"
- "Special Payment Project ‚Äì Networking & External Dependencies"
- "Special Payment Project ‚Äì Observability & Alerts"
- "AAP Job Templates"

OUTPUT FORMAT (dual output):
First, produce a concise, human-readable explanation with headings:
- 1) Probable cause ‚Äî 1‚Äì2 lines
- 2) Evidence mapping ‚Äî bullets quoting observed vs. expected
- 3) Next steps ‚Äî up to 5 copy/paste commands
- 4) Proposed remediation via AAP ‚Äî job_template_name + extra_vars (JSON-style; include concrete FQDN)
- 5) Key KB evidence ‚Äî 1‚Äì2 short quotes (or state that fallback was used)
- 6) Reference document ‚Äî ONE of the whitelisted titles above

Then, on a new line, output a single JSON object (per the schema below) delimited by these exact markers:

### JSON_START
{
  "probable_cause": "string (1‚Äì2 sentences)",
  "evidence_mapping": [
    "string ‚Äî observed finding",
    "string ‚Äî expected value from KB or canonical fallback",
    "string ‚Äî explicit mismatch description"
  ],
  "next_steps": [
    { "description": "string", "command": "string" }
  ],
  "proposed_remediation_via_aap": {
    "job_template_name": "string (exact name from 'AAP Job Templates')",
    "extra_vars": {
      "namespace": "string",
      "external_service_name": "string",
      "correct_external_name": "string"   // MUST be a concrete FQDN; never a placeholder
    }
  },
  "key_kb_evidence": [
    "short quote 1 (or 'Using canonical fallback due to ambiguous KB')"
  ],
  "reference_document": "ONE of the whitelisted titles above"
}
### JSON_END

Rules:
- Do not wrap the JSON in markdown fences. Ensure valid JSON (double-quoted keys/strings).
- Use ONLY the fields shown; do not add/rename/remove keys.
- If Service ExternalName mismatch is detected:
  - job_template_name MUST be "Configure Service"
  - extra_vars MUST include:
      namespace: "special-payment-project"
      external_service_name: "card-gateway-dns"
      correct_external_name: (the concrete canonical FQDN derived via KB or CANONICAL_FALLBACKS)
- Never output "<canonical FQDN from KB>" or any placeholder.

FAILSAFE (if evidence/KB is insufficient or retrieval fails):
- You MUST STILL RETURN BOTH the human-readable section AND a valid JSON block where:
  - "probable_cause": "inconclusive"
  - "evidence_mapping": []
  - "next_steps": [
      { "description": "Collect Service spec", "command": "oc get svc -n special-payment-project card-gateway-dns -o yaml" },
      { "description": "Resolve ExternalName from API pod", "command": "oc exec -n special-payment-project deploy/checkout-api -- getent hosts card-gateway-dns" },
      { "description": "Synthetic probe", "command": "curl -i https://special-payments.apps.<APPS_DOMAIN>/api/ping-upstream" }
    ]
  - "proposed_remediation_via_aap": {
      "job_template_name": "",
      "extra_vars": { "namespace": "special-payment-project", "external_service_name": "card-gateway-dns", "correct_external_name": "card-gateway-sandbox.payments-provider-sim.svc.cluster.local" }
    }
  - "key_kb_evidence": ["Using canonical fallback due to missing KB evidence"]
  - "reference_document": "Special Payment Project ‚Äì Deployment & Configuration"
""".strip()

print("‚úÖ RAG correlation instructions set (dual output + canonical fallback; no placeholders).")


‚úÖ RAG correlation instructions set (dual output + canonical fallback; no placeholders).


## Cell 8 ‚Äì Run the RAG Agent with MCP findings

Now we switch to **Phase 2 ‚Äì Knowledge-base correlation**:

1. Build messages:
   - A `system` message with the **RAG correlation instructions**.
   - A `user` message containing:
     - The incident description.
     - The cluster findings summary.

2. Call the `rag_agent` we created earlier:
   - The agent already knows it should use `file_search` on the Special Payment Project vector store.

3. Extract and print the **final explanation**:
   - Root cause analysis grounded in:
     - Live cluster findings
     - Special Payment Project documentation
   - Suggested remediation / next steps
   - Evidence + whitelisted reference document.


In [10]:
# Cell 8 - Run the RAG Agent and print both: formatted text + extracted JSON

import json
import re

rag_messages = [
    {"role": "system", "content": rag_correlation_instructions},
    {
        "role": "user",
        "content": (
            "Incident description:\n"
            f"{incident_question}\n\n"
            "Cluster findings from MCP diagnostics:\n"
            f"{cluster_findings}"
        ),
    },
]

# Create a session for this demo run
rag_session = rag_agent.create_session(session_name="mcp-first-rag-demo")
rag_session_id = (
    getattr(rag_session, "id", None)
    or getattr(rag_session, "session_id", None)
    or str(rag_session)
)

rag_result = rag_agent.create_turn(
    messages=rag_messages,
    session_id=rag_session_id,
    stream=False,
)

def _get_text_from_turn(turn):
    """Return assistant plain text content concatenated (best-effort)."""
    t = getattr(turn, "output_text", None)
    if isinstance(t, str) and t.strip():
        return t
    if hasattr(turn, "to_dict"):
        d = turn.to_dict()
        pieces = []
        for item in d.get("output", []):
            for c in item.get("content", []):
                if isinstance(c, dict) and c.get("type") in ("output_text", "text"):
                    txt = c.get("text", "")
                    if isinstance(txt, str):
                        pieces.append(txt)
        if pieces:
            return "\n".join(pieces)
        if isinstance(d.get("text"), str) and d["text"].strip():
            return d["text"]
    return ""

def _split_text_and_json(full_text):
    """Split using ### JSON_START / ### JSON_END markers; return (nice_text, json_dict or None)."""
    if not full_text:
        return "", None
    m = re.search(r"### JSON_START\s*(\{.*\})\s*### JSON_END", full_text, flags=re.DOTALL)
    if not m:
        return full_text.strip(), None
    json_str = m.group(1).strip()
    nice_text = (full_text[:m.start()]).strip()
    try:
        payload = json.loads(json_str)
    except Exception:
        payload = None
    return nice_text, payload

raw_text = _get_text_from_turn(rag_result)
nice_text, json_payload = _split_text_and_json(raw_text)

print("\n=== Final RAG explanation (KB-backed RCA + next steps) (Nicely Formatted) ===")
print(nice_text if nice_text else "(no formatted text returned)")

print("\n=== Final RAG explanation (KB-backed RCA + next steps) (JSON output) ===")
if json_payload is not None:
    print(json.dumps(json_payload, indent=2, ensure_ascii=False))
else:
    print("(no JSON block found between markers)")


INFO:httpx:HTTP Request: POST http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/conversations "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://lsd-llama-milvus-inline-service.llama-stack-demo.svc.cluster.local:8321/v1/responses "HTTP/1.1 200 OK"



=== Final RAG explanation (KB-backed RCA + next steps) (Nicely Formatted) ===
### Human-Readable Explanation

1. **Probable cause**: The likely root cause of the 'Payment failed: HTTP 502' errors is a misconfigured `externalName` value in the `card-gateway-dns` Service, leading to a connection error when the `checkout-api` pod tries to resolve the hostname.

2. **Evidence mapping**:
   - Observed `externalName` value: `card-gateway-sandbx.payments-provider-sim.svc.cluster.local`
   - Expected value from canonical fallback: `card-gateway-sandbox.payments-provider-sim.svc.cluster.local`
   - Mismatch description: The `externalName` value has a typo (`sandbx` instead of `sandbox`), causing the DNS resolution to fail.

3. **Next steps**:
   - Verify the `card-gateway-dns` Service configuration: `oc get svc -n special-payment-project card-gateway-dns -o yaml`
   - Check the `checkout-api` pod logs for any related errors: `oc logs -f -n special-payment-project checkout-api-5d87bb48b9-tx76s`

## Cell 9 ‚Äì Show both phases side-by-side (for the demo)

This final cell prints a **single, human-readable view** of the whole flow:

1. The original **incident question**.
2. The **MCP diagnostics summary** (what the live cluster agent observed).
3. The **RAG correlation explanation** (what the KB-backed agent concluded).

This makes it easy, during the demo, to show:
- What the MCP agent actually discovered from the cluster.
- How the RAG / knowledge-base agent used that to explain **root cause** and
  propose **next steps**.

In [12]:
# Cell 9 - Nicely formatted summary of MCP + RAG outputs (formatted + JSON)

import json
from termcolor import cprint

print("=" * 80)
cprint("MCP-First Diagnostics + RAG Correlation (Special Payment Project)", "cyan", attrs=["bold"])
print("=" * 80)
print()

# Incident
cprint("Incident question", "green", attrs=["bold"])
print("-" * 80)
print(incident_question.strip() if incident_question else "(no incident question set)")
print()

# Phase 1: MCP diagnostics
cprint("Phase 1 ‚Äì MCP diagnostics (live cluster)", "yellow", attrs=["bold"])
print("-" * 80)
if cluster_findings and str(cluster_findings).strip():
    print(str(cluster_findings).strip())
else:
    print("(no MCP cluster findings text returned)")
print()

# Phase 2: RAG correlation (Nicely Formatted)
cprint("Phase 2 ‚Äì RAG correlation (knowledge base) ‚Äî Nicely Formatted", "magenta", attrs=["bold"])
print("-" * 80)
if 'final_formatted_text' in globals() and final_formatted_text and final_formatted_text.strip():
    print(final_formatted_text.strip())
else:
    print("(no formatted RAG text returned)")
print()

# Phase 2: RAG correlation (JSON output)
cprint("Phase 2 ‚Äì RAG correlation (knowledge base) ‚Äî JSON output", "magenta", attrs=["bold"])
print("-" * 80)
if 'final_json_payload' in globals() and final_json_payload is not None:
    print(json.dumps(final_json_payload, indent=2, ensure_ascii=False))
else:
    print("(no JSON payload returned)")
print()

print("=" * 80)
cprint("End of demo flow", "cyan")
print("=" * 80)


[1m[36mMCP-First Diagnostics + RAG Correlation (Special Payment Project)[0m

[1m[32mIncident question[0m
--------------------------------------------------------------------------------
We are seeing 'Payment failed: HTTP 502' errors in the Special Payment Project checkout flow (namespace: special-payment-project). Please investigate.

[1m[33mPhase 1 ‚Äì MCP diagnostics (live cluster)[0m
--------------------------------------------------------------------------------
Cluster findings:

* The `checkout-api` and `checkout-frontend` pods are running in the `special-payment-project` namespace.
* The `checkout-api` pod has a log entry indicating a connection error to `card-gateway-dns:5678` with the error message `ConnectError: [Errno -5] No address associated with hostname`.
* The `card-gateway-dns` Service is of type `ExternalName` and has an `externalName` value of `card-gateway-sandbx.payments-provider-sim.svc.cluster.local` (which appears suspicious or possibly misconfigured).