### MCP Server Usage (SSE, streamable-http, and stdio)

This notebook demonstrates how to use the NVIDIA RAG MCP server via MCP transports (SSE, streamable-http, and stdio) instead of REST APIs. It covers:
- Launching the server (SSE and streamable-http); stdio is spawned by the client
- Connecting with the MCP Python client
- Listing tools (Retriever: `generate`, `search`, `get_summary`; Ingestor: `create_collections`, `list_collections`, `upload_documents`, `get_documents`, `update_documents`, `delete_documents`, `update_collection_metadata`, `update_document_metadata`, `delete_collections`)
- Calling Ingestor tools: `create_collections`, `upload_documents`, `delete_collections`
- Calling Retriever tools: `generate`, `search`, `get_summary`

Execute cells in sequence for each transport to validate end‑to‑end behavior.

#### 1. Install Dependencies

Purpose:
Install the libraries needed to run the MCP server and client locally in this notebook environment.

- Ensure your environment has:
  - `mcp`, `anyio`, `httpx`, `httpx-sse`, `uvicorn`
- If using Workbench/docker, these may already be installed.

In [None]:
# %pip install -qq -r ../nvidia_rag_mcp/requirements.txt


#### Prerequisites

- Ensure the RAG server is running and reachable before using MCP tools. 
- Ensure the Ingestor server is running and reachable before using MCP tools.
- Follow the [quickstart guide](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/deploy-docker-self-hosted.md) to start the RAG server and Ingestor server.

## Launch MCP Server (SSE)

Purpose:
Start the MCP server locally over SSE so that the client can connect and call tools.

This launches the MCP server on `http://127.0.0.1:8000`.

In [None]:
import subprocess

PORT = 8000
print(f"Kill any process running on port {PORT} to start sse server in the next cell")

try:
    subprocess.run(["fuser", "-k", f"{PORT}/tcp"], check=False)
except FileNotFoundError:
    print("'fuser' not found, skipping fuser-based cleanup.")
except Exception as e:
    print(f"Error while running fuser: {e}")

In [None]:
import os
import sys
import atexit
import time

sse_proc = None
try:

    repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
    server_path = os.path.join(repo_root, "nvidia_rag_mcp", "mcp_server.py")
    cmd = [
        sys.executable,
        server_path,
        "--transport",
        "sse",
        "--host",
        "127.0.0.1",
        "--port",
        "8000",
    ]

    print("Launching:", " ".join(cmd))
    sse_proc = subprocess.Popen(cmd)
    atexit.register(lambda: sse_proc and sse_proc.poll() is None and sse_proc.terminate())
    time.sleep(2.0)
    print("SSE server PID:", sse_proc.pid)
except Exception as e:
    print("Failed to start SSE server:", e)

## Connect with MCP Client (SSE), List Tools, and Call MCP Tools

Purpose:
Verify connectivity by listing available tools and invoking the retriever tools (`generate`, `search`, `get_summary`) and ingestor tools (`create_collections`, `upload_documents`, `delete_collections`) using the SSE transport.

This uses the `mcp_client.py` CLI to connect over SSE, list tools, and invoke both Retriever and Ingestor tools.

**Note:** Ensure the SSE server is running from Cell 6 before executing this cell.


In [None]:
import os
import sys
import json
import subprocess

SSE_URL = "http://127.0.0.1:8000/sse"
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
client_path = os.path.join(repo_root, "nvidia_rag_mcp", "mcp_client.py")
COLLECTION = "my_collection"
pdf_path = os.path.join(repo_root, "data", "multimodal", "woods_frost.pdf")

print("="*80)
print("Listing available tools...")
print("="*80)
subprocess.run([
    sys.executable,
    client_path,
    "list",
    "--transport=sse",
    f"--url={SSE_URL}",
])

print("\n" + "="*80)
print("Creating collection...")
print("="*80)
create_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=sse", f"--url={SSE_URL}",
    "--tool=create_collections",
    f"--json-args={create_args}",
])

print("\n" + "="*80)
print("Uploading document...")
print("="*80)
upload_args = json.dumps({
    "collection_name": COLLECTION,
    "file_paths": [pdf_path],
    "blocking": True,
    "generate_summary": True,
    "split_options": {"chunk_size": 512, "chunk_overlap": 150},
})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=sse", f"--url={SSE_URL}",
    "--tool=upload_documents",
    f"--json-args={upload_args}",
])

print("\n" + "="*80)
print("Calling 'generate' tool...")
print("="*80)
generate_args = json.dumps({
    "messages": [{"role": "user", "content": "Hello from SSE demo"}],
    "collection_names": [COLLECTION],
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=sse",
    f"--url={SSE_URL}",
    "--tool=generate",
    f"--json-args={generate_args}",
])

print("\n" + "="*80)
print("Calling 'search' tool...")
print("="*80)
search_args = json.dumps({
    "query": "Tell me about Robert Frost's poems",
    "collection_names": [COLLECTION],
    "reranker_top_k": 2,
    "vdb_top_k": 5,
    "enable_query_rewriting": False,
    "enable_reranker": True,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=sse",
    f"--url={SSE_URL}",
    "--tool=search",
    f"--json-args={search_args}",
])

print("\n" + "="*80)
print("Calling 'get_summary' tool...")
print("="*80)
summary_args = json.dumps({
    "collection_name": COLLECTION,
    "file_name": "woods_frost.pdf",
    "blocking": False,
    "timeout": 60,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=sse",
    f"--url={SSE_URL}",
    "--tool=get_summary",
    f"--json-args={summary_args}",
])

print("\n" + "="*80)
print("Deleting collection...")
print("="*80)
delete_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=sse", f"--url={SSE_URL}",
    "--tool=delete_collections",
    f"--json-args={delete_args}",
])

## Launch MCP Server (streamable_http)

Purpose:
Start the MCP server locally using the streamable_http transport so that the client can connect and call tools.

This launches the MCP server on `http://127.0.0.1:8000`.

In [None]:
PORT = 8000
print(f"Kill any process running on port {PORT} to start streamable_http server in the next cell")

try:
    subprocess.run(["fuser", "-k", f"{PORT}/tcp"], check=False)
except FileNotFoundError:
    print("'fuser' not found, skipping fuser-based cleanup.")
except Exception as e:
    print(f"Error while running fuser: {e}")

In [None]:
import os
import sys
import atexit
import time

stream_proc = None
try:

    repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
    server_path = os.path.join(repo_root, "nvidia_rag_mcp", "mcp_server.py")
    cmd = [
        sys.executable,
        server_path,
        "--transport",
        "streamable_http",
        "--host",
        "127.0.0.1",
        "--port",
        "8000",
    ]

    print("Launching streamable_http server:", " ".join(cmd))
    stream_proc = subprocess.Popen(cmd)
    atexit.register(lambda: stream_proc and stream_proc.poll() is None and stream_proc.terminate())
    time.sleep(2.0)
    print("streamable_http server PID:", stream_proc.pid)
except Exception as e:
    print("Failed to start streamable_http server:", e)

## Connect with MCP Client (streamable_http), List Tools, and Call MCP Tools

Purpose:
Verify connectivity by listing available tools and invoking the retriever tools (`generate`, `search`, `get_summary`) and ingestor tools (`create_collections`, `upload_documents`, `delete_collections`) using the streamable_http transport.

This uses the `mcp_client.py` CLI to connect over streamable_http, list tools, and invoke both Retriever and Ingestor tools.

**Note:** Ensure the streamable_http server is running from the cell above before executing this cell.



In [None]:
import os
import sys
import json
import subprocess

STREAMABLE_HTTP_URL = "http://127.0.0.1:8000/mcp"
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
client_path = os.path.join(repo_root, "nvidia_rag_mcp", "mcp_client.py")
COLLECTION = "my_collection"
pdf_path = os.path.join(repo_root, "data", "multimodal", "woods_frost.pdf")

print("="*80)
print("Listing available tools...")
print("="*80)
subprocess.run([
    sys.executable,
    client_path,
    "list",
    "--transport=streamable_http",
    f"--url={STREAMABLE_HTTP_URL}",
])

print("\n" + "="*80)
print("Creating collection...")
print("="*80)
create_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=streamable_http", f"--url={STREAMABLE_HTTP_URL}",
    "--tool=create_collections",
    f"--json-args={create_args}",
])

print("\n" + "="*80)
print("Uploading document...")
print("="*80)
upload_args = json.dumps({
    "collection_name": COLLECTION,
    "file_paths": [pdf_path],
    "blocking": True,
    "generate_summary": True,
    "split_options": {"chunk_size": 512, "chunk_overlap": 150},
})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=streamable_http", f"--url={STREAMABLE_HTTP_URL}",
    "--tool=upload_documents",
    f"--json-args={upload_args}",
])

print("\n" + "="*80)
print("Calling 'generate' tool...")
print("="*80)
generate_args = json.dumps({
    "messages": [{"role": "user", "content": "Hello from SSE demo"}],
    "collection_names": [COLLECTION],
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=streamable_http",
    f"--url={STREAMABLE_HTTP_URL}",
    "--tool=generate",
    f"--json-args={generate_args}",
])

print("\n" + "="*80)
print("Calling 'search' tool...")
print("="*80)
search_args = json.dumps({
    "query": "Tell me about Robert Frost's poems",
    "collection_names": [COLLECTION],
    "reranker_top_k": 2,
    "vdb_top_k": 5,
    "enable_query_rewriting": False,
    "enable_reranker": True,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=streamable_http",
    f"--url={STREAMABLE_HTTP_URL}",
    "--tool=search",
    f"--json-args={search_args}",
])

print("\n" + "="*80)
print("Calling 'get_summary' tool...")
print("="*80)
summary_args = json.dumps({
    "collection_name": COLLECTION,
    "file_name": "woods_frost.pdf",
    "blocking": False,
    "timeout": 60,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=streamable_http",
    f"--url={STREAMABLE_HTTP_URL}",
    "--tool=get_summary",
    f"--json-args={summary_args}",
])

print("\n" + "="*80)
print("Deleting collection...")
print("="*80)
delete_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=streamable_http", f"--url={STREAMABLE_HTTP_URL}",
    "--tool=delete_collections",
    f"--json-args={delete_args}",
])

## Connect with MCP Client (stdio), List Tools, and Call MCP Tools

Purpose:
Verify connectivity by listing available tools and invoking the retriever tools (`generate`, `search`, `get_summary`) and ingestor tools (`create_collections`, `upload_documents`, `delete_collections`) using the stdio transport. The client will spawn the server in stdio mode via `--command` and `--args`.

Note: No separate server launch is needed for stdio; the client manages the server subprocess.


In [None]:
import os
import sys
import json
import subprocess

repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
client_path = os.path.join(repo_root, "nvidia_rag_mcp", "mcp_client.py")
server_path = os.path.join(repo_root, "nvidia_rag_mcp", "mcp_server.py")

STDIO_CMD = sys.executable
STDIO_ARGS = f"{server_path} --transport stdio"
COLLECTION = "my_collection"
pdf_path = os.path.join(repo_root, "data", "multimodal", "woods_frost.pdf")

print("="*80)
print("Listing available tools (stdio)...")
print("="*80)
subprocess.run([
    sys.executable,
    client_path,
    "list",
    "--transport=stdio",
    f"--command={STDIO_CMD}",
    f"--args={STDIO_ARGS}",
])

print("\n" + "="*80)
print("Creating collection (stdio)...")
print("="*80)
create_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=stdio", f"--command={STDIO_CMD}", f"--args={STDIO_ARGS}",
    "--tool=create_collections",
    f"--json-args={create_args}",
])

print("\n" + "="*80)
print("Uploading document (stdio)...")
print("="*80)
upload_args = json.dumps({
    "collection_name": COLLECTION,
    "file_paths": [pdf_path],
    "blocking": True,
    "generate_summary": True,
    "split_options": {"chunk_size": 512, "chunk_overlap": 150},
})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=stdio", f"--command={STDIO_CMD}", f"--args={STDIO_ARGS}",
    "--tool=upload_documents",
    f"--json-args={upload_args}",
])

print("\n" + "="*80)
print("Calling 'generate' tool (stdio)...")
print("="*80)
generate_args = json.dumps({
    "messages": [{"role": "user", "content": "Say 'ok'"}],
    "collection_names": [COLLECTION],
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=stdio",
    f"--command={STDIO_CMD}",
    f"--args={STDIO_ARGS}",
    "--tool=generate",
    f"--json-args={generate_args}",
])

print("\n" + "="*80)
print("Calling 'search' tool (stdio)...")
print("="*80)
search_args = json.dumps({
    "query": "Tell me about Robert Frost's poems",
    "collection_names": [COLLECTION],
    "reranker_top_k": 2,
    "vdb_top_k": 5,
    "enable_query_rewriting": False,
    "enable_reranker": True,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=stdio",
    f"--command={STDIO_CMD}",
    f"--args={STDIO_ARGS}",
    "--tool=search",
    f"--json-args={search_args}",
])

print("\n" + "="*80)
print("Calling 'get_summary' tool (stdio)...")
print("="*80)
summary_args = json.dumps({
    "collection_name": COLLECTION,
    "file_name": "woods_frost.pdf",
    "blocking": False,
    "timeout": 60,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    "--transport=stdio",
    f"--command={STDIO_CMD}",
    f"--args={STDIO_ARGS}",
    "--tool=get_summary",
    f"--json-args={summary_args}",
])

print("\n" + "="*80)
print("Deleting collection (stdio)...")
print("="*80)
delete_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    sys.executable, client_path, "call",
    "--transport=stdio", f"--command={STDIO_CMD}", f"--args={STDIO_ARGS}",
    "--tool=delete_collections",
    f"--json-args={delete_args}",
])

## Cleanup & Troubleshooting

Purpose:
Wrap up the session, stop background processes, and provide guidance for common issues across SSE, streamable_http, and stdio.

- Stopping servers:
  - SSE / streamable_http: restart the kernel or terminate the subprocesses started in earlier cells.
  - stdio: no separate server process is kept running; the client starts and stops it per command.
- Readiness checks:
  - SSE: 200-range on `http://127.0.0.1:8000/sse` indicates ready.
  - streamable_http: GET on `/mcp` may return 406; treat 200-range OR 406 as ready.
- Ingestor + RAG endpoints:
  - Ensure RAG and Ingestor servers are reachable.
- Port conflicts:
  - Free port 8000 if needed (e.g., `fuser -k 8000/tcp` on Linux) before starting SSE/streamable_http again.
- stdio usage tips:
  - Use `--command` and `--args` to spawn the server, e.g. `--command=python --args="-m nvidia_rag_mcp.mcp_server --transport stdio"`.
- Dependencies:
  - Ensure recent versions of `mcp`, `anyio`, and `uvicorn` are available in your environment.