## MCP Server Usage

This notebook demonstrates how to use the NVIDIA RAG MCP server via MCP transports instead of REST APIs. It covers:

- Installing dependencies
- Launching the MCP server using `streamable_http` transport
- Connecting with the MCP Python client
- Listing available tools
- Calling Ingestor tools: `create_collection`, `upload_documents`, `delete_collections`
- Calling RAG tools: `generate`, `search`, `get_summary`

**Available MCP Tools:**
- **RAG tools:** `generate`, `search`, `get_summary`
- **Ingestor tools:** `create_collection`, `list_collections`, `upload_documents`, `get_documents`, `update_documents`, `delete_documents`, `update_collection_metadata`, `update_document_metadata`, `delete_collections`

Execute cells in sequence to validate end-to-end behavior.

### Prerequisites

**Before running this notebook, ensure you have:**

1. **Python 3.11 or higher** installed with `pip`
   ```bash
   python --version  # Should be 3.11+
   ```

2. **NVIDIA RAG services running** - The end-to-end RAG workflow must be up and running
   - Follow the [quickstart guide](../docs/deploy-docker-nvidia-hosted.md) to start the RAG and Ingestor services
   - Verify that RAG (port 8081) and Ingestor (port 8082) services are accessible

3. **Jupyter environment** - JupyterLab or Jupyter Notebook to run this notebook

### 1. Install Dependencies

Install the libraries needed to run the MCP server and client:
- `mcp` - MCP SDK for client/server communication
- `anyio` - Async I/O framework
- `httpx`, `httpx-sse` - HTTP client with SSE support
- `uvicorn` - ASGI server for running MCP server
- `fastmcp` - FastMCP framework for building MCP servers

In [None]:
%pip install -qq -r ../examples/nvidia_rag_mcp/requirements.txt

### 2. Configuration

Set up configuration variables for the MCP server and client.

**Environment variables:** Setup following environment variables in the next cell for the INGESTOR server and RAG server urls before starting the MCP server.
- INGESTOR_SERVER_URL="http://localhost:8082"
- VITE_API_CHAT_URL="http://localhost:8081"

**Transport Options:**
- `streamable_http` (default) - HTTP-based streaming, recommended for most use cases
- `sse` - Server-Sent Events over HTTP
- `stdio` - Standard input/output, ideal for local development

**To switch transports:** Simply change the `TRANSPORT` variable below. The `MCP_URL` will be automatically set based on your transport choice:
- `streamable_http` → `http://127.0.0.1:8000/mcp`
- `sse` → `http://127.0.0.1:8000/sse`
- `stdio` → No URL needed

See the end of this notebook for detailed instructions on using other transports.

In [None]:
import os
import sys

# Transport configuration
TRANSPORT = "streamable_http"  # Options: "streamable_http", "sse", "stdio"
PORT = 8000
HOST = "127.0.0.1"

# Export the INGESTOR server and RAG server urls before starting the MCP server
os.environ["INGESTOR_SERVER_URL"] = "http://localhost:8082" # Ingestor server url
os.environ["VITE_API_CHAT_URL"] = "http://localhost:8081" # Rag server url

# Automatically set URL path based on transport
if TRANSPORT == "streamable_http":
    URL_PATH = "/mcp"
elif TRANSPORT == "sse":
    URL_PATH = "/sse"
else:
    URL_PATH = ""  # stdio doesn't use URL

MCP_URL = f"http://{HOST}:{PORT}{URL_PATH}" if URL_PATH else None

# Paths
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
server_path = os.path.join(repo_root, "examples", "nvidia_rag_mcp", "mcp_server.py")
client_path = os.path.join(repo_root, "examples", "nvidia_rag_mcp", "mcp_client.py")
pdf_path = os.path.join(repo_root, "data", "multimodal", "woods_frost.pdf")

# Collection name for this demo
COLLECTION = "my_collection"

print(f"Configuration:")
print(f"  Transport: {TRANSPORT}")
print(f"  Server: {HOST}:{PORT}")
if MCP_URL:
    print(f"  MCP URL: {MCP_URL}")
print(f"  Collection: {COLLECTION}")
print(f"  Sample PDF: {os.path.basename(pdf_path)}")

### 3. Launch MCP Server

Start the MCP server using the configured transport. The server will:
- Expose NVIDIA RAG and Ingestor APIs as MCP tools
- Listen on the configured host and port
- Forward requests to RAG (port 8081) and Ingestor (port 8082) services

The server runs in the background and will be automatically terminated when the kernel restarts.

**Note:** Skip this section if using `stdio` transport (see end of notebook for instructions).

In [None]:
import subprocess

print(f"Freeing port {PORT}...")
try:
    subprocess.run(["fuser", "-k", f"{PORT}/tcp"], check=False)
except FileNotFoundError:
    print("'fuser' not found, skipping port cleanup.")
except Exception as e:
    print(f"Error while freeing port: {e}")

In [None]:
import atexit
import time

mcp_server_proc = None
try:
    cmd = [
        sys.executable,
        server_path,
        "--transport",
        TRANSPORT,
        "--host",
        HOST,
        "--port",
        str(PORT),
    ]

    print(f"Launching MCP server: {' '.join(cmd)}")
    mcp_server_proc = subprocess.Popen(cmd)
    atexit.register(lambda: mcp_server_proc and mcp_server_proc.poll() is None and mcp_server_proc.terminate())
    time.sleep(2.0)
    print(f"MCP server started (PID: {mcp_server_proc.pid})")
    if MCP_URL:
        print(f"Server URL: {MCP_URL}")
except Exception as e:
    print(f"Failed to start MCP server: {e}")

### 4. List Available Tools

Connect to the MCP server and list all available tools. This verifies:
- Server is running and accessible
- Client can successfully connect
- All RAG and Ingestor tools are properly exposed

**Expected tools:** `generate`, `search`, `get_summary`, `create_collection`, `list_collections`, `upload_documents`, `get_documents`, `update_documents`, `delete_documents`, `update_collection_metadata`, `update_document_metadata`, `delete_collections`

In [None]:
import json

print("="*80)
print("Listing available MCP tools...")
print("="*80)
subprocess.run([
    sys.executable,
    client_path,
    "list",
    f"--transport={TRANSPORT}",
    f"--url={MCP_URL}",
])

### 5. Create Collection

Call the `create_collection` tool to create a new vector database collection.

**Tool:** `create_collection`  
**Purpose:** Creates a collection in the vector database to store document embeddings  
**Arguments:**
- `collection_name` - Name of the collection to create

In [None]:
print("="*80)
print("Creating collection...")
print("="*80)
create_args = json.dumps({"collection_name": COLLECTION})
subprocess.run([
    sys.executable, client_path, "call",
    f"--transport={TRANSPORT}", f"--url={MCP_URL}",
    "--tool=create_collection",
    f"--json-args={create_args}",
])

### 6. Upload Document

Call the `upload_documents` tool to upload and process a PDF document.

**Tool:** `upload_documents`  
**Purpose:** Upload documents to a collection with chunking and optional summary generation  
**Arguments:**
- `collection_name` - Target collection
- `file_paths` - List of absolute file paths to upload
- `blocking` - Wait for ingestion to complete (True/False)
- `generate_summary` - Generate document summaries (True/False)
- `split_options` - Chunking configuration (chunk_size, chunk_overlap)

In [None]:
print("="*80)
print("Uploading document...")
print("="*80)
upload_args = json.dumps({
    "collection_name": COLLECTION,
    "file_paths": [pdf_path],
    "blocking": True,
    "generate_summary": True,
    "split_options": {"chunk_size": 512, "chunk_overlap": 150},
})
subprocess.run([
    sys.executable, client_path, "call",
    f"--transport={TRANSPORT}", f"--url={MCP_URL}",
    "--tool=upload_documents",
    f"--json-args={upload_args}",
])

### 7. Generate Answer with RAG

Call the `generate` tool to generate an answer using the RAG pipeline.

**Tool:** `generate`  
**Purpose:** Generate answers using the RAG pipeline with context from the knowledge base  
**Arguments:**
- `messages` - Chat messages in OpenAI format (role, content)
- `collection_names` - List of collections to search for context
- `use_knowledge_base` - Enable RAG retrieval (default: True)

The tool retrieves relevant document chunks and generates a contextually grounded response.

In [None]:
print("="*80)
print("Calling 'generate' tool...")
print("="*80)
generate_args = json.dumps({
    "messages": [{"role": "user", "content": "Hello from MCP demo"}],
    "collection_names": [COLLECTION],
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    f"--transport={TRANSPORT}",
    f"--url={MCP_URL}",
    "--tool=generate",
    f"--json-args={generate_args}",
])

### 8. Search Documents

Call the `search` tool to search the vector database for relevant documents.

**Tool:** `search`  
**Purpose:** Search the vector database and return relevant document chunks with citations  
**Arguments:**
- `query` - Search query text
- `collection_names` - Collections to search
- `vdb_top_k` - Number of documents to retrieve from vector DB
- `reranker_top_k` - Number of documents after reranking
- `enable_reranker` - Use reranker to improve results (True/False)
- `enable_query_rewriting` - Rewrite query for better retrieval (True/False)

In [None]:
print("="*80)
print("Calling 'search' tool...")
print("="*80)
search_args = json.dumps({
    "query": "Tell me about Robert Frost's poems",
    "collection_names": [COLLECTION],
    "reranker_top_k": 2,
    "vdb_top_k": 5,
    "enable_query_rewriting": False,
    "enable_reranker": True,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    f"--transport={TRANSPORT}",
    f"--url={MCP_URL}",
    "--tool=search",
    f"--json-args={search_args}",
])

### 9. Get Document Summary

Call the `get_summary` tool to retrieve the pre-generated summary for a document.

**Tool:** `get_summary`  
**Purpose:** Retrieve document summaries generated during ingestion  
**Arguments:**
- `collection_name` - Collection containing the document
- `file_name` - Name of the document file
- `blocking` - Wait for summary if not ready (True/False)
- `timeout` - Maximum seconds to wait if blocking

Summaries are generated asynchronously during document upload if `generate_summary=True`.

In [None]:
print("="*80)
print("Calling 'get_summary' tool...")
print("="*80)
summary_args = json.dumps({
    "collection_name": COLLECTION,
    "file_name": "woods_frost.pdf",
    "blocking": False,
    "timeout": 60,
})
subprocess.run([
    sys.executable,
    client_path,
    "call",
    f"--transport={TRANSPORT}",
    f"--url={MCP_URL}",
    "--tool=get_summary",
    f"--json-args={summary_args}",
])

### 10. Delete Collection

Call the `delete_collections` tool to clean up the demo collection.

**Tool:** `delete_collections`  
**Purpose:** Delete one or more collections from the vector database  
**Arguments:**
- `collection_names` - List of collection names to delete

This removes the collection and all its documents from the vector database.

In [None]:
print("="*80)
print("Deleting collection...")
print("="*80)
delete_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    sys.executable, client_path, "call",
    f"--transport={TRANSPORT}", f"--url={MCP_URL}",
    "--tool=delete_collections",
    f"--json-args={delete_args}",
])

---

## Using Other Transport Protocols

This notebook uses `streamable_http` by default, but the MCP server supports three transport modes:

### SSE (Server-Sent Events)

**To use SSE transport:**

1. Update the configuration in Cell 2:
   ```python
   TRANSPORT = "sse"
   ```

2. Re-run all cells from Cell 2 onwards

The `MCP_URL` will automatically be set to `http://127.0.0.1:8000/sse` based on the transport.

### stdio (Standard Input/Output)

**stdio transport doesn't require a separate server.** The client spawns the server automatically.

**To use stdio transport:**

1. Update the configuration in Cell 2:
   ```python
   TRANSPORT = "stdio"
   ```

2. Skip the server launch cell (Cell 3) - no server process needed

3. In all tool call cells, replace the subprocess command with this pattern:
   ```python
   STDIO_CMD = sys.executable
   STDIO_ARGS = f"{server_path} --transport stdio"
   
   subprocess.run([
       sys.executable,
       client_path,
       "list",  # or "call"
       "--transport=stdio",
       f"--command={STDIO_CMD}",
       f"--args={STDIO_ARGS}",
       # ... add tool-specific arguments
   ])
   ```

**Note:** stdio is ideal for local development and doesn't require managing server processes. Each client call spawns a fresh server instance.

---

## Cleanup & Troubleshooting

### Stopping the Server

**Option 1:** Restart the kernel - this automatically terminates the MCP server process

**Option 2:** Manually terminate using the PID shown when the server started

### Common Issues

**Port already in use:**
```bash
fuser -k 8000/tcp  # Linux
lsof -ti:8000 | xargs kill  # macOS
```

**Server not responding:**
- Verify RAG (port 8081) and Ingestor (port 8082) services are running
- Check server logs for connection errors
- For streamable_http: HTTP 406 on GET `/mcp` is normal; the endpoint only accepts MCP requests

**Tool call failures:**
- Ensure collection exists before calling RAG tools
- Verify documents are fully ingested before searching
- Check that file paths are absolute and accessible to the server

**Dependencies issues:**
- Ensure Python 3.11+ is installed
- Re-run the dependencies installation cell
- Verify `mcp`, `anyio`, `httpx`, `uvicorn`, and `fastmcp` are installed

### Additional Resources

- [MCP Documentation](https://modelcontextprotocol.io/docs/getting-started/intro)
- [NVIDIA RAG Quickstart](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/deploy-docker-self-hosted.md)
- [MCP Server Source Code](../examples/nvidia_rag_mcp/mcp_server.py)
- [MCP Client Source Code](../examples/nvidia_rag_mcp/mcp_client.py)