### NAT + NVIDIA RAG MCP end-to-end notebook

#### What is NAT?

[NeMo Agent Toolkit (NAT)](https://docs.nvidia.com/nemo/agent-toolkit/latest/index.html) is NVIDIA's framework for building, deploying, and orchestrating AI agents. NAT provides:

- **Unified agent workflows**: Define complex multi-step reasoning pipelines using simple YAML configurations
- **MCP integration**: Connect to any Model Context Protocol (MCP) server to extend agent capabilities with external tools
- **LLM flexibility**: Use NVIDIA NIM, OpenAI, or other LLM providers interchangeably
- **Production-ready serving**: Deploy agents as REST APIs with built-in observability and scaling

#### Why use NAT with NVIDIA RAG?

By combining NAT with the NVIDIA RAG MCP server, you can:

1. **Build intelligent agents** that leverage your enterprise knowledge base through RAG
2. **Enable natural language interactions** - users can ask questions, search documents, and manage collections conversationally
3. **Extend RAG capabilities** - the agent can reason about when to search, summarize, or combine information from multiple queries
4. **Simplify integration** - MCP provides a standard protocol for tool calling, making it easy to add RAG to any agent workflow

This notebook demonstrates how to:
1. Set up a virtual environment with NAT CLI and MCP dependencies using `uv`.
2. Start the NVIDIA RAG MCP server in `streamable_http` mode.
3. Create a collection and upload a sample document via the MCP client.
4. Configure a NAT workflow using `nvidia_rag_mcp.yaml`.
5. Start a NAT server that uses the RAG MCP server as a tool.
6. Run a natural language query against the RAG workflow using `nat run`.
7. Clean up: delete the collection, stop the servers, remove the `.nat-mcp` virtual environment and `nvidia_rag_mcp.yaml` config file.

**Prerequisites**

- End-to-end NVIDIA RAG workflow is up and running (per the [RAG quickstart](https://github.com/NVIDIA-AI-Blueprints/rag/blob/develop/docs/deploy-docker-self-hosted.md)).

### Set up Virtual Environment

Run the cell below to create a virtual environment and install NAT CLI and MCP dependencies.

In [None]:
# Create virtual environment using uv
!uv venv .nat-mcp

# Install NAT CLI with langchain and mcp plugins into the venv
# See: https://docs.nvidia.com/nemo/agent-toolkit/latest/quick-start/installing.html
!uv pip install --python .nat-mcp/bin/python "nvidia-nat[langchain,mcp]"

# Install MCP requirements into the venv
!uv pip install --python .nat-mcp/bin/python -r ../examples/nvidia_rag_mcp/requirements.txt

### Start the MCP Server

The next cell launches the NVIDIA RAG MCP server in `streamable_http` mode on port 9902. The server runs as a background subprocess and is automatically terminated when the kernel shuts down.

In [None]:
# Export the RAG server and INGESTOR server urls before starting the MCP server
import os

os.environ["INGESTOR_SERVER_URL"] = "http://localhost:8082" # Ingestor server url
os.environ["VITE_API_CHAT_URL"] = "http://localhost:8081" # Rag server url

In [None]:
import os
import atexit
import time
import subprocess

MCP_PORT = 9902
MCP_HOST = "127.0.0.1"

repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
server_path = os.path.join(repo_root, "examples", "nvidia_rag_mcp", "mcp_server.py")

# Use Python from the .nat-mcp venv
venv_dir = os.path.join(os.getcwd(), ".nat-mcp")
venv_python = os.path.join(venv_dir, "bin", "python")
if not os.path.exists(venv_python):
    raise RuntimeError(f"Python not found at {venv_python}. Please run the installation cell first.")

cmd = [
    venv_python,
    server_path,
    "--transport",
    "streamable_http",
    "--host",
    MCP_HOST,
    "--port",
    str(MCP_PORT),
]

print("Launching MCP server:", " ".join(cmd))

mcp_proc = subprocess.Popen(cmd)
atexit.register(lambda: mcp_proc and mcp_proc.poll() is None and mcp_proc.terminate())

time.sleep(2.0)
print(f"MCP server PID: {mcp_proc.pid} (http://{MCP_HOST}:{MCP_PORT}/mcp)")

### Create a Collection and Upload a Document

The next cell uses the MCP client to create a collection and upload a sample PDF document. This demonstrates how to interact with the NVIDIA RAG MCP server programmatically.

In [None]:
import json
import subprocess
import os

STREAMABLE_HTTP_URL = "http://127.0.0.1:9902/mcp"
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
client_path = os.path.join(repo_root, "examples", "nvidia_rag_mcp", "mcp_client.py")

# Use Python from the .nat-mcp venv
venv_dir = os.path.join(os.getcwd(), ".nat-mcp")
venv_python = os.path.join(venv_dir, "bin", "python")
if not os.path.exists(venv_python):
    raise RuntimeError(f"Python not found at {venv_python}. Please run the installation cell first.")

COLLECTION = "my_collection"
pdf_path = os.path.join(repo_root, "data", "multimodal", "product_catalog.pdf")

print("=" * 80)
print("Creating collection via MCP...")
print("=" * 80)
# Use create_collection (singular) with collection_name (not collection_names)
create_args = json.dumps({"collection_name": COLLECTION})
subprocess.run([
    venv_python,
    client_path,
    "call",
    "--transport=streamable_http",
    f"--url={STREAMABLE_HTTP_URL}",
    "--tool=create_collection",
    f"--json-args={create_args}",
], check=False)

print("\n" + "=" * 80)
print("Uploading document via MCP...")
print("=" * 80)
upload_args = json.dumps({
    "collection_name": COLLECTION,
    "file_paths": [pdf_path],
    "blocking": True,
    "generate_summary": True,
    "split_options": {"chunk_size": 512, "chunk_overlap": 150},
})
subprocess.run([
    venv_python,
    client_path,
    "call",
    "--transport=streamable_http",
    f"--url={STREAMABLE_HTTP_URL}",
    "--tool=upload_documents",
    f"--json-args={upload_args}",
], check=False)

print("\nDone setting up collection and document for NAT + MCP demo.")

### Create `nvidia_rag_mcp.yaml` for NAT

The next cell creates a file named `nvidia_rag_mcp.yaml` in the project root directory. This YAML configuration:
- **Defines an MCP client** (`function_groups.nvidia_rag_mcp`) that connects to the NVIDIA RAG MCP server over `streamable-http` at `http://localhost:9902/mcp` and exposes the `generate` tool to NAT.
- **Configures the LLM** (`llms.nim_llm`) that the agent uses when deciding how to call the MCP tool.
- **Builds a ReAct agent workflow** (`workflow`) that wires the `nvidia_rag_mcp` tool group and `nim_llm` model together.

You normally only need to modify the generated file if you:
- Point the MCP server to a different host/port (update the `url`), or
- Want to use a different LLM (update `model_name` and related settings).

In [None]:
import os

repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
config_path = os.path.join(repo_root, "nvidia_rag_mcp.yaml")

config_content = """\
function_groups:
  nvidia_rag_mcp:
    _type: mcp_client
    server:
      transport: streamable-http
      url: "http://localhost:9902/mcp"
    include:
      - generate

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
    max_tokens: 1024

workflow:
  _type: react_agent
  tool_names:
    - nvidia_rag_mcp
  llm_name: nim_llm
  verbose: true
  retry_parsing_errors: true
  max_retries: 3
"""

with open(config_path, "w") as f:
    f.write(config_content)

print(f"Created NAT config file: {config_path}")

### Start the NAT server

The next cell launches `nat serve` as a background subprocess. Leave the kernel running while you execute subsequent cells to query via `nat run`.

In [None]:
import os
import subprocess
import atexit
import time
import urllib.request
import urllib.error

NAT_PORT = 8000
MCP_URL = "http://127.0.0.1:9902/mcp"
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
config_path = os.path.join(repo_root, "nvidia_rag_mcp.yaml")

# Use Python/nat from the .nat-mcp venv
venv_dir = os.path.join(os.getcwd(), ".nat-mcp")
venv_nat = os.path.join(venv_dir, "bin", "nat")
if not os.path.exists(venv_nat):
    raise RuntimeError(f"NAT not found at {venv_nat}. Please run the installation cell first.")

# First, verify the MCP server is running
print(f"Checking MCP server at {MCP_URL}...")
try:
    req = urllib.request.Request(MCP_URL, method='POST')
    req.add_header('Content-Type', 'application/json')
    with urllib.request.urlopen(req, timeout=5) as resp:
        print(f"MCP server is reachable (status: {resp.status})")
except urllib.error.HTTPError as e:
    # HTTP errors like 400/405 mean the server is running
    print(f"MCP server is reachable (HTTP {e.code})")
except urllib.error.URLError as e:
    print(f"ERROR: Cannot connect to MCP server at {MCP_URL}")
    print(f"  Reason: {e.reason}")
    print("\nPlease ensure the MCP server is running (run Cell 2 first).")
    print("If port 9902 is in use, run: fuser -k 9902/tcp")
    raise RuntimeError("MCP server not reachable") from e

cmd = [venv_nat, "serve", "--config_file", config_path, "--port", str(NAT_PORT)]

print(f"Launching NAT server on port {NAT_PORT}...")

nat_proc = subprocess.Popen(cmd, cwd=repo_root)
atexit.register(lambda: nat_proc and nat_proc.poll() is None and nat_proc.terminate())

time.sleep(5.0)

# Check if process is still running
if nat_proc.poll() is None:
    print(f"NAT server PID: {nat_proc.pid} (http://127.0.0.1:{NAT_PORT})")
else:
    print(f"NAT server exited with code: {nat_proc.returncode}")

### Run `nat run`

The next cell executes `nat run` to invoke the workflow defined in `nvidia_rag_mcp.yaml`. This uses the NVIDIA RAG MCP server (running on `http://localhost:9902/mcp`) to answer based on the document you uploaded.

**Important:** Set your `NVIDIA_API_KEY` before running this cell. Get your API key from [build.nvidia.com](https://build.nvidia.com/).

In [None]:
# del os.environ['NVIDIA_API_KEY']  ## delete key and reset if needed
import os
from getpass import getpass

if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    candidate_api_key = getpass("NVAPI Key (starts with nvapi-): ")
    assert candidate_api_key.startswith("nvapi-"), (
        f"{candidate_api_key[:5]}... is not a valid key"
    )
    os.environ["NVIDIA_API_KEY"] = candidate_api_key

In [None]:
import os
import subprocess

# Set your NVIDIA API key here
# You can also set this as an environment variable before starting Jupyter
NVIDIA_API_KEY = os.environ.get("NVIDIA_API_KEY", "nvapi-...")
if NVIDIA_API_KEY == "nvapi-...":
    raise ValueError("Please set your NVIDIA_API_KEY. Get one from https://build.nvidia.com/")

COLLECTION = "my_collection"
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
config_path = os.path.join(repo_root, "nvidia_rag_mcp.yaml")

query = f"My rag question is: Tell me about Ratan Basket Shoulder Bag. Use collection {COLLECTION}, with ranker model as nvidia/llama-3.2-nv-rerankqa-1b-v2"

# Use nat from the .nat-mcp venv
venv_dir = os.path.join(os.getcwd(), ".nat-mcp")
venv_nat = os.path.join(venv_dir, "bin", "nat")
if not os.path.exists(venv_nat):
    raise RuntimeError(f"NAT not found at {venv_nat}. Please run the installation cell first.")

# Prepare environment with API key
env = os.environ.copy()
env["NVIDIA_API_KEY"] = NVIDIA_API_KEY

cmd = [venv_nat, "run", "--config_file", config_path, "--input", query]

print("Running NAT query...")
print("=" * 80)

result = subprocess.run(cmd, cwd=repo_root, capture_output=True, text=True, env=env)

print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)

### Clean up: Delete the demo collection

Run the next cell after you are done experimenting with the RAG workflow.

In [None]:
import json
import subprocess
import os

STREAMABLE_HTTP_URL = "http://127.0.0.1:9902/mcp"
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
client_path = os.path.join(repo_root, "examples", "nvidia_rag_mcp", "mcp_client.py")

# Use Python from the .nat-mcp venv
venv_dir = os.path.join(os.getcwd(), ".nat-mcp")
venv_python = os.path.join(venv_dir, "bin", "python")
if not os.path.exists(venv_python):
    raise RuntimeError(f"Python not found at {venv_python}. Please run the installation cell first.")

COLLECTION = "my_collection"

print("Deleting collection via MCP...\n")
delete_args = json.dumps({"collection_names": [COLLECTION]})
subprocess.run([
    venv_python,
    client_path,
    "call",
    "--transport=streamable_http",
    f"--url={STREAMABLE_HTTP_URL}",
    "--tool=delete_collections",
    f"--json-args={delete_args}",
], check=False)

### Stop the NAT server

The next cell stops the NAT server that was started earlier in this notebook. 

In [None]:
try:
    nat_proc
except NameError:
    print("No NAT server process handle (`nat_proc`) found in this session.")
else:
    if nat_proc is not None and nat_proc.poll() is None:
        print(f"Terminating NAT server PID {nat_proc.pid}...")
        nat_proc.terminate()
        try:
            nat_proc.wait(timeout=10)
        except Exception:
            print("NAT server did not exit in time; sending kill()...")
            nat_proc.kill()
        print("NAT server stopped.")
    else:
        print("NAT server process is not running.")

### Stop the MCP server

The next cell stops the MCP server that was started at the beginning of this notebook.

In [None]:
try:
    mcp_proc
except NameError:
    print("No MCP server process handle (`mcp_proc`) found in this session.")
else:
    if mcp_proc is not None and mcp_proc.poll() is None:
        print(f"Terminating MCP server PID {mcp_proc.pid}...")
        mcp_proc.terminate()
        try:
            mcp_proc.wait(timeout=10)
        except Exception:
            print("MCP server did not exit in time; sending kill()...")
            mcp_proc.kill()
        print("MCP server stopped.")
    else:
        print("MCP server process is not running.")

### Clean up: Delete virtual environment and config file

Run the next cell to remove the `.nat-mcp` virtual environment and the `nvidia_rag_mcp.yaml` config file created during this notebook.

In [None]:
import os
import shutil

# Delete .nat-mcp virtual environment
venv_dir = os.path.join(os.getcwd(), ".nat-mcp")
if os.path.exists(venv_dir):
    print(f"Deleting virtual environment: {venv_dir}")
    shutil.rmtree(venv_dir)
    print("Virtual environment deleted.")
else:
    print(f"Virtual environment not found at {venv_dir}")

# Delete nvidia_rag_mcp.yaml config file
repo_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
config_path = os.path.join(repo_root, "nvidia_rag_mcp.yaml")
if os.path.exists(config_path):
    print(f"Deleting config file: {config_path}")
    os.remove(config_path)
    print("Config file deleted.")
else:
    print(f"Config file not found at {config_path}")

### Troubleshooting

- **`uv` not found or virtual environment setup fails**
  - Install `uv` first: `pip install uv` or follow the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/).
  - If the virtual environment already exists, delete `.nat-mcp/` and rerun the setup cell.

- **MCP server fails to start or hangs**
  - Check that NVIDIA RAG and Ingestor services are up and healthy (see the [RAG quickstart](https://github.com/NVIDIA-AI-Blueprints/rag/blob/develop/docs/deploy-docker-self-hosted.md)).
  - Ensure port `9902` is free: `fuser -k 9902/tcp` (Linux) and rerun the server cell.

- **MCP client calls (create/upload/delete) fail**
  - Verify the MCP server cell printed a valid PID and URL `http://127.0.0.1:9902/mcp`.
  - Run the MCP client commands from the notebook again and inspect any error text in the cell output.

- **`nat serve` cannot connect to MCP**
  - Confirm the MCP server cell is still running (no error in its output, and the stop-MCP cell reports it as running).
  - Make sure the `url` in `nvidia_rag_mcp.yaml` matches the MCP server address (default: `http://localhost:9902/mcp`).

- **`nat run` errors or returns empty/irrelevant answers**
  - Ensure your `NVIDIA_API_KEY` is set correctly. Get one from [build.nvidia.com](https://build.nvidia.com/).
  - Confirm the collection (`my_collection`) exists and the upload cell completed successfully.
  - Adjust the `--input` prompt to clearly mention the collection name and the document you uploaded.

- **Environment or dependency issues**
  - Ensure you ran the virtual environment setup cell first (it installs `nvidia-nat[langchain,mcp]`).
  - If dependencies are missing, delete the `.nat-mcp/` folder and rerun the setup cell.
  - See the [NeMo Agent Toolkit installation guide](https://docs.nvidia.com/nemo/agent-toolkit/latest/quick-start/installing.html) for more details.