# NVIDIA RAG Python Package - Lite Mode

This notebook demonstrates the **containerless deployment** of the NVIDIA RAG Python package, where all operations are performed purely in Python without requiring Docker containers.

### Key Features

- **Containerless Deployment**: All functionality is accessible through Python APIs without external services
- **Simplified Setup**: Uses Milvus Lite (embedded vector database) and NV-Ingest subprocess mode
- **Cloud-Based Processing**: Leverages NVIDIA cloud APIs for embeddings, ranking, and LLM inference

### Important Limitations

> **⚠️ Citation Limitations**: This lite mode does not support **image, table, or chart citations** since the Minio object storage service is not deployed. Only text-based citations are available.

> **⚠️ Summary Generation**: Document summary generation is **not supported** in lite mode.

> **Note**: You may encounter warnings related to citations and summarization during execution. These warnings are expected in lite mode and can be safely ignored.

## Setup for NVIDIA RAG Python Package
Please refer to [rag_library_usage.ipynb](./rag_library_usage.ipynb) for detailed installation instructions for the RAG library.
**Quick install:**
```bash
uv pip install nvidia-rag[all]
```

Install nv-ingest library using below command - **OR** - Run the cell below if Jupyter notebook is started in the same environment:
```bash
uv pip install nv-ingest==26.1.1
```

### Install the NVIDIA RAG Package and NV-Ingest Library

Run the cell below to install the required packages if not already installed:

In [None]:
# Option A: Install from PyPI (recommended)
# Uncomment the line below to install from PyPI
# !uv pip install nvidia-rag[all]

# Option B: Install from source in development mode (for contributors)
# Note: ".." refers to the parent directory where pyproject.toml is located
!uv pip install -e "..[all]"

# Option C: Build and install from source wheel
# Uncomment the lines below to build and install from source
# !cd .. && uv build
# !uv pip install ../dist/nvidia_rag-*-py3-none-any.whl[all]

# Install NV-Ingest library in the same environment to run NV-Ingest pipeline
!uv pip install nv-ingest==26.1.1

## Setting up the dependencies

### 1. Setup the default configurations

In [None]:
!uv pip install python-dotenv
import os

from getpass import getpass

Provide your NGC_API_KEY after executing the cell below. You can obtain a key by following steps [here](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/api-key.md).

In [None]:
# del os.environ['NVIDIA_API_KEY']  ## delete key and reset if needed
if os.environ.get("NGC_API_KEY", "").startswith("nvapi-"):
    print("Valid NGC_API_KEY already in environment. Delete to reset")
else:
    candidate_api_key = getpass("NVAPI Key (starts with nvapi-): ")
    assert candidate_api_key.startswith("nvapi-"), (
        f"{candidate_api_key[:5]}... is not a valid key"
    )
    os.environ["NGC_API_KEY"] = candidate_api_key

Login to nvcr.io which is needed for pulling the containers of dependencies

### 2. Setup the NIMs

Setup the environment variables for NIMs using Nvidia Hosted models

In [None]:
# This are used by nv-ingest-ms-runtime container when using cloud models
os.environ["OCR_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-ocr"
os.environ["OCR_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-page-elements-v3"
)
os.environ["YOLOX_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_GRAPHIC_ELEMENTS_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-graphic-elements-v1"
)
os.environ["YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_TABLE_STRUCTURE_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-table-structure-v1"
)
os.environ["YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL"] = "http"

### 3. Setup the NV-Ingest Pipeline Subprocess

Apply platform compatibility patches if needed, then launch the NV-Ingest pipeline as a background subprocess for document processing.

**Note:** Kindly ensure NV-Ingest container is not running to avoid port conflict

In [None]:
"""
macOS Compatibility Patch for nv-ingest

The nv-ingest library contains Linux-specific dependencies (libc.so.6) that are not 
available on macOS systems. This compatibility patch neutralizes the `set_pdeathsig` 
function to enable cross-platform functionality.

Note: This patch should be executed before any nv-ingest functionality is invoked.
"""

import sys
import platform

if platform.system() == "Darwin":  # macOS
    print("Platform detected: macOS")
    print("Applying nv-ingest compatibility patch...")
    
    # Import the target module and apply the compatibility patch
    import nv_ingest.framework.orchestration.ray.util.pipeline.pipeline_runners as runners
    
    # Replace the Linux-specific set_pdeathsig function with a no-op implementation
    def _noop_set_pdeathsig():
        """No-op replacement for Linux-only set_pdeathsig on macOS."""
        pass
    
    runners.set_pdeathsig = _noop_set_pdeathsig
    print("Compatibility patch successfully applied.")
else:
    print(f"Platform detected: {platform.system()}")
    print("No compatibility patch required.")

In [None]:
# Run NV-Ingest pipeline as a background subprocess
from nv_ingest.framework.orchestration.ray.util.pipeline.pipeline_runners import run_pipeline

run_pipeline(block=False, disable_dynamic_scaling=True, run_in_subprocess=True)

---
# API usage example

After setting up the python package and starting all dependent services, finally we can execute some snippets showcasing all different functionalities offered by the `nvidia_rag` package.

## Set logging level
First let's set the required logging level. Set to INFO for displaying basic important logs. Set to DEBUG for full verbosity.

In [None]:
import logging
import os

# Set the log level via environment variable before importing nvidia_rag
# This ensures the package respects our log level setting
LOGLEVEL = logging.WARNING  # Set to INFO, DEBUG, WARNING or ERROR
os.environ["LOGLEVEL"] = logging.getLevelName(LOGLEVEL)

# Configure logging
logging.basicConfig(level=LOGLEVEL, force=True)

# Set log levels for specific loggers after package import
for name in logging.root.manager.loggerDict:
    if name == "nvidia_rag" or name.startswith("nvidia_rag."):
        logging.getLogger(name).setLevel(LOGLEVEL)
    if name == "nv_ingest_client" or name.startswith("nv_ingest_client."):
        logging.getLogger(name).setLevel(LOGLEVEL)

## Initialize the NvidiaRAGIngestor Package in Lite Mode

Import `NvidiaRAGIngestor` to access APIs for document upload and management operations.

In [None]:
from nvidia_rag import NvidiaRAGIngestor
from nvidia_rag.utils.configuration import NvidiaRAGConfig

config_ingestor = NvidiaRAGConfig.from_yaml("config.yaml")
# You can update the config object to use different models and endpoints like below
# config_ingestor.embeddings.model_name = "nvidia/llama-nemotron-embed-1b-v2"
# config_ingestor.embeddings.server_url = "https://integrate.api.nvidia.com/v1"

# Set config for rag lite library mode
config_ingestor.vector_store.url = "./milvus-lite.db"
config_ingestor.nv_ingest.message_client_port = 7671 # Port for NV-Ingest libary mode

# Set config for cloud API endpoints
config_ingestor.embeddings.server_url = "https://integrate.api.nvidia.com/v1"

ingestor = NvidiaRAGIngestor(config=config_ingestor, mode="lite")

## 1. Create a new collection
Creates a new collection in the vector database.

In [None]:
response = ingestor.create_collection(
    collection_name="test_library",
    # [Optional]: Create collection with metadata schema, uncomment to create collection with metadata schemas
    # metadata_schema = [
    #     {
    #         "name": "meta_field_1",
    #         "type": "string",
    #         "description": "Following field would contain the description for the document"
    #     }
    # ]
)
print(response)

## 2. List all collections
Retrieves all available collections from the vector database.

In [None]:
response = ingestor.get_collections()
print(response)

## 3. Add a document
Uploads new documents to the specified collection in the vector database. In case you have a requirement of updating existing documents in the specified collection, you can call `update_documents()` instead of `upload_documents()`.

In [None]:
response = await ingestor.upload_documents(
    collection_name="test_library",
    blocking=False,
    split_options={"chunk_size": 512, "chunk_overlap": 150},
    filepaths=[
        "../data/multimodal/woods_frost.docx",
        "../data/multimodal/multimodal_test.pdf",
    ],
    # [Optional]: Uncomment to add custom metadata, ensure that the metadata schema is created with the same fields with create_collection
    # custom_metadata=[
    #     {
    #         "filename": "multimodal_test.pdf",
    #         "metadata": {"meta_field_1": "multimodal document 1"}
    #     },
    #     {
    #         "filename": "woods_frost.docx",
    #         "metadata": {"meta_field_1": "multimodal document 2"}
    #     }
    # ]
)
task_id = response.get("task_id")
print(response)

## 4. Check document upload status
Checks the status of a document upload/update task.

In [None]:
response = await ingestor.status(task_id=task_id)
print(response)

##  [Optional] Update a document in a collection
In case you have a requirement of updating an existing document in the specified collection, execute below cell.

In [None]:
response = await ingestor.update_documents(
    collection_name="test_library",
    blocking=False,
    filepaths=["../data/multimodal/woods_frost.docx"],
)
task_id = response.get("task_id")
print(response)

## 5. Get documents in a collection
Retrieves the list of documents uploaded to a collection.

In [None]:
response = ingestor.get_documents(
    collection_name="test_library",
)
print(response)

## Import the NvidiaRAG packages
You can import `NvidiaRAG()` which exposes APIs to interact with the uploaded documents.

You can create a config object from a dictionary or from a YAML file. We have added a sample config file [config.yaml](./config.yaml) that you can use to create an `NvidiaRAG` object.

In [None]:
from nvidia_rag import NvidiaRAG
from nvidia_rag.utils.configuration import NvidiaRAGConfig

config_rag = NvidiaRAGConfig.from_yaml("config.yaml")

# Set config for rag lite library mode
config_rag.vector_store.url = "./milvus-lite.db"
config_rag.enable_citations = False

# Set config for cloud API endpoints
config_rag.embeddings.server_url = "https://integrate.api.nvidia.com/v1"
config_rag.ranking.server_url = ""  # Empty uses NVIDIA API catalog
config_rag.llm.server_url = ""  # Empty uses NVIDIA API catalog

# Initialize NvidiaRAG with config
# You can optionally pass custom prompts via:
#   - A path to a YAML/JSON file: prompts="custom_prompts.yaml"
#   - A dictionary: prompts={"rag_template": {"system": "...", "human": "..."}}
rag = NvidiaRAG(config=config_rag)


## 6. Query a document using RAG
Sends a chat-style query to the RAG system using the specified models and endpoints.

### Check health of all dependent services

In [None]:
import json

health_status_with_deps = await rag.health()
print(health_status_with_deps.message)

### Prepare output parser

In [None]:
import base64
import json

from IPython.display import Image, Markdown, display


async def print_streaming_response_and_citations(rag_response):
    """
    Print the streaming response and citations from the RAG response.
    """
    # Check for API errors before processing
    if rag_response.status_code != 200:
        print("Error: ", rag_response.status_code)
        return

    # Extract the streaming generator from the response
    response_generator = rag_response.generator
    first_chunk_data = None
    async for chunk in response_generator:
        if chunk.startswith("data: "):
            chunk = chunk[len("data: ") :].strip()
        if not chunk:
            continue
        try:
            data = json.loads(chunk)
        except Exception as e:
            print(f"JSON decode error: {e}")
            continue
        choices = data.get("choices", [])
        if not choices:
            continue
        # Save the first chunk with citations
        if first_chunk_data is None and data.get("citations"):
            first_chunk_data = data
        # Print streaming text
        delta = choices[0].get("delta", {})
        text = delta.get("content")
        if not text:
            message = choices[0].get("message", {})
            text = message.get("content", "")
        print(text, end="", flush=True)
    print()  # Newline after streaming

    # Display citations after streaming is done
    if first_chunk_data and first_chunk_data.get("citations"):
        citations = first_chunk_data["citations"]
        for idx, citation in enumerate(citations.get("results", [])):
            doc_type = citation.get("document_type", "text")
            content = citation.get("content", "")
            doc_name = citation.get("document_name", f"Citation {idx + 1}")
            display(Markdown(f"**Citation {idx + 1}: {doc_name}**"))
            try:
                image_bytes = base64.b64decode(content)
                display(Image(data=image_bytes))
            except Exception:
                display(Markdown(f"```\n{content}\n```"))

### Call the API

In [None]:
await print_streaming_response_and_citations(
    await rag.generate(
        messages=[{"role": "user", "content": "What is the price of a hammer?"}],
        use_knowledge_base=True,
        collection_names=["test_library"],
        enable_citations=False,
        # embedding_endpoint="localhost:9080" # TODO: Uncomment while using on-prem embeddings
    )
)

## 7. Search for documents
Performs a search in the vector database for relevant documents.

### Define output parser

In [None]:
def print_search_citations(citations):
    """
    Display all citations from the Citations object returned by search().
    Handles base64-encoded images and text.
    """
    if not citations or not hasattr(citations, "results") or not citations.results:
        print("No citations found.")
        return

    for idx, citation in enumerate(citations.results):
        # If using pydantic models, citation fields may be attributes, not dict keys
        doc_type = getattr(citation, "document_type", "text")
        content = getattr(citation, "content", "")
        doc_name = getattr(citation, "document_name", f"Citation {idx + 1}")

        display(Markdown(f"**Citation {idx + 1}: {doc_name}**"))
        try:
            image_bytes = base64.b64decode(content)
            display(Image(data=image_bytes))
        except Exception:
            display(Markdown(f"```\n{content}\n```"))

### Call the API

In [None]:
print_search_citations(
    await rag.search(
        query="What is the price of a hammer?",
        collection_names=["test_library"],
        reranker_top_k=10,
        vdb_top_k=100,
        # embedding_endpoint="localhost:9080" # TODO: Uncomment while using on-prem embeddings
        # [Optional]: Uncomment to filter the documents based on the metadata, ensure that the metadata schema is created with the same fields with create_collection
        # filter_expr='content_metadata["meta_field_1"] == "multimodal document 1"'
    )
)

Below APIs illustrate how to cleanup uploaded documents and collections once no more interaction is needed.
## 8. Delete documents from a collection
Deletes documents from the specified collection.

In [None]:
response = ingestor.delete_documents(
    collection_name="test_library",
    document_names=["../data/multimodal/multimodal_test.pdf"],
)
print(response)

## 9. Delete collections
Deletes the specified collection and all its documents from the vector database.

In [None]:
response = ingestor.delete_collections(
    collection_names=["test_library"]
)
print(response)

## 10. Stop the NV-Ingest Pipeline Subprocess

In [None]:
%%bash
# Stop process on port 7671
PID_7671=$(lsof -ti tcp:7671)
if [ -n "$PID_7671" ]; then
    kill $PID_7671
    echo "Stopped process on port 7671 (PID: $PID_7671)"
else
    echo "No process running on port 7671"
fi

# Stop process on port 8265
PID_8265=$(lsof -ti tcp:8265)
if [ -n "$PID_8265" ]; then
    kill $PID_8265
    echo "Stopped process on port 8265 (PID: $PID_8265)"
else
    echo "No process running on port 8265"
fi

echo "NV-Ingest Pipeline subprocess check completed"