# Document Summarization Customization Guide

This notebook demonstrates how to customize the document summarization feature in NVIDIA RAG.

## Two Modes of Operation

- **Library Mode**: Programmatic configuration changes in Python notebooks/scripts
- **Docker Mode**: Configuration via environment variables and config files

## üìä Summarization Pipeline Architecture

The diagram below shows how document summarization integrates into the complete RAG pipeline:

![Summarization Pipeline Architecture](https://github.com/NVIDIA-AI-Blueprints/rag/raw/main/docs/assets/summarization_flow_diagram.png)

The summarization workflow that this notebook focuses on. You'll learn to customize:

- **Page Filtering**: Select specific pages using ranges, negative indexing, or even/odd patterns
- **Shallow vs Full Extraction**: Fast text-only OR comprehensive multimodal processing
- **Summarization Strategy**: Choose between Single (fastest), Hierarchical (balanced), or Iterative (best quality - default)
    - **Single**: Merge all content, chunk by configured size, and summarize only the first chunk (fastest, one LLM call)
    - **Hierarchical**: Tree-based summarization - summarize all chunks, merge summaries until they fit chunk size, repeat recursively until reaching one final summary (balanced speed/quality)
    - **Iterative (default)**: Process chunks sequentially with context refinement from previous summaries (best quality, N sequential LLM calls)
- **Token-based Chunking**: 9000 tokens per chunk with 400 token overlap
- **Real-time Status Tracking**: Monitor progress via Redis with chunk-level updates

---

## Part 1: Library Mode 

### 1. Setup before using library mode

#### 1.1. Installation guide for python package

> **Note**: Python version **3.11 or higher** is required.

##### üìù **Development Mode Note:**

- Installing with `uv pip install -e "..[all]"` allows you to make live edits to the `nvidia_rag` source code and have those changes reflected without reinstalling the package.
- After making changes to the source code, you need to:
  - Restart the kernel of your notebook server
  - Re-execute the cells under `Setting up the dependencies` and `Import the packages` sections

#### Install uv (if not already installed)

Run the cell below to check if `uv` is installed and install it if needed.

In [None]:
import subprocess
import shutil

# Check if uv is installed
if shutil.which("uv"):
    result = subprocess.run(["uv", "--version"], capture_output=True, text=True)
    print(f"‚úÖ uv is already installed: {result.stdout.strip()}")
else:
    print("‚ö†Ô∏è uv is not installed. Installing now...")
    # Install uv using the official installer
    !curl -LsSf https://astral.sh/uv/install.sh | sh
    print("\n‚úÖ uv installed! Please restart your terminal/kernel and re-run this notebook.")

#### Install the NVIDIA RAG Package

Choose one of the installation options below:
- **Option A**: Install from PyPI (recommended for most users)
- **Option B**: Install from source in development mode (for contributors)
- **Option C**: Build and install from source wheel

In [None]:
# Option A: Install from PyPI (recommended)
# Uncomment the line below to install from PyPI
# !uv pip install nvidia-rag[all]

# Option B: Install from source in development mode (for contributors)
# Note: ".." refers to the parent directory where pyproject.toml is located
!uv pip install -e "..[all]"

# Option C: Build and install from source wheel
# Uncomment the lines below to build and install from source
# !cd .. && uv build
# !uv pip install ../dist/nvidia_rag-*-py3-none-any.whl[all]

#### 1.2. Verify the installation
The location of the package shown in the output of this command should be inside the virtual environment.

Location: `<workspace_path>/rag/.venv/lib/python3.12/site-packages`

In [None]:
!uv pip show nvidia_rag | grep Location

### 2. Setting up the dependencies

After the environment for the python package is set up, launch all the dependent services and NIMs that the pipeline depends on.

Fulfill the [prerequisites here](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/deploy-docker-self-hosted.md) to set up docker on your system.

#### 2.1. Setup the default configurations

In [None]:
!uv pip install python-dotenv
import os
from getpass import getpass

from dotenv import load_dotenv

Provide your NGC_API_KEY after executing the cell below. You can obtain a key by following steps [here](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/api-key.md).

In [None]:
# del os.environ['NVIDIA_API_KEY']  ## delete key and reset if needed
if os.environ.get("NGC_API_KEY", "").startswith("nvapi-"):
    print("Valid NGC_API_KEY already in environment. Delete to reset")
else:
    candidate_api_key = getpass("NVAPI Key (starts with nvapi-): ")
    assert candidate_api_key.startswith("nvapi-"), (
        f"{candidate_api_key[:5]}... is not a valid key"
    )
    os.environ["NGC_API_KEY"] = candidate_api_key

Login to nvcr.io which is needed for pulling the containers of dependencies

In [None]:
!echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin

#### 2.2. Setup the Milvus vector DB services
By default milvus uses GPU Indexing. Ensure you have provided correct GPU ID.
Note: If you don't have a GPU available, you can switch to CPU-only Milvus by following the instructions in [milvus-configuration.md](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/milvus-configuration.md).

In [None]:
os.environ["VECTORSTORE_GPU_DEVICE_ID"] = "0"

In [None]:
!docker compose -f ../deploy/compose/vectordb.yaml up -d

#### 2.3. Setup the NIMs

#### Option 1: Deploy on-prem models

Move to Option 2 if you are interested in using cloud models.

Ensure you meet [the hardware requirements](https://github.com/NVIDIA-AI-Blueprints/rag/blob/main/docs/support-matrix.md). By default the NIMs are configured to use 2xH100.

In [None]:
# Create the model cache directory
!mkdir -p ~/.cache/model-cache

In [None]:
# Set the MODEL_DIRECTORY environment variable in the Python kernel
import os

os.environ["MODEL_DIRECTORY"] = os.path.expanduser("~/.cache/model-cache")
print("MODEL_DIRECTORY set to:", os.environ["MODEL_DIRECTORY"])

In [None]:
# Configure GPU IDs for the various microservices if needed
os.environ["EMBEDDING_MS_GPU_ID"] = "0"
os.environ["RANKING_MS_GPU_ID"] = "0"
os.environ["YOLOX_MS_GPU_ID"] = "0"
os.environ["YOLOX_GRAPHICS_MS_GPU_ID"] = "0"
os.environ["YOLOX_TABLE_MS_GPU_ID"] = "0"
os.environ["OCR_MS_GPU_ID"] = "0"
os.environ["LLM_MS_GPU_ID"] = "1"

In [None]:
# ‚ö†Ô∏è Deploying NIMs - This may take a while as models download. If kernel times out, just rerun this cell.
!USERID=$(id -u) docker compose -f ../deploy/compose/nims.yaml up -d

In [None]:
# Watch the status of running containers (run this cell repeatedly or in a terminal)
!docker ps

In [None]:
# Set deployment mode for on-prem NIMs
DEPLOYMENT_MODE = "on_prem"

Ensure all the below are running and healthy before proceeding further
```output
NAMES                           STATUS
nemotron-ranking-ms        Up ... (healthy)
compose-page-elements-1         Up ...
compose-nemoretriever-ocr-1     Up ...
compose-graphic-elements-1      Up ...
compose-table-structure-1       Up ...
nemotron-embedding-ms      Up ... (healthy)
nim-llm-ms                      Up ... (healthy)
```

#### Option 2: Using Nvidia Hosted models

In [None]:
DEPLOYMENT_MODE = "cloud"

# Set deployment mode for NVIDIA hosted cloud APIs
os.environ["OCR_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-ocr"
os.environ["OCR_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-page-elements-v3"
)
os.environ["YOLOX_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_GRAPHIC_ELEMENTS_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-graphic-elements-v1"
)
os.environ["YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_TABLE_STRUCTURE_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-table-structure-v1"
)
os.environ["YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL"] = "http"

#### 2.4. Setup the Nvidia Ingest runtime and redis service

In [None]:
!docker compose -f ../deploy/compose/docker-compose-ingestor-server.yaml up nv-ingest-ms-runtime redis -d

#### 2.5. Load optional profiles if needed

In [None]:
# Load accuracy profile
# load_dotenv(dotenv_path='../deploy/compose/accuracy_profile.env', override=True)

# OR load perf profile
# load_dotenv(dotenv_path='../deploy/compose/perf_profile.env', override=True)

### 3. Import libraries and view defaults

After setting up the python package and starting all dependent services, we can now import the libraries and view default configuration for summarization.

#### 3.1. Set logging level

First let's set the required logging level. Set to INFO for displaying basic important logs. Set to DEBUG for full verbosity.

In [None]:
import logging
import os

# Set the log level via environment variable before importing nvidia_rag
# This ensures the package respects our log level setting
LOGLEVEL = logging.WARNING  # Set to INFO, DEBUG, WARNING or ERROR
os.environ["LOGLEVEL"] = logging.getLevelName(LOGLEVEL)

# Configure logging
logging.basicConfig(level=LOGLEVEL, force=True)

# Set log levels for specific loggers after package import
for name in logging.root.manager.loggerDict:
    if name == "nvidia_rag" or name.startswith("nvidia_rag."):
        logging.getLogger(name).setLevel(LOGLEVEL)
    if name == "nv_ingest_client" or name.startswith("nv_ingest_client."):
        logging.getLogger(name).setLevel(LOGLEVEL)

#### 3.2. Import the packages and initialize configuration
You can import both or either one based on your requirements. `NvidiaRAG()` exposes APIs to interact with the uploaded documents or retrieve summaries and `NvidiaRAGIngestor()` exposes APIs for document upload, management and summary generation.

In [None]:
from nvidia_rag import NvidiaRAG, NvidiaRAGIngestor
from nvidia_rag.utils.configuration import NvidiaRAGConfig
from nvidia_rag.rag_server.response_generator import retrieve_summary

# Get the configuration object
config = NvidiaRAGConfig.from_yaml("config.yaml")

# Update config for cloud deployment if using Option 2
if DEPLOYMENT_MODE == "cloud":
    config.embeddings.server_url = "https://integrate.api.nvidia.com/v1"
    config.llm.server_url = ""  # Empty uses NVIDIA API catalog
    config.ranking.server_url = "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking/v1"
    config.summarizer.server_url = ""  # Empty uses NVIDIA API catalog
else:
    config.embeddings.server_url = "nemotron-embedding-ms:8000/v1"
    config.ranking.server_url = "nemotron-ranking-ms:8000"
    config.summarizer.server_url = "nim-llm:8000"
    config.llm.server_url = "nim-llm:8000"

# Initialize NvidiaRAG and NvidiaRAGIngestor with config
# For summarization customization, pass prompts to NvidiaRAGIngestor:
#   - A path to a YAML/JSON file: prompts="custom_prompts.yaml"
#   - A dictionary: prompts={"document_summary_prompt": {...}}
rag = NvidiaRAG(config=config)
ingestor = NvidiaRAGIngestor(config=config)

#### 3.3. View Default Summarizer LLM Settings

Let's see what LLM model and parameters are used by default for summarization.

In [None]:
print("=" * 70)
print("DEFAULT SUMMARIZER LLM CONFIGURATION")
print("=" * 70)
print(f"Model:             {config.summarizer.model_name}")
print(f"Server URL:        {config.summarizer.server_url}")
print(f"Temperature:       {config.summarizer.temperature}")
print(f"Top P:             {config.summarizer.top_p}")
print(f"Max Parallel:      {config.summarizer.max_parallelization}")
print(f"Max Chunk Length:  {config.summarizer.max_chunk_length}")
print(f"Chunk Overlap:     {config.summarizer.chunk_overlap}")
print("=" * 70)

#### 3.4. View Default Summarization Prompts

The prompt template controls how the LLM generates summaries. Let's see the default prompts.

In [None]:
import json

# Access prompts from the NvidiaRAGIngestor instance (initialized with defaults)
# Summarization is handled by NvidiaRAGIngestor, so view prompts from ingestor
print("=" * 70)
print("DEFAULT DOCUMENT SUMMARY PROMPT")
print("=" * 70)
print(json.dumps(ingestor.prompts["document_summary_prompt"], indent=2))
print("=" * 70)

print("\n" + "=" * 70)
print("DEFAULT ITERATIVE SUMMARY PROMPT")
print("=" * 70)
print(json.dumps(ingestor.prompts["iterative_summary_prompt"], indent=2))
print("=" * 70)

This will display the default prompts used for:
- **document_summary_prompt**: Summarizing a single document or chunk (used for full multimodal extraction)
- **shallow_summary_prompt**: Summarizing with fast text-only extraction (used when `shallow_summary: true`)
- **iterative_summary_prompt**: Combining multiple summaries for large documents

The system automatically selects the appropriate prompt based on extraction mode and document size.

---

## Part 2: Library Mode - Change Configuration
Now let's see how to modify these settings programmatically in library mode.

### 1. Change LLM Model and Parameters

You can change the model and sampling parameters dynamically.

In [None]:
# Change to a different model (e.g., Llama 3.1 70B)
config.summarizer.model_name = "meta/llama-3.1-70b-instruct"
config.summarizer.server_url = ""

# Lower temperature for more deterministic, focused summaries
config.summarizer.temperature = 0.2

# Adjust top_p for nucleus sampling
config.summarizer.top_p = 0.7

# Configure global rate limiting (max parallel summary tasks across all workers)
# Prevents overwhelming GPU/API with too many concurrent LLM calls
config.summarizer.max_parallelization = 10  # Default: 20

print("‚úÖ Updated Summarizer Configuration:")
print(f"   Model:       {config.summarizer.model_name}")
print(f"   Server URL:  {config.summarizer.server_url}")
print(f"   Temperature: {config.summarizer.temperature}")
print(f"   Top P:       {config.summarizer.top_p}")
print(f"   Max Parallel:{config.summarizer.max_parallelization}")

### 2. Customize Summarization Prompts

Customize the prompt to change the style and focus of summaries by passing prompts during `NvidiaRAGIngestor` initialization.

This is the **recommended approach** for library mode - pass prompts directly to the constructor for clean, instance-specific configuration.

> **Note**: Summarization is handled by `NvidiaRAGIngestor`, so prompts for summarization should be passed to `NvidiaRAGIngestor`, not `NvidiaRAG`.

In [None]:
# Define custom prompts as a dictionary
custom_prompts = {
    "document_summary_prompt": {
        "system": "/no_think",
        "human": """You are a documentation specialist.

Create a clear, summary that:
1. Identifies the main topic and purpose
2. Lists key concepts or features
3. Highlights important procedures or steps  
4. Notes any warnings or critical information

Keep the summary concise.

Text to summarize:
{document_text}

Summary:"""
    }
}

# Create NvidiaRAGIngestor instance with custom prompts (Recommended Approach)
# The prompts are merged with defaults - only specified keys are overridden
ingestor_custom = NvidiaRAGIngestor(config=config, prompts=custom_prompts)

print("‚úÖ NvidiaRAGIngestor initialized with custom prompts")
print("\nCustom prompt preview (first 200 chars):")
print(ingestor_custom.prompts["document_summary_prompt"]["human"][:200] + "...")

#### Alternative: Using a YAML File

You can also pass a path to a YAML file containing your custom prompts:

```python
# Using a YAML file path
ingestor_from_yaml = NvidiaRAGIngestor(config=config, prompts="custom_prompts.yaml")
```

The YAML file format should match the structure shown in the Docker Mode section below.


### 3. Configure Summary Options

In [None]:
summary_options = {
    # Page filtering: [[1, 10]] (ranges), [[-5, -1]] (last N pages), "even"/"odd"
    "page_filter": [[1, 10]],  # Only pages 1-10
    
    # Fast mode: Text-only extraction first, summary in seconds
    "shallow_summary": True,  # Default: False
    
    # Strategy: None (iterative/best), "single" (fastest/truncates), "hierarchical" (parallel/faster than iterative)
    "summarization_strategy": "hierarchical"  # Default: None
}


print(f"  ‚Ä¢ Page Filter: {summary_options['page_filter']}")
print(f"  ‚Ä¢ Shallow Summary: {summary_options['shallow_summary']}")
print(f"  ‚Ä¢ Strategy: {summary_options['summarization_strategy']}")

### 4. Complete Workflow Example

This section demonstrates the end-to-end workflow: create collection ‚Üí upload documents ‚Üí check status ‚Üí retrieve summary ‚Üí cleanup.

#### 4.1. Create Collection

In [None]:
# Create collection
collection_name = "test_summary"
response = ingestor.create_collection(
    collection_name=collection_name,
    vdb_endpoint="http://localhost:19530"
)
print(f"‚úÖ Collection response: {response}")

#### 4.2. Upload Documents

In [None]:
# Upload documents with summary options
result = await ingestor.upload_documents(
    filepaths=["../data/multimodal/functional_validation.pdf"],
    collection_name=collection_name,
    generate_summary=True,
    summary_options=summary_options,  # From previous cell
    blocking=False  # Don't wait, check status instead
)
print(f"‚úÖ Upload started: {result}")

#### 4.3. Check Status and Get Summary

In [None]:
# Check summary status
status = await retrieve_summary(
    collection_name=collection_name,
    file_name="functional_validation.pdf",
    wait=False  # Just check, don't wait
)
print(f"\nüìä Status: {status.get('status')}")
if status.get('status') == 'IN_PROGRESS':
    progress = status.get('progress', {})
    print(f"   Progress: Chunk {progress.get('current')}/{progress.get('total')}")

In [None]:
# Get summary (blocking - waits until complete)
summary_result = await retrieve_summary(
    collection_name=collection_name,
    file_name="functional_validation.pdf",
    wait=True,
    timeout=300
)

if summary_result.get('status') == 'SUCCESS':
    print(f"\n‚úÖ Summary:\n{summary_result.get('summary')}")
else:
    print(f"\n‚ùå {summary_result.get('status')}: {summary_result.get('message')}")

#### 4.4. Delete Collection

In [None]:
# Delete the test collection
response = ingestor.delete_collections(
    collection_names=[collection_name],
    vdb_endpoint="http://localhost:19530"
)
print(f"‚úÖ Delete response: {response}")

---

## Part 3: Docker Mode - Change Configuration via Environment Variables

When running in Docker mode, you configure the ingestor-server and rag-server containers via environment variables and REST APIs.

**Prerequisites:**
- If you're starting fresh with Part 3, first complete section **"2. Setting up the dependencies"** from Part 1 above to start all required services (Milvus, NV-Ingest, Redis)
- If you completed Part 1, these services are already running

### 1. Configure via Environment Variables

Configure the ingestor server by setting environment variables before startup. Adjust these values according to your requirements:

In [None]:
# Set environment variables in Python based on mode
if DEPLOYMENT_MODE == "cloud":
    os.environ["SUMMARY_LLM_SERVERURL"] = ""
    os.environ["LLM_SERVER_URL"] = ""
    os.environ["APP_EMBEDDINGS_SERVERURL"] = "https://integrate.api.nvidia.com/v1"
    print("‚úì Configured for NVIDIA cloud APIs")
else:
    os.environ["SUMMARY_LLM_SERVERURL"] = "nim-llm:8000"
    os.environ["LLM_SERVER_URL"] = "nim-llm:8000"
    os.environ["APP_EMBEDDINGS_SERVERURL"] = "nemotron-embedding-ms:8000/v1"
    print("‚úì Configured for on-prem NIMs")

os.environ["LOGLEVEL"] = "INFO"

print("Environment variables set for deployment mode")

In [None]:
%%bash
# Custom Summarization configuration
export SUMMARY_LLM="meta/llama-3.1-70b-instruct"
export SUMMARY_LLM_TEMPERATURE=0.2
export SUMMARY_LLM_TOP_P=0.7
export SUMMARY_LLM_MAX_CHUNK_LENGTH=9000
export SUMMARY_CHUNK_OVERLAP=400
export SUMMARY_MAX_PARALLELIZATION=20

# start container
docker compose -f ../deploy/compose/docker-compose-ingestor-server.yaml up -d ingestor-server
docker compose -f ../deploy/compose/docker-compose-rag-server.yaml up -d rag-server

echo "Configure summarization parameters and start container"

### 2. Custom Prompts via YAML File

To change prompts in Docker mode, create a custom `prompt.yaml` file and set the `PROMPT_CONFIG_FILE` environment variable.

#### 2.1. Create Custom Prompt File

Create your custom prompt file (e.g., `/home/user/my_custom_prompt.yaml`):

In [None]:
# Define custom prompt configuration
custom_prompt_content = """document_summary_prompt:
  system: |
    /no_think
  
  human: |
    You are a technical documentation specialist.
    
    Create a clear, technical summary that:
    1. Identifies the main topic and purpose
    2. Lists key technical concepts or features
    3. Highlights important procedures or steps
    4. Notes any warnings or critical information
    
    Keep the summary concise and technical.
    
    Text to summarize:
    {document_text}
    
    Technical Summary:

iterative_summary_prompt:
  system: |
    /no_think
  
  human: |
    You are a technical documentation specialist combining summaries.
    
    Previous Summary:
    {previous_summary}
    
    New chunk:
    {new_chunk}
    
    Create an updated technical summary combining both.
"""

# Write the custom prompt file
import os
custom_prompt_path = os.path.expanduser("~/my_custom_prompt.yaml")
with open(custom_prompt_path, "w") as f:
    f.write(custom_prompt_content)

print(f"Custom prompt file created at: {custom_prompt_path}")

#### 2.2. Set Environment Variable and Restart

Set the environment variable and restart the container:

In [None]:
%%bash
# Set path to custom prompt file
export PROMPT_CONFIG_FILE=~/my_custom_prompt.yaml

# Restart the container (no rebuild needed)
# Note: This inherits NGC_API_KEY from the parent shell if it was set via os.environ earlier
docker compose -f ../deploy/compose/docker-compose-ingestor-server.yaml up -d ingestor-server

echo "Ingestor server restarted with custom prompts from: $PROMPT_CONFIG_FILE"

**Key Points:**
- The service will merge your custom prompts with the defaults
- Only the prompts you specify will be overridden - all others remain unchanged
- No container rebuild is required, just restart with the new environment variable!

For more details, see the prompt customization documentation.

### 3. Using Ingestor Server REST APIs

When running in Docker mode, you interact with the ingestor server via REST APIs. Here's the complete workflow for document summarization using APIs.

#### Prerequisites
- Ensure ingestor-server and rag-server containers are running
- Replace `localhost` with actual IP if hosted on another system

In [None]:
# Install Dependencies
!uv pip install aiohttp

In [None]:
import json
import os
import aiohttp

# Setup base configuration
INGESTOR_BASE_URL = "http://localhost:8082"
RAG_BASE_URL = "http://localhost:8081"


async def print_response(response):
    """Helper to print API response."""
    try:
        response_json = await response.json()
        print(json.dumps(response_json, indent=2))
    except aiohttp.ClientResponseError:
        print(await response.text())

#### 3.1. Health Check

In [None]:
async def check_health():
    """Check ingestor server health."""
    url = f"{INGESTOR_BASE_URL}/v1/health"
    params = {"check_dependencies": "True"}
    async with aiohttp.ClientSession() as session:
        async with session.get(url, params=params) as response:
            await print_response(response)

await check_health()

#### 3.2. Create Collection

In [None]:
async def create_collection(collection_name: str):
    """Create a collection for document storage."""
    data = {
        "collection_name": collection_name,
        "metadata_schema": []
    }
    
    headers = {"Content-Type": "application/json"}
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.post(
                f"{INGESTOR_BASE_URL}/v1/collection", 
                json=data, 
                headers=headers
            ) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Create collection
await create_collection(collection_name="test_summary_api")

#### 3.3. Upload Documents with Summary Options

In [None]:
async def upload_with_summary(collection_name: str, filepaths: list):
    """Upload documents and generate summaries."""
    
    # Configure summary options
    data = {
        "collection_name": collection_name,
        "blocking": False,  # Non-blocking upload
        "split_options": {"chunk_size": 512, "chunk_overlap": 150},
        "generate_summary": True,  # Enable summary generation
        "summary_options": {
            "page_filter": [[1, 10], [-5, -1]],  # First 10 and last 5 pages
            "shallow_summary": True,  # Fast text-only extraction
            "summarization_strategy": "single"  # fastest strategy other available: "hierarchical", None(iterative)
        }
    }
    
    form_data = aiohttp.FormData()
    for file_path in filepaths:
        form_data.add_field(
            "documents",
            open(file_path, "rb"),
            filename=os.path.basename(file_path),
            content_type="application/pdf",
        )
    
    form_data.add_field("data", json.dumps(data), content_type="application/json")
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.post(
                f"{INGESTOR_BASE_URL}/v1/documents", 
                data=form_data
            ) as response:
                await print_response(response)
                response_json = await response.json()
                return response_json.get("task_id")
        except aiohttp.ClientError as e:
            print(f"Error: {e}")
            return None

# Upload documents
task_id = await upload_with_summary(
    collection_name="test_summary_api",
    filepaths=["../data/multimodal/functional_validation.pdf"]
)
print(f"\n‚úÖ Upload task_id: {task_id}")

#### 3.4. Check Upload Status (Ingestor Server)

In [None]:
async def check_upload_status(task_id: str):
    """Check ingestion task status."""
    params = {"task_id": task_id}
    headers = {"Content-Type": "application/json"}
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(
                f"{INGESTOR_BASE_URL}/v1/status", 
                params=params, 
                headers=headers
            ) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Check status
if task_id:
    await check_upload_status(task_id=task_id)
else:
    print("No task_id available")

#### 3.5. Check Summary Status (RAG Server)

In [None]:
async def check_summary_status(collection_name: str, file_name: str):
    """Check summary generation status via RAG server."""
    params = {
        "collection_name": collection_name,
        "file_name": file_name,
        "blocking": "false"  # Just check status, don't wait
    }
    
    url = f"{RAG_BASE_URL}/v1/summary"
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(url, params=params) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Check summary status
await check_summary_status(
    collection_name="test_summary_api",
    file_name="functional_validation.pdf"
)

#### 3.6. Delete Collection

In [None]:
async def delete_collections(collection_names: list[str]):
    """Delete collections from the vector store."""
    url = f"{INGESTOR_BASE_URL}/v1/collections"
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.delete(url, json=collection_names) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Delete the test collection
await delete_collections(collection_names=["test_summary_api"])

---

## Summary of Available Configuration Options

### Summarizer Configuration Fields

| Field | Environment Variable | Default Value | Description |
|-------|---------------------|---------------|-------------|
| `model_name` | `SUMMARY_LLM` | `nvidia/llama-3.3-nemotron-super-49b-v1.5` | The LLM model used for summarization |
| `server_url` | `SUMMARY_LLM_SERVERURL` | (empty) | Server URL for custom model hosting |
| `temperature` | `SUMMARY_LLM_TEMPERATURE` | `0.0` | Controls randomness (0.0-1.0) |
| `top_p` | `SUMMARY_LLM_TOP_P` | `1.0` | Nucleus sampling parameter (0.0-1.0) |
| `max_chunk_length` | `SUMMARY_LLM_MAX_CHUNK_LENGTH` | `9000` | Maximum chunk size in tokens |
| `chunk_overlap` | `SUMMARY_CHUNK_OVERLAP` | `400` | Overlap between chunks in tokens |

### Prompt Template Variables

- **document_summary_prompt**: Use `{document_text}` variable
- **iterative_summary_prompt**: Use `{previous_summary}` and `{new_chunk}` variable

**Note:** Changes made in library mode take effect immediately without restarting any services. Changes in Docker mode require a container restart but no rebuild.