# Document Summarization Customization Guide

This notebook demonstrates how to customize the document summarization feature in NVIDIA RAG.

## Two Modes of Operation

- **Library Mode**: Programmatic configuration changes in Python notebooks/scripts
- **Docker Mode**: Configuration via environment variables and config files

## üìä Summarization Pipeline Architecture

The diagram below shows how document summarization integrates into the complete RAG pipeline:

![Summarization Pipeline Architecture](../docs/assets/summarization_flow_diagram.png)

The summarization workflow that this notebook focuses on. You'll learn to customize:

- **Page Filtering**: Select specific pages using ranges, negative indexing, or even/odd patterns
- **Shallow vs Full Extraction**: Fast text-only OR comprehensive multimodal processing
- **Summarization Strategy**: Choose between Single (fastest), Hierarchical (balanced), or Iterative (best quality - default)
    - **Single**: Merge all content, chunk by configured size, and summarize only the first chunk (fastest, one LLM call)
    - **Hierarchical**: Tree-based summarization - summarize all chunks, merge summaries until they fit chunk size, repeat recursively until reaching one final summary (balanced speed/quality)
    - **Iterative (default)**: Process chunks sequentially with context refinement from previous summaries (best quality, N sequential LLM calls)
- **Token-based Chunking**: 9000 tokens per chunk with 400 token overlap
- **Real-time Status Tracking**: Monitor progress via Redis with chunk-level updates

## Part 1: Library Mode 

---

### Step 1: Setup before using library mode

## Installation guide for python package

Before running the cells below, follow these steps in your terminal from the project root directory to install the python package in your environment and launch this notebook:

> **Note**: Python version **3.12 or higher** is supported.

```bash
# 1. Install Python >= 3.12 (e.g., Python 3.13) and its development headers
    sudo add-apt-repository ppa:deadsnakes/ppa
    sudo apt update
    sudo apt install python3.12
    sudo apt-get install python3.12-dev

# 2. Install uv
Follow instruction from https://docs.astral.sh/uv/getting-started/installation/

# 3. Create a virtual environment with a supported Python version (>= 3.12)
uv venv --python=python3.12

# 2. Activate the virtual environment
source .venv/bin/activate

# 3. (Option 1) Build the wheel from source and install the Nvidia RAG wheel
uv build
uv pip install dist/nvidia_rag-2.4.0.dev0-py3-none-any.whl[all]

# 4. (Option 2) Install the package in editable (development) mode from source
uv pip install -e .[all]

# 5. (Option 3) Install the prebuilt wheel file from pypi. This does not require you to clone the repo.
uv pip install nvidia-rag[all]

# 5. Start the notebook server and open this notebook in browser 
uv pip install jupyterlab
jupyter lab --allow-root --ip=0.0.0.0 --NotebookApp.token='' --port=8889 --no-browser &
Open http://<workstation_ip>:8889/lab/tree/notebooks

# 6. Optional: Install just RAG and Ingestor dependencies
uv pip install dist/nvidia_rag-2.4.0.dev0-py3-none-any.whl[rag]
uv pip install dist/nvidia_rag-2.4.0.dev0-py3-none-any.whl[ingest]
```

##### üìù **Note:**

- Installing with `uv pip install -e .[all]` allows you to make live edits to the `nvidia_rag` source code and have those changes reflected without reinstalling the package.
- **After making changes to the source code, you need to:
  - Restart the kernel of your notebook server
  - Re-execute the cells `Setup the default configurations` under `Setting up the dependencies` and `Import the packages` under `API usage examples`

#### Verify the installation
The location of the package shown in the output of this command should be inside the virtual environment.

Location: `<workspace_path>/rag/.venv/lib/python3.12/site-packages`

In [None]:
!uv pip show nvidia_rag | grep Location

---

## Setting up the dependencies

After the environment for the python package is setup we now launch all the dependent services and NIMs the pipeline depends on.
Fulfill the [prerequisites here](../docs/deploy-docker-self-hosted.md) to setup docker on your system.

### 1. Setup the default configurations

In [None]:
!uv pip install python-dotenv
import os
from getpass import getpass

from dotenv import load_dotenv

Provide your NGC_API_KEY after executing the cell below. You can obtain a key by following steps [here](../docs/api-key.md).

In [None]:
# del os.environ['NVIDIA_API_KEY']  ## delete key and reset if needed
if os.environ.get("NGC_API_KEY", "").startswith("nvapi-"):
    print("Valid NGC_API_KEY already in environment. Delete to reset")
else:
    candidate_api_key = getpass("NVAPI Key (starts with nvapi-): ")
    assert candidate_api_key.startswith("nvapi-"), (
        f"{candidate_api_key[:5]}... is not a valid key"
    )
    os.environ["NGC_API_KEY"] = candidate_api_key

Login to nvcr.io which is needed for pulling the containers of dependencies

In [None]:
!echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin

Load the default values for all the configurations

In [None]:
load_dotenv(dotenv_path=".env_library", override=True)

*üí° **Tip:***: You can override any default values of configurations defined in `.env_library` at runtime by using `os.environ` in the notebook. Reimport the `nvidia_rag` package and restart the  Nvidia Ingest runtime to take in the updated configurations.

In [None]:
# Example
# os.environ["ENV_VAR_NAME"]="ENV_VAR_VALUE"

### 2. Setup the Milvus vector DB services
By default milvus uses GPU Indexing. Ensure you have provided correct GPU ID.
Note: If you don't have a GPU available, you can switch to CPU-only Milvus by following the instructions in [milvus-configuration.md](../docs/milvus-configuration.md).

In [None]:
os.environ["VECTORSTORE_GPU_DEVICE_ID"] = "0"

In [None]:
!docker compose -f ../deploy/compose/vectordb.yaml up -d

### 3. Setup the NIMs

#### Option 1: Deploy on-prem models

Move to Option 2 if you are interested in using cloud models.

Ensure you meet [the hardware requirements](../docs/support-matrix.md). By default the NIMs are configured to use 2xH100.

In [None]:
# Create the model cache directory
!mkdir -p ~/.cache/model-cache

In [None]:
# Set the MODEL_DIRECTORY environment variable in the Python kernel
import os

os.environ["MODEL_DIRECTORY"] = os.path.expanduser("~/.cache/model-cache")
print("MODEL_DIRECTORY set to:", os.environ["MODEL_DIRECTORY"])

In [None]:
# Configure GPU IDs for the various microservices if needed
os.environ["EMBEDDING_MS_GPU_ID"] = "0"
os.environ["RANKING_MS_GPU_ID"] = "0"
os.environ["YOLOX_MS_GPU_ID"] = "0"
os.environ["YOLOX_GRAPHICS_MS_GPU_ID"] = "0"
os.environ["YOLOX_TABLE_MS_GPU_ID"] = "0"
os.environ["OCR_MS_GPU_ID"] = "0"
os.environ["LLM_MS_GPU_ID"] = "1"

In [None]:
# ‚ö†Ô∏è Deploying NIMs - This may take a while as models download. If kernel times out, just rerun this cell.
!USERID=$(id -u) docker compose -f ../deploy/compose/nims.yaml up -d

In [None]:
# Watch the status of running containers (run this cell repeatedly or in a terminal)
!docker ps

Ensure all the below are running and healthy before proceeding further
```output
NAMES                           STATUS
nemoretriever-ranking-ms        Up ... (healthy)
compose-page-elements-1         Up ...
compose-paddle-1                Up ...
compose-graphic-elements-1      Up ...
compose-table-structure-1       Up ...
nemoretriever-embedding-ms      Up ... (healthy)
nim-llm-ms                      Up ... (healthy)
```

#### Option 2: Using Nvidia Hosted models

In [None]:
from nvidia_rag.utils.configuration import NvidiaRAGConfig

# Get the config object for runtime modifications
# We'll override the default localhost URLs to use NVIDIA hosted APIs
config = NvidiaRAGConfig.from_yaml("config.yaml")

# Configure models to use NVIDIA hosted APIs
config.llm.model_name = "nvidia/llama-3.3-nemotron-super-49b-v1.5"
config.llm.server_url = ""  # Empty = use NVIDIA hosted API

config.embeddings.model_name = "nvidia/llama-3.2-nv-embedqa-1b-v2"
config.embeddings.server_url = ""  # Empty = use NVIDIA hosted API

config.ranking.model_name = "nvidia/llama-3.2-nv-rerankqa-1b-v2"
config.ranking.server_url = "https://ai.api.nvidia.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking/v1"

In [None]:
os.environ["OCR_HTTP_ENDPOINT"] = "https://ai.api.nvidia.com/v1/cv/baidu/paddleocr"
os.environ["OCR_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-page-elements-v2"
)
os.environ["YOLOX_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_GRAPHIC_ELEMENTS_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-graphic-elements-v1"
)
os.environ["YOLOX_GRAPHIC_ELEMENTS_INFER_PROTOCOL"] = "http"
os.environ["YOLOX_TABLE_STRUCTURE_HTTP_ENDPOINT"] = (
    "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-table-structure-v1"
)
os.environ["YOLOX_TABLE_STRUCTURE_INFER_PROTOCOL"] = "http"

### 4. Setup the Nvidia Ingest runtime and redis service

In [None]:
!docker compose -f ../deploy/compose/docker-compose-ingestor-server.yaml up nv-ingest-ms-runtime redis -d

### 5. Load optional profiles if needed

In [None]:
# Load accuracy profile
# load_dotenv(dotenv_path='../deploy/compose/accuracy_profile.env', override=True)

# OR load perf profile
# load_dotenv(dotenv_path='../deploy/compose/perf_profile.env', override=True)

---
# Summary Feature usage example

After setting up the python package and starting all dependent services, finally we can execute some snippets showcasing all summary related functionalities offered by the `nvidia_rag` package.

## Set logging level
First let's set the required logging level. Set to INFO for displaying basic important logs. Set to DEBUG for full verbosity.

In [None]:
import logging
import os

# Set the log level via environment variable before importing nvidia_rag
# This ensures the package respects our log level setting
LOGLEVEL = logging.WARNING  # Set to INFO, DEBUG, WARNING or ERROR
os.environ["LOGLEVEL"] = logging.getLevelName(LOGLEVEL)

# Configure logging
logging.basicConfig(level=LOGLEVEL, force=True)

# Set log levels for specific loggers after package import
for name in logging.root.manager.loggerDict:
    if name == "nvidia_rag" or name.startswith("nvidia_rag."):
        logging.getLogger(name).setLevel(LOGLEVEL)
    if name == "nv_ingest_client" or name.startswith("nv_ingest_client."):
        logging.getLogger(name).setLevel(LOGLEVEL)

## Import the packages
You can import both or either one based on your requirements. `NvidiaRAG()` exposes APIs to interact with the uploaded documents or retrieve summaries and `NvidiaRAGIngestor()` exposes APIs for document upload, management and summary generation.

In [None]:
from nvidia_rag import NvidiaRAG, NvidiaRAGIngestor
from nvidia_rag.utils.configuration import NvidiaRAGConfig
from nvidia_rag.rag_server.response_generator import retrieve_summary

# Get the configuration object
CONFIG = NvidiaRAGConfig.from_yaml("config.yaml")

rag = NvidiaRAG(config=CONFIG)
ingestor = NvidiaRAGIngestor(config=CONFIG)

### Step 2: View Default Summarizer LLM Settings

Let's see what LLM model and parameters are used by default for summarization.

In [None]:
print("=" * 70)
print("DEFAULT SUMMARIZER LLM CONFIGURATION")
print("=" * 70)
print(f"Model:             {CONFIG.summarizer.model_name}")
print(f"Server URL:        {CONFIG.summarizer.server_url}")
print(f"Temperature:       {CONFIG.summarizer.temperature}")
print(f"Top P:             {CONFIG.summarizer.top_p}")
print(f"Max Parallel:      {CONFIG.summarizer.max_parallelization}")
print(f"Max Chunk Length:  {CONFIG.summarizer.max_chunk_length}")
print(f"Chunk Overlap:     {CONFIG.summarizer.chunk_overlap}")
print("=" * 70)

### Step 3: View Default Summarization Prompts

The prompt template controls how the LLM generates summaries. Let's see the default prompts.

In [None]:
from nvidia_rag.utils.llm import get_prompts
import json
# Get all prompts
prompts = get_prompts()

print("=" * 70)
print("DEFAULT DOCUMENT SUMMARY PROMPT")
print("=" * 70)
print(json.dumps(prompts["document_summary_prompt"], indent=2))
print("=" * 70)

print("\n" + "=" * 70)
print("DEFAULT ITERATIVE SUMMARY PROMPT")
print("=" * 70)
print(json.dumps(prompts["iterative_summary_prompt"], indent=2))
print("=" * 70)

This will display the default prompts used for:
- **document_summary_prompt**: Summarizing a single document or chunk (used for full multimodal extraction)
- **shallow_summary_prompt**: Summarizing with fast text-only extraction (used when `shallow_summary: true`)
- **iterative_summary_prompt**: Combining multiple summaries for large documents

The system automatically selects the appropriate prompt based on extraction mode and document size.

## Part 2: Library Mode - Change Configuration

Now let's see how to modify these settings programmatically in library mode.

### Method 1: Change LLM Model and Parameters

You can change the model and sampling parameters dynamically.

In [None]:
# Change to a different model (e.g., Llama 3.1 70B)
CONFIG.summarizer.model_name = "meta/llama-3.1-70b-instruct"
CONFIG.summarizer.server_url = ""

# Lower temperature for more deterministic, focused summaries
CONFIG.summarizer.temperature = 0.2

# Adjust top_p for nucleus sampling
CONFIG.summarizer.top_p = 0.7

# Configure global rate limiting (max parallel summary tasks across all workers)
# Prevents overwhelming GPU/API with too many concurrent LLM calls
CONFIG.summarizer.max_parallelization = 10  # Default: 20

print("‚úÖ Updated Summarizer Configuration:")
print(f"   Model:       {CONFIG.summarizer.model_name}")
print(f"   Server URL:  {CONFIG.summarizer.server_url}")
print(f"   Temperature: {CONFIG.summarizer.temperature}")
print(f"   Top P:       {CONFIG.summarizer.top_p}")
print(f"   Max Parallel:{CONFIG.summarizer.max_parallelization}")

### Method 2: Change Summarization Prompt

Customize the prompt to change the style and focus of summaries.

In [None]:
# Define a custom summary prompt
summary_prompt = {
    "system": "/no_think",
    "human": """You are a documentation specialist.

Create a clear, summary that:
1. Identifies the main topic and purpose
2. Lists key concepts or features
3. Highlights important procedures or steps  
4. Notes any warnings or critical information

Keep the summary concise.

Text to summarize:
{document_text}

Summary:"""
}

# Get the prompts dictionary and apply the custom prompt
from nvidia_rag.utils.llm import get_prompts

prompts = get_prompts()
prompts["document_summary_prompt"] = summary_prompt

print("‚úÖ Custom document summary prompt applied")
print("\nNew prompt preview (first 200 chars):")
print(prompts["document_summary_prompt"]["human"][:200] + "...")

### Method 3: Configure Summary Options

In [None]:
summary_options = {
    # Page filtering: [[1, 10]] (ranges), [[-5, -1]] (last N pages), "even"/"odd"
    "page_filter": [[1, 10]],  # Only pages 1-10
    
    # Fast mode: Text-only extraction first, summary in seconds
    "shallow_summary": True,  # Default: False
    
    # Strategy: None (iterative/best), "single" (fastest/truncates), "hierarchical" (parallel/faster than iterative)
    "summarization_strategy": "hierarchical"  # Default: None
}


print(f"  ‚Ä¢ Page Filter: {summary_options['page_filter']}")
print(f"  ‚Ä¢ Shallow Summary: {summary_options['shallow_summary']}")
print(f"  ‚Ä¢ Strategy: {summary_options['summarization_strategy']}")

### COMPLETE WORKFLOW: Upload ‚Üí Check Status ‚Üí Get Summary

### Create Collection

In [None]:
# Create collection
collection_name = "test_summary"
response = ingestor.create_collection(
    collection_name=collection_name,
    vdb_endpoint="http://localhost:19530"
)
print(f"‚úÖ Collection response: {response}")

### Uploading Documents

In [None]:
# Step 2: Upload documents with summary options
result = await ingestor.upload_documents(
    filepaths=["../data/multimodal/functional_validation.pdf"],
    collection_name=collection_name,
    generate_summary=True,
    summary_options=summary_options,  # From previous cell
    blocking=False  # Don't wait, check status instead
)
print(f"‚úÖ Upload started: {result}")

### Checking Status and getting Summary

In [None]:
# Step 3: Check summary status
status = await retrieve_summary(
    collection_name=collection_name,
    file_name="functional_validation.pdf",
    wait=False  # Just check, don't wait
)
print(f"\nüìä Status: {status.get('status')}")
if status.get('status') == 'IN_PROGRESS':
    progress = status.get('progress', {})
    print(f"   Progress: Chunk {progress.get('current')}/{progress.get('total')}")

In [None]:
# Step 4: Get summary (blocking - waits until complete)
summary_result = await retrieve_summary(
    collection_name=collection_name,
    file_name="functional_validation.pdf",
    wait=True,
    timeout=300
)

if summary_result.get('status') == 'SUCCESS':
    print(f"\n‚úÖ Summary:\n{summary_result.get('summary')}")
else:
    print(f"\n‚ùå {summary_result.get('status')}: {summary_result.get('message')}")

### Delete Collection

In [None]:
# Delete the test collection
response = ingestor.delete_collections(
    collection_names=[collection_name],
    vdb_endpoint="http://localhost:19530"
)
print(f"‚úÖ Delete response: {response}")

---

## Part 3: Docker Mode - Change Configuration via Environment Variables

When running in Docker mode (default), you configure summarization via environment variables.

### Method 1: Set Environment Variables

Configure the ingestor server by exporting environment variables before startup. Adjust these values according to your requirements:

```bash
export SUMMARY_LLM="meta/llama-3.1-70b-instruct"
export SUMMARY_LLM_SERVERURL=""
export SUMMARY_LLM_TEMPERATURE=0.2
export SUMMARY_LLM_TOP_P=0.7
export SUMMARY_LLM_MAX_CHUNK_LENGTH=9000
export SUMMARY_CHUNK_OVERLAP=400
export SUMMARY_MAX_PARALLELIZATION=20

docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
```

### Method 2: Custom Prompts in Docker Mode

To change prompts in Docker mode, create a custom `prompt.yaml` file and set the `PROMPT_CONFIG_FILE` environment variable.

#### Step 1: Create Custom Prompt File

Create your custom prompt file (e.g., `/home/user/my_custom_prompt.yaml`):


```yaml# custom_prompt.yaml
document_summary_prompt:
  system: |
    /no_think
  
  human: |
    You are a technical documentation specialist.
    
    Create a clear, technical summary that:
    1. Identifies the main topic and purpose
    2. Lists key technical concepts or features
    3. Highlights important procedures or steps
    4. Notes any warnings or critical information
    
    Keep the summary concise and technical.
    
    Text to summarize:
    {document_text}
    
    Technical Summary:

iterative_summary_prompt:
  system: |
    /no_think
  
  human: |
    You are a technical documentation specialist combining summaries.
    
    Previous Summary:
    {previous_summary}
    
    New chunk:
    {new_chunk}
    
    Create an updated technical summary combining both.
```

#### Step 2: Set Environment Variable and Restart

For the ingestor-server, set the environment variable:

```
export PROMPT_CONFIG_FILE=/home/user/my_custom_prompt.yaml

# Restart the container (no rebuild needed)
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
```
**Key Points:**
- The service will merge your custom prompts with the defaults
- Only the prompts you specify will be overridden - all others remain unchanged
- No container rebuild is required, just restart with the new environment variable!

For more details, see the prompt customization documentation.

---

### Method 3: Using Ingestor Server APIs (Docker Mode)

When running in Docker mode, you interact with the ingestor server via REST APIs. Here's the complete workflow for document summarization using APIs.

#### Prerequisites
- Ensure ingestor-server and rag-server containers are running
- Replace `localhost` with actual IP if hosted on another system

In [None]:
# Install Dependencies
!pip install aiohttp

In [None]:
import json
import os
import aiohttp

# Setup base configuration
INGESTOR_BASE_URL = "http://localhost:8082"
RAG_BASE_URL = "http://localhost:8081"


async def print_response(response):
    """Helper to print API response."""
    try:
        response_json = await response.json()
        print(json.dumps(response_json, indent=2))
    except aiohttp.ClientResponseError:
        print(await response.text())

#### Step 1: Health Check

In [None]:
async def check_health():
    """Check ingestor server health."""
    url = f"{INGESTOR_BASE_URL}/v1/health"
    params = {"check_dependencies": "True"}
    async with aiohttp.ClientSession() as session:
        async with session.get(url, params=params) as response:
            await print_response(response)

await check_health()

#### Step 2: Create Collection

In [None]:
async def create_collection(collection_name: str, embedding_dimension: int = 2048):
    """Create a collection for document storage."""
    data = {
        "collection_name": collection_name,
        "embedding_dimension": embedding_dimension,
        "metadata_schema": []
    }
    
    headers = {"Content-Type": "application/json"}
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.post(
                f"{INGESTOR_BASE_URL}/v1/collection", 
                json=data, 
                headers=headers
            ) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Create collection
await create_collection(collection_name="test_summary_api")

#### Step 3: Upload Documents with Summary Options

In [None]:
async def upload_with_summary(collection_name: str, filepaths: list):
    """Upload documents and generate summaries."""
    
    # Configure summary options
    data = {
        "collection_name": collection_name,
        "blocking": False,  # Non-blocking upload
        "split_options": {"chunk_size": 512, "chunk_overlap": 150},
        "generate_summary": True,  # Enable summary generation
        "summary_options": {
            "page_filter": [[1, 10], [-5, -1]],  # First 10 and last 5 pages
            "shallow_summary": True,  # Fast text-only extraction
            "summarization_strategy": "single"  # fastest strategy other available: "hierarchical", None(iterative)
        }
    }
    
    form_data = aiohttp.FormData()
    for file_path in filepaths:
        form_data.add_field(
            "documents",
            open(file_path, "rb"),
            filename=os.path.basename(file_path),
            content_type="application/pdf",
        )
    
    form_data.add_field("data", json.dumps(data), content_type="application/json")
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.post(
                f"{INGESTOR_BASE_URL}/v1/documents", 
                data=form_data
            ) as response:
                await print_response(response)
                response_json = await response.json()
                return response_json.get("task_id")
        except aiohttp.ClientError as e:
            print(f"Error: {e}")
            return None

# Upload documents
task_id = await upload_with_summary(
    collection_name="test_summary_api",
    filepaths=["../data/multimodal/functional_validation.pdf"]
)
print(f"\n‚úÖ Upload task_id: {task_id}")

#### Step 4: Check Upload Status (Ingestor Server)

In [None]:
async def check_upload_status(task_id: str):
    """Check ingestion task status."""
    params = {"task_id": task_id}
    headers = {"Content-Type": "application/json"}
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(
                f"{INGESTOR_BASE_URL}/v1/status", 
                params=params, 
                headers=headers
            ) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Check status
if task_id:
    await check_upload_status(task_id=task_id)
else:
    print("No task_id available")

#### Step 5: Check Summary Status (RAG Server)

In [None]:
async def check_summary_status(collection_name: str, file_name: str):
    """Check summary generation status via RAG server."""
    params = {
        "collection_name": collection_name,
        "file_name": file_name,
        "blocking": "false"  # Just check status, don't wait
    }
    
    url = f"{RAG_BASE_URL}/v1/summary"
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(url, params=params) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Check summary status
await check_summary_status(
    collection_name="test_summary_api",
    file_name="functional_validation.pdf"
)

#### Step 6: Docker/API Mode - Delete Collection

In [None]:
async def delete_collections(collection_names: list[str]):
    """Delete collections from the vector store."""
    url = f"{INGESTOR_BASE_URL}/v1/collections"
    
    async with aiohttp.ClientSession() as session:
        try:
            async with session.delete(url, json=collection_names) as response:
                await print_response(response)
        except aiohttp.ClientError as e:
            print(f"Error: {e}")

# Delete the test collection
await delete_collections(collection_names=["test_summary_api"])

## Summary of Available Configuration Options

### Summarizer Configuration Fields

| Field | Environment Variable | Default Value | Description |
|-------|---------------------|---------------|-------------|
| `model_name` | `SUMMARY_LLM` | `nvidia/llama-3.3-nemotron-super-49b-v1.5` | The LLM model used for summarization |
| `server_url` | `SUMMARY_LLM_SERVERURL` | (empty) | Server URL for custom model hosting |
| `temperature` | `SUMMARY_LLM_TEMPERATURE` | `0.0` | Controls randomness (0.0-1.0) |
| `top_p` | `SUMMARY_LLM_TOP_P` | `1.0` | Nucleus sampling parameter (0.0-1.0) |
| `max_chunk_length` | `SUMMARY_LLM_MAX_CHUNK_LENGTH` | `9000` | Maximum chunk size in tokens |
| `chunk_overlap` | `SUMMARY_CHUNK_OVERLAP` | `400` | Overlap between chunks in tokens |

### Prompt Template Variables

- **document_summary_prompt**: Use `{document_text}` variable
- **iterative_summary_prompt**: Use `{previous_summary}` and `{new_chunk}` variables

---

**Note:** Changes made in library mode take effect immediately without restarting any services. Changes in Docker mode require a container restart but no rebuild.