Source: This repository is based on the NVIDIA AI Blueprints RAG Blueprint. Refer to the upstream repository for the full feature list, NIM details, deployment options, and official documentation.
Retrieval-Augmented Generation (RAG) combines the reasoning power of large language models with real-time retrieval from trusted data sources, grounding AI responses in your own knowledge and reducing hallucinations.
On top of the upstream NVIDIA RAG Blueprint, this repository adds hardware-specific getting-started guides and management scripts for getting up and running quickly without needing to read the full documentation.
| Addition | Description |
|---|---|
| docs/getting-started-nvidia-hosted.md | Guide for Core i9 + RTX 4070 — NVIDIA API Catalog NIMs + local cuVS GPU vector database |
| scripts/rag-nvidia-hosted.sh | Management script for the RTX 4070 setup |
| docs/getting-started-mac-m4.md | Guide for MacBook Pro M4 Pro (Apple Silicon) — NVIDIA API Catalog NIMs + CPU Milvus, no GPU required |
| scripts/rag-mac.sh | Management script for the Mac setup |
All guides share the same command pattern:
export NGC_API_KEY="nvapi-..."
./scripts/rag-<platform>.sh setup # check prerequisites, NGC login
./scripts/rag-<platform>.sh start # deploy, wait for health, print URLs
./scripts/rag-<platform>.sh status # containers, API health, resource usage
./scripts/rag-<platform>.sh logs # tail logs (or: logs <service-name>)
./scripts/rag-<platform>.sh stop # stop, keep data
./scripts/rag-<platform>.sh clean # stop, remove all datagit clone https://github.com/NVIDIA-AI-Blueprints/rag.git && cd rag
export NGC_API_KEY="nvapi-..."
chmod +x scripts/rag-nvidia-hosted.sh
./scripts/rag-nvidia-hosted.sh setup && ./scripts/rag-nvidia-hosted.sh startSee docs/getting-started-nvidia-hosted.md.
git clone https://github.com/NVIDIA-AI-Blueprints/rag.git && cd rag
export NGC_API_KEY="nvapi-..."
chmod +x scripts/rag-mac.sh
./scripts/rag-mac.sh setup && ./scripts/rag-mac.sh startSee docs/getting-started-mac-m4.md.
Then open http://localhost:8090 in your browser.
Both deployments route all AI workloads (LLM, embeddings, reranker, OCR) to NVIDIA-hosted NIMs. The difference is the local vector database: GPU_CAGRA on the RTX 4070, HNSW (CPU) on Mac.
For deployment options, configuration, customization, and Kubernetes/Helm guides, refer to the upstream:
- NVIDIA RAG Blueprint — full docs
- Self-hosted NIM deployment
- NVIDIA API Catalog + cuVS deployment
- Helm/Kubernetes deployment
- Troubleshooting
- NVIDIA NeMo Retriever Delivers Accurate Multimodal PDF Data Extraction 15x Faster
- Finding the Best Chunking Strategy for Accurate AI Responses
To open a GitHub issue or pull request, see the contributing guidelines.
This NVIDIA AI Blueprint is licensed under the Apache License, Version 2.0. This project downloads and installs additional third-party open source software projects and containers. Review the license terms of these open source projects before use.
Use of the models is governed by the NVIDIA AI Foundation Models Community License.
This blueprint is governed by the NVIDIA Software License Agreement and the Product Specific Terms for AI Products. Models are governed by the NVIDIA Community Model License. The NVIDIA RAG dataset is governed by the NVIDIA Asset License Agreement.
The following models built with Llama are governed by the Llama 3.2 Community License: nvidia/llama-3.2-nv-embedqa-1b-v2, nvidia/llama-3.2-nv-rerankqa-1b-v2, llama-3.2-nemoretriever-1b-vlm-embed-v1.
The llama-3.3-nemotron-super-49b-v1.5 model is governed by the Llama 3.3 Community License. Built with Llama.
Apache 2.0 applies to NVIDIA Ingest and the nemoretriever-page-elements-v2, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1, paddleocr, and nemoretriever-ocr-v1 models.
