NVIDIA RAG Blueprint

Source: This repository is based on the NVIDIA AI Blueprints RAG Blueprint. Refer to the upstream repository for the full feature list, NIM details, deployment options, and official documentation.

Retrieval-Augmented Generation (RAG) combines the reasoning power of large language models with real-time retrieval from trusted data sources, grounding AI responses in your own knowledge and reducing hallucinations.

What's Added Here

On top of the upstream NVIDIA RAG Blueprint, this repository adds hardware-specific getting-started guides and management scripts for getting up and running quickly without needing to read the full documentation.

Addition	Description
docs/getting-started-nvidia-hosted.md	Guide for Core i9 + RTX 4070 — NVIDIA API Catalog NIMs + local cuVS GPU vector database
scripts/rag-nvidia-hosted.sh	Management script for the RTX 4070 setup
docs/getting-started-mac-m4.md	Guide for MacBook Pro M4 Pro (Apple Silicon) — NVIDIA API Catalog NIMs + CPU Milvus, no GPU required
scripts/rag-mac.sh	Management script for the Mac setup

All guides share the same command pattern:

export NGC_API_KEY="nvapi-..."
./scripts/rag-<platform>.sh setup    # check prerequisites, NGC login
./scripts/rag-<platform>.sh start    # deploy, wait for health, print URLs
./scripts/rag-<platform>.sh status   # containers, API health, resource usage
./scripts/rag-<platform>.sh logs     # tail logs (or: logs <service-name>)
./scripts/rag-<platform>.sh stop     # stop, keep data
./scripts/rag-<platform>.sh clean    # stop, remove all data

Quick Start

RTX 4070 (Ubuntu) — GPU-accelerated vector DB

git clone https://github.com/NVIDIA-AI-Blueprints/rag.git && cd rag
export NGC_API_KEY="nvapi-..."
chmod +x scripts/rag-nvidia-hosted.sh
./scripts/rag-nvidia-hosted.sh setup && ./scripts/rag-nvidia-hosted.sh start

See docs/getting-started-nvidia-hosted.md.

Mac M4 Pro (Apple Silicon) — CPU vector DB, no GPU needed

git clone https://github.com/NVIDIA-AI-Blueprints/rag.git && cd rag
export NGC_API_KEY="nvapi-..."
chmod +x scripts/rag-mac.sh
./scripts/rag-mac.sh setup && ./scripts/rag-mac.sh start

See docs/getting-started-mac-m4.md.

Then open http://localhost:8090 in your browser.

Architecture

Both deployments route all AI workloads (LLM, embeddings, reranker, OCR) to NVIDIA-hosted NIMs. The difference is the local vector database: GPU_CAGRA on the RTX 4070, HNSW (CPU) on Mac.

Full Documentation

For deployment options, configuration, customization, and Kubernetes/Helm guides, refer to the upstream:

Blog Posts

Contributing

To open a GitHub issue or pull request, see the contributing guidelines.

License

This NVIDIA AI Blueprint is licensed under the Apache License, Version 2.0. This project downloads and installs additional third-party open source software projects and containers. Review the license terms of these open source projects before use.

Use of the models is governed by the NVIDIA AI Foundation Models Community License.

Terms of Use

This blueprint is governed by the NVIDIA Software License Agreement and the Product Specific Terms for AI Products. Models are governed by the NVIDIA Community Model License. The NVIDIA RAG dataset is governed by the NVIDIA Asset License Agreement.

The following models built with Llama are governed by the Llama 3.2 Community License: nvidia/llama-3.2-nv-embedqa-1b-v2, nvidia/llama-3.2-nv-rerankqa-1b-v2, llama-3.2-nemoretriever-1b-vlm-embed-v1. The llama-3.3-nemotron-super-49b-v1.5 model is governed by the Llama 3.3 Community License. Built with Llama. Apache 2.0 applies to NVIDIA Ingest and the nemoretriever-page-elements-v2, nemoretriever-table-structure-v1, nemoretriever-graphic-elements-v1, paddleocr, and nemoretriever-ocr-v1 models.

Name		Name	Last commit message	Last commit date
Latest commit History 480 Commits
.github		.github
.project		.project
ci		ci
data		data
deploy		deploy
docs		docs
examples		examples
frontend		frontend
hooks		hooks
notebooks		notebooks
scripts		scripts
skill-source		skill-source
src		src
tests		tests

.coderabbit.yaml		.coderabbit.yaml
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-3rd-party.txt		LICENSE-3rd-party.txt
LINTING.md		LINTING.md
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
uv.lock		uv.lock
variables.env		variables.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA RAG Blueprint

What's Added Here

Quick Start

RTX 4070 (Ubuntu) — GPU-accelerated vector DB

Mac M4 Pro (Apple Silicon) — CPU vector DB, no GPU needed

Architecture

Full Documentation

Blog Posts

Contributing

License

Terms of Use

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NVIDIA RAG Blueprint

What's Added Here

Quick Start

RTX 4070 (Ubuntu) — GPU-accelerated vector DB

Mac M4 Pro (Apple Silicon) — CPU vector DB, no GPU needed

Architecture

Full Documentation

Blog Posts

Contributing

License

Terms of Use

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages