Release v2.0.0

Overview

AI-Q v2.0.0 is a ground-up rewrite of the NVIDIA AI-Q Blueprint. The v1.x line provided a single deep research agent with PDF upload and a demo web application. v2.0.0 introduces a two-tier multi-agent architecture built on the NVIDIA NeMo Agent Toolkit (NAT), a new Next.js frontend, async job infrastructure, a pluggable knowledge layer, and built-in evaluation. The AI-Q NVIDIA Blueprint is an open reference example for building intelligent AI agents that connect to your enterprise data, reason using state-of-the-art models, and deliver trusted business insights.

AI-Q holds top positions on both the DeepResearch Bench and DeepResearch Bench II leaderboards. To reproduce those results, use the drb1 and drb2 branches, respectively.

Architecture

Two-tier research routing. A single-call Intent Classifier routes every query to the optimal path: instant meta responses, fast shallow research, or comprehensive deep research — eliminating unnecessary latency for simple queries.
LangGraph state machine orchestrator. The core workflow is a LangGraph StateGraph with explicit, testable routing and conversation checkpointing (in-memory, SQLite, or PostgreSQL).
Shallow Researcher agent. New bounded tool-calling agent optimized for speed with configurable tool-call budgets, context compaction, and a synthesis anchor that forces citation-backed answers when the budget is exhausted.
Deep Researcher agent. Rebuilt using the deepagents library with a three-role subagent architecture (Orchestrator, Planner, Researcher). Supports configurable research loop iterations, per-role LLM assignment, and structured multi-phase workflows: planning, iterative research, citation management, and final report generation.
Clarifier agent with HITL. New human-in-the-loop agent that gathers clarifications, generates structured research plans, and supports plan approval/rejection/feedback before deep research begins. Fully configurable and can be disabled.
Shallow-to-deep escalation. The shallow researcher can automatically escalate to deep research when it detects insufficient results, routing through the clarifier for plan approval.

API and Backend

Async Jobs API. New REST API (/v1/jobs/async/) for submitting, tracking, cancelling, and streaming research jobs. Supports custom job IDs, configurable expiry, and job artifact retrieval.
SSE streaming with event replay. Real-time Server-Sent Events for all agent execution events (LLM tokens, tool calls, artifacts, citations). Full reconnection support with event replay from any point — sub-10ms latency on PostgreSQL via LISTEN/NOTIFY.
Dask-based distributed execution. Deep research jobs run on a Dask cluster with configurable workers and threads, background heartbeats, stale job reaping, and cooperative cancellation.
PostgreSQL persistence. Job store, event store, LangGraph checkpoints, and document summaries all support PostgreSQL for production deployments. SQLite remains available for local development.
Pluggable agent registration. Custom agents can be registered and exposed through the async jobs API without modifying core code.

Knowledge Layer

Pluggable knowledge retrieval. Backend-agnostic knowledge layer with a factory/registry pattern. Swap between LlamaIndex (local ChromaDB) and Foundational RAG (hosted NVIDIA RAG Blueprint) without changing agent code.
Document ingestion pipeline. Async file upload with job tracking, status polling (UPLOADING → INGESTING → SUCCESS/FAILED), and collection management (create, delete, list, TTL cleanup).
Multimodal extraction. LlamaIndex backend supports VLM-powered image captioning and chart data extraction from PDFs, making visual content searchable alongside text.
Document summaries. Optional LLM-generated one-sentence summaries per document, injected into agent prompts so researchers understand available files before making tool calls.
Session-based collections. Each browser session gets an isolated collection with automatic 24-hour TTL cleanup.

Citation Verification

Deterministic citation verification pipeline. Every research response (shallow and deep) passes through post-processing that validates all citations against a source registry of actually-retrieved URLs using a five-level matching strategy (exact, truncation, prefix, child-path, query-subset). Includes report sanitization (shortened URLs, IP addresses, non-HTTP schemes) and a full audit trail of verification decisions.

Frontend

New Next.js web UI. Complete rewrite as a modern Next.js application with conversational chat interface, document upload, collection management, and real-time research progress visualization.
Optional OAuth authentication. OIDC-based authentication support with configurable providers and a REQUIRE_AUTH toggle.
Configurable file upload. Accepted file types, max file size, and max file count controllable via environment variables.

Observability

Multi-backend tracing. Built-in support for Phoenix (local trace visualization), LangSmith (LLM evaluation and prompt optimization), Weights & Biases Weave (experiment tracking with PII redaction), and a production-grade OpenTelemetry Collector exporter with configurable privacy redaction — all configurable through NAT YAML config or environment variables.

Evaluation

FreshQA benchmark. Built-in factuality evaluation on time-sensitive questions for measuring shallow researcher accuracy, runnable via the NAT evaluation harness (nat eval). Deep research benchmark reproduction is available on the dedicated drb1 and drb2 branches.

Deployment

Docker Compose stack. Production-ready three-service stack (backend, frontend, PostgreSQL) with multi-stage Dockerfile, dev/release build targets, and distroless runtime images running as non-root (UID 1000).
Helm chart for Kubernetes. Full Helm deployment with NGC registry support, Kubernetes secrets management, configurable resource limits, health checks, and Foundational RAG integration via internal service DNS.
Horizontal scaling. Stateless backend supports scaling behind a load balancer with shared PostgreSQL and optional external Dask scheduler.

NAT-Powered Configuration

Native NeMo Agent Toolkit integration. AI-Q is a direct implementation of the NVIDIA NeMo Agent Toolkit — all agents, tools, LLMs, routing behavior, and observability are defined through NAT's YAML configuration system with environment variable substitution (${VAR:-default}), plugin registration, and nat run / nat serve / nat eval CLI commands.
Per-role LLM assignment. Assign different models to the orchestrator, planner, researcher, and intent classifier roles independently.
Four pre-built configs. CLI default, Web + LlamaIndex, Web + Foundational RAG, and Hybrid Frontier Model (GPT-5.2 orchestrator with open-source researchers).

Models

Default models. NVIDIA Nemotron 3 Nano 30B (agents, intent classifier), GPT-OSS 120B (deep research orchestrator/planner), Nemotron Mini 4B (document summaries), Llama Nemotron Embed VL 1B v2 (embeddings), Nemotron Nano 12B v2 VL (multimodal extraction).
Frontier model support. Optional config for GPT-5.2 as orchestrator/planner with open-source researchers.
Nemotron Super compatibility. Tested with Nemotron 3 Super 120B; temporarily commented out in default configs due to Build API availability constraints.

Developer Experience

uv workspace monorepo. uv sync installs everything; individual packages installable with uv pip install -e.
Jupyter notebook series. Three-part tutorial: Getting Started, Deep Researcher deep dive, and Customization guide.
Debug console. Built-in debug UI at /debug with real-time SSE visualization, job tracking, and state inspection.
Comprehensive documentation. Architecture docs, API reference, customization guides, knowledge layer SDK reference, and deployment guides for Docker Compose and Kubernetes.

Breaking Changes from v1.x

Complete architecture rewrite — v1.x configs and workflows are not compatible.
The demo web application from v1.x has been replaced by the new Next.js frontend.
PDF processing is now handled through the knowledge layer rather than direct RAG integration.
The v1.x single-agent deep researcher has been replaced by the multi-agent orchestrated workflow.

Dependencies

Pinned to NeMo Agent Toolkit (NAT) v1.4.0. NAT v1.5 or later is not yet supported.
Python 3.11–3.13 supported.
Node.js 22+ required for the frontend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.0.0

Choose a tag to compare

Sorry, something went wrong.