2.0.0
Release v2.0.0
Overview
AI-Q v2.0.0 is a ground-up rewrite of the NVIDIA AI-Q Blueprint. The v1.x line provided a single deep research agent with PDF upload and a demo web application. v2.0.0 introduces a two-tier multi-agent architecture built on the NVIDIA NeMo Agent Toolkit (NAT), a new Next.js frontend, async job infrastructure, a pluggable knowledge layer, and built-in evaluation. The AI-Q NVIDIA Blueprint is an open reference example for building intelligent AI agents that connect to your enterprise data, reason using state-of-the-art models, and deliver trusted business insights.
AI-Q holds top positions on both the DeepResearch Bench and DeepResearch Bench II leaderboards. To reproduce those results, use the drb1 and drb2 branches, respectively.
Architecture
- Two-tier research routing. A single-call Intent Classifier routes every query to the optimal path: instant meta responses, fast shallow research, or comprehensive deep research — eliminating unnecessary latency for simple queries.
- LangGraph state machine orchestrator. The core workflow is a LangGraph
StateGraphwith explicit, testable routing and conversation checkpointing (in-memory, SQLite, or PostgreSQL). - Shallow Researcher agent. New bounded tool-calling agent optimized for speed with configurable tool-call budgets, context compaction, and a synthesis anchor that forces citation-backed answers when the budget is exhausted.
- Deep Researcher agent. Rebuilt using the
deepagentslibrary with a three-role subagent architecture (Orchestrator, Planner, Researcher). Supports configurable research loop iterations, per-role LLM assignment, and structured multi-phase workflows: planning, iterative research, citation management, and final report generation. - Clarifier agent with HITL. New human-in-the-loop agent that gathers clarifications, generates structured research plans, and supports plan approval/rejection/feedback before deep research begins. Fully configurable and can be disabled.
- Shallow-to-deep escalation. The shallow researcher can automatically escalate to deep research when it detects insufficient results, routing through the clarifier for plan approval.
API and Backend
- Async Jobs API. New REST API (
/v1/jobs/async/) for submitting, tracking, cancelling, and streaming research jobs. Supports custom job IDs, configurable expiry, and job artifact retrieval. - SSE streaming with event replay. Real-time Server-Sent Events for all agent execution events (LLM tokens, tool calls, artifacts, citations). Full reconnection support with event replay from any point — sub-10ms latency on PostgreSQL via LISTEN/NOTIFY.
- Dask-based distributed execution. Deep research jobs run on a Dask cluster with configurable workers and threads, background heartbeats, stale job reaping, and cooperative cancellation.
- PostgreSQL persistence. Job store, event store, LangGraph checkpoints, and document summaries all support PostgreSQL for production deployments. SQLite remains available for local development.
- Pluggable agent registration. Custom agents can be registered and exposed through the async jobs API without modifying core code.
Knowledge Layer
- Pluggable knowledge retrieval. Backend-agnostic knowledge layer with a factory/registry pattern. Swap between LlamaIndex (local ChromaDB) and Foundational RAG (hosted NVIDIA RAG Blueprint) without changing agent code.
- Document ingestion pipeline. Async file upload with job tracking, status polling (UPLOADING → INGESTING → SUCCESS/FAILED), and collection management (create, delete, list, TTL cleanup).
- Multimodal extraction. LlamaIndex backend supports VLM-powered image captioning and chart data extraction from PDFs, making visual content searchable alongside text.
- Document summaries. Optional LLM-generated one-sentence summaries per document, injected into agent prompts so researchers understand available files before making tool calls.
- Session-based collections. Each browser session gets an isolated collection with automatic 24-hour TTL cleanup.
Citation Verification
- Deterministic citation verification pipeline. Every research response (shallow and deep) passes through post-processing that validates all citations against a source registry of actually-retrieved URLs using a five-level matching strategy (exact, truncation, prefix, child-path, query-subset). Includes report sanitization (shortened URLs, IP addresses, non-HTTP schemes) and a full audit trail of verification decisions.
Frontend
- New Next.js web UI. Complete rewrite as a modern Next.js application with conversational chat interface, document upload, collection management, and real-time research progress visualization.
- Optional OAuth authentication. OIDC-based authentication support with configurable providers and a
REQUIRE_AUTHtoggle. - Configurable file upload. Accepted file types, max file size, and max file count controllable via environment variables.
Observability
- Multi-backend tracing. Built-in support for Phoenix (local trace visualization), LangSmith (LLM evaluation and prompt optimization), Weights & Biases Weave (experiment tracking with PII redaction), and a production-grade OpenTelemetry Collector exporter with configurable privacy redaction — all configurable through NAT YAML config or environment variables.
Evaluation
- FreshQA benchmark. Built-in factuality evaluation on time-sensitive questions for measuring shallow researcher accuracy, runnable via the NAT evaluation harness (
nat eval). Deep research benchmark reproduction is available on the dedicateddrb1anddrb2branches.
Deployment
- Docker Compose stack. Production-ready three-service stack (backend, frontend, PostgreSQL) with multi-stage Dockerfile, dev/release build targets, and distroless runtime images running as non-root (UID 1000).
- Helm chart for Kubernetes. Full Helm deployment with NGC registry support, Kubernetes secrets management, configurable resource limits, health checks, and Foundational RAG integration via internal service DNS.
- Horizontal scaling. Stateless backend supports scaling behind a load balancer with shared PostgreSQL and optional external Dask scheduler.
NAT-Powered Configuration
- Native NeMo Agent Toolkit integration. AI-Q is a direct implementation of the NVIDIA NeMo Agent Toolkit — all agents, tools, LLMs, routing behavior, and observability are defined through NAT's YAML configuration system with environment variable substitution (
${VAR:-default}), plugin registration, andnat run/nat serve/nat evalCLI commands. - Per-role LLM assignment. Assign different models to the orchestrator, planner, researcher, and intent classifier roles independently.
- Four pre-built configs. CLI default, Web + LlamaIndex, Web + Foundational RAG, and Hybrid Frontier Model (GPT-5.2 orchestrator with open-source researchers).
Models
- Default models. NVIDIA Nemotron 3 Nano 30B (agents, intent classifier), GPT-OSS 120B (deep research orchestrator/planner), Nemotron Mini 4B (document summaries), Llama Nemotron Embed VL 1B v2 (embeddings), Nemotron Nano 12B v2 VL (multimodal extraction).
- Frontier model support. Optional config for GPT-5.2 as orchestrator/planner with open-source researchers.
- Nemotron Super compatibility. Tested with Nemotron 3 Super 120B; temporarily commented out in default configs due to Build API availability constraints.
Developer Experience
- uv workspace monorepo.
uv syncinstalls everything; individual packages installable withuv pip install -e. - Jupyter notebook series. Three-part tutorial: Getting Started, Deep Researcher deep dive, and Customization guide.
- Debug console. Built-in debug UI at
/debugwith real-time SSE visualization, job tracking, and state inspection. - Comprehensive documentation. Architecture docs, API reference, customization guides, knowledge layer SDK reference, and deployment guides for Docker Compose and Kubernetes.
Breaking Changes from v1.x
- Complete architecture rewrite — v1.x configs and workflows are not compatible.
- The demo web application from v1.x has been replaced by the new Next.js frontend.
- PDF processing is now handled through the knowledge layer rather than direct RAG integration.
- The v1.x single-agent deep researcher has been replaced by the multi-agent orchestrated workflow.
Dependencies
- Pinned to NeMo Agent Toolkit (NAT) v1.4.0. NAT v1.5 or later is not yet supported.
- Python 3.11–3.13 supported.
- Node.js 22+ required for the frontend.