Skip to content

2.0.0

Choose a tag to compare

@AjayThorve AjayThorve released this 18 Mar 15:16
· 106 commits to develop since this release
62101c8

Release v2.0.0

Overview

AI-Q v2.0.0 is a ground-up rewrite of the NVIDIA AI-Q Blueprint. The v1.x line provided a single deep research agent with PDF upload and a demo web application. v2.0.0 introduces a two-tier multi-agent architecture built on the NVIDIA NeMo Agent Toolkit (NAT), a new Next.js frontend, async job infrastructure, a pluggable knowledge layer, and built-in evaluation. The AI-Q NVIDIA Blueprint is an open reference example for building intelligent AI agents that connect to your enterprise data, reason using state-of-the-art models, and deliver trusted business insights.

AI-Q holds top positions on both the DeepResearch Bench and DeepResearch Bench II leaderboards. To reproduce those results, use the drb1 and drb2 branches, respectively.

Architecture

  • Two-tier research routing. A single-call Intent Classifier routes every query to the optimal path: instant meta responses, fast shallow research, or comprehensive deep research — eliminating unnecessary latency for simple queries.
  • LangGraph state machine orchestrator. The core workflow is a LangGraph StateGraph with explicit, testable routing and conversation checkpointing (in-memory, SQLite, or PostgreSQL).
  • Shallow Researcher agent. New bounded tool-calling agent optimized for speed with configurable tool-call budgets, context compaction, and a synthesis anchor that forces citation-backed answers when the budget is exhausted.
  • Deep Researcher agent. Rebuilt using the deepagents library with a three-role subagent architecture (Orchestrator, Planner, Researcher). Supports configurable research loop iterations, per-role LLM assignment, and structured multi-phase workflows: planning, iterative research, citation management, and final report generation.
  • Clarifier agent with HITL. New human-in-the-loop agent that gathers clarifications, generates structured research plans, and supports plan approval/rejection/feedback before deep research begins. Fully configurable and can be disabled.
  • Shallow-to-deep escalation. The shallow researcher can automatically escalate to deep research when it detects insufficient results, routing through the clarifier for plan approval.

API and Backend

  • Async Jobs API. New REST API (/v1/jobs/async/) for submitting, tracking, cancelling, and streaming research jobs. Supports custom job IDs, configurable expiry, and job artifact retrieval.
  • SSE streaming with event replay. Real-time Server-Sent Events for all agent execution events (LLM tokens, tool calls, artifacts, citations). Full reconnection support with event replay from any point — sub-10ms latency on PostgreSQL via LISTEN/NOTIFY.
  • Dask-based distributed execution. Deep research jobs run on a Dask cluster with configurable workers and threads, background heartbeats, stale job reaping, and cooperative cancellation.
  • PostgreSQL persistence. Job store, event store, LangGraph checkpoints, and document summaries all support PostgreSQL for production deployments. SQLite remains available for local development.
  • Pluggable agent registration. Custom agents can be registered and exposed through the async jobs API without modifying core code.

Knowledge Layer

  • Pluggable knowledge retrieval. Backend-agnostic knowledge layer with a factory/registry pattern. Swap between LlamaIndex (local ChromaDB) and Foundational RAG (hosted NVIDIA RAG Blueprint) without changing agent code.
  • Document ingestion pipeline. Async file upload with job tracking, status polling (UPLOADING → INGESTING → SUCCESS/FAILED), and collection management (create, delete, list, TTL cleanup).
  • Multimodal extraction. LlamaIndex backend supports VLM-powered image captioning and chart data extraction from PDFs, making visual content searchable alongside text.
  • Document summaries. Optional LLM-generated one-sentence summaries per document, injected into agent prompts so researchers understand available files before making tool calls.
  • Session-based collections. Each browser session gets an isolated collection with automatic 24-hour TTL cleanup.

Citation Verification

  • Deterministic citation verification pipeline. Every research response (shallow and deep) passes through post-processing that validates all citations against a source registry of actually-retrieved URLs using a five-level matching strategy (exact, truncation, prefix, child-path, query-subset). Includes report sanitization (shortened URLs, IP addresses, non-HTTP schemes) and a full audit trail of verification decisions.

Frontend

  • New Next.js web UI. Complete rewrite as a modern Next.js application with conversational chat interface, document upload, collection management, and real-time research progress visualization.
  • Optional OAuth authentication. OIDC-based authentication support with configurable providers and a REQUIRE_AUTH toggle.
  • Configurable file upload. Accepted file types, max file size, and max file count controllable via environment variables.

Observability

  • Multi-backend tracing. Built-in support for Phoenix (local trace visualization), LangSmith (LLM evaluation and prompt optimization), Weights & Biases Weave (experiment tracking with PII redaction), and a production-grade OpenTelemetry Collector exporter with configurable privacy redaction — all configurable through NAT YAML config or environment variables.

Evaluation

  • FreshQA benchmark. Built-in factuality evaluation on time-sensitive questions for measuring shallow researcher accuracy, runnable via the NAT evaluation harness (nat eval). Deep research benchmark reproduction is available on the dedicated drb1 and drb2 branches.

Deployment

  • Docker Compose stack. Production-ready three-service stack (backend, frontend, PostgreSQL) with multi-stage Dockerfile, dev/release build targets, and distroless runtime images running as non-root (UID 1000).
  • Helm chart for Kubernetes. Full Helm deployment with NGC registry support, Kubernetes secrets management, configurable resource limits, health checks, and Foundational RAG integration via internal service DNS.
  • Horizontal scaling. Stateless backend supports scaling behind a load balancer with shared PostgreSQL and optional external Dask scheduler.

NAT-Powered Configuration

  • Native NeMo Agent Toolkit integration. AI-Q is a direct implementation of the NVIDIA NeMo Agent Toolkit — all agents, tools, LLMs, routing behavior, and observability are defined through NAT's YAML configuration system with environment variable substitution (${VAR:-default}), plugin registration, and nat run / nat serve / nat eval CLI commands.
  • Per-role LLM assignment. Assign different models to the orchestrator, planner, researcher, and intent classifier roles independently.
  • Four pre-built configs. CLI default, Web + LlamaIndex, Web + Foundational RAG, and Hybrid Frontier Model (GPT-5.2 orchestrator with open-source researchers).

Models

  • Default models. NVIDIA Nemotron 3 Nano 30B (agents, intent classifier), GPT-OSS 120B (deep research orchestrator/planner), Nemotron Mini 4B (document summaries), Llama Nemotron Embed VL 1B v2 (embeddings), Nemotron Nano 12B v2 VL (multimodal extraction).
  • Frontier model support. Optional config for GPT-5.2 as orchestrator/planner with open-source researchers.
  • Nemotron Super compatibility. Tested with Nemotron 3 Super 120B; temporarily commented out in default configs due to Build API availability constraints.

Developer Experience

  • uv workspace monorepo. uv sync installs everything; individual packages installable with uv pip install -e.
  • Jupyter notebook series. Three-part tutorial: Getting Started, Deep Researcher deep dive, and Customization guide.
  • Debug console. Built-in debug UI at /debug with real-time SSE visualization, job tracking, and state inspection.
  • Comprehensive documentation. Architecture docs, API reference, customization guides, knowledge layer SDK reference, and deployment guides for Docker Compose and Kubernetes.

Breaking Changes from v1.x

  • Complete architecture rewrite — v1.x configs and workflows are not compatible.
  • The demo web application from v1.x has been replaced by the new Next.js frontend.
  • PDF processing is now handled through the knowledge layer rather than direct RAG integration.
  • The v1.x single-agent deep researcher has been replaced by the multi-agent orchestrated workflow.

Dependencies

  • Pinned to NeMo Agent Toolkit (NAT) v1.4.0. NAT v1.5 or later is not yet supported.
  • Python 3.11–3.13 supported.
  • Node.js 22+ required for the frontend.