Skip to content

Introduce vanna text2sql#974

Merged
rapids-bot[bot] merged 39 commits intoNVIDIA:developfrom
jiaxiangr:feat-vanna-text2sql
Nov 18, 2025
Merged

Introduce vanna text2sql#974
rapids-bot[bot] merged 39 commits intoNVIDIA:developfrom
jiaxiangr:feat-vanna-text2sql

Conversation

@jiaxiangr
Copy link
Contributor

@jiaxiangr jiaxiangr commented Oct 10, 2025

Description

This PR introduces nvidia-nat-vanna, a new NAT plugin that provides production-ready text-to-SQL capabilities using the Vanna framework with NVIDIA NIM integration.

Key Features

  • Text-to-SQL Generation: Convert natural language questions to SQL queries using AI
  • MCP Support: native supported with NAT MCP server and client
  • Multi-Database Compatibility: Works out of the box with Databricks; other databases (e.g., PostgreSQL, MS SQL Server, SQLite, etc.) require connector definitions.
    • Fully automatic training and ingestion. Currently support: Databricks
  • Vector-Based RAG: Milvus integration for few-shot learning with similarity search
  • Reasoning LLMs support: Nemotron, OAI OSS, DeepSeek
  • Streaming Support: Real-time progress updates during SQL generation
    • Full integration with NAT's intermediate step tracking system
    • Better UI Display - Front-ends can render intermediate steps
  • Enterprise-Ready: Configurable authentication, error handling, and result formatting
  • NAT-Native: Follows NAT package conventions with proper function registration

What's Included

  • Two NAT functions: text2sql (SQL generation) and execute_db_query (query execution)
  • Reusable package: packages/nvidia_nat_vanna/
  • Comprehensive documentation with quick start guide and example configuration: text2sql_config.yml
    • Todo: move to examples folder as quickstart

Closes #732

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Text-to-SQL Vanna integration with streaming progress, optional Databricks query execution, and an execute-DB-query function; packaged for installation.
  • Improvements

    • Async and sync Milvus retriever support and enhanced LLM + vector-store training, retrieval, singleton lifecycle, and configurable model options.
  • Bug Fixes

    • Broadened sandbox output parsing to handle additional parse/type failures.
  • Documentation

    • Guides, quick-starts, and configuration examples for Text-to-SQL and Vanna.
  • Tests

    • New tests for Vanna DB utilities and async Milvus retriever paths.
  • Chores

    • Packaging, workspace registration, config YAMLs, dependency updates, CI allowlist and vocabulary updates.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link

coderabbitai bot commented Oct 10, 2025

Walkthrough

Adds a new nvidia-nat-vanna package providing Text2SQL and Execute DB Query functions, Vanna LLM + Milvus integration, Databricks DB utilities, async-capable Milvus retriever support, training artifacts, docs, tests, YAML configs, CI vocabulary update, and workspace/pyproject wiring. (50 words)

Changes

Cohort / File(s) Summary
Package metadata & workspace wiring
packages/nvidia_nat_vanna/pyproject.toml, packages/nvidia_nat_all/pyproject.toml, pyproject.toml
Add new package pyproject for nvidia-nat-vanna (setuptools + setuptools-scm), project metadata, entry point nat.components -> nat.plugins.vanna.register, uv workspace sources, optional vanna extra, and bump pymilvus / workspace wiring.
Package init & registration
packages/nvidia_nat_vanna/src/nat/plugins/vanna/__init__.py, packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
Add package init and a register module that imports plugin modules to enable automatic plugin registration.
Vanna core & training artifacts
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py, packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
New Vanna integration: JSON/think-tag helpers, LLM adapter, prompt builders, Milvus-backed async-capable vector store, VannaLangChain orchestration, singleton lifecycle, training driver, and static training constants/examples.
DB utils & tests
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py, packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
Add Databricks connection helpers, sync/async query execution, QueryResult model (to_dataframe/to_records/row_count), SQL extraction helpers, Vanna run_sql binding, and tests covering parsing, connections, execution, and async wiring.
Functions & registration
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py, packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
New registered functions text2sql and execute_db_query with config models, streaming and non-streaming flows, orchestration logic, observability, error handling, and LangChain wrapper integration.
Function configs (YAML)
packages/nvidia_nat_vanna/text2sql_config.yml, packages/nvidia_nat_vanna/text2sql_training_config.yml
Add inference and training YAML configs wiring LLMs, embedders, Milvus retriever, Databricks connection, training toggles, and workflow settings.
Docs & guides
packages/nvidia_nat_vanna/src/nat/meta/pypi.md, docs/source/workflows/functions/text-to-sql.md, docs/source/workflows/functions/index.md, docs/source/quick-start/installing.md
Add package pypi metadata, comprehensive Text-to-SQL documentation, update functions index, and mention nvidia-nat[vanna] in install docs.
Milvus retriever: async support
src/nat/retriever/milvus/register.py, src/nat/retriever/milvus/retriever.py
Add use_async_client: bool config; retriever now accepts sync or async Milvus clients, detects async capability at runtime, and implements dual async/sync logic for validation, embedding, search, and iterator handling.
Milvus retriever tests & mocks
tests/nat/retriever/test_retrievers.py
Add async embedding method, CustomAsyncMilvusClient mock, async retriever fixture, and async tests mirroring sync coverage to validate async paths.
Function training & schema
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py, packages/nvidia_nat_vanna/text2sql_training_config.yml
Add static training DDL, docs, examples, response guidelines, training prompt, and a training-focused YAML config.
CI & lint vocabulary
ci/scripts/path_checks.py, ci/vale/styles/config/vocabularies/nat/accept.txt
Add allowlist entry "deepseek-ai/deepseek-.*" and add "Vanna" to accepted vocabulary.
Misc: sandbox parsing tweak
src/nat/tool/code_execution/code_sandbox.py
Broaden LocalSandbox._parse_request_output exception handling to also catch AssertionError alongside JSON decoding errors.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant NAT as NAT Agent
    participant T2S as text2sql Function
    participant V as Vanna Singleton
    participant M as Milvus VectorStore
    participant L as LLM
    participant DB as Databricks

    User->>NAT: Natural language query
    NAT->>T2S: Invoke text2sql
    T2S->>V: Get or create instance
    V->>M: Ensure collections / retrieve context
    M-->>T2S: Related questions, DDL, docs
    T2S->>L: Generate SQL with context
    L-->>T2S: SQL + explanation
    T2S-->>NAT: Text2SQLOutput (sql, explanation)
    alt execute_sql == true
        NAT->>T2S: Invoke execute_db_query
        T2S->>V: use run_sql on Vanna (async)
        V->>DB: run_sql executes query
        DB-->>T2S: QueryResult (rows, columns)
        T2S-->>User: Final result with rows
    else
        NAT-->>User: Final result (sql only)
    end
Loading
sequenceDiagram
    participant App as Application
    participant Client as Milvus Client (Async/Sync)
    participant Ret as MilvusRetriever

    App->>Ret: Initialize with client
    Ret->>Ret: Detect async capability -> set _is_async
    App->>Ret: Call search(...)
    alt Async mode
        Ret->>Client: await aembed_query(...)
        Ret->>Client: await .search(...)
    else Sync mode
        Ret->>Client: embed_query(...)
        Ret->>Client: .search(...)
    end
    Client-->>Ret: Results
    Ret-->>App: RetrieverOutput
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas to focus during review:

  • Milvus retriever dual-mode logic: async detection, awaited vs sync calls, iterator lifecycle, schema/collection handling, and error paths.
  • VannaSingleton concurrency, resource lifecycle, and correct reuse/cleanup of LLM/embedder/async Milvus client.
  • Streaming AsyncGenerator implementations in text2sql and execute_db_query: yields, cancellation, typing, and error propagation.
  • DB connection and query execution (sync vs async), QueryResult -> DataFrame/records conversion, and secret serialization handling.
  • Tests and mocks: ensure async mocks accurately reflect production client behavior and cover both sync/async branches.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Introduce vanna text2sql' is concise, descriptive, uses imperative mood, and directly reflects the main feature addition—a new text-to-SQL capability using the Vanna framework.
Linked Issues check ✅ Passed The PR successfully implements all core objectives from issue #732: contributes a NAT-compatible text-to-SQL function with Vanna framework, integrates NVIDIA NIM (ChatNVIDIA, NVIDIAEmbeddings), implements Milvus-based vector retrieval, provides streaming/non-streaming endpoints, supports Databricks with training pipelines, and ensures proper NAT function registration.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the Vanna text-to-SQL feature scope. Minor ecosystem updates (pymilvus version bump, Milvus async client support, package references in nvidia_nat_all) are necessary dependencies; documentation updates and allowlist entries support the feature rollout appropriately.
Docstring Coverage ✅ Passed Docstring coverage is 80.90% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnuradhaKaruppiah
Copy link
Contributor

Love this new feat!

@jiaxiangr
Copy link
Contributor Author

@mpenn , could you please help review? Aaditya and I will be making two commits soon, and we'll mark the PR as ready shortly.

@jiaxiangr jiaxiangr force-pushed the feat-vanna-text2sql branch from fa6d91e to 5580062 Compare October 11, 2025 00:03
@mdemoret-nv mdemoret-nv added feature request New feature or request non-breaking Non-breaking change labels Oct 17, 2025
@jiaxiangr jiaxiangr force-pushed the feat-vanna-text2sql branch 2 times, most recently from 7e06364 to c10e3f5 Compare October 20, 2025 23:28
@jiaxiangr jiaxiangr requested a review from mpenn October 22, 2025 16:40
@jiaxiangr jiaxiangr force-pushed the feat-vanna-text2sql branch from 16ccd15 to 006f63e Compare October 28, 2025 22:51
@jiaxiangr jiaxiangr marked this pull request as ready for review October 28, 2025 23:04
@jiaxiangr jiaxiangr requested a review from a team as a code owner October 28, 2025 23:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (20)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (1)

16-70: Consider adding a module docstring.

While not strictly required for data-only modules, a module-level docstring would improve discoverability and help developers understand the purpose of these training data constants.

Consider adding a module docstring after the license header:

 # limitations under the License.
 
+"""Static training data and prompts for VANNA text-to-SQL integration.
+
+This module provides example DDL statements, documentation, question-SQL pairs,
+and prompt templates used for training and configuring the VANNA framework
+with NVIDIA NIM integration.
+"""
+
 # yapf: disable
-# ruff: noqa: E501
packages/nvidia_nat_vanna/README.md (3)

160-177: Add language identifier to fenced code block.

Markdown best practices require language identifiers for fenced code blocks. Add "text" or "plaintext" for the expected output example.

Apply this diff:

 Expected output:
-```
+```text
 # Ingest DDL and synthesize query-SQL pairs for training

409-423: Add language identifiers to fenced code blocks.

Lines 409 and 417 have fenced code blocks without language identifiers. Add "text" for error message examples.

Apply this diff:

 **Milvus connection failed:**
-```
+```text
 Error: Failed to connect to Milvus
  • Verify Milvus is running: docker ps | grep milvus
  • Check host/port configuration
  • Verify TLS settings match your Milvus deployment

Database connection failed:
- +text
Error: Failed to connect to database


447-461: Add language identifier to fenced code block.

Line 447 starts a fenced code block for the package structure without a language identifier. Add "text" for the directory tree.

Apply this diff:

 ## Package Structure

-```
+```text
 nvidia_nat_vanna/
packages/nvidia_nat_vanna/src/nat/meta/pypi.md (1)

95-95: Convert bare URL to markdown link.

Line 95 uses a bare URL which should be in markdown link format per markdown best practices.

Apply this diff:

-Full documentation: https://nvidia.github.io/NeMo-Agent-Toolkit/
+Full documentation: [https://nvidia.github.io/NeMo-Agent-Toolkit/](https://nvidia.github.io/NeMo-Agent-Toolkit/)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (4)

130-137: Avoid reliance on retriever private attrs; add defensive checks.

_accessing retriever._is_async/_client is brittle. Validate with getattr and fail gracefully.

-        if not retriever._is_async:  # type: ignore[attr-defined]
+        if not getattr(retriever, "_is_async", False):  # robust check
             msg = (f"Milvus retriever '{config.milvus_retriever}' must be configured with "
                    "use_async_client=true for Vanna text2sql function")
             raise ValueError(msg)
         
-        async_milvus_client = retriever._client  # type: ignore[attr-defined]
+        async_milvus_client = getattr(retriever, "_client", None)
+        if async_milvus_client is None:
+            raise ValueError("Milvus retriever did not expose an async client via _client")

Please confirm the retriever interface guarantees these attributes; if there’s a public accessor, prefer it. Based on learnings.


16-19: Add a concise module docstring.

Required by guidelines for public modules.

 import logging
 import uuid
 from collections.abc import AsyncGenerator
 
+"""Text2SQL function using Vanna + Milvus with Databricks execution and NAT streaming."""

220-229: Keep logger.error when re-raising; silence ruff TRY400 locally.

Per guidelines, use logger.error when re-raising. Add noqa to satisfy ruff.

-        except Exception as e:
-            logger.error(f"SQL generation failed: {e}")
+        except Exception as e:
+            logger.error(f"SQL generation failed: {e}")  # noqa: TRY400 (re-raising below per logging policy)

112-154: Singleton reuse across differing configs can cause drift.

Reusing VannaSingleton without comparing llm/embedder/milvus/dialect may leak previous config. Consider hashing relevant config and resetting when drift detected.

Would you like a helper that tracks a config_hash on the instance and resets when changed?

packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (4)

255-261: Use asyncio.to_thread for executor offload; add return type hint.

Cleaner and future-proof vs get_event_loop/run_in_executor.

-    import asyncio
-
-    # Run synchronous query in executor
-    loop = asyncio.get_event_loop()
-    query_result = await loop.run_in_executor(None, execute_query, connection, query, catalog, schema, database_type)
-
-    return query_result.to_dataframe()
+    import asyncio
+    from typing import Any
+    query_result = await asyncio.to_thread(execute_query, connection, query, catalog, schema, database_type)
+    return query_result.to_dataframe()

Also annotate the signature:

-async def async_execute_query(
+async def async_execute_query(
     connection: Any,
     query: str,
     catalog: str | None = None,
     schema: str | None = None,
     database_type: str | None = None,
-) :
+) -> Any:

68-77: Avoid re-importing BaseModel inside function.

Use the module-level import.

-    from pydantic import BaseModel
-
     # Handle BaseModel objects (e.g., Text2SQLOutput)
     if isinstance(sql_query, BaseModel):

153-161: Mark unused parameters to satisfy ruff (Databricks-only path).

These are retained for future DBs; silence ARG001 cleanly.

 def connect_to_database(
     database_type: str,
     host: str | None = None,
-    port: int | None = None,
-    database: str | None = None,
-    username: str | None = None,
+    port: int | None = None,      # noqa: ARG001
+    database: str | None = None,  # noqa: ARG001
+    username: str | None = None,  # noqa: ARG001
     password: str | None = None,
     **kwargs,
 ) -> Any:

Similarly in setup_vanna_db_connection for port/database/username.


144-151: Logging policy vs ruff TRY400.

These blocks log and re-raise; keep logger.error per guidelines. Add inline noqa to appease ruff.

-    except ImportError:
-        logger.error("databricks-sql-connector not installed")
+    except ImportError:
+        logger.error("databricks-sql-connector not installed")  # noqa: TRY400
         raise
-    except Exception as e:
-        logger.error(f"Failed to connect to Databricks: {e}")
+    except Exception as e:
+        logger.error(f"Failed to connect to Databricks: {e}")  # noqa: TRY400
         raise
@@
-    except Exception as e:
-        logger.error(f"Error executing query: {e}")
+    except Exception as e:
+        logger.error(f"Error executing query: {e}")  # noqa: TRY400
         raise
@@
-        except Exception as e:
-            logger.error(f"Error executing SQL: {e}")
+        except Exception as e:
+            logger.error(f"Error executing SQL: {e}")  # noqa: TRY400
             raise

Also applies to: 231-233, 318-319

packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (2)

16-23: Add a concise module docstring.

 import logging
 import uuid
 from collections.abc import AsyncGenerator
 from typing import Any
 
+"""Execute a SQL query (Databricks) with streaming progress and structured output for NAT."""

216-223: Use logger.exception when not re-raising.

We swallow the error and yield a failure result; capture stack trace.

-        except Exception as e:
-            logger.error(f"Error executing SQL query: {e}")
+        except Exception as e:
+            logger.exception(f"Error executing SQL query: {e}")
             yield ExecuteDBQueryOutput(
                 success=False,
                 failure_reason=str(e),
                 sql_query=sql_query,
                 dataframe_info=DataFrameInfo(shape=[0, 0], dtypes={}, columns=[]),
             )
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (5)

141-144: Replace en dash with hyphen to satisfy RUF001.

-                          "Please generate diverse question–SQL pairs where each SQL "
+                          "Please generate diverse question-SQL pairs where each SQL "

477-479: Use logger.exception when not re-raising.

Capture the stack trace for observability.

-        except Exception as e:
-            logger.error(f"Error retrieving {collection_name}: {e}")
+        except Exception as e:
+            logger.exception(f"Error retrieving {collection_name}: {e}")

506-507: Use logger.exception when not re-raising.

Same rationale for similar-queries retrieval.

-        except Exception as e:
-            logger.error(f"Error retrieving similar questions: {e}")
+        except Exception as e:
+            logger.exception(f"Error retrieving similar questions: {e}")

170-173: Silence unused kwargs and keep interface stable.

These kwargs aren’t used; acknowledge to satisfy ruff.

-    def get_sql_prompt(
+    def get_sql_prompt(
         self,
@@
-        error_message: dict | None = None,
-        **kwargs,
+        error_message: dict | None = None,
@@
-    ) -> list:
+    ) -> list:
         """Generate prompt for SQL generation."""
+        # kwargs reserved for future prompt knobs
@@
-    async def submit_prompt(self, prompt, **kwargs) -> str:
+    async def submit_prompt(self, prompt, **kwargs) -> str:
         """Submit prompt to LLM."""
+        _ = kwargs  # noqa: F841

Also applies to: 212-229


584-599: Config flag unused: allow_llm_to_see_data has no effect.

Either use it (e.g., include a brief note in the prompt or enable a guarded data peek) or remove from API to avoid confusion.

Want a small patch that injects a hint into the prompt when True?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24b287f and 006f63e.

📒 Files selected for processing (14)
  • packages/nvidia_nat_vanna/README.md (1 hunks)
  • packages/nvidia_nat_vanna/pyproject.toml (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/meta/__init__.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/text2sql_config.yml (1 hunks)
  • src/nat/retriever/milvus/register.py (2 hunks)
  • src/nat/retriever/milvus/retriever.py (6 hunks)
  • tests/nat/retriever/test_retrievers.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (13)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Files:

  • packages/nvidia_nat_vanna/text2sql_config.yml
  • packages/nvidia_nat_vanna/src/nat/meta/__init__.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
  • src/nat/retriever/milvus/register.py
  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • tests/nat/retriever/test_retrievers.py
  • src/nat/retriever/milvus/retriever.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_vanna/text2sql_config.yml
  • packages/nvidia_nat_vanna/src/nat/meta/__init__.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
  • src/nat/retriever/milvus/register.py
  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • tests/nat/retriever/test_retrievers.py
  • src/nat/retriever/milvus/retriever.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_vanna/text2sql_config.yml
  • packages/nvidia_nat_vanna/src/nat/meta/__init__.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible

Files:

  • packages/nvidia_nat_vanna/src/nat/meta/__init__.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
  • src/nat/retriever/milvus/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • tests/nat/retriever/test_retrievers.py
  • src/nat/retriever/milvus/retriever.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development

Files:

  • packages/nvidia_nat_vanna/src/nat/meta/__init__.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
  • src/nat/retriever/milvus/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • src/nat/retriever/milvus/retriever.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml

Files:

  • packages/nvidia_nat_vanna/src/nat/meta/__init__.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

src/nat/**/* contains core functionality; changes should prioritize backward compatibility

Files:

  • src/nat/retriever/milvus/register.py
  • src/nat/retriever/milvus/retriever.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/retriever/milvus/register.py
  • src/nat/retriever/milvus/retriever.py
{docs/source/**/*.md,**/README.@(md|ipynb)}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{docs/source/**/*.md,**/README.@(md|ipynb)}: Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Never use “NAT”/“nat” abbreviations in documentation
Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct

Files:

  • packages/nvidia_nat_vanna/README.md
packages/*/pyproject.toml

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

packages/*/pyproject.toml: Each package under packages/ must contain a pyproject.toml
packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-
Dependencies in pyproject.toml should use ~= with two-digit versions (e.g., ~=1.0)

Files:

  • packages/nvidia_nat_vanna/pyproject.toml
{**/pyproject.toml,uv.lock}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via ‘uv pip install --sync’

Files:

  • packages/nvidia_nat_vanna/pyproject.toml
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests live in tests/ or examples/*/tests and use markers defined in pyproject.toml (e.g., integration)

Files:

  • tests/nat/retriever/test_retrievers.py
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}: Use pytest with pytest-asyncio for async code
Test functions should be named with test_ prefix using snake_case
Extract frequently repeated test code into pytest fixtures
Pytest fixtures should define the name argument in @pytest.fixture and the function should be prefixed fixture_ using snake_case
Mock external services with pytest_httpserver or unittest.mock; avoid live endpoints
Mark slow tests with @pytest.mark.slow to skip in default test suite
Mark integration tests requiring external services with @pytest.mark.integration and follow nat-tests guidelines

Files:

  • tests/nat/retriever/test_retrievers.py
tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/retriever/test_retrievers.py
🧠 Learnings (1)
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to packages/*/pyproject.toml : packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-

Applied to files:

  • packages/nvidia_nat_vanna/pyproject.toml
🧬 Code graph analysis (7)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)
  • execute_db_query (92-252)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1)
  • text2sql (101-267)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
packages/nvidia_nat_data_flywheel/tests/observability/processor/test_dfw_record_processor.py (1)
  • model_dump_json (336-338)
tests/nat/authentication/test_bearer_token_validator.py (1)
  • json (111-112)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (7)
src/nat/builder/builder.py (1)
  • Builder (68-290)
src/nat/builder/framework_enum.py (1)
  • LLMFrameworkEnum (19-25)
src/nat/data_models/api_server.py (1)
  • ResponseIntermediateStep (481-493)
src/nat/data_models/component_ref.py (3)
  • EmbedderRef (83-91)
  • LLMRef (116-124)
  • RetrieverRef (149-157)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-27)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
  • setup_vanna_db_connection (264-326)
  • run_sql (313-319)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (6)
  • VannaSingleton (640-741)
  • train_vanna (744-822)
  • instance (647-653)
  • get_instance (656-728)
  • generate_sql (581-637)
  • close (551-558)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (6)
src/nat/builder/builder.py (1)
  • Builder (68-290)
src/nat/builder/framework_enum.py (1)
  • LLMFrameworkEnum (19-25)
src/nat/builder/function_info.py (2)
  • FunctionInfo (290-625)
  • create (351-549)
src/nat/data_models/api_server.py (1)
  • ResponseIntermediateStep (481-493)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-27)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (3)
  • async_execute_query (236-261)
  • connect_to_database (153-191)
  • extract_sql_from_message (52-121)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)
src/nat/agent/reasoning_agent/reasoning_agent.py (1)
  • remove_r1_think_tags (87-96)
tests/nat/retriever/test_retrievers.py (4)
  • embed_documents (169-170)
  • aembed_query (164-167)
  • search (85-125)
  • search (231-271)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)
  • run_sql (313-319)
tests/nat/retriever/test_retrievers.py (1)
src/nat/retriever/milvus/retriever.py (4)
  • search (96-97)
  • MilvusRetriever (36-263)
  • CollectionNotFoundError (32-33)
  • bind (69-80)
src/nat/retriever/milvus/retriever.py (1)
tests/nat/retriever/test_retrievers.py (8)
  • list_collections (33-34)
  • list_collections (179-180)
  • describe_collection (36-74)
  • describe_collection (182-220)
  • aembed_query (164-167)
  • search_iterator (127-154)
  • search (85-125)
  • search (231-271)
🪛 markdownlint-cli2 (0.18.1)
packages/nvidia_nat_vanna/src/nat/meta/pypi.md

95-95: Bare URL used

(MD034, no-bare-urls)

packages/nvidia_nat_vanna/README.md

160-160: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


409-409: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


417-417: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


447-447: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


474-474: Bare URL used

(MD034, no-bare-urls)


475-475: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py

17-17: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py

144-144: Consider moving this statement to an else block

(TRY300)


146-146: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


149-149: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


156-156: Unused function argument: port

(ARG001)


157-157: Unused function argument: database

(ARG001)


158-158: Unused function argument: username

(ARG001)


232-232: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


268-268: Unused function argument: port

(ARG001)


269-269: Unused function argument: database

(ARG001)


270-270: Unused function argument: username

(ARG001)


318-318: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py

220-220: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


266-266: Do not catch blind exception: Exception

(BLE001)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py

94-94: Unused function argument: builder

(ARG001)


216-216: Do not catch blind exception: Exception

(BLE001)


217-217: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py

61-61: Abstract raise to an inner function

(TRY301)


69-69: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


70-70: Avoid specifying long messages outside the exception class

(TRY003)


141-141: String contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)


170-170: Unused method argument: kwargs

(ARG002)


212-212: Unused method argument: kwargs

(ARG002)


228-228: Consider moving this statement to an else block

(TRY300)


231-231: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


253-253: Abstract raise to an inner function

(TRY301)


259-259: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


270-270: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


294-294: Consider moving this statement to an else block

(TRY300)


339-339: Consider moving this statement to an else block

(TRY300)


383-383: Consider moving this statement to an else block

(TRY300)


417-417: Unused method argument: kwargs

(ARG002)


428-428: Unused method argument: kwargs

(ARG002)


443-443: Unused method argument: kwargs

(ARG002)


477-477: Do not catch blind exception: Exception

(BLE001)


478-478: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


481-481: Unused method argument: kwargs

(ARG002)


506-506: Do not catch blind exception: Exception

(BLE001)


507-507: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


510-510: Unused method argument: kwargs

(ARG002)


557-557: Do not catch blind exception: Exception

(BLE001)


584-584: Unused method argument: allow_llm_to_see_data

(ARG002)


631-631: Do not catch blind exception: Exception

(BLE001)


739-739: Do not catch blind exception: Exception

(BLE001)


811-811: Do not catch blind exception: Exception

(BLE001)


813-813: Do not catch blind exception: Exception

(BLE001)

tests/nat/retriever/test_retrievers.py

166-166: Avoid specifying long messages outside the exception class

(TRY003)


169-169: Unused method argument: texts

(ARG002)

src/nat/retriever/milvus/retriever.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)


121-121: Avoid specifying long messages outside the exception class

(TRY003)


214-214: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (12)
packages/nvidia_nat_vanna/pyproject.toml (1)

23-24: LGTM! Dependency declarations follow guidelines.

The dependencies correctly use ~= with two-digit versions (e.g., ~=1.4) and declare dependency on nvidia-nat as required for plugin packages.

Based on learnings.

packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (1)

1-22: LGTM! Registration module follows conventions.

The registration module correctly:

  • Includes the required SPDX Apache-2.0 header with current copyright year (2025)
  • Imports the provider modules for automatic registration
  • Follows the standard NAT plugin registration pattern
src/nat/retriever/milvus/register.py (2)

51-54: LGTM! Configuration field is well-documented.

The new use_async_client field is properly:

  • Typed as bool with a default of False
  • Documented with a clear description using Pydantic Field
  • Integrated into the client creation logic below

69-77: LGTM! Runtime client selection is clean and correct.

The runtime branching between AsyncMilvusClient and MilvusClient based on the use_async_client flag is:

  • Clear and easy to understand
  • Properly imports the client types only when needed
  • Passes the same configuration parameters to both client types
  • Compatible with the downstream MilvusRetriever which now accepts Any type
tests/nat/retriever/test_retrievers.py (3)

164-167: LGTM! Async embedding method correctly implemented.

The aembed_query method properly mirrors the sync embed_query method with:

  • Input validation
  • Same return value structure
  • Async signature using async def

This enables testing of async Milvus retriever paths.


173-271: LGTM! Async mock client properly structured.

The CustomAsyncMilvusClient correctly:

  • Mirrors the structure of CustomMilvusClient for consistency
  • Makes all methods async (async def) where needed
  • Implements the same validation and return value logic
  • Provides test fixtures for async code paths

The duplication between sync and async clients is acceptable for test fixtures.


500-574: LGTM! Comprehensive async test coverage.

The new async Milvus retriever tests provide excellent coverage:

  • test_async_milvus_search: Validates search functionality, top_k behavior, and output fields
  • test_async_milvus_retriever_binding: Tests parameter binding and error cases
  • test_async_milvus_validation: Validates field existence checks

These tests mirror the sync test structure, ensuring both code paths are thoroughly tested.

src/nat/retriever/milvus/retriever.py (4)

43-58: LGTM! Async client detection is pragmatic and well-documented.

The changes correctly:

  • Widen the client type to Any to accommodate both sync and async Milvus clients
  • Detect async mode via type name comparison ("AsyncMilvusClient")
  • Log the client mode for observability
  • Maintain backward compatibility with existing sync usage

The type name string comparison is a practical approach that avoids circular import issues.


88-94: LGTM! Collection validation correctly handles both client types.

The _validate_collection method properly:

  • Becomes async to support both code paths
  • Branches on _is_async to await or call directly
  • Returns a boolean consistently
  • Will work with both sync and async clients

120-192: LGTM! Search iterator implementation handles both async and sync paths.

The _search_with_iterator method correctly:

  • Uses async aembed_query for embedding (line 133)
  • Branches on _is_async for all Milvus operations (describe_collection, search_iterator, next, close)
  • Awaits async operations when needed
  • Maintains consistent return types and error handling

The dual-path implementation is clean and preserves existing behavior for sync clients.


213-262: LGTM! Search implementation properly supports both client types.

The _search method correctly:

  • Validates collection asynchronously
  • Branches on _is_async for describe_collection and search operations
  • Uses async aembed_query for embedding (line 237)
  • Awaits operations when in async mode
  • Maintains consistent validation and error handling

The implementation is thorough and maintains backward compatibility.

packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1)

86-90: No action required—import path is correct.

The canonical definition of remove_r1_think_tags is at src/nat/utils/io/model_processing.py:19. The import in vanna_utils.py matches this location and is consistent with all other usages across the codebase. The local function definition in reasoning_agent.py is a separate implementation, not the intended import source. The import path nat.utils.io.model_processing is correct.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
packages/nvidia_nat_vanna/README.md (4)

3-3: Remove NAT abbreviation from line 3.

Per documentation guidelines, never use "NAT" abbreviation. Replace with the full toolkit name.

-Production-ready text-to-SQL integration for NeMo Agent Toolkit (NAT) using the Vanna framework with Databricks support and vector-based few-shot learning.
+Production-ready text-to-SQL integration for NVIDIA NeMo Agent Toolkit using the Vanna framework with Databricks support and vector-based few-shot learning.

139-140: Fix NAT abbreviation in CLI comment.

The comment on line 140 uses "NAT CLI" which violates documentation guidelines. The actual command nat run is correct (it's the CLI command), but the comment should reference the toolkit by its full name.

-# Using NAT CLI
+# Using NeMo Agent Toolkit CLI

399-402: Replace NAT abbreviation with full toolkit name.

Line 400 uses "NAT's" which violates documentation guidelines. Use the full toolkit name instead.

 Other features include:
-- Full integration with NAT's intermediate step tracking system
+- Full integration with NeMo Agent Toolkit's intermediate step tracking system

476-478: Convert bare URLs to markdown links and fix NAT abbreviation.

Lines 476–477 use bare URLs which should be in markdown link format. Line 478 also contains the "NAT" abbreviation which must be replaced.

 For questions and support:
-- GitHub Issues: https://github.com/NVIDIA/NeMo-Agent-Toolkit/issues
-- Documentation: https://docs.nvidia.com/nemo/agent-toolkit/latest/
-- Discord: Join the NAT community Discord
+- [GitHub Issues](https://github.com/NVIDIA/NeMo-Agent-Toolkit/issues)
+- [Documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/)
+- Discord: Join the NeMo Agent Toolkit community Discord
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (1)

52-52: Address the placeholder constant per previous review.

This constant was flagged in a previous review as placeholder code. Additionally, it lacks a type hint (unlike the other constants in this file) and has no explanatory comment describing its purpose.

Add the type hint and either remove this constant if unused, or replace with production values and document its purpose:

-VANNA_ACTIVE_TABLES = ['catalog.schema.table_a', 'catalog.schema.table_b']
+# Tables to include/exclude during specific operations (update with actual table names).
+VANNA_ACTIVE_TABLES: list[str] = ['catalog.schema.table_a', 'catalog.schema.table_b']
🧹 Nitpick comments (6)
packages/nvidia_nat_vanna/README.md (4)

160-160: Specify language for fenced code block.

Add the language identifier to the code block on line 160 for proper syntax highlighting.

-Expected output:
+Expected output:
-```
+```text
 # Ingest DDL and synthesize query-SQL pairs for training

409-409: Specify language for fenced code block.

Add the language identifier to the code block on line 409 for proper syntax highlighting.

 **Milvus connection failed:**
-```
+```text
 Error: Failed to connect to Milvus

417-417: Specify language for fenced code block.

Add the language identifier to the code block on line 417 for proper syntax highlighting.

 **Database connection failed:**
-```
+```text
 Error: Failed to connect to database

449-449: Specify language for fenced code block.

Add the language identifier to the code block on line 449 for proper syntax highlighting.

 ## Package Structure
-```
+```text
 nvidia_nat_vanna/
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (2)

19-25: Move instructional comments to external documentation.

The comment "Define your database schema here to help the model understand table structures" is instructional and suggests users should replace this data. Per coding guidelines, placeholder text should not appear in source code. Either:

  1. Move these instructions to the quickstart documentation or configuration examples, or
  2. Clearly document that these are example values for demonstration purposes.

Update the comment to clarify intent:

-# DDL statements for training
-# Define your database schema here to help the model understand table structures
+# Example DDL statements for training.
+# These demo tables are used in the quickstart guide. Replace with your actual schema.

55-69: Add type hints for consistency.

The prompt template constants lack type hints, which is inconsistent with the other constants in this file (lines 21, 29, 37). Adding explicit str type hints improves consistency and helps with static analysis.

Add type hints to both constants:

-VANNA_RESPONSE_GUIDELINES = """
+VANNA_RESPONSE_GUIDELINES: str = """
 Response Guidelines:
 1. Carefully analyze the question to understand the user's intent, target columns, filters, and any aggregation or grouping requirements.
 2. Output only JSON:
 {
     "sql": "<valid SQL query>",
     "explanation": "<brief description>",
 }
 """

-VANNA_TRAINING_PROMPT = """
+VANNA_TRAINING_PROMPT: str = """
 Response Guidelines:
 1. Generate 20 natural language questions and their corresponding valid SQL queries.
 2. Output JSON like: [{{"question": "...", "sql": "..."}}]
 """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 006f63e and 8e80e5b.

📒 Files selected for processing (2)
  • packages/nvidia_nat_vanna/README.md (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
{docs/source/**/*.md,**/README.@(md|ipynb)}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{docs/source/**/*.md,**/README.@(md|ipynb)}: Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Never use “NAT”/“nat” abbreviations in documentation
Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct

Files:

  • packages/nvidia_nat_vanna/README.md
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible

Files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development

Files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml

Files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
🧠 Learnings (4)
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Never use “NAT”/“nat” abbreviations in documentation

Applied to files:

  • packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)

Applied to files:

  • packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to **/*.py : In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation

Applied to files:

  • packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-08-22T20:18:28.041Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/cursor-rules.mdc:0-0
Timestamp: 2025-08-22T20:18:28.041Z
Learning: Applies to .cursor/rules/**/*.mdc : Use consistent project terminology in descriptions (e.g., NAT workflows, NAT CLI commands)

Applied to files:

  • packages/nvidia_nat_vanna/README.md
🪛 markdownlint-cli2 (0.18.1)
packages/nvidia_nat_vanna/README.md

160-160: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


409-409: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


417-417: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


449-449: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


476-476: Bare URL used

(MD034, no-bare-urls)


477-477: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py

17-17: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

🔇 Additional comments (1)
packages/nvidia_nat_vanna/README.md (1)

467-467: Approve fix for NAT abbreviation.

This line correctly uses the full toolkit name instead of the abbreviation. Good catch on the previous fix!

@jiaxiangr jiaxiangr changed the title Feat vanna text2sql Introduce vanna text2sql Oct 29, 2025
@jiaxiangr
Copy link
Contributor Author

@coderabbitai generate docstrings

@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

Caution

Docstrings generation - FAILED

An unexpected error occurred while opening a pull request: Reference update failed - https://docs.github.com/rest/git/refs#create-a-reference

Copy link
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation should also be updated to mention this new package and feature set.

Copy link
Contributor

@dnandakumar-nv dnandakumar-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the nvidia_nat_vanna package could also benefit from unit testing

@jiaxiangr jiaxiangr force-pushed the feat-vanna-text2sql branch from 8e80e5b to bb462ec Compare November 3, 2025 23:44
@jiaxiangr jiaxiangr requested a review from a team as a code owner November 3, 2025 23:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

♻️ Duplicate comments (4)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (1)

23-23: Remove unused noqa directive.

The noqa: E501 directive is unnecessary since E501 (line length check) is not enabled in the project's ruff configuration.

Apply this diff:

-# ruff: noqa: E501
+
packages/nvidia_nat_vanna/pyproject.toml (1)

1-3: Add the required Apache-2.0 SPDX header.

All TOML files must start with our standard Apache-2.0 header. Please insert the full SPDX block before [build-system].

Apply this diff:

+# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 [build-system]
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1)

100-101: Add return type annotation for the registered function.

Public APIs must declare return types; add -> AsyncGenerator[FunctionInfo, None] to satisfy our typing contract and prevent downstream type-checker errors.

Apply this diff:

-@register_function(config_type=Text2SQLConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN])
-async def text2sql(config: Text2SQLConfig, builder: Builder):
+@register_function(config_type=Text2SQLConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN])
+async def text2sql(config: Text2SQLConfig, builder: Builder) -> AsyncGenerator[FunctionInfo, None]:

As per coding guidelines

packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)

86-94: Annotate the registered function’s return type.

execute_db_query is part of the public plugin surface; add -> AsyncGenerator[FunctionInfo, None] to comply with our typing requirements.

Apply this diff:

 @register_function(
     config_type=ExecuteDBQueryConfig,
     framework_wrappers=[LLMFrameworkEnum.LANGCHAIN],
 )
 async def execute_db_query(
     config: ExecuteDBQueryConfig,
     _builder: Builder,
-):
+) -> AsyncGenerator[FunctionInfo, None]:

As per coding guidelines

🧹 Nitpick comments (4)
packages/nvidia_nat_vanna/README.md (2)

191-208: Add language identifiers to fenced code blocks.

Several fenced blocks (expected output, connection format examples, SQL snippets, and troubleshooting errors) omit a language hint, tripping markdownlint (MD040) and hurting syntax highlighting. Please add the appropriate language labels such as text, bash, or sql to each affected block.

Also applies to: 261-276, 280-290, 475-489


543-545: Convert bare URLs to Markdown links.

The Support section still exposes bare URLs, triggering MD034 and breaking our documentation style rules. Wrap each URL in Markdown link syntax (e.g., [GitHub Issues](https://github.com/NVIDIA/NeMo-Agent-Toolkit/issues)) and keep the descriptive text.

packages/nvidia_nat_vanna/src/nat/meta/pypi.md (1)

130-133: Wrap the documentation URL in Markdown.

Line 132 exposes a bare URL; convert it to [Documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/) (or similar descriptive text) to satisfy Markdown style rules.

packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1)

284-416: Significant code duplication in collection creation methods.

The three _create_*_collection methods (_create_sql_collection, _create_ddl_collection, _create_doc_collection) share nearly identical structure with only field names differing. This violates DRY principles and makes maintenance harder.

Consider extracting a common method:

async def _create_collection(self, name: str, field_configs: list[dict]):
    """Create a Milvus collection with specified fields."""
    from pymilvus import DataType, MilvusClient, MilvusException
    
    try:
        await self.async_milvus_client.load_collection(collection_name=name)
        logger.debug(f"Collection {name} already exists, skipping creation")
        return
    except MilvusException as e:
        if "collection not found" not in str(e).lower():
            raise
    
    schema = MilvusClient.create_schema(auto_id=False, enable_dynamic_field=False)
    schema.add_field(
        field_name="id",
        datatype=DataType.VARCHAR,
        is_primary=True,
        max_length=65535,
    )
    for field_config in field_configs:
        schema.add_field(**field_config)
    schema.add_field(
        field_name="vector",
        datatype=DataType.FLOAT_VECTOR,
        dim=self._embedding_dim,
    )
    
    index_params = MilvusClient.prepare_index_params()
    index_params.add_index(field_name="vector", index_type="AUTOINDEX", metric_type="L2")
    await self.async_milvus_client.create_collection(
        collection_name=name,
        schema=schema,
        index_params=index_params,
        consistency_level="Strong",
    )
    logger.info(f"Created collection: {name}")

async def _create_sql_collection(self, name: str):
    """Create SQL collection using async client."""
    from pymilvus import DataType
    await self._create_collection(name, [
        {"field_name": "text", "datatype": DataType.VARCHAR, "max_length": 65535},
        {"field_name": "sql", "datatype": DataType.VARCHAR, "max_length": 65535},
    ])
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e80e5b and bb462ec.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • packages/nvidia_nat_all/pyproject.toml (2 hunks)
  • packages/nvidia_nat_vanna/README.md (1 hunks)
  • packages/nvidia_nat_vanna/pyproject.toml (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/__init__.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/text2sql_config.yml (1 hunks)
  • packages/nvidia_nat_vanna/text2sql_training_config.yml (1 hunks)
  • pyproject.toml (2 hunks)
  • src/nat/retriever/milvus/register.py (2 hunks)
  • src/nat/retriever/milvus/retriever.py (6 hunks)
  • tests/nat/retriever/test_retrievers.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/nat/retriever/milvus/register.py
  • packages/nvidia_nat_vanna/text2sql_config.yml
🧰 Additional context used
📓 Path-based instructions (14)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Files:

  • packages/nvidia_nat_vanna/text2sql_training_config.yml
  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
  • pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
  • src/nat/retriever/milvus/retriever.py
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • tests/nat/retriever/test_retrievers.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_vanna/text2sql_training_config.yml
  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
  • pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
  • src/nat/retriever/milvus/retriever.py
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • tests/nat/retriever/test_retrievers.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_vanna/text2sql_training_config.yml
  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
packages/*/pyproject.toml

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

packages/*/pyproject.toml: Each package under packages/ must contain a pyproject.toml
packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-
Dependencies in pyproject.toml should use ~= with two-digit versions (e.g., ~=1.0)

Files:

  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
{**/pyproject.toml,uv.lock}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via ‘uv pip install --sync’

Files:

  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • pyproject.toml
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible

Files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
  • src/nat/retriever/milvus/retriever.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • tests/nat/retriever/test_retrievers.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development

Files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
  • src/nat/retriever/milvus/retriever.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml

Files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}: Use pytest with pytest-asyncio for async code
Test functions should be named with test_ prefix using snake_case
Extract frequently repeated test code into pytest fixtures
Pytest fixtures should define the name argument in @pytest.fixture and the function should be prefixed fixture_ using snake_case
Mock external services with pytest_httpserver or unittest.mock; avoid live endpoints
Mark slow tests with @pytest.mark.slow to skip in default test suite
Mark integration tests requiring external services with @pytest.mark.integration and follow nat-tests guidelines

Files:

  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
  • tests/nat/retriever/test_retrievers.py
{tests/test_*.py,examples/*/tests/test_*.py,packages/*/tests/test_*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Name test files as test_*.py and store them in a tests/ folder

Files:

  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
{docs/source/**/*.md,**/README.@(md|ipynb)}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{docs/source/**/*.md,**/README.@(md|ipynb)}: Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Never use “NAT”/“nat” abbreviations in documentation
Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct

Files:

  • packages/nvidia_nat_vanna/README.md
src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

src/nat/**/* contains core functionality; changes should prioritize backward compatibility

Files:

  • src/nat/retriever/milvus/retriever.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/retriever/milvus/retriever.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests live in tests/ or examples/*/tests and use markers defined in pyproject.toml (e.g., integration)

Files:

  • tests/nat/retriever/test_retrievers.py
tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/retriever/test_retrievers.py
🧠 Learnings (16)
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to packages/*/pyproject.toml : packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-

Applied to files:

  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • pyproject.toml
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {**/pyproject.toml,uv.lock} : New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via ‘uv pip install <pkg> --sync’

Applied to files:

  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • pyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to **/*.py : In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation

Applied to files:

  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to packages/*/pyproject.toml : Each package under packages/ must contain a pyproject.toml

Applied to files:

  • packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to examples/*/pyproject.toml : Examples containing Python code should have a pyproject.toml

Applied to files:

  • packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb} : Every file must start with the standard SPDX Apache-2.0 header

Applied to files:

  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb} : All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Applied to files:

  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Periodically run ‘uv pip list --outdated’ and upgrade dependencies

Applied to files:

  • packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Never use “NAT”/“nat” abbreviations in documentation

Applied to files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)

Applied to files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
📚 Learning: 2025-08-22T20:18:28.041Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/cursor-rules.mdc:0-0
Timestamp: 2025-08-22T20:18:28.041Z
Learning: Applies to .cursor/rules/**/*.mdc : Use consistent project terminology in descriptions (e.g., NAT workflows, NAT CLI commands)

Applied to files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names

Applied to files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-09-15T21:26:29.430Z
Learnt from: saglave
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 726
File: examples/frameworks/adk_demo/src/nat_adk_demo/weather_update_tool.py:0-0
Timestamp: 2025-09-15T21:26:29.430Z
Learning: In NAT's ADK integration, the docstring of registered functions serves as the tool description sent to the LLM, not standard Python function documentation. The docstring should describe the logical tool interface (parameters the LLM will provide) rather than the NAT framework wrapper parameters like tool_config and builder.

Applied to files:

  • packages/nvidia_nat_vanna/README.md
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-08-22T20:29:13.433Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 701
File: examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-with-custom-post-process.yml:16-18
Timestamp: 2025-08-22T20:29:13.433Z
Learning: In the NeMo-Agent-Toolkit project, configuration consistency across example files is prioritized over removing unused config options, even when those options aren't currently implemented in the runtime.

Applied to files:

  • packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct

Applied to files:

  • packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to **/*.py : Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored

Applied to files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
🧬 Code graph analysis (8)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)
src/nat/agent/reasoning_agent/reasoning_agent.py (1)
  • remove_r1_think_tags (87-96)
tests/nat/retriever/test_retrievers.py (4)
  • embed_documents (169-170)
  • aembed_query (164-167)
  • search (85-125)
  • search (231-271)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)
  • run_sql (285-292)
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (11)
  • QueryResult (45-68)
  • SupportedDatabase (39-42)
  • connect_to_database (163-200)
  • connect_to_databricks (143-160)
  • execute_query (203-226)
  • extract_sql_from_message (71-140)
  • setup_vanna_db_connection (248-298)
  • to_records (57-59)
  • to_dataframe (51-55)
  • row_count (62-68)
  • run_sql (285-292)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)
  • execute_db_query (90-237)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1)
  • text2sql (101-249)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
src/nat/data_models/common.py (1)
  • get_secret_value (177-193)
packages/nvidia_nat_data_flywheel/tests/observability/processor/test_dfw_record_processor.py (1)
  • model_dump_json (336-338)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (6)
src/nat/builder/builder.py (1)
  • Builder (68-290)
src/nat/data_models/api_server.py (1)
  • ResponseIntermediateStep (482-494)
src/nat/data_models/component_ref.py (3)
  • EmbedderRef (83-91)
  • LLMRef (116-124)
  • RetrieverRef (149-157)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-27)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
  • setup_vanna_db_connection (248-298)
  • run_sql (285-292)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (4)
  • VannaSingleton (648-759)
  • instance (655-661)
  • get_instance (664-737)
  • generate_sql (589-645)
src/nat/retriever/milvus/retriever.py (2)
tests/nat/retriever/test_retrievers.py (8)
  • list_collections (33-34)
  • list_collections (179-180)
  • describe_collection (36-74)
  • describe_collection (182-220)
  • aembed_query (164-167)
  • search_iterator (127-154)
  • search (85-125)
  • search (231-271)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1)
  • close (557-564)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (6)
src/nat/builder/builder.py (1)
  • Builder (68-290)
src/nat/builder/framework_enum.py (1)
  • LLMFrameworkEnum (19-25)
src/nat/data_models/api_server.py (1)
  • ResponseIntermediateStep (482-494)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-27)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (5)
  • row_count (62-68)
  • async_execute_query (229-245)
  • connect_to_database (163-200)
  • extract_sql_from_message (71-140)
  • to_dataframe (51-55)
src/nat/data_models/common.py (1)
  • get_secret_value (177-193)
tests/nat/retriever/test_retrievers.py (1)
src/nat/retriever/milvus/retriever.py (4)
  • search (96-97)
  • MilvusRetriever (36-263)
  • CollectionNotFoundError (32-33)
  • bind (69-80)
🪛 markdownlint-cli2 (0.18.1)
packages/nvidia_nat_vanna/README.md

150-150: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


191-191: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


261-261: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


280-280: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


483-483: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


516-516: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


543-543: Bare URL used

(MD034, no-bare-urls)


544-544: Bare URL used

(MD034, no-bare-urls)

packages/nvidia_nat_vanna/src/nat/meta/pypi.md

132-132: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.3)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py

61-61: Abstract raise to an inner function

(TRY301)


69-69: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


70-70: Avoid specifying long messages outside the exception class

(TRY003)


141-141: String contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF001)


170-170: Unused method argument: kwargs

(ARG002)


212-212: Unused method argument: kwargs

(ARG002)


228-228: Consider moving this statement to an else block

(TRY300)


231-231: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


253-253: Abstract raise to an inner function

(TRY301)


259-259: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


270-270: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


294-294: Consider moving this statement to an else block

(TRY300)


339-339: Consider moving this statement to an else block

(TRY300)


383-383: Consider moving this statement to an else block

(TRY300)


417-417: Unused method argument: kwargs

(ARG002)


428-428: Unused method argument: kwargs

(ARG002)


443-443: Unused method argument: kwargs

(ARG002)


477-477: Do not catch blind exception: Exception

(BLE001)


478-478: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


481-481: Unused method argument: kwargs

(ARG002)


506-506: Do not catch blind exception: Exception

(BLE001)


507-507: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


510-510: Unused method argument: kwargs

(ARG002)


563-563: Do not catch blind exception: Exception

(BLE001)


592-592: Unused method argument: allow_llm_to_see_data

(ARG002)


639-639: Do not catch blind exception: Exception

(BLE001)


753-753: Do not catch blind exception: Exception

(BLE001)


757-757: Do not catch blind exception: Exception

(BLE001)


834-834: Do not catch blind exception: Exception

(BLE001)


836-836: Do not catch blind exception: Exception

(BLE001)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py

157-157: Consider moving this statement to an else block

(TRY300)


159-159: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


166-166: Unused function argument: kwargs

(ARG001)


225-225: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


252-252: Unused function argument: kwargs

(ARG001)


291-291: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py

213-213: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py

23-23: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

src/nat/retriever/milvus/retriever.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)


121-121: Avoid specifying long messages outside the exception class

(TRY003)


214-214: Avoid specifying long messages outside the exception class

(TRY003)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py

201-201: Do not catch blind exception: Exception

(BLE001)


202-202: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

tests/nat/retriever/test_retrievers.py

166-166: Avoid specifying long messages outside the exception class

(TRY003)


169-169: Unused method argument: texts

(ARG002)

🔇 Additional comments (7)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (2)

1-20: LGTM!

The Apache 2.0 license header and module docstring are correct and follow the project's coding guidelines.


25-56: LGTM!

The training data constants are properly typed, follow naming conventions, and contain valid example SQL statements and documentation.

packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (5)

1-28: LGTM: License header and imports are correct.

The Apache 2.0 license header is properly formatted with the correct copyright year, and the imports are appropriate for the module's functionality.


31-102: LGTM: Utility functions are well-implemented.

The JSON extraction, think tag removal, and message conversion utilities are correctly implemented with proper error handling and docstrings.


567-646: LGTM: Well-structured SQL generation with parallel retrieval.

The VannaLangChain class properly combines the Milvus vector store and LangChain LLM capabilities. The generate_sql method efficiently uses asyncio.gather for parallel retrieval of context and includes appropriate fallback handling for non-JSON responses.

Note: The allow_llm_to_see_data parameter appears unused but may be reserved for future functionality or interface compatibility.


238-260: The review comment incorrectly identifies the blocking call location and context.

Line 256 is inside the synchronous __init__ method (line 238), not an async initialization path. Calling embed_documents() synchronously during class initialization is normal and expected.

The actual blocking issues are at lines 423, 434, and 449 in the async methods add_question_sql(), add_ddl(), and add_documentation(), where LangChain embedders support async methods like aembed_documents() that should be used instead of the synchronous embed_documents() calls.

Likely an incorrect or invalid review comment.


762-845: Review comment is incorrect: run_sql is guaranteed to be available at runtime.

The run_sql function is dynamically attached to the Vanna instance in setup_vanna_db_connection() (line 295 in db_utils.py). In the only call site for train_vanna() (text2sql.py line 166), setup_vanna_db_connection() is always invoked first (lines 158–162), ensuring vn.run_sql exists when training starts. No AttributeError will occur in normal usage.

The dynamic attribute pattern is intentional; type checkers cannot resolve it (hence the # type: ignore[misc] comment on line 206 of text2sql.py), but this is a known design trade-off, not an error.

Likely an incorrect or invalid review comment.

@jiaxiangr jiaxiangr force-pushed the feat-vanna-text2sql branch from bb462ec to 741996f Compare November 5, 2025 18:15
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
…library

Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
tests/nat/retriever/test_retrievers.py (2)

56-71: Fix collection metadata typo

collection2 mistakenly labels collection_name as "collection1".

-            "collection2": {
-                "collection_name": "collection1",
+            "collection2": {
+                "collection_name": "collection2",

106-108: Use tuple in isinstance; PEP604 union invalid here

isinstance(timeout, float | int) raises TypeError.

-        if timeout:
-            assert isinstance(timeout, float | int)
+        if timeout is not None:
+            assert isinstance(timeout, (float, int))
src/nat/retriever/milvus/retriever.py (2)

169-196: Don’t return inside the iteration loop

Returning inside the loop truncates results after first batch. Move the return after the loop.

-                results.extend(res[0])
-
-                return _wrap_milvus_results(results, content_field=self.content_field)
+                results.extend(res[0])
+            # after loop completes
+            return _wrap_milvus_results(results, content_field=self.content_field)

275-277: Fix isinstance usage with union

Hit | dict is not valid in isinstance.

-    if not isinstance(res, Hit | dict):
+    if not isinstance(res, (Hit, dict)):
♻️ Duplicate comments (12)
packages/nvidia_nat_vanna/pyproject.toml (2)

1-55: Add required SPDX Apache-2.0 license header.

The file is missing the standard SPDX license header required by coding guidelines. All source files, including pyproject.toml, must include the Apache-2.0 header comment at the top.

Add this header before line 1:

# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[build-system]

26-27: Remove version constraints from workspace dependencies.

Lines 26–27 declare nvidia-nat and nvidia-nat-langchain with version constraints (~=1.4), but the comment at line 21 explicitly states "Does not apply to nvidia-nat packages." Workspace dependencies should not have version constraints; they are managed via the [tool.uv.sources] section.

Apply this diff:

   # Keep sorted!!!
   "databricks-sql-connector~=4.1.4",
   "databricks-sqlalchemy~=2.0.8",
   "langchain-core~=0.3.76", # determined from nvidia-nat-langchain
-  "nvidia-nat~=1.4",
-  "nvidia-nat-langchain~=1.4",
+  "nvidia-nat",
+  "nvidia-nat-langchain",
   "pandas~=2.0",
   "pymilvus[model]~=2.5.11",
   "sqlglot~=26.33.0",
   "vanna[chromadb]~=0.7.9",

Based on learnings from the PR author.

packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (2)

58-58: Add type hints for public constants

Annotate exported constants to meet project typing guidelines.

-VANNA_ACTIVE_TABLES = ['catalog.schema.table_a', 'catalog.schema.table_b']
+VANNA_ACTIVE_TABLES: list[str] = ['catalog.schema.table_a', 'catalog.schema.table_b']

-VANNA_RESPONSE_GUIDELINES = """
+VANNA_RESPONSE_GUIDELINES: str = """
 ...
-"""
+"""

-VANNA_TRAINING_PROMPT = """
+VANNA_TRAINING_PROMPT: str = """
 ...
-"""
+"""

Also applies to: 61-69, 71-75


23-23: Remove unused ruff directive

# ruff: noqa: E501 is unnecessary; E501 not enabled.

-# ruff: noqa: E501
+
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)

117-121: Avoid KeyError on optional config; use defaults

Provide safe defaults for reasoning_models and chat_models.

-        self.reasoning_models = self.config["reasoning_models"]
-        self.chat_models = self.config["chat_models"]
+        self.reasoning_models = self.config.get("reasoning_models", set())
+        self.chat_models = self.config.get("chat_models", set())

242-251: Validate presence of async Milvus client in config

Fail fast with clear error rather than KeyError.

-            self.async_milvus_client = config["async_milvus_client"]
+            if config is None or "async_milvus_client" not in config:
+                msg = "async_milvus_client must be provided in config"
+                raise ValueError(msg)
+            self.async_milvus_client = config["async_milvus_client"]

428-436: Don’t block event loop: use async embedding in async methods

Use aembed_documents for async paths.

-        embedding = self.embedder.embed_documents([ddl])[0]
+        embedding = (await self.embedder.aembed_documents([ddl]))[0]
@@
-        embedding = self.embedder.embed_documents([documentation])[0]
+        embedding = (await self.embedder.aembed_documents([documentation]))[0]

Also applies to: 443-451

packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)

285-293: Use logger.error when re-raising to avoid duplicate stack traces

Per guidelines, log error (no traceback) and re-raise.

-        except Exception:
-            logger.exception("Error executing SQL")
+        except Exception:
+            logger.error("Error executing SQL")
             raise
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (2)

201-206: Log exceptions with traceback; keep generic failure_reason

Use logger.exception when not re-raising.

-        except Exception as e:
-            logger.error("Error executing SQL query", exc_info=e)
+        except Exception:
+            logger.exception("Error executing SQL query")
             yield ExecuteDBQueryOutput(
                 success=False,
                 failure_reason="SQL execution failed. Please check server logs for details.",

148-151: Enforce read-only SQL before connecting

Reject non-SELECT/WITH early to prevent destructive ops.

-            connection = connect_to_database(
+            import re
+            if not re.match(r'^\s*(WITH|SELECT)\b', sql_query, flags=re.IGNORECASE):
+                yield ExecuteDBQueryOutput(
+                    success=False,
+                    failure_reason="Only SELECT/WITH statements are allowed.",
+                    sql_query=sql_query,
+                    dataframe_info=DataFrameInfo(shape=[0, 0], dtypes={}, columns=[]),
+                )
+                return
+            connection = connect_to_database(
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (2)

212-223: Fix exception logging to include traceback without leaking details

Use exc_info=True (or logger.exception if not re-raising). You re-raise, so keep logger.error with exc_info=True and generic payload.

-        except Exception as e:
-            logger.error("SQL generation failed", exc_info=e)
+        except Exception as e:
+            logger.error("SQL generation failed", exc_info=True)
             # Error status as ResponseIntermediateStep

195-209: Block non read-only SQL before execution

Prevent destructive statements from LLM output; allow only SELECT/WITH.

-            if config.execute_sql:
+            if config.execute_sql:
+                import re
+                if not re.match(r'^\s*(WITH|SELECT)\b', sql, flags=re.IGNORECASE):
+                    yield ResponseIntermediateStep(
+                        id=str(uuid.uuid4()),
+                        parent_id=parent_id,
+                        type="markdown",
+                        name="text2sql_status",
+                        payload=StatusPayload(message="Execution skipped: only SELECT/WITH statements are allowed.").model_dump_json(),
+                    )
+                    yield Text2SQLOutput(sql=sql, explanation=explanation)
+                    return
                 yield ResponseIntermediateStep(
🧹 Nitpick comments (6)
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)

201-221: Add a test for engine reuse path

Consider a second call with vn.db_engine preset to ensure the “reuse existing engine” branch is exercised (assert connect_to_database not called again).

packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (2)

100-102: Annotate public return type

Registered functions must declare return type.

-async def text2sql(config: Text2SQLConfig, builder: Builder):
+async def text2sql(config: Text2SQLConfig, builder: Builder) -> AsyncGenerator[FunctionInfo, None]:

42-48: Return executed rows when execute_sql=True (optional but useful)

Expose records/row_count to callers.

-from pydantic import BaseModel
-from pydantic import Field
+from typing import Any
+from pydantic import BaseModel
+from pydantic import Field
@@
 class Text2SQLOutput(BaseModel):
     """Output schema for text2sql function."""
     sql: str = Field(description="Generated SQL query")
     explanation: str | None = Field(default=None, description="Explanation of the query")
+    dataframe_records: list[dict[str, Any]] | None = Field(default=None, description="Rows when executed")
+    row_count: int | None = Field(default=None, description="Row count when executed")
@@
-                df = await vanna_instance.run_sql(sql)  # type: ignore[misc]
+                df = await vanna_instance.run_sql(sql)  # type: ignore[misc]
                 logger.info(f"SQL executed successfully: {len(df)} rows returned")
@@
-            yield Text2SQLOutput(sql=sql, explanation=explanation)
+            result = Text2SQLOutput(sql=sql, explanation=explanation)
+            if config.execute_sql:
+                result.dataframe_records = df.to_dict("records")
+                result.row_count = len(df)
+            yield result

Also applies to: 195-209

packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)

215-221: Avoid logging full query at info level

Reduce exposure of sensitive SQL; prefer debug.

-            logger.info(f"Executing query: {query}")
+            logger.debug("Executing query")

241-244: Use get_running_loop (modern asyncio)

Prefer asyncio.get_running_loop() in async contexts.

-    loop = asyncio.get_event_loop()
+    loop = asyncio.get_running_loop()
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)

86-94: Annotate public return type (and silence unused builder)

Add return type; keep _builder but silence lint if needed.

-from collections.abc import AsyncGenerator
+from collections.abc import AsyncGenerator
@@
-async def execute_db_query(
+async def execute_db_query(
     config: ExecuteDBQueryConfig,
-    _builder: Builder,
-) :
+    _builder: Builder,  # noqa: ARG001
+) -> AsyncGenerator[FunctionInfo, None]:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0516588 and 894b12c.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (22)
  • ci/scripts/path_checks.py (1 hunks)
  • ci/vale/styles/config/vocabularies/nat/accept.txt (1 hunks)
  • docs/source/workflows/functions/index.md (1 hunks)
  • docs/source/workflows/functions/text-to-sql.md (1 hunks)
  • packages/nvidia_nat_all/pyproject.toml (2 hunks)
  • packages/nvidia_nat_vanna/pyproject.toml (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/__init__.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (1 hunks)
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1 hunks)
  • packages/nvidia_nat_vanna/text2sql_config.yml (1 hunks)
  • packages/nvidia_nat_vanna/text2sql_training_config.yml (1 hunks)
  • pyproject.toml (3 hunks)
  • src/nat/retriever/milvus/register.py (2 hunks)
  • src/nat/retriever/milvus/retriever.py (6 hunks)
  • src/nat/tool/code_execution/code_sandbox.py (1 hunks)
  • tests/nat/retriever/test_retrievers.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • packages/nvidia_nat_vanna/src/nat/meta/pypi.md
🚧 Files skipped from review as they are similar to previous changes (9)
  • ci/vale/styles/config/vocabularies/nat/accept.txt
  • packages/nvidia_nat_vanna/text2sql_training_config.yml
  • src/nat/retriever/milvus/register.py
  • src/nat/tool/code_execution/code_sandbox.py
  • packages/nvidia_nat_vanna/text2sql_config.yml
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/init.py
  • docs/source/workflows/functions/text-to-sql.md
  • pyproject.toml
🧰 Additional context used
📓 Path-based instructions (5)
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_all/pyproject.toml
  • ci/scripts/path_checks.py
  • tests/nat/retriever/test_retrievers.py
  • docs/source/workflows/functions/index.md
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • src/nat/retriever/milvus/retriever.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
  • packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/retriever/test_retrievers.py
docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

  • docs/source/workflows/functions/index.md
src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/retriever/milvus/retriever.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.
📚 Learning: 2025-11-10T21:26:35.059Z
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.

Applied to files:

  • packages/nvidia_nat_all/pyproject.toml
  • packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-11-14T20:33:53.944Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1181
File: packages/nvidia_nat_test/tests/test_test_llm.py:419-484
Timestamp: 2025-11-14T20:33:53.944Z
Learning: The NVIDIA NeMo-Agent-Toolkit project uses pytest-asyncio in strict mode (the default), which requires pytest.mark.asyncio decorator on all async test functions. All async tests in packages/nvidia_nat_test/tests/test_test_llm.py consistently follow this pattern.

Applied to files:

  • tests/nat/retriever/test_retrievers.py
📚 Learning: 2025-11-05T11:45:35.119Z
Learnt from: thepatrickchin
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1152
File: examples/config_inheritance/pyproject.toml:1-25
Timestamp: 2025-11-05T11:45:35.119Z
Learning: In the examples/ directory, pyproject.toml files typically do not include SPDX license headers, with only one exception (adk_demo). This is an established pattern that differs from the general guideline requiring SPDX headers in all .toml files.

Applied to files:

  • packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-11-10T21:49:04.946Z
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: src/nat/retriever/milvus/retriever.py:133-149
Timestamp: 2025-11-10T21:49:04.946Z
Learning: In Python files using LangChain's `Embeddings` interface, `aembed_query` is always available on all embedders because the base class provides a default async implementation that wraps the synchronous method in an executor. Implementations may override with native async for better performance. It's safe to unconditionally call `await embedder.aembed_query(...)` without fallback checks.

Applied to files:

  • src/nat/retriever/milvus/retriever.py
📚 Learning: 2025-09-17T05:20:03.228Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 752
File: packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py:99-121
Timestamp: 2025-09-17T05:20:03.228Z
Learning: The AuthAdapter class in packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py has a known concurrency race condition where `self.auth_provider.config.auth_request = auth_request` can be racy under concurrent requests. This should be addressed by either refactoring AuthProviderBase.authenticate() to accept AuthRequest as a parameter or using an asyncio.Lock to serialize access.

Applied to files:

  • packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
🧬 Code graph analysis (7)
tests/nat/retriever/test_retrievers.py (1)
src/nat/retriever/milvus/retriever.py (4)
  • search (100-101)
  • MilvusRetriever (36-267)
  • CollectionNotFoundError (32-33)
  • bind (73-84)
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (11)
  • QueryResult (45-68)
  • SupportedDatabase (39-42)
  • connect_to_database (163-200)
  • connect_to_databricks (143-160)
  • execute_query (203-226)
  • extract_sql_from_message (71-140)
  • setup_vanna_db_connection (248-298)
  • to_records (57-59)
  • to_dataframe (51-55)
  • row_count (62-68)
  • run_sql (285-292)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (7)
src/nat/builder/builder.py (1)
  • Builder (71-342)
src/nat/builder/framework_enum.py (1)
  • LLMFrameworkEnum (19-25)
src/nat/builder/function_info.py (2)
  • FunctionInfo (290-625)
  • create (351-549)
src/nat/data_models/api_server.py (1)
  • ResponseIntermediateStep (482-494)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-36)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (5)
  • row_count (62-68)
  • async_execute_query (229-245)
  • connect_to_database (163-200)
  • extract_sql_from_message (71-140)
  • to_dataframe (51-55)
src/nat/data_models/common.py (1)
  • get_secret_value (177-193)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (8)
src/nat/builder/builder.py (1)
  • Builder (71-342)
src/nat/builder/framework_enum.py (1)
  • LLMFrameworkEnum (19-25)
src/nat/builder/function_info.py (2)
  • FunctionInfo (290-625)
  • create (351-549)
src/nat/data_models/api_server.py (1)
  • ResponseIntermediateStep (482-494)
src/nat/data_models/component_ref.py (3)
  • EmbedderRef (83-91)
  • LLMRef (116-124)
  • RetrieverRef (149-157)
src/nat/data_models/function.py (1)
  • FunctionBaseConfig (26-36)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
  • setup_vanna_db_connection (248-298)
  • run_sql (285-292)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (5)
  • VannaSingleton (641-758)
  • train_vanna (761-844)
  • instance (655-661)
  • get_instance (664-736)
  • generate_sql (582-638)
src/nat/retriever/milvus/retriever.py (1)
tests/nat/retriever/test_retrievers.py (8)
  • list_collections (33-34)
  • list_collections (179-180)
  • describe_collection (36-74)
  • describe_collection (182-220)
  • aembed_query (164-167)
  • search_iterator (127-154)
  • search (85-125)
  • search (231-271)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
src/nat/data_models/common.py (1)
  • get_secret_value (177-193)
packages/nvidia_nat_data_flywheel/tests/observability/processor/test_dfw_record_processor.py (1)
  • model_dump_json (336-338)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)
src/nat/agent/reasoning_agent/reasoning_agent.py (1)
  • remove_r1_think_tags (87-96)
tests/nat/retriever/test_retrievers.py (4)
  • embed_documents (169-170)
  • aembed_query (164-167)
  • search (85-125)
  • search (231-271)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)
  • run_sql (285-292)
🪛 Ruff (0.14.5)
tests/nat/retriever/test_retrievers.py

166-166: Avoid specifying long messages outside the exception class

(TRY003)


169-169: Unused method argument: texts

(ARG002)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py

201-201: Do not catch blind exception: Exception

(BLE001)


202-202: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py

213-213: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

src/nat/retriever/milvus/retriever.py

65-65: Avoid specifying long messages outside the exception class

(TRY003)


125-125: Avoid specifying long messages outside the exception class

(TRY003)


218-218: Avoid specifying long messages outside the exception class

(TRY003)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py

157-157: Consider moving this statement to an else block

(TRY300)


159-159: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


166-166: Unused function argument: kwargs

(ARG001)


225-225: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


252-252: Unused function argument: kwargs

(ARG001)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py

61-61: Abstract raise to an inner function

(TRY301)


69-69: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


70-70: Avoid specifying long messages outside the exception class

(TRY003)


170-170: Unused method argument: kwargs

(ARG002)


212-212: Unused method argument: kwargs

(ARG002)


228-228: Consider moving this statement to an else block

(TRY300)


231-231: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


253-253: Abstract raise to an inner function

(TRY301)


259-259: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


270-270: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


294-294: Consider moving this statement to an else block

(TRY300)


339-339: Consider moving this statement to an else block

(TRY300)


383-383: Consider moving this statement to an else block

(TRY300)


417-417: Unused method argument: kwargs

(ARG002)


428-428: Unused method argument: kwargs

(ARG002)


443-443: Unused method argument: kwargs

(ARG002)


478-478: Redundant exception object included in logging.exception call

(TRY401)


481-481: Unused method argument: kwargs

(ARG002)


507-507: Redundant exception object included in logging.exception call

(TRY401)


510-510: Unused method argument: kwargs

(ARG002)


563-563: Do not catch blind exception: Exception

(BLE001)


585-585: Unused method argument: allow_llm_to_see_data

(ARG002)


632-632: Do not catch blind exception: Exception

(BLE001)


752-752: Do not catch blind exception: Exception

(BLE001)


756-756: Do not catch blind exception: Exception

(BLE001)


833-833: Do not catch blind exception: Exception

(BLE001)


835-835: Do not catch blind exception: Exception

(BLE001)

packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py

23-23: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

🔇 Additional comments (4)
docs/source/workflows/functions/index.md (1)

18-18: Documentation updates look good.

The header generalization and toctree addition appropriately reflect the expanded scope of the functions documentation to include the new text-to-sql capability alongside code-execution. The changes maintain consistency with the markdown structure and properly integrate the new documentation page.

Also applies to: 68-71

ci/scripts/path_checks.py (1)

153-154: LGTM! Allowlist entry correctly supports DeepSeek models.

The addition of the DeepSeek model pattern is consistent with existing LLM model name patterns and correctly supports the text-to-SQL feature's DeepSeek integration mentioned in the PR objectives.

packages/nvidia_nat_all/pyproject.toml (1)

39-39: Workspace dependency additions are correct.

Both changes correctly implement the workspace dependency pattern: nvidia-nat-vanna is added to the dependencies list without a version constraint (line 39) and mapped in [tool.uv.sources] as a workspace dependency (line 84). This follows the established pattern for all nvidia-nat-* plugins in this meta-package.

Also applies to: 84-84

packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)

209-214: Good fix: initialize db_engine to None instead of deleting attribute

Safer and matches setup_vanna_db_connection semantics; avoids AttributeError on fresh MagicMock.

Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
@dagardner-nv
Copy link
Contributor

/ok to test 0565169

@dagardner-nv
Copy link
Contributor

/ok to test f9344a8

Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
@dagardner-nv dagardner-nv self-requested a review November 17, 2025 22:30
@dagardner-nv
Copy link
Contributor

/ok to test 4c7294b

jiaxiangr and others added 2 commits November 17, 2025 14:47
Co-authored-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
Signed-off-by: Jiaxiang Ren <jiaxiangr@nvidia.com>
Co-authored-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
Signed-off-by: Jiaxiang Ren <jiaxiangr@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c7294b and e961df5.

📒 Files selected for processing (1)
  • docs/source/workflows/functions/text-to-sql.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • docs/source/workflows/functions/text-to-sql.md
docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

  • docs/source/workflows/functions/text-to-sql.md
🧠 Learnings (2)
📚 Learning: 2025-09-15T21:26:29.430Z
Learnt from: saglave
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 726
File: examples/frameworks/adk_demo/src/nat_adk_demo/weather_update_tool.py:0-0
Timestamp: 2025-09-15T21:26:29.430Z
Learning: In NAT's ADK integration, the docstring of registered functions serves as the tool description sent to the LLM, not standard Python function documentation. The docstring should describe the logical tool interface (parameters the LLM will provide) rather than the NAT framework wrapper parameters like tool_config and builder.

Applied to files:

  • docs/source/workflows/functions/text-to-sql.md
📚 Learning: 2025-11-10T21:26:35.059Z
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.

Applied to files:

  • docs/source/workflows/functions/text-to-sql.md

@dagardner-nv
Copy link
Contributor

/ok to test e961df5

@jiaxiangr
Copy link
Contributor Author

/merge

1 similar comment
@AnuradhaKaruppiah
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 6378ee4 into NVIDIA:develop Nov 18, 2025
17 checks passed
saglave pushed a commit to snps-scm13/SNPS-NeMo-Agent-Toolkit that referenced this pull request Dec 11, 2025
This PR introduces `nvidia-nat-vanna`, a new NAT plugin that provides production-ready text-to-SQL capabilities using the Vanna framework with NVIDIA NIM integration.

- Text-to-SQL Generation: Convert natural language questions to SQL queries using AI
- MCP Support: native supported with NAT MCP server and client
- Multi-Database Compatibility: Works out of the box with Databricks; other databases (e.g., PostgreSQL, MS SQL Server, SQLite, etc.) require connector definitions.
  - Fully automatic training and ingestion. Currently support: Databricks
- Vector-Based RAG: Milvus integration for few-shot learning with similarity search
- Reasoning LLMs support: Nemotron, OAI OSS, DeepSeek
- Streaming Support: Real-time progress updates during SQL generation
  - Full integration with NAT's intermediate step tracking system
  - Better UI Display - Front-ends can render intermediate steps
- Enterprise-Ready: Configurable authentication, error handling, and result formatting
- NAT-Native: Follows NAT package conventions with proper function registration

- Two NAT functions: text2sql (SQL generation) and execute_db_query (query execution)
- Reusable package: packages/nvidia_nat_vanna/
- Comprehensive documentation with quick start guide and example configuration: text2sql_config.yml
  - Todo: move to examples folder as quickstart

Closes NVIDIA#732

- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

* **New Features**
  * Text-to-SQL Vanna integration with streaming progress, optional Databricks query execution, and an execute-DB-query function; packaged for installation.

* **Improvements**
  * Async and sync Milvus retriever support and enhanced LLM + vector-store training, retrieval, singleton lifecycle, and configurable model options.

* **Bug Fixes**
  * Broadened sandbox output parsing to handle additional parse/type failures.

* **Documentation**
  * Guides, quick-starts, and configuration examples for Text-to-SQL and Vanna.

* **Tests**
  * New tests for Vanna DB utilities and async Milvus retriever paths.

* **Chores**
  * Packaging, workspace registration, config YAMLs, dependency updates, CI allowlist and vocabulary updates.

Authors:
  - Jiaxiang Ren (https://github.com/jiaxiangr)
  - https://github.com/aadityaramsh

Approvers:
  - Yuchen Zhang (https://github.com/yczhang-nv)
  - Eric Evans II (https://github.com/ericevans-nv)
  - Dhruv Nandakumar (https://github.com/dnandakumar-nv)
  - https://github.com/Salonijain27
  - David Gardner (https://github.com/dagardner-nv)

URL: NVIDIA#974
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Contribution: Advanced Text-to-SQL Function with Vanna Framework Integration