Introduce vanna text2sql#974
Conversation
WalkthroughAdds a new nvidia-nat-vanna package providing Text2SQL and Execute DB Query functions, Vanna LLM + Milvus integration, Databricks DB utilities, async-capable Milvus retriever support, training artifacts, docs, tests, YAML configs, CI vocabulary update, and workspace/pyproject wiring. (50 words) Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant NAT as NAT Agent
participant T2S as text2sql Function
participant V as Vanna Singleton
participant M as Milvus VectorStore
participant L as LLM
participant DB as Databricks
User->>NAT: Natural language query
NAT->>T2S: Invoke text2sql
T2S->>V: Get or create instance
V->>M: Ensure collections / retrieve context
M-->>T2S: Related questions, DDL, docs
T2S->>L: Generate SQL with context
L-->>T2S: SQL + explanation
T2S-->>NAT: Text2SQLOutput (sql, explanation)
alt execute_sql == true
NAT->>T2S: Invoke execute_db_query
T2S->>V: use run_sql on Vanna (async)
V->>DB: run_sql executes query
DB-->>T2S: QueryResult (rows, columns)
T2S-->>User: Final result with rows
else
NAT-->>User: Final result (sql only)
end
sequenceDiagram
participant App as Application
participant Client as Milvus Client (Async/Sync)
participant Ret as MilvusRetriever
App->>Ret: Initialize with client
Ret->>Ret: Detect async capability -> set _is_async
App->>Ret: Call search(...)
alt Async mode
Ret->>Client: await aembed_query(...)
Ret->>Client: await .search(...)
else Sync mode
Ret->>Client: embed_query(...)
Ret->>Client: .search(...)
end
Client-->>Ret: Results
Ret-->>App: RetrieverOutput
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Areas to focus during review:
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Love this new feat! |
|
@mpenn , could you please help review? Aaditya and I will be making two commits soon, and we'll mark the PR as ready shortly. |
fa6d91e to
5580062
Compare
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
Outdated
Show resolved
Hide resolved
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
Outdated
Show resolved
Hide resolved
7e06364 to
c10e3f5
Compare
16ccd15 to
006f63e
Compare
There was a problem hiding this comment.
Actionable comments posted: 17
🧹 Nitpick comments (20)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (1)
16-70: Consider adding a module docstring.While not strictly required for data-only modules, a module-level docstring would improve discoverability and help developers understand the purpose of these training data constants.
Consider adding a module docstring after the license header:
# limitations under the License. +"""Static training data and prompts for VANNA text-to-SQL integration. + +This module provides example DDL statements, documentation, question-SQL pairs, +and prompt templates used for training and configuring the VANNA framework +with NVIDIA NIM integration. +""" + # yapf: disable -# ruff: noqa: E501packages/nvidia_nat_vanna/README.md (3)
160-177: Add language identifier to fenced code block.Markdown best practices require language identifiers for fenced code blocks. Add "text" or "plaintext" for the expected output example.
Apply this diff:
Expected output: -``` +```text # Ingest DDL and synthesize query-SQL pairs for training
409-423: Add language identifiers to fenced code blocks.Lines 409 and 417 have fenced code blocks without language identifiers. Add "text" for error message examples.
Apply this diff:
**Milvus connection failed:** -``` +```text Error: Failed to connect to Milvus
- Verify Milvus is running:
docker ps | grep milvus- Check host/port configuration
- Verify TLS settings match your Milvus deployment
Database connection failed:
-+text
Error: Failed to connect to database
447-461: Add language identifier to fenced code block.Line 447 starts a fenced code block for the package structure without a language identifier. Add "text" for the directory tree.
Apply this diff:
## Package Structure -``` +```text nvidia_nat_vanna/packages/nvidia_nat_vanna/src/nat/meta/pypi.md (1)
95-95: Convert bare URL to markdown link.Line 95 uses a bare URL which should be in markdown link format per markdown best practices.
Apply this diff:
-Full documentation: https://nvidia.github.io/NeMo-Agent-Toolkit/ +Full documentation: [https://nvidia.github.io/NeMo-Agent-Toolkit/](https://nvidia.github.io/NeMo-Agent-Toolkit/)packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (4)
130-137: Avoid reliance on retriever private attrs; add defensive checks._accessing retriever._is_async/_client is brittle. Validate with getattr and fail gracefully.
- if not retriever._is_async: # type: ignore[attr-defined] + if not getattr(retriever, "_is_async", False): # robust check msg = (f"Milvus retriever '{config.milvus_retriever}' must be configured with " "use_async_client=true for Vanna text2sql function") raise ValueError(msg) - async_milvus_client = retriever._client # type: ignore[attr-defined] + async_milvus_client = getattr(retriever, "_client", None) + if async_milvus_client is None: + raise ValueError("Milvus retriever did not expose an async client via _client")Please confirm the retriever interface guarantees these attributes; if there’s a public accessor, prefer it. Based on learnings.
16-19: Add a concise module docstring.Required by guidelines for public modules.
import logging import uuid from collections.abc import AsyncGenerator +"""Text2SQL function using Vanna + Milvus with Databricks execution and NAT streaming."""
220-229: Keep logger.error when re-raising; silence ruff TRY400 locally.Per guidelines, use logger.error when re-raising. Add noqa to satisfy ruff.
- except Exception as e: - logger.error(f"SQL generation failed: {e}") + except Exception as e: + logger.error(f"SQL generation failed: {e}") # noqa: TRY400 (re-raising below per logging policy)
112-154: Singleton reuse across differing configs can cause drift.Reusing VannaSingleton without comparing llm/embedder/milvus/dialect may leak previous config. Consider hashing relevant config and resetting when drift detected.
Would you like a helper that tracks a config_hash on the instance and resets when changed?
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (4)
255-261: Use asyncio.to_thread for executor offload; add return type hint.Cleaner and future-proof vs get_event_loop/run_in_executor.
- import asyncio - - # Run synchronous query in executor - loop = asyncio.get_event_loop() - query_result = await loop.run_in_executor(None, execute_query, connection, query, catalog, schema, database_type) - - return query_result.to_dataframe() + import asyncio + from typing import Any + query_result = await asyncio.to_thread(execute_query, connection, query, catalog, schema, database_type) + return query_result.to_dataframe()Also annotate the signature:
-async def async_execute_query( +async def async_execute_query( connection: Any, query: str, catalog: str | None = None, schema: str | None = None, database_type: str | None = None, -) : +) -> Any:
68-77: Avoid re-importing BaseModel inside function.Use the module-level import.
- from pydantic import BaseModel - # Handle BaseModel objects (e.g., Text2SQLOutput) if isinstance(sql_query, BaseModel):
153-161: Mark unused parameters to satisfy ruff (Databricks-only path).These are retained for future DBs; silence ARG001 cleanly.
def connect_to_database( database_type: str, host: str | None = None, - port: int | None = None, - database: str | None = None, - username: str | None = None, + port: int | None = None, # noqa: ARG001 + database: str | None = None, # noqa: ARG001 + username: str | None = None, # noqa: ARG001 password: str | None = None, **kwargs, ) -> Any:Similarly in setup_vanna_db_connection for port/database/username.
144-151: Logging policy vs ruff TRY400.These blocks log and re-raise; keep logger.error per guidelines. Add inline noqa to appease ruff.
- except ImportError: - logger.error("databricks-sql-connector not installed") + except ImportError: + logger.error("databricks-sql-connector not installed") # noqa: TRY400 raise - except Exception as e: - logger.error(f"Failed to connect to Databricks: {e}") + except Exception as e: + logger.error(f"Failed to connect to Databricks: {e}") # noqa: TRY400 raise @@ - except Exception as e: - logger.error(f"Error executing query: {e}") + except Exception as e: + logger.error(f"Error executing query: {e}") # noqa: TRY400 raise @@ - except Exception as e: - logger.error(f"Error executing SQL: {e}") + except Exception as e: + logger.error(f"Error executing SQL: {e}") # noqa: TRY400 raiseAlso applies to: 231-233, 318-319
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (2)
16-23: Add a concise module docstring.import logging import uuid from collections.abc import AsyncGenerator from typing import Any +"""Execute a SQL query (Databricks) with streaming progress and structured output for NAT."""
216-223: Use logger.exception when not re-raising.We swallow the error and yield a failure result; capture stack trace.
- except Exception as e: - logger.error(f"Error executing SQL query: {e}") + except Exception as e: + logger.exception(f"Error executing SQL query: {e}") yield ExecuteDBQueryOutput( success=False, failure_reason=str(e), sql_query=sql_query, dataframe_info=DataFrameInfo(shape=[0, 0], dtypes={}, columns=[]), )packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (5)
141-144: Replace en dash with hyphen to satisfy RUF001.- "Please generate diverse question–SQL pairs where each SQL " + "Please generate diverse question-SQL pairs where each SQL "
477-479: Use logger.exception when not re-raising.Capture the stack trace for observability.
- except Exception as e: - logger.error(f"Error retrieving {collection_name}: {e}") + except Exception as e: + logger.exception(f"Error retrieving {collection_name}: {e}")
506-507: Use logger.exception when not re-raising.Same rationale for similar-queries retrieval.
- except Exception as e: - logger.error(f"Error retrieving similar questions: {e}") + except Exception as e: + logger.exception(f"Error retrieving similar questions: {e}")
170-173: Silence unused kwargs and keep interface stable.These kwargs aren’t used; acknowledge to satisfy ruff.
- def get_sql_prompt( + def get_sql_prompt( self, @@ - error_message: dict | None = None, - **kwargs, + error_message: dict | None = None, @@ - ) -> list: + ) -> list: """Generate prompt for SQL generation.""" + # kwargs reserved for future prompt knobs @@ - async def submit_prompt(self, prompt, **kwargs) -> str: + async def submit_prompt(self, prompt, **kwargs) -> str: """Submit prompt to LLM.""" + _ = kwargs # noqa: F841Also applies to: 212-229
584-599: Config flag unused: allow_llm_to_see_data has no effect.Either use it (e.g., include a brief note in the prompt or enable a guarded data peek) or remove from API to avoid confusion.
Want a small patch that injects a hint into the prompt when True?
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
packages/nvidia_nat_vanna/README.md(1 hunks)packages/nvidia_nat_vanna/pyproject.toml(1 hunks)packages/nvidia_nat_vanna/src/nat/meta/__init__.py(1 hunks)packages/nvidia_nat_vanna/src/nat/meta/pypi.md(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py(1 hunks)packages/nvidia_nat_vanna/text2sql_config.yml(1 hunks)src/nat/retriever/milvus/register.py(2 hunks)src/nat/retriever/milvus/retriever.py(6 hunks)tests/nat/retriever/test_retrievers.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (13)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)
Files:
packages/nvidia_nat_vanna/text2sql_config.ymlpackages/nvidia_nat_vanna/src/nat/meta/__init__.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.pypackages/nvidia_nat_vanna/src/nat/meta/pypi.mdsrc/nat/retriever/milvus/register.pypackages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pytests/nat/retriever/test_retrievers.pysrc/nat/retriever/milvus/retriever.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
packages/nvidia_nat_vanna/text2sql_config.ymlpackages/nvidia_nat_vanna/src/nat/meta/__init__.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.pypackages/nvidia_nat_vanna/src/nat/meta/pypi.mdsrc/nat/retriever/milvus/register.pypackages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pytests/nat/retriever/test_retrievers.pysrc/nat/retriever/milvus/retriever.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_vanna/text2sql_config.ymlpackages/nvidia_nat_vanna/src/nat/meta/__init__.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.pypackages/nvidia_nat_vanna/src/nat/meta/pypi.mdpackages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible
Files:
packages/nvidia_nat_vanna/src/nat/meta/__init__.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.pysrc/nat/retriever/milvus/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pytests/nat/retriever/test_retrievers.pysrc/nat/retriever/milvus/retriever.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development
Files:
packages/nvidia_nat_vanna/src/nat/meta/__init__.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.pysrc/nat/retriever/milvus/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pysrc/nat/retriever/milvus/retriever.py
packages/*/src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml
Files:
packages/nvidia_nat_vanna/src/nat/meta/__init__.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/nat/**/* contains core functionality; changes should prioritize backward compatibility
Files:
src/nat/retriever/milvus/register.pysrc/nat/retriever/milvus/retriever.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/retriever/milvus/register.pysrc/nat/retriever/milvus/retriever.py
{docs/source/**/*.md,**/README.@(md|ipynb)}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{docs/source/**/*.md,**/README.@(md|ipynb)}: Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Never use “NAT”/“nat” abbreviations in documentation
Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct
Files:
packages/nvidia_nat_vanna/README.md
packages/*/pyproject.toml
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
packages/*/pyproject.toml: Each package under packages/ must contain a pyproject.toml
packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-
Dependencies in pyproject.toml should use ~= with two-digit versions (e.g., ~=1.0)
Files:
packages/nvidia_nat_vanna/pyproject.toml
{**/pyproject.toml,uv.lock}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via ‘uv pip install --sync’
Files:
packages/nvidia_nat_vanna/pyproject.toml
{tests/**/*.py,examples/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests live in tests/ or examples/*/tests and use markers defined in pyproject.toml (e.g., integration)
Files:
tests/nat/retriever/test_retrievers.py
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}: Use pytest with pytest-asyncio for async code
Test functions should be named with test_ prefix using snake_case
Extract frequently repeated test code into pytest fixtures
Pytest fixtures should define the name argument in @pytest.fixture and the function should be prefixed fixture_ using snake_case
Mock external services with pytest_httpserver or unittest.mock; avoid live endpoints
Mark slow tests with @pytest.mark.slow to skip in default test suite
Mark integration tests requiring external services with @pytest.mark.integration and follow nat-tests guidelines
Files:
tests/nat/retriever/test_retrievers.py
tests/**/*.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/retriever/test_retrievers.py
🧠 Learnings (1)
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to packages/*/pyproject.toml : packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-
Applied to files:
packages/nvidia_nat_vanna/pyproject.toml
🧬 Code graph analysis (7)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)
execute_db_query(92-252)packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1)
text2sql(101-267)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
packages/nvidia_nat_data_flywheel/tests/observability/processor/test_dfw_record_processor.py (1)
model_dump_json(336-338)tests/nat/authentication/test_bearer_token_validator.py (1)
json(111-112)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (7)
src/nat/builder/builder.py (1)
Builder(68-290)src/nat/builder/framework_enum.py (1)
LLMFrameworkEnum(19-25)src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(481-493)src/nat/data_models/component_ref.py (3)
EmbedderRef(83-91)LLMRef(116-124)RetrieverRef(149-157)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
setup_vanna_db_connection(264-326)run_sql(313-319)packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (6)
VannaSingleton(640-741)train_vanna(744-822)instance(647-653)get_instance(656-728)generate_sql(581-637)close(551-558)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (6)
src/nat/builder/builder.py (1)
Builder(68-290)src/nat/builder/framework_enum.py (1)
LLMFrameworkEnum(19-25)src/nat/builder/function_info.py (2)
FunctionInfo(290-625)create(351-549)src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(481-493)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (3)
async_execute_query(236-261)connect_to_database(153-191)extract_sql_from_message(52-121)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)
src/nat/agent/reasoning_agent/reasoning_agent.py (1)
remove_r1_think_tags(87-96)tests/nat/retriever/test_retrievers.py (4)
embed_documents(169-170)aembed_query(164-167)search(85-125)search(231-271)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)
run_sql(313-319)
tests/nat/retriever/test_retrievers.py (1)
src/nat/retriever/milvus/retriever.py (4)
search(96-97)MilvusRetriever(36-263)CollectionNotFoundError(32-33)bind(69-80)
src/nat/retriever/milvus/retriever.py (1)
tests/nat/retriever/test_retrievers.py (8)
list_collections(33-34)list_collections(179-180)describe_collection(36-74)describe_collection(182-220)aembed_query(164-167)search_iterator(127-154)search(85-125)search(231-271)
🪛 markdownlint-cli2 (0.18.1)
packages/nvidia_nat_vanna/src/nat/meta/pypi.md
95-95: Bare URL used
(MD034, no-bare-urls)
packages/nvidia_nat_vanna/README.md
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
409-409: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
417-417: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
447-447: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
474-474: Bare URL used
(MD034, no-bare-urls)
475-475: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.14.2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
17-17: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
144-144: Consider moving this statement to an else block
(TRY300)
146-146: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
149-149: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
156-156: Unused function argument: port
(ARG001)
157-157: Unused function argument: database
(ARG001)
158-158: Unused function argument: username
(ARG001)
232-232: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
268-268: Unused function argument: port
(ARG001)
269-269: Unused function argument: database
(ARG001)
270-270: Unused function argument: username
(ARG001)
318-318: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
220-220: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
266-266: Do not catch blind exception: Exception
(BLE001)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
94-94: Unused function argument: builder
(ARG001)
216-216: Do not catch blind exception: Exception
(BLE001)
217-217: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
61-61: Abstract raise to an inner function
(TRY301)
69-69: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
70-70: Avoid specifying long messages outside the exception class
(TRY003)
141-141: String contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF001)
170-170: Unused method argument: kwargs
(ARG002)
212-212: Unused method argument: kwargs
(ARG002)
228-228: Consider moving this statement to an else block
(TRY300)
231-231: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
253-253: Abstract raise to an inner function
(TRY301)
259-259: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
270-270: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
294-294: Consider moving this statement to an else block
(TRY300)
339-339: Consider moving this statement to an else block
(TRY300)
383-383: Consider moving this statement to an else block
(TRY300)
417-417: Unused method argument: kwargs
(ARG002)
428-428: Unused method argument: kwargs
(ARG002)
443-443: Unused method argument: kwargs
(ARG002)
477-477: Do not catch blind exception: Exception
(BLE001)
478-478: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
481-481: Unused method argument: kwargs
(ARG002)
506-506: Do not catch blind exception: Exception
(BLE001)
507-507: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
510-510: Unused method argument: kwargs
(ARG002)
557-557: Do not catch blind exception: Exception
(BLE001)
584-584: Unused method argument: allow_llm_to_see_data
(ARG002)
631-631: Do not catch blind exception: Exception
(BLE001)
739-739: Do not catch blind exception: Exception
(BLE001)
811-811: Do not catch blind exception: Exception
(BLE001)
813-813: Do not catch blind exception: Exception
(BLE001)
tests/nat/retriever/test_retrievers.py
166-166: Avoid specifying long messages outside the exception class
(TRY003)
169-169: Unused method argument: texts
(ARG002)
src/nat/retriever/milvus/retriever.py
61-61: Avoid specifying long messages outside the exception class
(TRY003)
121-121: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (12)
packages/nvidia_nat_vanna/pyproject.toml (1)
23-24: LGTM! Dependency declarations follow guidelines.The dependencies correctly use
~=with two-digit versions (e.g.,~=1.4) and declare dependency onnvidia-natas required for plugin packages.Based on learnings.
packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (1)
1-22: LGTM! Registration module follows conventions.The registration module correctly:
- Includes the required SPDX Apache-2.0 header with current copyright year (2025)
- Imports the provider modules for automatic registration
- Follows the standard NAT plugin registration pattern
src/nat/retriever/milvus/register.py (2)
51-54: LGTM! Configuration field is well-documented.The new
use_async_clientfield is properly:
- Typed as
boolwith a default ofFalse- Documented with a clear description using Pydantic
Field- Integrated into the client creation logic below
69-77: LGTM! Runtime client selection is clean and correct.The runtime branching between
AsyncMilvusClientandMilvusClientbased on theuse_async_clientflag is:
- Clear and easy to understand
- Properly imports the client types only when needed
- Passes the same configuration parameters to both client types
- Compatible with the downstream
MilvusRetrieverwhich now acceptsAnytypetests/nat/retriever/test_retrievers.py (3)
164-167: LGTM! Async embedding method correctly implemented.The
aembed_querymethod properly mirrors the syncembed_querymethod with:
- Input validation
- Same return value structure
- Async signature using
async defThis enables testing of async Milvus retriever paths.
173-271: LGTM! Async mock client properly structured.The
CustomAsyncMilvusClientcorrectly:
- Mirrors the structure of
CustomMilvusClientfor consistency- Makes all methods async (
async def) where needed- Implements the same validation and return value logic
- Provides test fixtures for async code paths
The duplication between sync and async clients is acceptable for test fixtures.
500-574: LGTM! Comprehensive async test coverage.The new async Milvus retriever tests provide excellent coverage:
test_async_milvus_search: Validates search functionality, top_k behavior, and output fieldstest_async_milvus_retriever_binding: Tests parameter binding and error casestest_async_milvus_validation: Validates field existence checksThese tests mirror the sync test structure, ensuring both code paths are thoroughly tested.
src/nat/retriever/milvus/retriever.py (4)
43-58: LGTM! Async client detection is pragmatic and well-documented.The changes correctly:
- Widen the client type to
Anyto accommodate both sync and async Milvus clients- Detect async mode via type name comparison (
"AsyncMilvusClient")- Log the client mode for observability
- Maintain backward compatibility with existing sync usage
The type name string comparison is a practical approach that avoids circular import issues.
88-94: LGTM! Collection validation correctly handles both client types.The
_validate_collectionmethod properly:
- Becomes async to support both code paths
- Branches on
_is_asyncto await or call directly- Returns a boolean consistently
- Will work with both sync and async clients
120-192: LGTM! Search iterator implementation handles both async and sync paths.The
_search_with_iteratormethod correctly:
- Uses async
aembed_queryfor embedding (line 133)- Branches on
_is_asyncfor all Milvus operations (describe_collection, search_iterator, next, close)- Awaits async operations when needed
- Maintains consistent return types and error handling
The dual-path implementation is clean and preserves existing behavior for sync clients.
213-262: LGTM! Search implementation properly supports both client types.The
_searchmethod correctly:
- Validates collection asynchronously
- Branches on
_is_asyncfor describe_collection and search operations- Uses async
aembed_queryfor embedding (line 237)- Awaits operations when in async mode
- Maintains consistent validation and error handling
The implementation is thorough and maintains backward compatibility.
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1)
86-90: No action required—import path is correct.The canonical definition of
remove_r1_think_tagsis atsrc/nat/utils/io/model_processing.py:19. The import in vanna_utils.py matches this location and is consistent with all other usages across the codebase. The local function definition in reasoning_agent.py is a separate implementation, not the intended import source. The import pathnat.utils.io.model_processingis correct.
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (5)
packages/nvidia_nat_vanna/README.md (4)
3-3: Remove NAT abbreviation from line 3.Per documentation guidelines, never use "NAT" abbreviation. Replace with the full toolkit name.
-Production-ready text-to-SQL integration for NeMo Agent Toolkit (NAT) using the Vanna framework with Databricks support and vector-based few-shot learning. +Production-ready text-to-SQL integration for NVIDIA NeMo Agent Toolkit using the Vanna framework with Databricks support and vector-based few-shot learning.
139-140: Fix NAT abbreviation in CLI comment.The comment on line 140 uses "NAT CLI" which violates documentation guidelines. The actual command
nat runis correct (it's the CLI command), but the comment should reference the toolkit by its full name.-# Using NAT CLI +# Using NeMo Agent Toolkit CLI
399-402: Replace NAT abbreviation with full toolkit name.Line 400 uses "NAT's" which violates documentation guidelines. Use the full toolkit name instead.
Other features include: -- Full integration with NAT's intermediate step tracking system +- Full integration with NeMo Agent Toolkit's intermediate step tracking system
476-478: Convert bare URLs to markdown links and fix NAT abbreviation.Lines 476–477 use bare URLs which should be in markdown link format. Line 478 also contains the "NAT" abbreviation which must be replaced.
For questions and support: -- GitHub Issues: https://github.com/NVIDIA/NeMo-Agent-Toolkit/issues -- Documentation: https://docs.nvidia.com/nemo/agent-toolkit/latest/ -- Discord: Join the NAT community Discord +- [GitHub Issues](https://github.com/NVIDIA/NeMo-Agent-Toolkit/issues) +- [Documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/) +- Discord: Join the NeMo Agent Toolkit community Discordpackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (1)
52-52: Address the placeholder constant per previous review.This constant was flagged in a previous review as placeholder code. Additionally, it lacks a type hint (unlike the other constants in this file) and has no explanatory comment describing its purpose.
Add the type hint and either remove this constant if unused, or replace with production values and document its purpose:
-VANNA_ACTIVE_TABLES = ['catalog.schema.table_a', 'catalog.schema.table_b'] +# Tables to include/exclude during specific operations (update with actual table names). +VANNA_ACTIVE_TABLES: list[str] = ['catalog.schema.table_a', 'catalog.schema.table_b']
🧹 Nitpick comments (6)
packages/nvidia_nat_vanna/README.md (4)
160-160: Specify language for fenced code block.Add the language identifier to the code block on line 160 for proper syntax highlighting.
-Expected output: +Expected output: -``` +```text # Ingest DDL and synthesize query-SQL pairs for training
409-409: Specify language for fenced code block.Add the language identifier to the code block on line 409 for proper syntax highlighting.
**Milvus connection failed:** -``` +```text Error: Failed to connect to Milvus
417-417: Specify language for fenced code block.Add the language identifier to the code block on line 417 for proper syntax highlighting.
**Database connection failed:** -``` +```text Error: Failed to connect to database
449-449: Specify language for fenced code block.Add the language identifier to the code block on line 449 for proper syntax highlighting.
## Package Structure -``` +```text nvidia_nat_vanna/packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py (2)
19-25: Move instructional comments to external documentation.The comment "Define your database schema here to help the model understand table structures" is instructional and suggests users should replace this data. Per coding guidelines, placeholder text should not appear in source code. Either:
- Move these instructions to the quickstart documentation or configuration examples, or
- Clearly document that these are example values for demonstration purposes.
Update the comment to clarify intent:
-# DDL statements for training -# Define your database schema here to help the model understand table structures +# Example DDL statements for training. +# These demo tables are used in the quickstart guide. Replace with your actual schema.
55-69: Add type hints for consistency.The prompt template constants lack type hints, which is inconsistent with the other constants in this file (lines 21, 29, 37). Adding explicit
strtype hints improves consistency and helps with static analysis.Add type hints to both constants:
-VANNA_RESPONSE_GUIDELINES = """ +VANNA_RESPONSE_GUIDELINES: str = """ Response Guidelines: 1. Carefully analyze the question to understand the user's intent, target columns, filters, and any aggregation or grouping requirements. 2. Output only JSON: { "sql": "<valid SQL query>", "explanation": "<brief description>", } """ -VANNA_TRAINING_PROMPT = """ +VANNA_TRAINING_PROMPT: str = """ Response Guidelines: 1. Generate 20 natural language questions and their corresponding valid SQL queries. 2. Output JSON like: [{{"question": "...", "sql": "..."}}] """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/nvidia_nat_vanna/README.md(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
{docs/source/**/*.md,**/README.@(md|ipynb)}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{docs/source/**/*.md,**/README.@(md|ipynb)}: Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Never use “NAT”/“nat” abbreviations in documentation
Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct
Files:
packages/nvidia_nat_vanna/README.md
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)
Files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible
Files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development
Files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
packages/*/src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml
Files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
🧠 Learnings (4)
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Never use “NAT”/“nat” abbreviations in documentation
Applied to files:
packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Applied to files:
packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to **/*.py : In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Applied to files:
packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-08-22T20:18:28.041Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/cursor-rules.mdc:0-0
Timestamp: 2025-08-22T20:18:28.041Z
Learning: Applies to .cursor/rules/**/*.mdc : Use consistent project terminology in descriptions (e.g., NAT workflows, NAT CLI commands)
Applied to files:
packages/nvidia_nat_vanna/README.md
🪛 markdownlint-cli2 (0.18.1)
packages/nvidia_nat_vanna/README.md
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
409-409: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
417-417: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
449-449: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
476-476: Bare URL used
(MD034, no-bare-urls)
477-477: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.14.2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_schema.py
17-17: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (1)
packages/nvidia_nat_vanna/README.md (1)
467-467: Approve fix for NAT abbreviation.This line correctly uses the full toolkit name instead of the abbreviation. Good catch on the previous fix!
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Caution Docstrings generation - FAILED An unexpected error occurred while opening a pull request: Reference update failed - https://docs.github.com/rest/git/refs#create-a-reference |
willkill07
left a comment
There was a problem hiding this comment.
Documentation should also be updated to mention this new package and feature set.
dnandakumar-nv
left a comment
There was a problem hiding this comment.
I believe that the nvidia_nat_vanna package could also benefit from unit testing
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
Outdated
Show resolved
Hide resolved
8e80e5b to
bb462ec
Compare
There was a problem hiding this comment.
Actionable comments posted: 18
♻️ Duplicate comments (4)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (1)
23-23: Remove unusednoqadirective.The
noqa: E501directive is unnecessary since E501 (line length check) is not enabled in the project's ruff configuration.Apply this diff:
-# ruff: noqa: E501 +packages/nvidia_nat_vanna/pyproject.toml (1)
1-3: Add the required Apache-2.0 SPDX header.All TOML files must start with our standard Apache-2.0 header. Please insert the full SPDX block before
[build-system].Apply this diff:
+# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + [build-system]packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1)
100-101: Add return type annotation for the registered function.Public APIs must declare return types; add
-> AsyncGenerator[FunctionInfo, None]to satisfy our typing contract and prevent downstream type-checker errors.Apply this diff:
-@register_function(config_type=Text2SQLConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN]) -async def text2sql(config: Text2SQLConfig, builder: Builder): +@register_function(config_type=Text2SQLConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN]) +async def text2sql(config: Text2SQLConfig, builder: Builder) -> AsyncGenerator[FunctionInfo, None]:As per coding guidelines
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)
86-94: Annotate the registered function’s return type.
execute_db_queryis part of the public plugin surface; add-> AsyncGenerator[FunctionInfo, None]to comply with our typing requirements.Apply this diff:
@register_function( config_type=ExecuteDBQueryConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN], ) async def execute_db_query( config: ExecuteDBQueryConfig, _builder: Builder, -): +) -> AsyncGenerator[FunctionInfo, None]:As per coding guidelines
🧹 Nitpick comments (4)
packages/nvidia_nat_vanna/README.md (2)
191-208: Add language identifiers to fenced code blocks.Several fenced blocks (expected output, connection format examples, SQL snippets, and troubleshooting errors) omit a language hint, tripping markdownlint (MD040) and hurting syntax highlighting. Please add the appropriate language labels such as
text,bash, orsqlto each affected block.Also applies to: 261-276, 280-290, 475-489
543-545: Convert bare URLs to Markdown links.The Support section still exposes bare URLs, triggering MD034 and breaking our documentation style rules. Wrap each URL in Markdown link syntax (e.g.,
[GitHub Issues](https://github.com/NVIDIA/NeMo-Agent-Toolkit/issues)) and keep the descriptive text.packages/nvidia_nat_vanna/src/nat/meta/pypi.md (1)
130-133: Wrap the documentation URL in Markdown.Line 132 exposes a bare URL; convert it to
[Documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/)(or similar descriptive text) to satisfy Markdown style rules.packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1)
284-416: Significant code duplication in collection creation methods.The three
_create_*_collectionmethods (_create_sql_collection,_create_ddl_collection,_create_doc_collection) share nearly identical structure with only field names differing. This violates DRY principles and makes maintenance harder.Consider extracting a common method:
async def _create_collection(self, name: str, field_configs: list[dict]): """Create a Milvus collection with specified fields.""" from pymilvus import DataType, MilvusClient, MilvusException try: await self.async_milvus_client.load_collection(collection_name=name) logger.debug(f"Collection {name} already exists, skipping creation") return except MilvusException as e: if "collection not found" not in str(e).lower(): raise schema = MilvusClient.create_schema(auto_id=False, enable_dynamic_field=False) schema.add_field( field_name="id", datatype=DataType.VARCHAR, is_primary=True, max_length=65535, ) for field_config in field_configs: schema.add_field(**field_config) schema.add_field( field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=self._embedding_dim, ) index_params = MilvusClient.prepare_index_params() index_params.add_index(field_name="vector", index_type="AUTOINDEX", metric_type="L2") await self.async_milvus_client.create_collection( collection_name=name, schema=schema, index_params=index_params, consistency_level="Strong", ) logger.info(f"Created collection: {name}") async def _create_sql_collection(self, name: str): """Create SQL collection using async client.""" from pymilvus import DataType await self._create_collection(name, [ {"field_name": "text", "datatype": DataType.VARCHAR, "max_length": 65535}, {"field_name": "sql", "datatype": DataType.VARCHAR, "max_length": 65535}, ])
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
packages/nvidia_nat_all/pyproject.toml(2 hunks)packages/nvidia_nat_vanna/README.md(1 hunks)packages/nvidia_nat_vanna/pyproject.toml(1 hunks)packages/nvidia_nat_vanna/src/nat/meta/pypi.md(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/__init__.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py(1 hunks)packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py(1 hunks)packages/nvidia_nat_vanna/text2sql_config.yml(1 hunks)packages/nvidia_nat_vanna/text2sql_training_config.yml(1 hunks)pyproject.toml(2 hunks)src/nat/retriever/milvus/register.py(2 hunks)src/nat/retriever/milvus/retriever.py(6 hunks)tests/nat/retriever/test_retrievers.py(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/nvidia_nat_vanna/src/nat/plugins/vanna/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/nat/retriever/milvus/register.py
- packages/nvidia_nat_vanna/text2sql_config.yml
🧰 Additional context used
📓 Path-based instructions (14)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)
Files:
packages/nvidia_nat_vanna/text2sql_training_config.ymlpackages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/tests/test_vanna_db_utils.pypyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.pysrc/nat/retriever/milvus/retriever.pypackages/nvidia_nat_vanna/src/nat/meta/pypi.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pytests/nat/retriever/test_retrievers.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
packages/nvidia_nat_vanna/text2sql_training_config.ymlpackages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/tests/test_vanna_db_utils.pypyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.pysrc/nat/retriever/milvus/retriever.pypackages/nvidia_nat_vanna/src/nat/meta/pypi.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pytests/nat/retriever/test_retrievers.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_vanna/text2sql_training_config.ymlpackages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/tests/test_vanna_db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.pypackages/nvidia_nat_vanna/src/nat/meta/pypi.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
packages/*/pyproject.toml
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
packages/*/pyproject.toml: Each package under packages/ must contain a pyproject.toml
packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-
Dependencies in pyproject.toml should use ~= with two-digit versions (e.g., ~=1.0)
Files:
packages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.toml
{**/pyproject.toml,uv.lock}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via ‘uv pip install --sync’
Files:
packages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpyproject.toml
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible
Files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/tests/test_vanna_db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.pysrc/nat/retriever/milvus/retriever.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pytests/nat/retriever/test_retrievers.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{src/**/*.py,packages/*/src/**/*.py}: All importable Python code must live under src/ or packages//src/
All public APIs must have Python 3.11+ type hints on parameters and return values
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or metadata when useful
Treat pyright warnings as errors during development
Files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.pysrc/nat/retriever/milvus/retriever.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
packages/*/src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
If a package contains Python code, it must have tests in a tests/ directory at the same level as pyproject.toml
Files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/register.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{tests/**/*.py,examples/*/tests/**/*.py,packages/*/tests/**/*.py}: Use pytest with pytest-asyncio for async code
Test functions should be named with test_ prefix using snake_case
Extract frequently repeated test code into pytest fixtures
Pytest fixtures should define the name argument in @pytest.fixture and the function should be prefixed fixture_ using snake_case
Mock external services with pytest_httpserver or unittest.mock; avoid live endpoints
Mark slow tests with @pytest.mark.slow to skip in default test suite
Mark integration tests requiring external services with @pytest.mark.integration and follow nat-tests guidelines
Files:
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.pytests/nat/retriever/test_retrievers.py
{tests/test_*.py,examples/*/tests/test_*.py,packages/*/tests/test_*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Name test files as test_*.py and store them in a tests/ folder
Files:
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py
{docs/source/**/*.md,**/README.@(md|ipynb)}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{docs/source/**/*.md,**/README.@(md|ipynb)}: Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Never use “NAT”/“nat” abbreviations in documentation
Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct
Files:
packages/nvidia_nat_vanna/README.md
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/nat/**/* contains core functionality; changes should prioritize backward compatibility
Files:
src/nat/retriever/milvus/retriever.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/retriever/milvus/retriever.py
{tests/**/*.py,examples/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests live in tests/ or examples/*/tests and use markers defined in pyproject.toml (e.g., integration)
Files:
tests/nat/retriever/test_retrievers.py
tests/**/*.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/retriever/test_retrievers.py
🧠 Learnings (16)
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to packages/*/pyproject.toml : packages/*/pyproject.toml must declare a dependency on nvidia-nat or a package starting with nvidia-nat-
Applied to files:
packages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpyproject.tomlpackages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {**/pyproject.toml,uv.lock} : New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via ‘uv pip install <pkg> --sync’
Applied to files:
packages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to **/*.py : In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Applied to files:
packages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to packages/*/pyproject.toml : Each package under packages/ must contain a pyproject.toml
Applied to files:
packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to examples/*/pyproject.toml : Examples containing Python code should have a pyproject.toml
Applied to files:
packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb} : Every file must start with the standard SPDX Apache-2.0 header
Applied to files:
packages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb} : All source files must include the SPDX Apache-2.0 header template (copy from an existing file)
Applied to files:
packages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Periodically run ‘uv pip list --outdated’ and upgrade dependencies
Applied to files:
packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Never use “NAT”/“nat” abbreviations in documentation
Applied to files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Applied to files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/meta/pypi.mdpackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
📚 Learning: 2025-08-22T20:18:28.041Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/cursor-rules.mdc:0-0
Timestamp: 2025-08-22T20:18:28.041Z
Learning: Applies to .cursor/rules/**/*.mdc : Use consistent project terminology in descriptions (e.g., NAT workflows, NAT CLI commands)
Applied to files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Applied to files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-09-15T21:26:29.430Z
Learnt from: saglave
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 726
File: examples/frameworks/adk_demo/src/nat_adk_demo/weather_update_tool.py:0-0
Timestamp: 2025-09-15T21:26:29.430Z
Learning: In NAT's ADK integration, the docstring of registered functions serves as the tool description sent to the LLM, not standard Python function documentation. The docstring should describe the logical tool interface (parameters the LLM will provide) rather than the NAT framework wrapper parameters like tool_config and builder.
Applied to files:
packages/nvidia_nat_vanna/README.mdpackages/nvidia_nat_vanna/src/nat/meta/pypi.md
📚 Learning: 2025-08-22T20:29:13.433Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 701
File: examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-with-custom-post-process.yml:16-18
Timestamp: 2025-08-22T20:29:13.433Z
Learning: In the NeMo-Agent-Toolkit project, configuration consistency across example files is prioritized over removing unused config options, even when those options aren't currently implemented in the runtime.
Applied to files:
packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to {docs/source/**/*.md,**/README.@(md|ipynb)} : Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct
Applied to files:
packages/nvidia_nat_vanna/README.md
📚 Learning: 2025-10-22T22:02:12.903Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-10-22T22:02:12.903Z
Learning: Applies to **/*.py : Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Applied to files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
🧬 Code graph analysis (8)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)
src/nat/agent/reasoning_agent/reasoning_agent.py (1)
remove_r1_think_tags(87-96)tests/nat/retriever/test_retrievers.py (4)
embed_documents(169-170)aembed_query(164-167)search(85-125)search(231-271)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)
run_sql(285-292)
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (11)
QueryResult(45-68)SupportedDatabase(39-42)connect_to_database(163-200)connect_to_databricks(143-160)execute_query(203-226)extract_sql_from_message(71-140)setup_vanna_db_connection(248-298)to_records(57-59)to_dataframe(51-55)row_count(62-68)run_sql(285-292)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py (2)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)
execute_db_query(90-237)packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (1)
text2sql(101-249)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
src/nat/data_models/common.py (1)
get_secret_value(177-193)packages/nvidia_nat_data_flywheel/tests/observability/processor/test_dfw_record_processor.py (1)
model_dump_json(336-338)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (6)
src/nat/builder/builder.py (1)
Builder(68-290)src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(482-494)src/nat/data_models/component_ref.py (3)
EmbedderRef(83-91)LLMRef(116-124)RetrieverRef(149-157)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
setup_vanna_db_connection(248-298)run_sql(285-292)packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (4)
VannaSingleton(648-759)instance(655-661)get_instance(664-737)generate_sql(589-645)
src/nat/retriever/milvus/retriever.py (2)
tests/nat/retriever/test_retrievers.py (8)
list_collections(33-34)list_collections(179-180)describe_collection(36-74)describe_collection(182-220)aembed_query(164-167)search_iterator(127-154)search(85-125)search(231-271)packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (1)
close(557-564)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (6)
src/nat/builder/builder.py (1)
Builder(68-290)src/nat/builder/framework_enum.py (1)
LLMFrameworkEnum(19-25)src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(482-494)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-27)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (5)
row_count(62-68)async_execute_query(229-245)connect_to_database(163-200)extract_sql_from_message(71-140)to_dataframe(51-55)src/nat/data_models/common.py (1)
get_secret_value(177-193)
tests/nat/retriever/test_retrievers.py (1)
src/nat/retriever/milvus/retriever.py (4)
search(96-97)MilvusRetriever(36-263)CollectionNotFoundError(32-33)bind(69-80)
🪛 markdownlint-cli2 (0.18.1)
packages/nvidia_nat_vanna/README.md
150-150: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
191-191: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
261-261: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
280-280: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
475-475: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
483-483: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
516-516: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
543-543: Bare URL used
(MD034, no-bare-urls)
544-544: Bare URL used
(MD034, no-bare-urls)
packages/nvidia_nat_vanna/src/nat/meta/pypi.md
132-132: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.14.3)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
61-61: Abstract raise to an inner function
(TRY301)
69-69: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
70-70: Avoid specifying long messages outside the exception class
(TRY003)
141-141: String contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF001)
170-170: Unused method argument: kwargs
(ARG002)
212-212: Unused method argument: kwargs
(ARG002)
228-228: Consider moving this statement to an else block
(TRY300)
231-231: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
253-253: Abstract raise to an inner function
(TRY301)
259-259: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
270-270: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
294-294: Consider moving this statement to an else block
(TRY300)
339-339: Consider moving this statement to an else block
(TRY300)
383-383: Consider moving this statement to an else block
(TRY300)
417-417: Unused method argument: kwargs
(ARG002)
428-428: Unused method argument: kwargs
(ARG002)
443-443: Unused method argument: kwargs
(ARG002)
477-477: Do not catch blind exception: Exception
(BLE001)
478-478: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
481-481: Unused method argument: kwargs
(ARG002)
506-506: Do not catch blind exception: Exception
(BLE001)
507-507: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
510-510: Unused method argument: kwargs
(ARG002)
563-563: Do not catch blind exception: Exception
(BLE001)
592-592: Unused method argument: allow_llm_to_see_data
(ARG002)
639-639: Do not catch blind exception: Exception
(BLE001)
753-753: Do not catch blind exception: Exception
(BLE001)
757-757: Do not catch blind exception: Exception
(BLE001)
834-834: Do not catch blind exception: Exception
(BLE001)
836-836: Do not catch blind exception: Exception
(BLE001)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
157-157: Consider moving this statement to an else block
(TRY300)
159-159: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
166-166: Unused function argument: kwargs
(ARG001)
225-225: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
252-252: Unused function argument: kwargs
(ARG001)
291-291: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
213-213: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
23-23: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
src/nat/retriever/milvus/retriever.py
61-61: Avoid specifying long messages outside the exception class
(TRY003)
121-121: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Avoid specifying long messages outside the exception class
(TRY003)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
201-201: Do not catch blind exception: Exception
(BLE001)
202-202: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/nat/retriever/test_retrievers.py
166-166: Avoid specifying long messages outside the exception class
(TRY003)
169-169: Unused method argument: texts
(ARG002)
🔇 Additional comments (7)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (2)
1-20: LGTM!The Apache 2.0 license header and module docstring are correct and follow the project's coding guidelines.
25-56: LGTM!The training data constants are properly typed, follow naming conventions, and contain valid example SQL statements and documentation.
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (5)
1-28: LGTM: License header and imports are correct.The Apache 2.0 license header is properly formatted with the correct copyright year, and the imports are appropriate for the module's functionality.
31-102: LGTM: Utility functions are well-implemented.The JSON extraction, think tag removal, and message conversion utilities are correctly implemented with proper error handling and docstrings.
567-646: LGTM: Well-structured SQL generation with parallel retrieval.The
VannaLangChainclass properly combines the Milvus vector store and LangChain LLM capabilities. Thegenerate_sqlmethod efficiently usesasyncio.gatherfor parallel retrieval of context and includes appropriate fallback handling for non-JSON responses.Note: The
allow_llm_to_see_dataparameter appears unused but may be reserved for future functionality or interface compatibility.
238-260: The review comment incorrectly identifies the blocking call location and context.Line 256 is inside the synchronous
__init__method (line 238), not an async initialization path. Callingembed_documents()synchronously during class initialization is normal and expected.The actual blocking issues are at lines 423, 434, and 449 in the async methods
add_question_sql(),add_ddl(), andadd_documentation(), where LangChain embedders support async methods likeaembed_documents()that should be used instead of the synchronousembed_documents()calls.Likely an incorrect or invalid review comment.
762-845: Review comment is incorrect:run_sqlis guaranteed to be available at runtime.The
run_sqlfunction is dynamically attached to the Vanna instance insetup_vanna_db_connection()(line 295 in db_utils.py). In the only call site fortrain_vanna()(text2sql.py line 166),setup_vanna_db_connection()is always invoked first (lines 158–162), ensuringvn.run_sqlexists when training starts. No AttributeError will occur in normal usage.The dynamic attribute pattern is intentional; type checkers cannot resolve it (hence the
# type: ignore[misc]comment on line 206 of text2sql.py), but this is a known design trade-off, not an error.Likely an incorrect or invalid review comment.
bb462ec to
741996f
Compare
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
…library Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
d0b62fe to
894b12c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
tests/nat/retriever/test_retrievers.py (2)
56-71: Fix collection metadata typo
collection2mistakenly labelscollection_nameas "collection1".- "collection2": { - "collection_name": "collection1", + "collection2": { + "collection_name": "collection2",
106-108: Use tuple in isinstance; PEP604 union invalid here
isinstance(timeout, float | int)raises TypeError.- if timeout: - assert isinstance(timeout, float | int) + if timeout is not None: + assert isinstance(timeout, (float, int))src/nat/retriever/milvus/retriever.py (2)
169-196: Don’t return inside the iteration loopReturning inside the loop truncates results after first batch. Move the return after the loop.
- results.extend(res[0]) - - return _wrap_milvus_results(results, content_field=self.content_field) + results.extend(res[0]) + # after loop completes + return _wrap_milvus_results(results, content_field=self.content_field)
275-277: Fix isinstance usage with union
Hit | dictis not valid in isinstance.- if not isinstance(res, Hit | dict): + if not isinstance(res, (Hit, dict)):
♻️ Duplicate comments (12)
packages/nvidia_nat_vanna/pyproject.toml (2)
1-55: Add required SPDX Apache-2.0 license header.The file is missing the standard SPDX license header required by coding guidelines. All source files, including pyproject.toml, must include the Apache-2.0 header comment at the top.
Add this header before line 1:
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. [build-system]
26-27: Remove version constraints from workspace dependencies.Lines 26–27 declare
nvidia-natandnvidia-nat-langchainwith version constraints (~=1.4), but the comment at line 21 explicitly states "Does not apply to nvidia-nat packages." Workspace dependencies should not have version constraints; they are managed via the[tool.uv.sources]section.Apply this diff:
# Keep sorted!!! "databricks-sql-connector~=4.1.4", "databricks-sqlalchemy~=2.0.8", "langchain-core~=0.3.76", # determined from nvidia-nat-langchain - "nvidia-nat~=1.4", - "nvidia-nat-langchain~=1.4", + "nvidia-nat", + "nvidia-nat-langchain", "pandas~=2.0", "pymilvus[model]~=2.5.11", "sqlglot~=26.33.0", "vanna[chromadb]~=0.7.9",Based on learnings from the PR author.
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py (2)
58-58: Add type hints for public constantsAnnotate exported constants to meet project typing guidelines.
-VANNA_ACTIVE_TABLES = ['catalog.schema.table_a', 'catalog.schema.table_b'] +VANNA_ACTIVE_TABLES: list[str] = ['catalog.schema.table_a', 'catalog.schema.table_b'] -VANNA_RESPONSE_GUIDELINES = """ +VANNA_RESPONSE_GUIDELINES: str = """ ... -""" +""" -VANNA_TRAINING_PROMPT = """ +VANNA_TRAINING_PROMPT: str = """ ... -""" +"""Also applies to: 61-69, 71-75
23-23: Remove unused ruff directive
# ruff: noqa: E501is unnecessary; E501 not enabled.-# ruff: noqa: E501 +packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)
117-121: Avoid KeyError on optional config; use defaultsProvide safe defaults for
reasoning_modelsandchat_models.- self.reasoning_models = self.config["reasoning_models"] - self.chat_models = self.config["chat_models"] + self.reasoning_models = self.config.get("reasoning_models", set()) + self.chat_models = self.config.get("chat_models", set())
242-251: Validate presence of async Milvus client in configFail fast with clear error rather than KeyError.
- self.async_milvus_client = config["async_milvus_client"] + if config is None or "async_milvus_client" not in config: + msg = "async_milvus_client must be provided in config" + raise ValueError(msg) + self.async_milvus_client = config["async_milvus_client"]
428-436: Don’t block event loop: use async embedding in async methodsUse
aembed_documentsfor async paths.- embedding = self.embedder.embed_documents([ddl])[0] + embedding = (await self.embedder.aembed_documents([ddl]))[0] @@ - embedding = self.embedder.embed_documents([documentation])[0] + embedding = (await self.embedder.aembed_documents([documentation]))[0]Also applies to: 443-451
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)
285-293: Use logger.error when re-raising to avoid duplicate stack tracesPer guidelines, log error (no traceback) and re-raise.
- except Exception: - logger.exception("Error executing SQL") + except Exception: + logger.error("Error executing SQL") raisepackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (2)
201-206: Log exceptions with traceback; keep generic failure_reasonUse logger.exception when not re-raising.
- except Exception as e: - logger.error("Error executing SQL query", exc_info=e) + except Exception: + logger.exception("Error executing SQL query") yield ExecuteDBQueryOutput( success=False, failure_reason="SQL execution failed. Please check server logs for details.",
148-151: Enforce read-only SQL before connectingReject non-SELECT/WITH early to prevent destructive ops.
- connection = connect_to_database( + import re + if not re.match(r'^\s*(WITH|SELECT)\b', sql_query, flags=re.IGNORECASE): + yield ExecuteDBQueryOutput( + success=False, + failure_reason="Only SELECT/WITH statements are allowed.", + sql_query=sql_query, + dataframe_info=DataFrameInfo(shape=[0, 0], dtypes={}, columns=[]), + ) + return + connection = connect_to_database(packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (2)
212-223: Fix exception logging to include traceback without leaking detailsUse exc_info=True (or logger.exception if not re-raising). You re-raise, so keep logger.error with exc_info=True and generic payload.
- except Exception as e: - logger.error("SQL generation failed", exc_info=e) + except Exception as e: + logger.error("SQL generation failed", exc_info=True) # Error status as ResponseIntermediateStep
195-209: Block non read-only SQL before executionPrevent destructive statements from LLM output; allow only SELECT/WITH.
- if config.execute_sql: + if config.execute_sql: + import re + if not re.match(r'^\s*(WITH|SELECT)\b', sql, flags=re.IGNORECASE): + yield ResponseIntermediateStep( + id=str(uuid.uuid4()), + parent_id=parent_id, + type="markdown", + name="text2sql_status", + payload=StatusPayload(message="Execution skipped: only SELECT/WITH statements are allowed.").model_dump_json(), + ) + yield Text2SQLOutput(sql=sql, explanation=explanation) + return yield ResponseIntermediateStep(
🧹 Nitpick comments (6)
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)
201-221: Add a test for engine reuse pathConsider a second call with vn.db_engine preset to ensure the “reuse existing engine” branch is exercised (assert connect_to_database not called again).
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (2)
100-102: Annotate public return typeRegistered functions must declare return type.
-async def text2sql(config: Text2SQLConfig, builder: Builder): +async def text2sql(config: Text2SQLConfig, builder: Builder) -> AsyncGenerator[FunctionInfo, None]:
42-48: Return executed rows when execute_sql=True (optional but useful)Expose records/row_count to callers.
-from pydantic import BaseModel -from pydantic import Field +from typing import Any +from pydantic import BaseModel +from pydantic import Field @@ class Text2SQLOutput(BaseModel): """Output schema for text2sql function.""" sql: str = Field(description="Generated SQL query") explanation: str | None = Field(default=None, description="Explanation of the query") + dataframe_records: list[dict[str, Any]] | None = Field(default=None, description="Rows when executed") + row_count: int | None = Field(default=None, description="Row count when executed") @@ - df = await vanna_instance.run_sql(sql) # type: ignore[misc] + df = await vanna_instance.run_sql(sql) # type: ignore[misc] logger.info(f"SQL executed successfully: {len(df)} rows returned") @@ - yield Text2SQLOutput(sql=sql, explanation=explanation) + result = Text2SQLOutput(sql=sql, explanation=explanation) + if config.execute_sql: + result.dataframe_records = df.to_dict("records") + result.row_count = len(df) + yield resultAlso applies to: 195-209
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
215-221: Avoid logging full query at info levelReduce exposure of sensitive SQL; prefer debug.
- logger.info(f"Executing query: {query}") + logger.debug("Executing query")
241-244: Use get_running_loop (modern asyncio)Prefer
asyncio.get_running_loop()in async contexts.- loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop()packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (1)
86-94: Annotate public return type (and silence unused builder)Add return type; keep
_builderbut silence lint if needed.-from collections.abc import AsyncGenerator +from collections.abc import AsyncGenerator @@ -async def execute_db_query( +async def execute_db_query( config: ExecuteDBQueryConfig, - _builder: Builder, -) : + _builder: Builder, # noqa: ARG001 +) -> AsyncGenerator[FunctionInfo, None]:
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (22)
ci/scripts/path_checks.py(1 hunks)ci/vale/styles/config/vocabularies/nat/accept.txt(1 hunks)docs/source/workflows/functions/index.md(1 hunks)docs/source/workflows/functions/text-to-sql.md(1 hunks)packages/nvidia_nat_all/pyproject.toml(2 hunks)packages/nvidia_nat_vanna/pyproject.toml(1 hunks)packages/nvidia_nat_vanna/src/nat/meta/pypi.md(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/__init__.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py(1 hunks)packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py(1 hunks)packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py(1 hunks)packages/nvidia_nat_vanna/text2sql_config.yml(1 hunks)packages/nvidia_nat_vanna/text2sql_training_config.yml(1 hunks)pyproject.toml(3 hunks)src/nat/retriever/milvus/register.py(2 hunks)src/nat/retriever/milvus/retriever.py(6 hunks)src/nat/tool/code_execution/code_sandbox.py(1 hunks)tests/nat/retriever/test_retrievers.py(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/nvidia_nat_vanna/src/nat/meta/pypi.md
🚧 Files skipped from review as they are similar to previous changes (9)
- ci/vale/styles/config/vocabularies/nat/accept.txt
- packages/nvidia_nat_vanna/text2sql_training_config.yml
- src/nat/retriever/milvus/register.py
- src/nat/tool/code_execution/code_sandbox.py
- packages/nvidia_nat_vanna/text2sql_config.yml
- packages/nvidia_nat_vanna/src/nat/plugins/vanna/register.py
- packages/nvidia_nat_vanna/src/nat/plugins/vanna/init.py
- docs/source/workflows/functions/text-to-sql.md
- pyproject.toml
🧰 Additional context used
📓 Path-based instructions (5)
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
packages/nvidia_nat_all/pyproject.tomlci/scripts/path_checks.pytests/nat/retriever/test_retrievers.pydocs/source/workflows/functions/index.mdpackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/tests/test_vanna_db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pysrc/nat/retriever/milvus/retriever.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.tomlpackages/nvidia_nat_vanna/tests/test_vanna_db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.pypackages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
tests/**/*.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/retriever/test_retrievers.py
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_staticdirectory.
Files:
docs/source/workflows/functions/index.md
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/retriever/milvus/retriever.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.
📚 Learning: 2025-11-10T21:26:35.059Z
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.
Applied to files:
packages/nvidia_nat_all/pyproject.tomlpackages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-11-14T20:33:53.944Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1181
File: packages/nvidia_nat_test/tests/test_test_llm.py:419-484
Timestamp: 2025-11-14T20:33:53.944Z
Learning: The NVIDIA NeMo-Agent-Toolkit project uses pytest-asyncio in strict mode (the default), which requires pytest.mark.asyncio decorator on all async test functions. All async tests in packages/nvidia_nat_test/tests/test_test_llm.py consistently follow this pattern.
Applied to files:
tests/nat/retriever/test_retrievers.py
📚 Learning: 2025-11-05T11:45:35.119Z
Learnt from: thepatrickchin
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1152
File: examples/config_inheritance/pyproject.toml:1-25
Timestamp: 2025-11-05T11:45:35.119Z
Learning: In the examples/ directory, pyproject.toml files typically do not include SPDX license headers, with only one exception (adk_demo). This is an established pattern that differs from the general guideline requiring SPDX headers in all .toml files.
Applied to files:
packages/nvidia_nat_vanna/pyproject.toml
📚 Learning: 2025-11-10T21:49:04.946Z
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: src/nat/retriever/milvus/retriever.py:133-149
Timestamp: 2025-11-10T21:49:04.946Z
Learning: In Python files using LangChain's `Embeddings` interface, `aembed_query` is always available on all embedders because the base class provides a default async implementation that wraps the synchronous method in an executor. Implementations may override with native async for better performance. It's safe to unconditionally call `await embedder.aembed_query(...)` without fallback checks.
Applied to files:
src/nat/retriever/milvus/retriever.py
📚 Learning: 2025-09-17T05:20:03.228Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 752
File: packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py:99-121
Timestamp: 2025-09-17T05:20:03.228Z
Learning: The AuthAdapter class in packages/nvidia_nat_mcp/src/nat/plugins/mcp/client_base.py has a known concurrency race condition where `self.auth_provider.config.auth_request = auth_request` can be racy under concurrent requests. This should be addressed by either refactoring AuthProviderBase.authenticate() to accept AuthRequest as a parameter or using an asyncio.Lock to serialize access.
Applied to files:
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
🧬 Code graph analysis (7)
tests/nat/retriever/test_retrievers.py (1)
src/nat/retriever/milvus/retriever.py (4)
search(100-101)MilvusRetriever(36-267)CollectionNotFoundError(32-33)bind(73-84)
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (11)
QueryResult(45-68)SupportedDatabase(39-42)connect_to_database(163-200)connect_to_databricks(143-160)execute_query(203-226)extract_sql_from_message(71-140)setup_vanna_db_connection(248-298)to_records(57-59)to_dataframe(51-55)row_count(62-68)run_sql(285-292)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py (7)
src/nat/builder/builder.py (1)
Builder(71-342)src/nat/builder/framework_enum.py (1)
LLMFrameworkEnum(19-25)src/nat/builder/function_info.py (2)
FunctionInfo(290-625)create(351-549)src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(482-494)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-36)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (5)
row_count(62-68)async_execute_query(229-245)connect_to_database(163-200)extract_sql_from_message(71-140)to_dataframe(51-55)src/nat/data_models/common.py (1)
get_secret_value(177-193)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py (8)
src/nat/builder/builder.py (1)
Builder(71-342)src/nat/builder/framework_enum.py (1)
LLMFrameworkEnum(19-25)src/nat/builder/function_info.py (2)
FunctionInfo(290-625)create(351-549)src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(482-494)src/nat/data_models/component_ref.py (3)
EmbedderRef(83-91)LLMRef(116-124)RetrieverRef(149-157)src/nat/data_models/function.py (1)
FunctionBaseConfig(26-36)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
setup_vanna_db_connection(248-298)run_sql(285-292)packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (5)
VannaSingleton(641-758)train_vanna(761-844)instance(655-661)get_instance(664-736)generate_sql(582-638)
src/nat/retriever/milvus/retriever.py (1)
tests/nat/retriever/test_retrievers.py (8)
list_collections(33-34)list_collections(179-180)describe_collection(36-74)describe_collection(182-220)aembed_query(164-167)search_iterator(127-154)search(85-125)search(231-271)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (2)
src/nat/data_models/common.py (1)
get_secret_value(177-193)packages/nvidia_nat_data_flywheel/tests/observability/processor/test_dfw_record_processor.py (1)
model_dump_json(336-338)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py (3)
src/nat/agent/reasoning_agent/reasoning_agent.py (1)
remove_r1_think_tags(87-96)tests/nat/retriever/test_retrievers.py (4)
embed_documents(169-170)aembed_query(164-167)search(85-125)search(231-271)packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py (1)
run_sql(285-292)
🪛 Ruff (0.14.5)
tests/nat/retriever/test_retrievers.py
166-166: Avoid specifying long messages outside the exception class
(TRY003)
169-169: Unused method argument: texts
(ARG002)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/execute_db_query.py
201-201: Do not catch blind exception: Exception
(BLE001)
202-202: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/text2sql.py
213-213: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
src/nat/retriever/milvus/retriever.py
65-65: Avoid specifying long messages outside the exception class
(TRY003)
125-125: Avoid specifying long messages outside the exception class
(TRY003)
218-218: Avoid specifying long messages outside the exception class
(TRY003)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/db_utils.py
157-157: Consider moving this statement to an else block
(TRY300)
159-159: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
166-166: Unused function argument: kwargs
(ARG001)
225-225: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
252-252: Unused function argument: kwargs
(ARG001)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/vanna_utils.py
61-61: Abstract raise to an inner function
(TRY301)
69-69: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
70-70: Avoid specifying long messages outside the exception class
(TRY003)
170-170: Unused method argument: kwargs
(ARG002)
212-212: Unused method argument: kwargs
(ARG002)
228-228: Consider moving this statement to an else block
(TRY300)
231-231: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
253-253: Abstract raise to an inner function
(TRY301)
259-259: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
270-270: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
294-294: Consider moving this statement to an else block
(TRY300)
339-339: Consider moving this statement to an else block
(TRY300)
383-383: Consider moving this statement to an else block
(TRY300)
417-417: Unused method argument: kwargs
(ARG002)
428-428: Unused method argument: kwargs
(ARG002)
443-443: Unused method argument: kwargs
(ARG002)
478-478: Redundant exception object included in logging.exception call
(TRY401)
481-481: Unused method argument: kwargs
(ARG002)
507-507: Redundant exception object included in logging.exception call
(TRY401)
510-510: Unused method argument: kwargs
(ARG002)
563-563: Do not catch blind exception: Exception
(BLE001)
585-585: Unused method argument: allow_llm_to_see_data
(ARG002)
632-632: Do not catch blind exception: Exception
(BLE001)
752-752: Do not catch blind exception: Exception
(BLE001)
756-756: Do not catch blind exception: Exception
(BLE001)
833-833: Do not catch blind exception: Exception
(BLE001)
835-835: Do not catch blind exception: Exception
(BLE001)
packages/nvidia_nat_vanna/src/nat/plugins/vanna/training_db_schema.py
23-23: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (4)
docs/source/workflows/functions/index.md (1)
18-18: Documentation updates look good.The header generalization and toctree addition appropriately reflect the expanded scope of the functions documentation to include the new text-to-sql capability alongside code-execution. The changes maintain consistency with the markdown structure and properly integrate the new documentation page.
Also applies to: 68-71
ci/scripts/path_checks.py (1)
153-154: LGTM! Allowlist entry correctly supports DeepSeek models.The addition of the DeepSeek model pattern is consistent with existing LLM model name patterns and correctly supports the text-to-SQL feature's DeepSeek integration mentioned in the PR objectives.
packages/nvidia_nat_all/pyproject.toml (1)
39-39: Workspace dependency additions are correct.Both changes correctly implement the workspace dependency pattern:
nvidia-nat-vannais added to the dependencies list without a version constraint (line 39) and mapped in[tool.uv.sources]as a workspace dependency (line 84). This follows the established pattern for all nvidia-nat-* plugins in this meta-package.Also applies to: 84-84
packages/nvidia_nat_vanna/tests/test_vanna_db_utils.py (1)
209-214: Good fix: initialize db_engine to None instead of deleting attributeSafer and matches setup_vanna_db_connection semantics; avoids AttributeError on fresh MagicMock.
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
|
/ok to test 0565169 |
|
/ok to test f9344a8 |
Signed-off-by: jiaxiangr <jiaxiangr@nvidia.com>
f9344a8 to
4c7294b
Compare
|
/ok to test 4c7294b |
Co-authored-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com> Signed-off-by: Jiaxiang Ren <jiaxiangr@nvidia.com>
Co-authored-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com> Signed-off-by: Jiaxiang Ren <jiaxiangr@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/workflows/functions/text-to-sql.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
docs/source/workflows/functions/text-to-sql.md
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_staticdirectory.
Files:
docs/source/workflows/functions/text-to-sql.md
🧠 Learnings (2)
📚 Learning: 2025-09-15T21:26:29.430Z
Learnt from: saglave
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 726
File: examples/frameworks/adk_demo/src/nat_adk_demo/weather_update_tool.py:0-0
Timestamp: 2025-09-15T21:26:29.430Z
Learning: In NAT's ADK integration, the docstring of registered functions serves as the tool description sent to the LLM, not standard Python function documentation. The docstring should describe the logical tool interface (parameters the LLM will provide) rather than the NAT framework wrapper parameters like tool_config and builder.
Applied to files:
docs/source/workflows/functions/text-to-sql.md
📚 Learning: 2025-11-10T21:26:35.059Z
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.
Applied to files:
docs/source/workflows/functions/text-to-sql.md
|
/ok to test e961df5 |
|
/merge |
1 similar comment
|
/merge |
This PR introduces `nvidia-nat-vanna`, a new NAT plugin that provides production-ready text-to-SQL capabilities using the Vanna framework with NVIDIA NIM integration. - Text-to-SQL Generation: Convert natural language questions to SQL queries using AI - MCP Support: native supported with NAT MCP server and client - Multi-Database Compatibility: Works out of the box with Databricks; other databases (e.g., PostgreSQL, MS SQL Server, SQLite, etc.) require connector definitions. - Fully automatic training and ingestion. Currently support: Databricks - Vector-Based RAG: Milvus integration for few-shot learning with similarity search - Reasoning LLMs support: Nemotron, OAI OSS, DeepSeek - Streaming Support: Real-time progress updates during SQL generation - Full integration with NAT's intermediate step tracking system - Better UI Display - Front-ends can render intermediate steps - Enterprise-Ready: Configurable authentication, error handling, and result formatting - NAT-Native: Follows NAT package conventions with proper function registration - Two NAT functions: text2sql (SQL generation) and execute_db_query (query execution) - Reusable package: packages/nvidia_nat_vanna/ - Comprehensive documentation with quick start guide and example configuration: text2sql_config.yml - Todo: move to examples folder as quickstart Closes NVIDIA#732 - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. * **New Features** * Text-to-SQL Vanna integration with streaming progress, optional Databricks query execution, and an execute-DB-query function; packaged for installation. * **Improvements** * Async and sync Milvus retriever support and enhanced LLM + vector-store training, retrieval, singleton lifecycle, and configurable model options. * **Bug Fixes** * Broadened sandbox output parsing to handle additional parse/type failures. * **Documentation** * Guides, quick-starts, and configuration examples for Text-to-SQL and Vanna. * **Tests** * New tests for Vanna DB utilities and async Milvus retriever paths. * **Chores** * Packaging, workspace registration, config YAMLs, dependency updates, CI allowlist and vocabulary updates. Authors: - Jiaxiang Ren (https://github.com/jiaxiangr) - https://github.com/aadityaramsh Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) - Eric Evans II (https://github.com/ericevans-nv) - Dhruv Nandakumar (https://github.com/dnandakumar-nv) - https://github.com/Salonijain27 - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#974
Description
This PR introduces
nvidia-nat-vanna, a new NAT plugin that provides production-ready text-to-SQL capabilities using the Vanna framework with NVIDIA NIM integration.Key Features
What's Included
Closes #732
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Documentation
Tests
Chores