-
Notifications
You must be signed in to change notification settings - Fork 485
Support additional provider parameters in LLM and Embedder config #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support additional provider parameters in LLM and Embedder config #749
Conversation
WalkthroughPydantic ConfigDict for four model config classes was updated to allow extra fields by adding Changes
Sequence Diagram(s)No sequence diagrams — changes are configuration-only. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Pre-merge checks (3 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/nat/embedder/test_openai_embedder.py (3)
16-16: Remove unused import.The
tkinterimport appears to be accidentally included and is unused in the test file.-from tkinter import N, NO
72-73: Follow PEP 8 recommendations for boolean comparisons.As suggested by the static analysis tool, avoid explicit comparisons to
TrueandFalsein assertions.- assert config_dict['tiktoken_enabled'] == False - assert config_dict['show_progress_bar'] == True + assert not config_dict['tiktoken_enabled'] + assert config_dict['show_progress_bar']
155-161: Enhance lazy import test with verification.The test verifies that langchain_openai is not imported when no extra params are provided, but it doesn't verify that the import would actually be triggered when extra params are present. Consider adding a complementary test.
Add this test method to verify the lazy import behavior more comprehensively:
@patch('langchain_openai.embeddings.base.OpenAIEmbeddings') def test_langchain_imported_when_extra_params_provided(self, mock_openai_embeddings): """Verify that langchain_openai is imported when extra params are provided.""" # Mock the OpenAI model fields mock_openai_embeddings.model_fields = { 'tiktoken_enabled': Mock(annotation=bool) } # Mock TypeAdapter validation with patch('pydantic.type_adapter.TypeAdapter') as mock_type_adapter: mock_adapter_instance = Mock() mock_adapter_instance.validate_python.side_effect = lambda x: x mock_type_adapter.return_value = mock_adapter_instance config = OpenAIEmbedderModelConfig( model_name="text-embedding-ada-002", tiktoken_enabled=True ) # Verify the import was triggered mock_openai_embeddings.model_fields.__getitem__.assert_called_with('tiktoken_enabled') mock_type_adapter.assert_called_once()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/nat/embedder/openai_embedder.py(2 hunks)tests/nat/embedder/test_openai_embedder.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/**/*.py: All importable Python code must live under src/
All public APIs in src/ require Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise with a period; surround code entities with backticks
Files:
src/nat/embedder/openai_embedder.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Core functionality under src/nat should prioritize backward compatibility when changed
Files:
src/nat/embedder/openai_embedder.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/embedder/openai_embedder.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests must live under tests/ and use configured markers (e2e, integration, etc.)
Files:
tests/nat/embedder/test_openai_embedder.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/embedder/test_openai_embedder.py
**/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/tests/**/*.py: Test functions must use the test_ prefix and snake_case
Extract repeated test code into pytest fixtures; fixtures should set name=... in @pytest.fixture and functions named with fixture_ prefix
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Use pytest with pytest-asyncio for async code; mock external services with pytest_httpserver or unittest.mock
Files:
tests/nat/embedder/test_openai_embedder.py
🧬 Code graph analysis (1)
tests/nat/embedder/test_openai_embedder.py (1)
src/nat/embedder/openai_embedder.py (1)
OpenAIEmbedderModelConfig(29-68)
🪛 Ruff (0.12.2)
tests/nat/embedder/test_openai_embedder.py
72-72: Avoid equality comparisons to False; use not config_dict['tiktoken_enabled']: for false checks
Replace with not config_dict['tiktoken_enabled']
(E712)
73-73: Avoid equality comparisons to True; use config_dict['show_progress_bar']: for truth checks
Replace with config_dict['show_progress_bar']
(E712)
🔇 Additional comments (2)
src/nat/embedder/openai_embedder.py (1)
32-32: LGTM! Configuration change enables dynamic parameter support.The addition of
extra="allow"to the Pydantic model configuration correctly enables acceptance of extra parameters, which is essential for the new dynamic validation functionality.tests/nat/embedder/test_openai_embedder.py (1)
24-184: Excellent test coverage with comprehensive validation scenarios.The test suite thoroughly covers the new dynamic parameter validation functionality including:
- Valid configuration creation and serialization
- Extra parameter acceptance and validation
- Type coercion behavior
- Invalid parameter rejection
- Type validation enforcement
- Required field validation
- Field aliasing functionality
- Lazy import behavior
- Configuration preservation
- Schema description verification
The mocking strategy appropriately isolates the validation logic while testing the integration points. The tests align well with the PR objectives of enabling dynamic OpenAI parameter support.
8ed7fcd to
c110673
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/nat/embedder/openai_embedder.py (1)
71-75: Add return type and a concise Google-style docstring to the provider.Meets repo typing/docstring requirements.
-async def openai_embedder_model(config: OpenAIEmbedderModelConfig, _builder: Builder): +async def openai_embedder_model( + config: OpenAIEmbedderModelConfig, _builder: Builder +) -> AsyncIterator[EmbedderProviderInfo]: - yield EmbedderProviderInfo(config=config, description="An OpenAI model for use with an Embedder client.") + """Register the OpenAI embedder provider. + + Yields: + EmbedderProviderInfo: Provider info for the configured OpenAI embedder. + """ + yield EmbedderProviderInfo(config=config, description="An OpenAI model for use with an Embedder client.")
♻️ Duplicate comments (1)
src/nat/embedder/openai_embedder.py (1)
49-66: Harden dynamic import and type validation error reporting.Add defensive handling for missing/incompatible langchain-openai, missing annotations, and wrap TypeAdapter errors with context. This improves UX and test determinism.
Apply:
- from langchain_openai.embeddings.base import OpenAIEmbeddings - from pydantic import TypeAdapter + try: + from langchain_openai.embeddings.base import OpenAIEmbeddings + from pydantic import TypeAdapter + except ImportError as e: + raise ValueError( + "Unable to validate OpenAI parameters: langchain-openai is not installed or incompatible." + ) from e @@ - valid_openai_fields = set(OpenAIEmbeddings.model_fields.keys()) + model_fields = getattr(OpenAIEmbeddings, "model_fields", None) + if not model_fields: + raise ValueError( + "Unable to validate OpenAI parameters: OpenAIEmbeddings.model_fields is unavailable." + ) + valid_openai_fields = set(model_fields.keys()) @@ - field_info = OpenAIEmbeddings.model_fields[param_name] - field_type = field_info.annotation - type_adapter = TypeAdapter(field_type) + field_info = model_fields.get(param_name) + if field_info is None: + raise ValueError(f"Unable to retrieve field info for parameter '{param_name}'.") + field_type = getattr(field_info, "annotation", None) + if field_type is None: + raise ValueError(f"Missing type annotation for parameter '{param_name}'.") + type_adapter = TypeAdapter(field_type) @@ - values[param_name] = type_adapter.validate_python(values[param_name]) + try: + values[param_name] = type_adapter.validate_python(values[param_name]) + except Exception as e: + raise ValueError( + f"Invalid value for parameter '{param_name}' (expected {field_type!r}): {e}" + ) from e
🧹 Nitpick comments (4)
src/nat/embedder/openai_embedder.py (2)
57-60: Make error message deterministic.Sort the valid-field list to avoid non-deterministic set ordering in messages and snapshots.
- f"Valid OpenAIEmbeddings parameters include: {valid_openai_fields}" + f"Valid OpenAIEmbeddings parameters include: {sorted(valid_openai_fields)}"
30-31: Docstring is misleading; this is an embedder, not an LLM provider.Tighten wording.
- """An OpenAI LLM provider to be used with an LLM client.""" + """OpenAI embedder provider configuration."""tests/nat/embedder/test_openai_embedder.py (2)
66-67: Ruff E712: avoid equality to True/False in assertions.Use truthiness/negation for cleaner style.
- assert config_dict['tiktoken_enabled'] == False - assert config_dict['show_progress_bar'] == True + assert not config_dict['tiktoken_enabled'] + assert config_dict['show_progress_bar']
137-144: Make the “no extra params → no import” test independent of langchain_openai.Current patch loads the module just to patch it. Prefer blocking the import so the test fails if a stray import occurs.
- # Mock import error for langchain_openai - with patch('langchain_openai.embeddings.base.OpenAIEmbeddings', - side_effect=ImportError("Should not import langchain_openai when no extra params are provided")): - config = OpenAIEmbedderModelConfig(model_name="text-embedding-ada-002") - assert config.model_name == "text-embedding-ada-002" + # Assert that importing langchain_openai would fail if attempted + import builtins as _builtins + real_import = _builtins.__import__ + def _guarded_import(name, *a, **kw): + if name.startswith("langchain_openai"): + raise AssertionError("langchain_openai should not be imported when no extra params are provided") + return real_import(name, *a, **kw) + with patch("builtins.__import__", side_effect=_guarded_import): + config = OpenAIEmbedderModelConfig(model_name="text-embedding-ada-002") + assert config.model_name == "text-embedding-ada-002"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/nat/embedder/openai_embedder.py(2 hunks)tests/nat/embedder/test_openai_embedder.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/**/*.py: All importable Python code must live under src/
All public APIs in src/ require Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise with a period; surround code entities with backticks
Files:
src/nat/embedder/openai_embedder.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Core functionality under src/nat should prioritize backward compatibility when changed
Files:
src/nat/embedder/openai_embedder.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/embedder/openai_embedder.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/embedder/openai_embedder.pytests/nat/embedder/test_openai_embedder.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests must live under tests/ and use configured markers (e2e, integration, etc.)
Files:
tests/nat/embedder/test_openai_embedder.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/embedder/test_openai_embedder.py
**/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/tests/**/*.py: Test functions must use the test_ prefix and snake_case
Extract repeated test code into pytest fixtures; fixtures should set name=... in @pytest.fixture and functions named with fixture_ prefix
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Use pytest with pytest-asyncio for async code; mock external services with pytest_httpserver or unittest.mock
Files:
tests/nat/embedder/test_openai_embedder.py
🧬 Code graph analysis (1)
tests/nat/embedder/test_openai_embedder.py (1)
src/nat/embedder/openai_embedder.py (1)
OpenAIEmbedderModelConfig(29-68)
🪛 Ruff (0.12.2)
tests/nat/embedder/test_openai_embedder.py
66-66: Avoid equality comparisons to False; use not config_dict['tiktoken_enabled']: for false checks
Replace with not config_dict['tiktoken_enabled']
(E712)
67-67: Avoid equality comparisons to True; use config_dict['show_progress_bar']: for truth checks
Replace with config_dict['show_progress_bar']
(E712)
🔇 Additional comments (2)
tests/nat/embedder/test_openai_embedder.py (2)
126-136: Validate alias handling once alias-extras fix lands.The alias
modelcould be misclassified as an extra by the current validator. Re-run this test after applying the alias fix to ensure it passes without importing langchain_openai.
27-41: Good baseline validation coverage.Solid checks for minimal config and absence of extras.
c110673 to
71be6c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/nat/embedder/test_openai_embedder.py (1)
114-118: Exception type likely mismatches Pydantic v2 behavior; verify.TypeAdapter.validate_python typically raises pydantic ValidationError, not ValueError.
Apply this diff if the implementation doesn’t wrap to ValueError:
- with pytest.raises(ValueError, match="Input should be a valid boolean"): + with pytest.raises(ValidationError, match="Input should be a valid boolean"):If you intentionally convert to ValueError upstream, keep the current expectation.
🧹 Nitpick comments (5)
tests/nat/embedder/test_openai_embedder.py (5)
16-16: Remove unused tkinter import.Not used; will trip ruff F401.
Apply this diff:
-from tkinter import N, NO
66-67: Avoid equality comparisons to True/False (ruff E712).Use identity or truthiness.
Apply this diff:
- assert config_dict['tiktoken_enabled'] == False - assert config_dict['show_progress_bar'] == True + assert config_dict['tiktoken_enabled'] is False + assert config_dict['show_progress_bar'] is True
136-143: This test still imports the target via patch; assert non-import without patch.Current patch forces importing langchain_openai, defeating the purpose.
Apply this diff:
- def test_langchain_not_import_when_no_extra_params(self): - """langchain_openai not imported when no extra params are provided.""" - # Mock import error for langchain_openai - with patch('langchain_openai.embeddings.base.OpenAIEmbeddings', - side_effect=ImportError("Should not import langchain_openai when no extra params are provided")): - config = OpenAIEmbedderModelConfig(model_name="text-embedding-ada-002") - assert config.model_name == "text-embedding-ada-002" + def test_langchain_not_import_when_no_extra_params(self): + """langchain_openai not imported when no extra params are provided.""" + import sys + before = {k for k in sys.modules.keys() if k.startswith("langchain_openai")} + config = OpenAIEmbedderModelConfig(model_name="text-embedding-ada-002") + assert config.model_name == "text-embedding-ada-002" + after = {k for k in sys.modules.keys() if k.startswith("langchain_openai")} + assert after == beforeAlso add the import near the top of the file:
import sysNote: If earlier tests import langchain_openai, this check remains valid by ensuring no new imports occur in this code path.
39-41: Avoid relying on private pydantic_extra; use public API.Future Pydantic changes could break this.
Apply this diff:
- extra = getattr(config, '__pydantic_extra__', {}) - assert extra == {} + # Verify no extras via public API + dumped = config.model_dump(exclude={'model_name', 'api_key', 'base_url'}) + assert dumped == {}
42-53: Deduplicate mocking with a fixture.Extract a fixture to set OpenAIEmbeddings.model_fields once per test case, improving readability.
Example fixture:
import pytest from unittest.mock import Mock, patch @pytest.fixture(name="fixture_openai_fields") def fixture_openai_fields(): with patch('langchain_openai.embeddings.base.OpenAIEmbeddings') as cls: cls.model_fields = { 'tiktoken_enabled': Mock(annotation=bool), 'show_progress_bar': Mock(annotation=bool), 'max_retries': Mock(annotation=int), 'chunk_size': Mock(annotation=int), } yield clsThen use fixture_openai_fields in tests instead of per-test patch blocks.
Also applies to: 71-78, 110-112
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/nat/embedder/openai_embedder.py(2 hunks)tests/nat/embedder/test_openai_embedder.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/nat/embedder/openai_embedder.py
🧰 Additional context used
📓 Path-based instructions (6)
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests must live under tests/ and use configured markers (e2e, integration, etc.)
Files:
tests/nat/embedder/test_openai_embedder.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/embedder/test_openai_embedder.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
Files:
tests/nat/embedder/test_openai_embedder.py
**/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/tests/**/*.py: Test functions must use the test_ prefix and snake_case
Extract repeated test code into pytest fixtures; fixtures should set name=... in @pytest.fixture and functions named with fixture_ prefix
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Use pytest with pytest-asyncio for async code; mock external services with pytest_httpserver or unittest.mock
Files:
tests/nat/embedder/test_openai_embedder.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
tests/nat/embedder/test_openai_embedder.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
tests/nat/embedder/test_openai_embedder.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
tests/nat/embedder/test_openai_embedder.py
🧬 Code graph analysis (1)
tests/nat/embedder/test_openai_embedder.py (1)
src/nat/embedder/openai_embedder.py (1)
OpenAIEmbedderModelConfig(29-68)
🪛 Ruff (0.12.2)
tests/nat/embedder/test_openai_embedder.py
66-66: Avoid equality comparisons to False; use not config_dict['tiktoken_enabled']: for false checks
Replace with not config_dict['tiktoken_enabled']
(E712)
67-67: Avoid equality comparisons to True; use config_dict['show_progress_bar']: for truth checks
Replace with config_dict['show_progress_bar']
(E712)
5c5f6b9 to
e2c57eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/nat/embedder/openai_embedder.py (1)
42-44: Add missing return type annotation for public API.Based on the coding guidelines, public APIs in
src/require type hints on return values. The async generator function should have an explicit return type annotation.+from collections.abc import AsyncIterator + @register_embedder_provider(config_type=OpenAIEmbedderModelConfig) -async def openai_embedder_model(config: OpenAIEmbedderModelConfig, _builder: Builder): +async def openai_embedder_model(config: OpenAIEmbedderModelConfig, _builder: Builder) -> AsyncIterator[EmbedderProviderInfo]:
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/nat/embedder/nim_embedder.py(1 hunks)src/nat/embedder/openai_embedder.py(2 hunks)src/nat/llm/aws_bedrock_llm.py(1 hunks)src/nat/llm/nim_llm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/**/*.py: All importable Python code must live under src/
All public APIs in src/ require Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise with a period; surround code entities with backticks
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Core functionality under src/nat should prioritize backward compatibility when changed
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/llm/aws_bedrock_llm.pysrc/nat/llm/nim_llm.pysrc/nat/embedder/nim_embedder.pysrc/nat/embedder/openai_embedder.py
🔇 Additional comments (4)
src/nat/embedder/nim_embedder.py (1)
53-53: Configuration change consistent with PR objectives.The addition of
extra="allow"enables dynamic parameter passing while maintaining the existing protected namespace configuration. This change aligns with the PR's goal of supporting additional OpenAI-like parameters across multiple model configurations.src/nat/embedder/openai_embedder.py (1)
32-32: Configuration change enables additional parameter support.The addition of
extra="allow"successfully addresses the core objective from issue #671 by allowing additional OpenAI parameters (liketiktoken_enabled,show_progress_bar) to be passed through the configuration without requiring custom class extensions.src/nat/llm/nim_llm.py (1)
34-34: Configuration update aligns with broader parameter support strategy.The addition of
extra="allow"to the NIM LLM configuration is consistent with the PR's approach of enabling flexible parameter passing across all model configurations. This change maintains backward compatibility while allowing additional parameters to be specified.src/nat/llm/aws_bedrock_llm.py (1)
33-33: Consistent configuration change across model types.The addition of
extra="allow"maintains consistency with other model configurations in this PR, enabling flexible parameter passing for AWS Bedrock models while preserving the existing protected namespace configuration.
Signed-off-by: Yossi Elias <yelias@redhat.com>
e2c57eb to
a03cc17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/nat/llm/nim_llm.py (1)
44-48: Add return type and a concise Google-style docstring to the public provider.Public APIs under src/ require return annotations. This async generator should declare
AsyncIterator[LLMProviderInfo]and include a brief docstring.@@ -from nat.builder.builder import Builder +from nat.builder.builder import Builder +from collections.abc import AsyncIterator @@ -@register_llm_provider(config_type=NIMModelConfig) -async def nim_model(llm_config: NIMModelConfig, _builder: Builder): - - yield LLMProviderInfo(config=llm_config, description="A NIM model for use with an LLM client.") +@register_llm_provider(config_type=NIMModelConfig) +async def nim_model(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[LLMProviderInfo]: + """Yield a `LLMProviderInfo` for the NIM provider. + + Args: + llm_config: The `NIMModelConfig` instance for provider initialization. + _builder: The workflow builder (unused here). + + Yields: + LLMProviderInfo: Provider info consumed by the builder/runtime. + """ + yield LLMProviderInfo(config=llm_config, description="A NIM model for use with an LLM client.")
🧹 Nitpick comments (2)
src/nat/llm/nim_llm.py (2)
1-15: Optional: add a minimal module docstring.Keeps docs consistent with guidelines for public modules.
@@ # limitations under the License. +"""NIM LLM provider configuration and registration.""" +
34-34: Collect unknown kwargs explicitly in NIMModelConfig
Replace the currentextra="allow"with an explicitclient_options: dict[str, Any]field and a@model_validator(mode="before")that moves any unknown keys into it—prevents silent acceptance of typos and makes downstream forwarding explicit.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/nat/embedder/nim_embedder.py(1 hunks)src/nat/embedder/openai_embedder.py(1 hunks)src/nat/llm/aws_bedrock_llm.py(1 hunks)src/nat/llm/nim_llm.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/nat/llm/aws_bedrock_llm.py
- src/nat/embedder/nim_embedder.py
- src/nat/embedder/openai_embedder.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/**/*.py: All importable Python code must live under src/
All public APIs in src/ require Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise with a period; surround code entities with backticks
Files:
src/nat/llm/nim_llm.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Core functionality under src/nat should prioritize backward compatibility when changed
Files:
src/nat/llm/nim_llm.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/llm/nim_llm.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
Files:
src/nat/llm/nim_llm.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
src/nat/llm/nim_llm.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
src/nat/llm/nim_llm.py
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/llm/nim_llm.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/llm/nim_llm.py
Updated ChangesI expanded this to support extra parameters across more providers, not just OpenAI embeddings. Now all of these accept additional parameters:
I simplified the approach too - removed the complex validation stuff and just added These four were the ones missing this capability - the rest of the model providers in the codebase already support additional params. This means you can now pass any provider-specific parameters for all providers through the YAML configuration. @willkill07 Can you pls review? |
|
/ok to test a03cc17 |
|
Thank you for your contribution @YosiElias ! |
|
/merge |
Description
Closes #671
Summary
Adds dynamic validation support for additional OpenAI embedding parameters in
OpenAIEmbedderModelConfig, enabling users to configure 20+ OpenAI-specific parameters through YAML while maintaining type safety and validation.Changes
extra="allow"and@model_validatorto dynamically accept OpenAI parametersTypeAdapterfor full type validation and coercion against live OpenAI model fieldstiktoken_enabled,show_progress_bar,max_retries,chunk_size, etc.Benefits
Example
This enhancement enables third-party model integration by supporting models that don't use tiktoken tokenizers through the
tiktoken_enabledparameter, while allowing comprehensive configuration of all OpenAI parameters with dynamic validation.This is a general solution that can be applied to other LLM and embedding configuration
classes throughout the codebase for consistent parameter handling, and I can implement it for other classes if needed.
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Improvements