Skip to content

fix: langchain<>huggingface integration#1382

Merged
rapids-bot[bot] merged 1 commit intoNVIDIA:release/1.4from
willkill07:wkk_fix-langchain-huggingface
Jan 12, 2026
Merged

fix: langchain<>huggingface integration#1382
rapids-bot[bot] merged 1 commit intoNVIDIA:release/1.4from
willkill07:wkk_fix-langchain-huggingface

Conversation

@willkill07
Copy link
Member

@willkill07 willkill07 commented Jan 11, 2026

Description

The nightly CI ran with this error:

tests.nat.llm_providers.test_langchain_agents::test_huggingface_langchain_agent
ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.

This PR resolves that error in addition to the following changes:

  • Renaming HuggingFace config field to dtype (from deprecated torch_dtype) and uses it when loading models.
  • Updating LangChain HuggingFace client to avoid passing device when accelerate sharding is present, passes dtype to pipelines, and wraps ChatHuggingFace with an async _agenerate using asyncio.to_thread.
  • Fixing test to now use TinyLlama (newer, small model) for the HuggingFace agent test, and assert response contains “3” instead of just non-empty content.

Closes

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Release Notes

  • Breaking Changes

    • Updated HuggingFace LLM configuration parameter name; users may need to update their configuration.
  • Improvements

    • Enhanced HuggingFace model compatibility with distributed architectures to prevent errors.
    • Added async execution support for HuggingFace LLMs.
  • Documentation

    • Updated HuggingFace LLM configuration documentation.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Will Killian <wkillian@nvidia.com>
@willkill07 willkill07 self-assigned this Jan 11, 2026
@willkill07 willkill07 requested a review from a team as a code owner January 11, 2026 14:51
@willkill07 willkill07 added bug Something isn't working non-breaking Non-breaking change labels Jan 11, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 11, 2026

Walkthrough

The PR renames the HuggingFace LLM configuration parameter from torch_dtype to dtype across documentation and implementation. It also introduces async support for HuggingFace LLMs via a new wrapper class, implements conditional device handling to prevent conflicts with accelerate-sharded models, and updates tests to use a fixed small model for validation.

Changes

Cohort / File(s) Summary
Parameter rename (torch_dtype → dtype)
docs/source/build-workflows/llms/index.md, src/nat/llm/huggingface_llm.py
Renamed config field from torch_dtype to dtype in HuggingFaceConfig and updated documentation. Updated model loading call to pass dtype parameter instead of torch_dtype.
LangChain integration improvements
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
Introduced AsyncChatHuggingFace inner class to support async operations via thread delegation. Added conditional device handling to avoid passing device when hf_device_map is present on the model. Included type ignore annotation for Azure OpenAI api_version call. Added local imports for asyncio and message handling.
Test updates
tests/nat/llm_providers/test_langchain_agents.py
Updated HuggingFace test to load fixed small model (TinyLlama/TinyLlama-1.1B-Chat-v1.0) instead of environment-based override. Changed response validation from length check to asserting digit "3" presence. Added type annotation for Azure OpenAI config_args.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: it fixes a langchain and huggingface integration issue by resolving device handling, renaming the config field, adding async support, and updating tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/nat/llm/huggingface_llm.py:
- Around line 161-164: The call to AutoModelForCausalLM.from_pretrained
incorrectly passes dtype=config.dtype which Transformers expects as torch_dtype;
update the call in huggingface_llm.py (the AutoModelForCausalLM.from_pretrained
invocation) to use torch_dtype=config.dtype, preserving the other args
(device_map=config.device and trust_remote_code=config.trust_remote_code).
🧹 Nitpick comments (1)
tests/nat/llm_providers/test_langchain_agents.py (1)

30-30: Remove the unused noqa directive.

The # noqa: F401 comment is unnecessary. The import is actively used on line 31 to set the HAS_HUGGINGFACE flag, so there's no linting violation to suppress.

♻️ Remove unnecessary noqa
-    from nat.llm.huggingface_llm import HuggingFaceConfig  # noqa: F401
+    from nat.llm.huggingface_llm import HuggingFaceConfig

Based on static analysis hints from ruff.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ec7cf8 and e6e7be1.

📒 Files selected for processing (4)
  • docs/source/build-workflows/llms/index.md
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/huggingface_llm.py
  • tests/nat/llm_providers/test_langchain_agents.py
🧰 Additional context used
📓 Path-based instructions (12)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/huggingface_llm.py
  • tests/nat/llm_providers/test_langchain_agents.py
**/*.{py,yaml,yml,json,toml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (never tabs) and ensure every file ends with a single newline

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/huggingface_llm.py
  • tests/nat/llm_providers/test_langchain_agents.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • docs/source/build-workflows/llms/index.md
  • src/nat/llm/huggingface_llm.py
  • tests/nat/llm_providers/test_langchain_agents.py
**/*.{py,md,mdx,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • docs/source/build-workflows/llms/index.md
  • src/nat/llm/huggingface_llm.py
  • tests/nat/llm_providers/test_langchain_agents.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
    in that situation no return type hint is needed).
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

  • Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
    (ex: "the system's performance" should be "the performance of the system").
  • Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
    The exception to this rule is when referring to package names or code identifiers that contain "nat", th...

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • docs/source/build-workflows/llms/index.md
  • src/nat/llm/huggingface_llm.py
  • tests/nat/llm_providers/test_langchain_agents.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.
  • When adding a new package, that new package name (as defined in the pyproject.toml file) should
    be added as a dependency to the nvidia-nat-all package in packages/nvidia_nat_all/pyproject.toml

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
**/*.{md,mdx}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{md,mdx}: Use 'NVIDIA NeMo Agent toolkit' for full name (first use), 'NeMo Agent toolkit' or 'the toolkit' for subsequent references, and 'Toolkit' (capital T) in titles/headings, 'toolkit' (lowercase t) in body text
Never use deprecated names: 'Agent Intelligence toolkit', 'aiqtoolkit', 'AgentIQ', 'AIQ', or 'aiq' in documentation; update any occurrences unless intentionally referring to deprecated versions or implementing compatibility layers

Files:

  • docs/source/build-workflows/llms/index.md
**/*.{md,mdx,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{md,mdx,rst}: Documentation must be clear, comprehensive, and free of TODOs, FIXMEs, placeholder text, offensive or outdated terms, and spelling mistakes
Do not use words listed in 'ci/vale/styles/config/vocabularies/nat/reject.txt' in documentation
Words listed in 'ci/vale/styles/config/vocabularies/nat/accept.txt' are acceptable even if they appear to be spelling mistakes

Files:

  • docs/source/build-workflows/llms/index.md
docs/source/**/*

⚙️ CodeRabbit configuration file

docs/source/**/*: This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Documentation Categories

Ensure documentation is placed in the correct category:

  • get-started/: Introductory documentation for new users
    • get-started/tutorials/: Step-by-step learning guides
  • build-workflows/: Workflow creation, configuration, adding remote MCP tools or A2A agents - run-workflows/: Execution, observability, serving workflows via MCP and A2A protocols - improve-workflows/: Evaluation and optimization guides - components/: Specific component implementations (agents, tools, connectors) - extend/: Custom component development and testing (not core library contributions) - reference/: Python and REST API documentation only - resources/: Project information (licensing, FAQs)
    • resources/contributing/: Development environment and contribution guides

Placement rules:

  1. Component implementations always belong in components/, not build-workflows/ 2. API documentation belongs only in reference/ 3. Using remote MCP tools or A2A agents should be placed in build-workflows/ 4. Serving workflows via MCP/A2A should be placed in run-workflows/

Files:

  • docs/source/build-workflows/llms/index.md
src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/llm/huggingface_llm.py
**/test_*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/test_*.py: Use pytest with pytest-asyncio for asynchronous code testing
Test functions should be named with 'test_' prefix using snake_case
Extract frequently repeated code in tests into pytest fixtures with the 'fixture_' prefix on the function name and a 'name' argument in the decorator
Mock external services with 'pytest_httpserver' or 'unittest.mock' instead of hitting live endpoints in tests
Mark slow tests with '@pytest.mark.slow' so they can be skipped in the default test suite
Mark integration tests requiring external services with '@pytest.mark.integration' so they can be skipped in the default test suite

Files:

  • tests/nat/llm_providers/test_langchain_agents.py
tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

  • Do NOT add @pytest.mark.asyncio to any test. Async tests are automatically detected and run by the
    async runner - the decorator is unnecessary clutter.

Files:

  • tests/nat/llm_providers/test_langchain_agents.py
🧠 Learnings (1)
📚 Learning: 2026-01-05T15:46:49.676Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.676Z
Learning: Applies to **/*.py : Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints

Applied to files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
🧬 Code graph analysis (1)
tests/nat/llm_providers/test_langchain_agents.py (2)
src/nat/llm/huggingface_llm.py (2)
  • HuggingFaceConfig (74-88)
  • get (57-59)
src/nat/llm/azure_openai_llm.py (1)
  • model_name (65-69)
🪛 Ruff (0.14.10)
tests/nat/llm_providers/test_langchain_agents.py

30-30: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (8)
docs/source/build-workflows/llms/index.md (1)

165-165: LGTM - Documentation aligns with code changes.

The parameter rename from torch_dtype to dtype correctly reflects the updated HuggingFaceConfig class definition.

src/nat/llm/huggingface_llm.py (1)

81-81: Field rename looks good.

The configuration field rename from torch_dtype to dtype improves clarity and aligns with the documentation updates.

tests/nat/llm_providers/test_langchain_agents.py (2)

161-162: Test configuration improvements look good.

Using a fixed small model (TinyLlama-1.1B-Chat-v1.0) for testing instead of environment-based configuration improves test reliability and reduces external dependencies.


174-174: Improved assertion specificity.

The updated assertion verifies that the response contains the expected answer "3", which is more meaningful than just checking for non-empty content.

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (4)

19-19: LGTM - Import used for type annotations.

The Any import is properly used in the type annotation on line 315.


147-147: Type ignore comment is acceptable.

The type: ignore[call-arg] suppresses a type-checking error for api_version. This is appropriate when passing through configuration to external libraries that may have incomplete type stubs.


289-303: Excellent fix for accelerate device conflicts.

The conditional device handling correctly addresses the runtime error mentioned in the PR objectives. When a model is loaded with accelerate sharding (hf_device_map is present), the device parameter is omitted to prevent the ValueError. The fallback logic properly extracts the device from model parameters when accelerate is not used.

Additionally, using dtype=model_param.dtype (line 298) correctly retrieves the actual dtype from the already-loaded model rather than relying on the config value.


307-325: Async wrapper implementation looks solid.

The AsyncChatHuggingFace wrapper properly implements async support by delegating _agenerate to the synchronous _generate method via asyncio.to_thread. This pattern correctly handles the blocking HuggingFace pipeline operations in a thread pool, preventing event loop blocking.

Note: Line 320 safely handles the optional run_manager with run_manager.get_sync() if run_manager else None.

Copy link
Contributor

@mnajafian-nv mnajafian-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.

A couple clarifications / follow-ups:

  1. Transformers 4.57.x: pipeline(dtype=...) looks correct, but for AutoModelForCausalLM.from_pretrained(...) can you confirm dtype is a supported kwarg in our pinned version (4.57.3) / mapped internally, vs needing torch_dtype? If dtype is not honored, we should pass torch_dtype= when loading.

  2. Backward compat: renaming config torch_dtype → dtype is breaking for existing YAML? Right? Can we accept both (deprecated alias) for at least one release, or document this as breaking + migration note?

  3. Test: TinyLlama may be heavy/flaky for CI downloads. Shall we consider keeping an env override again or using a smaller test model while still asserting on “3”.

@willkill07
Copy link
Member Author

Transformers 4.57.x: pipeline(dtype=...) looks correct, but for AutoModelForCausalLM.from_pretrained(...) can you confirm dtype is a supported kwarg in our pinned version (4.57.3) / mapped internally, vs needing torch_dtype? If dtype is not honored, we should pass torch_dtype= when loading.

Yes, I can confirm that dtype is honored.

Backward compat: renaming config torch_dtype → dtype is breaking for existing YAML? Right? Can we accept both (deprecated alias) for at least one release, or document this as breaking + migration note?

There is no existing yaml using this, as it's a brand new feature which has yet to be released. There is no reason to keep backwards compatibility as its a new feature.

Test: TinyLlama may be heavy/flaky for CI downloads. Shall we consider keeping an env override again or using a smaller test model while still asserting on “3”.

TinyLlama is the smallest model I could find which has a built-in prompt template defined. I do not believe there is a smaller (meaningful) model.

@mnajafian-nv
Copy link
Contributor

Thank you!

@willkill07
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 1546075 into NVIDIA:release/1.4 Jan 12, 2026
16 of 17 checks passed
Jerryguan777 pushed a commit to Jerryguan777/NeMo-Agent-Toolkit that referenced this pull request Jan 28, 2026
The nightly CI ran with this error:
```
tests.nat.llm_providers.test_langchain_agents::test_huggingface_langchain_agent
ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object.
```

This PR resolves that error in addition to the following changes:
- Renaming HuggingFace config field to `dtype` (from deprecated `torch_dtype`) and uses it when loading models.
- Updating LangChain HuggingFace client to avoid passing `device` when accelerate sharding is present, passes `dtype` to pipelines, and wraps `ChatHuggingFace` with an async `_agenerate` using `asyncio.to_thread`.
- Fixing test to now use TinyLlama (newer, small model) for the HuggingFace agent test, and assert response contains “3” instead of just non-empty content.

Closes

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.




## Summary by CodeRabbit

## Release Notes

* **Breaking Changes**
  * Updated HuggingFace LLM configuration parameter name; users may need to update their configuration.

* **Improvements**
  * Enhanced HuggingFace model compatibility with distributed architectures to prevent errors.
  * Added async execution support for HuggingFace LLMs.

* **Documentation**
  * Updated HuggingFace LLM configuration documentation.

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

Authors:
  - Will Killian (https://github.com/willkill07)

Approvers:
  - https://github.com/mnajafian-nv

URL: NVIDIA#1382
@willkill07 willkill07 deleted the wkk_fix-langchain-huggingface branch February 25, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants