Add support for setting NIXL num threads for vLLM CLI by amaslenn · Pull Request #809 · NVIDIA/cloudai

amaslenn · 2026-02-13T15:11:15Z

Summary

New field nixl_threads was added to pass the number to --kv-transfer-config '{"kv_connector": "NixlConnector", "kv_role": "...", "kv_connector_extra_config": {"num_threads": <VALUE>}}'.

Additionally, added VllmArgs class to documentation with correct rendering for Pydantic model.

Test Plan

CI
Manual doc review.

Additional Notes

–

coderabbitai · 2026-02-13T15:11:42Z

Warning

Rate limit exceeded

@amaslenn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Added Pydantic autodoc to docs, enabled new autodoc_pydantic config flags, introduced nixl_threads to VllmArgs, refactored vLLM Slurm command generation to produce per-branch commands with kv-transfer-config (including optional threading), and reorganized/expanded vLLM tests.

Changes

Cohort / File(s)	Summary
Documentation & build deps `doc/conf.py`, `pyproject.toml`, `doc/workloads/vllm.rst`	Enabled `sphinxcontrib.autodoc_pydantic` in Sphinx config, removed `"undoc-members"` from `autodoc_default_options`, added three `autodoc_pydantic_model_* = False` flags, added `autodoc-pydantic~=2.2` to extras, and inserted an API docs subsection referencing `VllmArgs`.
Vllm argument model `src/cloudai/workloads/vllm/vllm.py`	Added `nixl_threads: int
vLLM Slurm command generation `src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`	Added helper `_to_json_str_arg(config: dict) -> str`; refactored `get_vllm_serve_commands` to build commands via a loop for prefill/decode branches, unify `kv_transfer_config` construction, inject per-branch `kv_role`, optionally set `kv_connector_extra_config.num_threads` when `nixl_threads` provided, and return accumulated command list.
Tests — vLLM command strategy `tests/workloads/vllm/test_command_gen_strategy_slurm.py`	Reorganized tests into classes (TestVllmServeCommand, TestVllmBenchCommand, TestVllmAggregatedMode, etc.), added tests for container mounts and sweep detection, introduced parameterized tests for `nixl_threads` behavior, and updated expectations to reflect two serve commands when threading is configured.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble configs, tidy every thread,
I teach the docs what Pydantic said,
Loops hum where hardcoded lines once lay,
Tests hop in order, checking the way,
A tiny rabbit cheers the build today!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately summarizes the main change: adding support for setting NIXL num threads for the vLLM CLI, which is the core functionality introduced across all modified files.
Description check	✅ Passed	The description clearly relates to the changeset by explaining the new nixl_threads field implementation and documentation additions for VllmArgs, matching the actual changes made.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch am/vllm-nthreads

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-02-13T15:14:31Z

Greptile Overview

Greptile Summary

Added nixl_threads field to VllmArgs to configure the number of threads for NIXL connector in vLLM disaggregated prefill/decode mode. The field is properly excluded from CLI arguments and instead embedded into the --kv-transfer-config JSON structure as kv_connector_extra_config.num_threads. Additionally, added Pydantic model documentation rendering support using autodoc-pydantic.

Key changes:

New nixl_threads field in VllmArgs can be set independently for prefill and decode instances
Refactored command generation to build kv_transfer_config fresh for each iteration (fixes potential state mutation bug)
Added helper method _to_json_str_arg() for consistent JSON serialization
Comprehensive test coverage with parametrized tests verifying all combinations of nixl_threads settings
Tests verify that --nixl-threads never appears as a CLI flag (only in JSON config)
Documentation improvements with autodoc-pydantic extension for better Pydantic model rendering

Confidence Score: 5/5

This PR is safe to merge with no issues found
All previously identified issues have been addressed. The code correctly handles the nixl_threads parameter, builds config dictionaries fresh for each iteration, includes comprehensive test coverage with multiple scenarios, and the documentation changes are straightforward. The implementation is clean and follows best practices.
No files require special attention

Important Files Changed

Filename	Overview
src/cloudai/workloads/vllm/vllm.py	Added `nixl_threads` field to `VllmArgs` with proper exclusion from `serve_args` property, includes documentation
src/cloudai/workloads/vllm/slurm_command_gen_strategy.py	Refactored to build `kv_transfer_config` per iteration, correctly handles `nixl_threads` parameter, includes helper method `_to_json_str_arg`
tests/workloads/vllm/test_command_gen_strategy_slurm.py	Added comprehensive parametrized tests for `nixl_threads`, verifies CLI argument exclusion and JSON config inclusion, reorganized test classes

Sequence Diagram

sequenceDiagram
    participant User
    participant VllmTestDefinition
    participant VllmSlurmCommandGenStrategy
    participant vLLM Prefill
    participant vLLM Decode

    User->>VllmTestDefinition: Configure nixl_threads for prefill/decode
    User->>VllmSlurmCommandGenStrategy: Call get_vllm_serve_commands()
    VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Loop: prefill, decode
    VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Build kv_transfer_config dict
    alt nixl_threads is set
        VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Add kv_connector_extra_config with num_threads
    end
    VllmSlurmCommandGenStrategy->>VllmSlurmCommandGenStrategy: Serialize config to JSON string
    VllmSlurmCommandGenStrategy->>vLLM Prefill: vllm serve --kv-transfer-config '{...,"num_threads":N}'
    VllmSlurmCommandGenStrategy->>vLLM Decode: vllm serve --kv-transfer-config '{...,"num_threads":M}'
    vLLM Prefill-->>User: Start with N threads
    vLLM Decode-->>User: Start with M threads

_{Last reviewed commit: dcd70bd}

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

greptile-apps · 2026-02-13T15:14:40Z

Additional Comments (1)

src/cloudai/workloads/vllm/vllm.py
nixl_threads should be excluded from serve_args since it's used only to construct the --kv-transfer-config JSON in the command generator, not as a separate CLI argument

        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/workloads/vllm/vllm.py (1)
47-53: ⚠️ Potential issue | 🔴 Critical

Bug confirmed: nixl_threads must be excluded from serve_args.

When nixl_threads is set, serve_args will emit --nixl-threads <value> — an invalid vLLM CLI argument. The field is only meant for internal use in kv_connector_extra_config.num_threads (handled separately in slurm_command_gen_strategy.py). It must be excluded from model_dump() in the same way gpu_ids is.

Existing tests check that kv_connector_extra_config is properly constructed but do not verify that --nixl-threads does not appear as a spurious CLI flag, leaving this bug undetected.
🐛 Proposed fix
-        for k, v in self.model_dump(exclude={"gpu_ids"}, exclude_none=True).items():
+        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():

🤖 Fix all issues with AI agents

In `@doc/workloads/vllm.rst`:
- Around line 135-139: The section heading contains a typo — "vLLm Serve
Arguments" uses a lowercase "m"; update the heading text to "vLLM Serve
Arguments" so it matches the project's capitalization, ensuring consistency with
the surrounding content (the heading string to change is "vLLm Serve Arguments"
above the autopydantic_model block for cloudai.workloads.vllm.vllm.VllmArgs).

In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 76-97: The shared dict kv_transfer_config is mutated inside the
loop so prefill's nixl_threads (via setdefault and kv_connector_extra_config)
leaks into the subsequent decode iteration; fix by constructing a fresh
per-iteration config (e.g., copy or create new dict) before adding kv_role and
potential kv_connector_extra_config, then pass that per-iteration dict into
self._to_json_str_arg when building commands in the loop that appends to
commands for prefill/decode using args.nixl_threads and args.serve_args.

In `@tests/workloads/vllm/test_command_gen_strategy_slurm.py`:
- Around line 183-217: Add a test that ensures prefill's nixl_threads doesn't
leak into the decode command: create a new test (e.g.,
test_prefill_nixl_threads_does_not_leak_to_decode) that uses
VllmSlurmCommandGenStrategy and VllmArgs to set
tdef.cmd_args.prefill.nixl_threads=8 and tdef.cmd_args.decode.nixl_threads=None,
call VllmSlurmCommandGenStrategy.get_vllm_serve_commands(), assert two commands
are returned, assert the prefill command string (commands[0]) contains
'"num_threads":8', and assert the decode command string (commands[1]) does NOT
contain "num_threads" or "kv_connector_extra_config" to catch shared-state
mutation between prefill and decode.

doc/workloads/vllm.rst

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

tests/workloads/vllm/test_command_gen_strategy_slurm.py

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

tests/workloads/vllm/test_command_gen_strategy_slurm.py

greptile-apps · 2026-02-13T15:48:13Z

Additional Comments (1)

src/cloudai/workloads/vllm/vllm.py
nixl_threads needs to be excluded from serve_args since it's handled specially in the kv_transfer_config. Without excluding it, the field will be converted to --nixl-threads CLI arg AND included in the config JSON, causing duplicate/incorrect arguments.

        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 81-84: kv_transfer_config is newly created each iteration so
calling kv_transfer_config.setdefault("kv_connector_extra_config", {}) is
redundant; instead, when args.nixl_threads is not None, directly assign the
extra config dict (e.g., kv_transfer_config["kv_connector_extra_config"] =
{"num_threads": cast(int, args.nixl_threads)}) so you remove the unnecessary
setdefault and make intent explicit in slurm_command_gen_strategy.py around the
kv_transfer_config construction.

In `@tests/workloads/vllm/test_command_gen_strategy_slurm.py`:
- Around line 193-219: The test function test_decode_nixl_threads is misnamed
because it asserts on both prefill and decode nixl_threads; rename the test
(e.g., to test_nixl_threads or test_serve_commands_nixl_threads) to reflect that
it validates both commands, update the function name where it's referenced
(including any test parametrization or fixtures that mention
test_decode_nixl_threads) and adjust any test identifiers in decorators or test
runners to use the new name so discovery and reporting reflect the correct
intent; leave the body and assertions (references to VllmSlurmCommandGenStrategy
and VllmTestDefinition) unchanged.

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

tests/workloads/vllm/test_command_gen_strategy_slurm.py

greptile-apps

_{7 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

tests/workloads/vllm/test_command_gen_strategy_slurm.py

greptile-apps · 2026-02-13T15:59:09Z

Additional Comments (1)

src/cloudai/workloads/vllm/vllm.py
nixl_threads should be excluded from the model dump just like gpu_ids, otherwise it will be incorrectly passed as --nixl-threads CLI argument in addition to being included in the --kv-transfer-config JSON

        for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py

amaslenn added 5 commits February 13, 2026 11:25

Refactoring, move functions/classes

6cfd033

Unify prefill/decode cmd generation

0e356b2

Add support for nixl_threads

e82c1aa

Render VllmArgs documentation

18983f3

Merge branch 'main' into am/vllm-nthreads

4287915

amaslenn requested review from jeffnvidia and srivatsankrishnan as code owners February 13, 2026 15:11

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

doc/workloads/vllm.rst Show resolved Hide resolved

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Outdated Show resolved Hide resolved

tests/workloads/vllm/test_command_gen_strategy_slurm.py Outdated Show resolved Hide resolved

Address review comments

6097bf1

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

tests/workloads/vllm/test_command_gen_strategy_slurm.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Outdated Show resolved Hide resolved

tests/workloads/vllm/test_command_gen_strategy_slurm.py Outdated Show resolved Hide resolved

Address review comments

2649905

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

tests/workloads/vllm/test_command_gen_strategy_slurm.py Show resolved Hide resolved

Make sure --nixl-threads doesn't appear in CLI

dcd70bd

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

amaslenn requested a review from podkidyshev February 13, 2026 16:06

podkidyshev reviewed Feb 16, 2026

View reviewed changes

src/cloudai/workloads/vllm/slurm_command_gen_strategy.py Show resolved Hide resolved

podkidyshev approved these changes Feb 17, 2026

View reviewed changes

amaslenn merged commit fba16bc into main Feb 17, 2026
5 checks passed

amaslenn deleted the am/vllm-nthreads branch February 17, 2026 10:24

Conversation

amaslenn commented Feb 13, 2026

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

greptile-apps bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Feb 13, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

greptile-apps bot commented Feb 13, 2026 •

edited

Loading