Add support for setting NIXL num threads for vLLM CLI#809
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdded Pydantic autodoc to docs, enabled new autodoc_pydantic config flags, introduced Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile OverviewGreptile SummaryAdded Key changes:
Confidence Score: 5/5
Important Files Changed
|
Additional Comments (1)
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cloudai/workloads/vllm/vllm.py (1)
47-53:⚠️ Potential issue | 🔴 CriticalBug confirmed:
nixl_threadsmust be excluded fromserve_args.When
nixl_threadsis set,serve_argswill emit--nixl-threads <value>— an invalid vLLM CLI argument. The field is only meant for internal use inkv_connector_extra_config.num_threads(handled separately inslurm_command_gen_strategy.py). It must be excluded frommodel_dump()in the same waygpu_idsis.Existing tests check that
kv_connector_extra_configis properly constructed but do not verify that--nixl-threadsdoes not appear as a spurious CLI flag, leaving this bug undetected.🐛 Proposed fix
- for k, v in self.model_dump(exclude={"gpu_ids"}, exclude_none=True).items(): + for k, v in self.model_dump(exclude={"gpu_ids", "nixl_threads"}, exclude_none=True).items():
🤖 Fix all issues with AI agents
In `@doc/workloads/vllm.rst`:
- Around line 135-139: The section heading contains a typo — "vLLm Serve
Arguments" uses a lowercase "m"; update the heading text to "vLLM Serve
Arguments" so it matches the project's capitalization, ensuring consistency with
the surrounding content (the heading string to change is "vLLm Serve Arguments"
above the autopydantic_model block for cloudai.workloads.vllm.vllm.VllmArgs).
In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 76-97: The shared dict kv_transfer_config is mutated inside the
loop so prefill's nixl_threads (via setdefault and kv_connector_extra_config)
leaks into the subsequent decode iteration; fix by constructing a fresh
per-iteration config (e.g., copy or create new dict) before adding kv_role and
potential kv_connector_extra_config, then pass that per-iteration dict into
self._to_json_str_arg when building commands in the loop that appends to
commands for prefill/decode using args.nixl_threads and args.serve_args.
In `@tests/workloads/vllm/test_command_gen_strategy_slurm.py`:
- Around line 183-217: Add a test that ensures prefill's nixl_threads doesn't
leak into the decode command: create a new test (e.g.,
test_prefill_nixl_threads_does_not_leak_to_decode) that uses
VllmSlurmCommandGenStrategy and VllmArgs to set
tdef.cmd_args.prefill.nixl_threads=8 and tdef.cmd_args.decode.nixl_threads=None,
call VllmSlurmCommandGenStrategy.get_vllm_serve_commands(), assert two commands
are returned, assert the prefill command string (commands[0]) contains
'"num_threads":8', and assert the decode command string (commands[1]) does NOT
contain "num_threads" or "kv_connector_extra_config" to catch shared-state
mutation between prefill and decode.
Additional Comments (1)
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/cloudai/workloads/vllm/slurm_command_gen_strategy.py`:
- Around line 81-84: kv_transfer_config is newly created each iteration so
calling kv_transfer_config.setdefault("kv_connector_extra_config", {}) is
redundant; instead, when args.nixl_threads is not None, directly assign the
extra config dict (e.g., kv_transfer_config["kv_connector_extra_config"] =
{"num_threads": cast(int, args.nixl_threads)}) so you remove the unnecessary
setdefault and make intent explicit in slurm_command_gen_strategy.py around the
kv_transfer_config construction.
In `@tests/workloads/vllm/test_command_gen_strategy_slurm.py`:
- Around line 193-219: The test function test_decode_nixl_threads is misnamed
because it asserts on both prefill and decode nixl_threads; rename the test
(e.g., to test_nixl_threads or test_serve_commands_nixl_threads) to reflect that
it validates both commands, update the function name where it's referenced
(including any test parametrization or fixtures that mention
test_decode_nixl_threads) and adjust any test identifiers in decorators or test
runners to use the new name so discovery and reporting reflect the correct
intent; leave the body and assertions (references to VllmSlurmCommandGenStrategy
and VllmTestDefinition) unchanged.
Additional Comments (1)
|
Summary
New field
nixl_threadswas added to pass the number to--kv-transfer-config '{"kv_connector": "NixlConnector", "kv_role": "...", "kv_connector_extra_config": {"num_threads": <VALUE>}}'.Additionally, added
VllmArgsclass to documentation with correct rendering for Pydantic model.Test Plan
Additional Notes
–