[https://nvbugs/5972776][fix] Pass IPC HMAC key through file descriptor#14378
Conversation
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR switches IPC HMAC key provisioning from environment variables to file descriptors across Python executor launch paths, disaggregated leader spawn, shell script launch, and test coverage. The core mechanism caches and reads keys from FDs, with coordinated updates in serve.py (disaggregated leader), the llmapi launch script, and corresponding test coverage. ChangesIPC HMAC Key File Descriptor Mechanism
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tensorrt_llm/executor/utils.py (2)
36-45: 💤 Low valuePotential
UnicodeDecodeErrorwhen normalizing non-hex binary bytes.If
keyis bytes with length ≠ 32 and contains non-ASCII characters,key.decode("ascii")on line 40 will raiseUnicodeDecodeError. This could happen if the caller passes raw binary bytes that aren't hex-encoded.Consider adding explicit handling or documenting that non-32-byte inputs must be ASCII hex strings:
🛡️ Proposed defensive fix
def _normalize_spawn_proxy_process_ipc_hmac_key(key: str | bytes) -> bytes: if isinstance(key, bytes): if len(key) == 32: return key - key = key.decode("ascii") + try: + key = key.decode("ascii") + except UnicodeDecodeError as e: + raise ValueError( + "IPC HMAC key bytes must be exactly 32 bytes or ASCII hex-encoded" + ) from e key_bytes = bytes.fromhex(key) if len(key_bytes) != 32: raise ValueError("IPC HMAC key must be 32 bytes.") return key_bytes🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/executor/utils.py` around lines 36 - 45, The function _normalize_spawn_proxy_process_ipc_hmac_key can raise UnicodeDecodeError when given non-32 raw bytes containing non-ASCII values; wrap the key.decode("ascii") in a try/except UnicodeDecodeError and convert that into a clear ValueError (e.g. "IPC HMAC key must be 32 bytes or an ASCII hex string") so callers get a deterministic error; keep the existing bytes.fromhex flow and length check for the decoded hex string and return the 32-byte result if valid.
87-89: ⚡ Quick winConsider using
ValueErrorinstead ofAssertionErrorfor missing key.
AssertionErroris typically reserved for programming errors caught during development and can be disabled with-O. For a runtime configuration error that should always be checked, aValueErroror custom exception is more appropriate.♻️ Proposed fix
- raise AssertionError( + raise ValueError( f"{LlmLauncherEnvs.TLLM_SPAWN_PROXY_PROCESS_IPC_HMAC_KEY_FD} is not set. " "HMAC encryption is required for IPC communication.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/executor/utils.py` around lines 87 - 89, Replace the runtime check that raises AssertionError with a ValueError to signal a missing runtime configuration; specifically, in tensorrt_llm/executor/utils.py update the exception raised where LlmLauncherEnvs.TLLM_SPAWN_PROXY_PROCESS_IPC_HMAC_KEY_FD is validated (the block that currently raises AssertionError saying the HMAC key FD is not set) to raise ValueError with the same descriptive message so the error cannot be disabled with -O and correctly represents a configuration/runtime error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tensorrt_llm/executor/utils.py`:
- Around line 36-45: The function _normalize_spawn_proxy_process_ipc_hmac_key
can raise UnicodeDecodeError when given non-32 raw bytes containing non-ASCII
values; wrap the key.decode("ascii") in a try/except UnicodeDecodeError and
convert that into a clear ValueError (e.g. "IPC HMAC key must be 32 bytes or an
ASCII hex string") so callers get a deterministic error; keep the existing
bytes.fromhex flow and length check for the decoded hex string and return the
32-byte result if valid.
- Around line 87-89: Replace the runtime check that raises AssertionError with a
ValueError to signal a missing runtime configuration; specifically, in
tensorrt_llm/executor/utils.py update the exception raised where
LlmLauncherEnvs.TLLM_SPAWN_PROXY_PROCESS_IPC_HMAC_KEY_FD is validated (the block
that currently raises AssertionError saying the HMAC key FD is not set) to raise
ValueError with the same descriptive message so the error cannot be disabled
with -O and correctly represents a configuration/runtime error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f193e4b7-543c-44ca-93e9-c7ed36a338c0
📒 Files selected for processing (4)
tensorrt_llm/commands/serve.pytensorrt_llm/executor/utils.pytensorrt_llm/llmapi/trtllm-llmapi-launchtests/unittest/executor/test_launcher_envs.py
|
/bot run |
2 similar comments
|
/bot run |
|
/bot run |
|
PR_Github #49788 [ run ] triggered by Bot. Commit: |
|
PR_Github #49788 [ run ] completed with state
|
|
/bot run |
|
PR_Github #49848 [ run ] triggered by Bot. Commit: |
|
PR_Github #49848 [ run ] completed with state
|
|
/bot run |
|
PR_Github #49905 [ run ] triggered by Bot. Commit: |
|
PR_Github #49905 [ run ] completed with state |
Summary by CodeRabbit
Improvements
Tests
Description
This prevents another process to steal HMAC key from the environment variable.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.