[https://nvbugs/5972776][fix] Pass IPC HMAC key through file descriptor by yibinl-nvidia · Pull Request #14378 · NVIDIA/TensorRT-LLM

yibinl-nvidia · 2026-05-21T01:00:59Z

Summary by CodeRabbit

Improvements
- Enhanced IPC process security by updating how authentication keys are transmitted between processes, replacing environment variable-based passing with a more secure file descriptor-based mechanism to reduce key exposure.
Tests
- Added test coverage for the updated authentication key provisioning behavior.

Description

This prevents another process to steal HMAC key from the environment variable.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

coderabbitai · 2026-05-21T22:41:51Z

📝 Walkthrough

Walkthrough

This PR switches IPC HMAC key provisioning from environment variables to file descriptors across Python executor launch paths, disaggregated leader spawn, shell script launch, and test coverage. The core mechanism caches and reads keys from FDs, with coordinated updates in serve.py (disaggregated leader), the llmapi launch script, and corresponding test coverage.

Changes

IPC HMAC Key File Descriptor Mechanism

Layer / File(s)	Summary
Core FD-based HMAC key utilities `tensorrt_llm/executor/utils.py`	Add `TLLM_SPAWN_PROXY_PROCESS_IPC_HMAC_KEY_FD` enum member, module-level key caching with `set_spawn_proxy_process_ipc_hmac_key()` setter, FD reading helper that normalizes bytes to 32-byte key, and updated `get_spawn_proxy_process_ipc_hmac_key_env()` that prefers FD source, caches result, and asserts on missing key.
Disaggregated leader launch with FD-based key `tensorrt_llm/commands/serve.py`	Update imports, generate and cache HMAC key in `_launch_disaggregated_leader`, remove old env var assertion, create pipe and write key to it, pass read-end FD to child via `pass_fds`, and cleanup pipe FDs in finally block.
Shell script wrapper and execution points `tensorrt_llm/llmapi/trtllm-llmapi-launch`	Introduce `run_with_ipc_hmac_key` wrapper that generates key at runtime and passes via FD using bash `exec {fd}<<<...`; wrap Rank0 task execution, MGMN leader-node stop, and leader-node start through wrapper.
Test coverage for FD-based HMAC key reading `tests/unittest/executor/test_launcher_envs.py`	Add test helpers for cache/env cleanup and pipe FD setup; test FD reading and env var removal, caching behavior across calls, and assertion when HMAC key source is missing.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is incomplete. It explains the 'why' (prevent key theft) but lacks a detailed 'what' section, test coverage details, and most of the PR checklist items remain unchecked despite the author marking completion.	Expand description to detail the implementation changes across files, specifically list test cases added, and ensure all applicable PR checklist items are properly addressed and verified.
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: passing the IPC HMAC key through a file descriptor instead of environment variables, directly addressing the security concern.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

tensorrt_llm/executor/utils.py (2)

36-45: 💤 Low value

Potential UnicodeDecodeError when normalizing non-hex binary bytes.

If key is bytes with length ≠ 32 and contains non-ASCII characters, key.decode("ascii") on line 40 will raise UnicodeDecodeError. This could happen if the caller passes raw binary bytes that aren't hex-encoded.

Consider adding explicit handling or documenting that non-32-byte inputs must be ASCII hex strings:

🛡️ Proposed defensive fix

 def _normalize_spawn_proxy_process_ipc_hmac_key(key: str | bytes) -> bytes:
     if isinstance(key, bytes):
         if len(key) == 32:
             return key
-        key = key.decode("ascii")
+        try:
+            key = key.decode("ascii")
+        except UnicodeDecodeError as e:
+            raise ValueError(
+                "IPC HMAC key bytes must be exactly 32 bytes or ASCII hex-encoded"
+            ) from e

     key_bytes = bytes.fromhex(key)
     if len(key_bytes) != 32:
         raise ValueError("IPC HMAC key must be 32 bytes.")
     return key_bytes

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/executor/utils.py` around lines 36 - 45, The function
_normalize_spawn_proxy_process_ipc_hmac_key can raise UnicodeDecodeError when
given non-32 raw bytes containing non-ASCII values; wrap the key.decode("ascii")
in a try/except UnicodeDecodeError and convert that into a clear ValueError
(e.g. "IPC HMAC key must be 32 bytes or an ASCII hex string") so callers get a
deterministic error; keep the existing bytes.fromhex flow and length check for
the decoded hex string and return the 32-byte result if valid.

87-89: ⚡ Quick win

Consider using ValueError instead of AssertionError for missing key.

AssertionError is typically reserved for programming errors caught during development and can be disabled with -O. For a runtime configuration error that should always be checked, a ValueError or custom exception is more appropriate.

♻️ Proposed fix

-    raise AssertionError(
+    raise ValueError(
         f"{LlmLauncherEnvs.TLLM_SPAWN_PROXY_PROCESS_IPC_HMAC_KEY_FD} is not set. "
         "HMAC encryption is required for IPC communication.")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/executor/utils.py` around lines 87 - 89, Replace the runtime
check that raises AssertionError with a ValueError to signal a missing runtime
configuration; specifically, in tensorrt_llm/executor/utils.py update the
exception raised where LlmLauncherEnvs.TLLM_SPAWN_PROXY_PROCESS_IPC_HMAC_KEY_FD
is validated (the block that currently raises AssertionError saying the HMAC key
FD is not set) to raise ValueError with the same descriptive message so the
error cannot be disabled with -O and correctly represents a
configuration/runtime error.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tensorrt_llm/executor/utils.py`:
- Around line 36-45: The function _normalize_spawn_proxy_process_ipc_hmac_key
can raise UnicodeDecodeError when given non-32 raw bytes containing non-ASCII
values; wrap the key.decode("ascii") in a try/except UnicodeDecodeError and
convert that into a clear ValueError (e.g. "IPC HMAC key must be 32 bytes or an
ASCII hex string") so callers get a deterministic error; keep the existing
bytes.fromhex flow and length check for the decoded hex string and return the
32-byte result if valid.
- Around line 87-89: Replace the runtime check that raises AssertionError with a
ValueError to signal a missing runtime configuration; specifically, in
tensorrt_llm/executor/utils.py update the exception raised where
LlmLauncherEnvs.TLLM_SPAWN_PROXY_PROCESS_IPC_HMAC_KEY_FD is validated (the block
that currently raises AssertionError saying the HMAC key FD is not set) to raise
ValueError with the same descriptive message so the error cannot be disabled
with -O and correctly represents a configuration/runtime error.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f193e4b7-543c-44ca-93e9-c7ed36a338c0

📥 Commits

Reviewing files that changed from the base of the PR and between 3b8387c and 525f500.

📒 Files selected for processing (4)

tensorrt_llm/commands/serve.py
tensorrt_llm/executor/utils.py
tensorrt_llm/llmapi/trtllm-llmapi-launch
tests/unittest/executor/test_launcher_envs.py

yibinl-nvidia · 2026-05-21T22:54:34Z

/bot run

yibinl-nvidia · 2026-05-21T23:45:20Z

/bot run

yibinl-nvidia · 2026-05-21T23:53:15Z

/bot run

tensorrt-cicd · 2026-05-21T23:59:01Z

PR_Github #49788 [ run ] triggered by Bot. Commit: 525f500 Link to invocation

tensorrt-cicd · 2026-05-22T04:49:20Z

PR_Github #49788 [ run ] completed with state SUCCESS. Commit: 525f500
/LLM/main/L0_MergeRequest_PR pipeline #39379 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

yibinl-nvidia · 2026-05-22T04:50:19Z

/bot run

tensorrt-cicd · 2026-05-22T04:55:30Z

PR_Github #49848 [ run ] triggered by Bot. Commit: 525f500 Link to invocation

tensorrt-cicd · 2026-05-22T09:00:57Z

PR_Github #49848 [ run ] completed with state SUCCESS. Commit: 525f500
/LLM/main/L0_MergeRequest_PR pipeline #39432 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

yibinl-nvidia · 2026-05-22T09:01:22Z

/bot run

tensorrt-cicd · 2026-05-22T09:08:00Z

PR_Github #49905 [ run ] triggered by Bot. Commit: 525f500 Link to invocation

tensorrt-cicd · 2026-05-22T13:14:44Z

PR_Github #49905 [ run ] completed with state SUCCESS. Commit: 525f500
/LLM/main/L0_MergeRequest_PR pipeline #39484 completed with status: 'SUCCESS'

CI Report

Link to invocation

Superjomn

LGTM

Pass IPC HMAC key through file descriptor

8a6d3a7

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

github-actions Bot assigned yibinl-nvidia May 21, 2026

yibinl-nvidia added 2 commits May 21, 2026 01:51

Remove raw IPC HMAC key environment variable

2f16a73

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

Apply pre-commit formatting

525f500

Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>

yibinl-nvidia marked this pull request as ready for review May 21, 2026 22:37

yibinl-nvidia requested a review from a team as a code owner May 21, 2026 22:37

yibinl-nvidia requested a review from Superjomn May 21, 2026 22:37

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Superjomn approved these changes May 28, 2026

View reviewed changes

yibinl-nvidia merged commit 50ca49f into NVIDIA:main May 28, 2026
13 of 14 checks passed

Conversation

yibinl-nvidia commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 21, 2026

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

yibinl-nvidia commented May 21, 2026

Uh oh!

yibinl-nvidia commented May 21, 2026

Uh oh!

yibinl-nvidia commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

yibinl-nvidia commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

yibinl-nvidia commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yibinl-nvidia commented May 21, 2026 •

edited by coderabbitai Bot

Loading