Skip to content

[https://nvbugs/5814350][fix] Fix OOM killed during weight loading in disaggregated sever#12377

Merged
yingguo-trt merged 4 commits intoNVIDIA:mainfrom
yingguo-trt:bug_fix_5814350
Mar 20, 2026
Merged

[https://nvbugs/5814350][fix] Fix OOM killed during weight loading in disaggregated sever#12377
yingguo-trt merged 4 commits intoNVIDIA:mainfrom
yingguo-trt:bug_fix_5814350

Conversation

@yingguo-trt
Copy link
Collaborator

@yingguo-trt yingguo-trt commented Mar 20, 2026

Summary by CodeRabbit

  • Chores
    • Updated performance benchmark and integration test configuration files with memory allocator tuning settings.
    • Fixed file formatting consistency in configuration files.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…v_var

Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
@yingguo-trt yingguo-trt requested review from a team as code owners March 20, 2026 02:42
@yingguo-trt yingguo-trt requested a review from nvchenghaoz March 20, 2026 02:42
@yingguo-trt yingguo-trt removed the request for review from nvchenghaoz March 20, 2026 02:43
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

This pull request adds MIMALLOC_PURGE_DELAY=0 environment variable to worker configurations across 500+ test and example configuration files. Most changes append this memory management setting to existing worker_env_var strings in YAML test configurations, while a few files add it as a new configuration entry or fix EOF markers.

Changes

Cohort / File(s) Summary
Auto-deploy examples
examples/auto_deploy/nano_v3.yaml, examples/auto_deploy/super_v3.yaml
Added newline at end of file (cosmetic formatting).
Disaggregated integration test configs - Qwen3
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_*_ccb-*.yaml (8 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated integration test configs - Qwen3 FP8
tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP8_*_ccb-*.yaml (2 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated integration test configs - DeepSeek R1
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_*.yaml (24 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated integration test configs - DeepSeek R1 1k1k
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_*.yaml (8 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated integration test configs - DeepSeek R1 8k1k
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_*.yaml (6 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Wideep accuracy integration test configs
tests/integration/defs/perf/disagg/test_configs/wideep/accuracy/*.yaml (2 files)
Added worker_env_var: MIMALLOC_PURGE_DELAY=0 to environment section.
Wideep perf integration test configs - Qwen3
tests/integration/defs/perf/disagg/test_configs/wideep/perf/Qwen3-235B-A22B-FP4_*.yaml (6 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Wideep perf integration test configs - DeepSeek
tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_*.yaml (16 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Wideep perf integration test configs - DeepSeek V32, Kimi
tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_*.yaml, tests/integration/defs/perf/disagg/test_configs/wideep/perf/kimi-k2-thinking-fp4_*.yaml (5 files)
Appended or added MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated scripts perf test configs - Qwen3
tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP4_*.yaml, tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_*.yaml (44 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated scripts perf test configs - DeepSeek R1
tests/scripts/perf/disaggregated/deepseek-r1-fp4_128k8k_*.yaml (15 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated scripts perf test configs - DeepSeek R1 1k1k
tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_*.yaml (44 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated scripts perf test configs - DeepSeek R1 8k1k
tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_*.yaml (26 files)
Appended MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.
Disaggregated scripts perf stress/accuracy test configs
tests/scripts/perf/disaggregated/wideep_*.yaml (22 files)
Appended or added MIMALLOC_PURGE_DELAY=0 to environment.worker_env_var.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely empty and lacks substantive content about the changes being made or why they are needed. Add a clear description explaining: (1) what OOM issue is being fixed, (2) why adding MIMALLOC_PURGE_DELAY=0 to worker environment variables solves this, and (3) which test configurations were updated and why. Also include test coverage details demonstrating the fix is effective.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title '[https://nvbugs/5814350][fix] Fix OOM killed during weight loading in disaggregated server' clearly describes the main fix (OOM issue during weight loading) with proper format including NVBugs ID and type tag.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (19)
tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml (1)

40-40: LGTM - reasonable OOM mitigation.

Adding MIMALLOC_PURGE_DELAY=0 is an appropriate approach to mitigate the OOM issue during weight loading. This configures the mimalloc allocator to immediately return freed memory to the OS rather than caching it, which should help reduce memory pressure at the cost of a minor performance tradeoff. Given that garbage collection is already disabled for workers (via TRTLLM_WORKER_DISABLE_GC=1), aggressive memory purging makes sense.

📝 Optional: Consider adding a comment for maintainability

For future maintainability, you could add a YAML comment explaining why this variable is set:

   work_dir: <full_path_to_work_dir>
+  # MIMALLOC_PURGE_DELAY=0 mitigates OOM during weight loading (nvbugs/5814350)
   worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml`
at line 40, Add a short YAML comment above the worker_env_var setting explaining
why MIMALLOC_PURGE_DELAY=0 is set (to immediately return freed memory to the OS
to mitigate OOM during weight loading when TRTLLM_WORKER_DISABLE_GC=1), so
future maintainers understand the tradeoff and rationale; update the entry that
contains worker_env_var and mention MIMALLOC_PURGE_DELAY,
TRTLLM_WORKER_DISABLE_GC, and the OOM/weight-loading context in one concise
comment.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx5_pp4_gen1_dep32_bs2_eplb0_mtp3-Default.yaml (1)

38-39: Appropriate fix for OOM during weight loading.

Adding MIMALLOC_PURGE_DELAY=0 forces mimalloc to immediately return freed memory to the OS, which reduces memory pressure and should help prevent out-of-memory kills during weight loading. This is a suitable approach for the stated issue.

Optional: Consider improving YAML readability.

While the multi-line format is valid YAML, quoting the entire value would make it clearer:

♻️ Optional formatting improvement
-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx5_pp4_gen1_dep32_bs2_eplb0_mtp3-Default.yaml`
around lines 38 - 39, The worker_env_var entry needs MIMALLOC_PURGE_DELAY=0
added so mimalloc immediately returns freed memory to the OS to reduce memory
pressure during weight loading; update the worker_env_var value (the env string
under worker_env_var) to include MIMALLOC_PURGE_DELAY=0 alongside the existing
vars (e.g., TLLM_LOG_LEVEL, TRTLLM_* vars) so that processes use the new alloc
behavior and avoid OOMs during model weight load.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx2_pp8_gen1_dep32_bs2_eplb0_mtp0-Default.yaml (1)

38-39: LGTM! Good fix for the OOM issue.

Adding MIMALLOC_PURGE_DELAY=0 will cause mimalloc to immediately return freed memory to the OS, which should help prevent memory fragmentation during weight loading in the disaggregated server. This is an appropriate configuration change to address the OOM kills mentioned in the PR objective.

The combination with the existing GC disable flags (TRTLLM_SERVER_DISABLE_GC=1, TRTLLM_WORKER_DISABLE_GC=1) creates a memory management strategy where Python GC is disabled but the allocator aggressively returns memory to the OS.


Optional: Consider improving readability.

The multi-line format (lines 38-39) is valid YAML but could be more readable. Consider using a YAML block scalar or adding an inline comment explaining the OOM fix.

📝 Optional formatting improvement

Using a folded block scalar for better readability:

-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: >-
+    TLLM_LOG_LEVEL=INFO
+    TRTLLM_SERVER_DISABLE_GC=1
+    TRTLLM_WORKER_DISABLE_GC=1
+    MIMALLOC_PURGE_DELAY=0
+    TRTLLM_ENABLE_PDL=1
+    ENROOT_ALLOW_DEV=yes

Or adding an inline comment:

-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
+  # MIMALLOC_PURGE_DELAY=0 added to fix OOM during weight loading (bug 5814350)
+  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
     TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx2_pp8_gen1_dep32_bs2_eplb0_mtp0-Default.yaml`
around lines 38 - 39, The worker_env_var value is spread across two YAML lines
which is valid but less readable; update the worker_env_var entry so the
environment variables are clearer by using a folded block scalar or by putting
them on one line and/or adding an inline comment explaining
MIMALLOC_PURGE_DELAY=0 (and why
TRTLLM_SERVER_DISABLE_GC/TRTLLM_WORKER_DISABLE_GC remain set), locating the
worker_env_var key in the YAML and replacing the multi-line value with the
folded-scalar version or a single-line value plus comment to improve
readability.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep8_bs4_eplb0_mtp2-Default.yaml (1)

38-38: Validate that MIMALLOC_PURGE_DELAY=0 resolves the OOM issue and assess performance impact.

Setting MIMALLOC_PURGE_DELAY=0 enables immediate memory purging, which should help prevent OOM kills by reclaiming unused memory faster. However, this may introduce performance overhead due to more frequent purging operations.

Please ensure:

  1. Testing confirms this change resolves the OOM issue during weight loading in disaggregated servers
  2. Performance benchmarks show acceptable overhead from immediate purging
  3. Consider if a non-zero delay (e.g., 100ms) might provide a better balance between memory reclamation and performance
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep8_bs4_eplb0_mtp2-Default.yaml`
at line 38, Update test validation for the worker_env_var line to verify that
setting MIMALLOC_PURGE_DELAY=0 prevents OOM during weight loading: run the
disaggregated server weight-load scenario with this config, capture memory usage
and OOM events, and confirm no OOM kills occur; additionally run performance
benchmarks to measure overhead (latency and throughput) introduced by immediate
purging and compare against a baseline, and if overhead is significant, test a
non-zero delay (e.g., MIMALLOC_PURGE_DELAY=100) to find a better trade-off
before finalizing the YAML change.
tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml (2)

39-40: Consider improving YAML readability.

The multi-line string using implicit line folding works but could be clearer. Consider using an explicit YAML multiline operator or quotes.

📝 Alternative YAML formatting options

Option 1: Folded scalar with >

-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: >-
+    TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
+    MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes

Option 2: Quoted string

-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml`
around lines 39 - 40, The worker_env_var multiline environment string is hard to
read in its current implicit form; replace it with an explicit YAML multiline
scalar (e.g., use folded ">" or literal "|" or a quoted single-line string) so
each env var is clearer and preserves spacing—update the value for the key
worker_env_var to use an explicit multiline scalar or a quoted string containing
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" to improve
readability while keeping the same contents.

39-39: Be aware of performance trade-offs.

MIMALLOC_PURGE_DELAY=0 forces immediate purging of unused memory back to the OS, which reduces memory footprint but may increase allocation overhead and degrade performance. Monitor allocation-heavy workloads to ensure the performance impact is acceptable given the OOM fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml`
at line 39, The env var MIMALLOC_PURGE_DELAY=0 in the worker_env_var forces
immediate purging which can hurt allocation-heavy performance; update the YAML
to either set a non-zero purge delay (e.g., a small positive value) or make the
setting configurable (via an env var or a runtime flag) and document it
alongside worker_env_var so OOM mitigation is enabled only when acceptable;
reference the literal MIMALLOC_PURGE_DELAY=0 and worker_env_var to locate and
change the value and add a short comment explaining the trade-off for future
reviewers.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx8_pp4_gen1_dep32_bs4_eplb0_mtp3-Default.yaml (1)

38-39: Consider using explicit YAML multi-line syntax for clarity.

While the current implicit line folding is valid YAML, using an explicit folded scalar would make the multi-line intent clearer and more maintainable.

📝 Optional refactor for improved readability
-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: >-
+    TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 
+    MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes

The >- indicator creates a folded scalar (spaces replace newlines) without a trailing newline.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx8_pp4_gen1_dep32_bs4_eplb0_mtp3-Default.yaml`
around lines 38 - 39, The worker_env_var value currently uses implicit YAML line
folding which is valid but unclear; update the YAML to use an explicit folded
scalar for that key (worker_env_var) such as the >- folded indicator and place
each env var on its own line under worker_env_var so the multi-line intent is
explicit and easier to maintain (preserve the same env variable strings and
spacing while switching to the folded scalar form).
tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml (1)

41-41: Consider adding a comment to document why this setting is needed.

Future maintainers won't understand why MIMALLOC_PURGE_DELAY=0 was added to this configuration without context. A brief inline comment would improve maintainability.

📝 Suggested documentation improvement
  work_dir: <full_path_to_work_dir>
+  # MIMALLOC_PURGE_DELAY=0: Immediately return freed memory to OS to prevent OOM during weight loading (see nvbugs/5814350)
   worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0"
   server_env_var: "TRTLLM_SERVER_DISABLE_GC=1"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml`
at line 41, The environment variable MIMALLOC_PURGE_DELAY=0 in the
worker_env_var line is unclear to future readers; add a brief inline comment
next to worker_env_var (mentioning MIMALLOC_PURGE_DELAY=0) that states why the
immediate mimalloc purge is required for these tests (e.g., to prevent memory
growth or GC latency during long-running/integration perf runs) and, if
applicable, reference any linked bug/issue or profiling data that motivated the
setting.
tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep48_bs16_eplb288_mtp3_ccb-DEFAULT.yaml (1)

42-42: LGTM!

The MIMALLOC_PURGE_DELAY=0 addition is correct. Note that this file appends the variable at the end while other configs in this PR insert it in the middle—functionally equivalent but inconsistent ordering across files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep48_bs16_eplb288_mtp3_ccb-DEFAULT.yaml`
at line 42, The environment variable string assigned to worker_env_var places
MIMALLOC_PURGE_DELAY=0 at the end causing inconsistent ordering across config
files; modify the worker_env_var value so MIMALLOC_PURGE_DELAY=0 is inserted in
the same spot used by the other PR configs (keep the same ordering around
TRTLLM_ENABLE_PDL and ENROOT_ALLOW_DEV), e.g., move MIMALLOC_PURGE_DELAY=0 to be
adjacent to the other TRTLLM_* entries rather than appended at the very end so
the ordering of variables in worker_env_var matches the rest of the PR.
tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-NIXL.yaml (1)

41-41: LGTM!

The MIMALLOC_PURGE_DELAY=0 addition is correct.

Optional: Across this PR, there's formatting inconsistency in worker_env_var values—some files use quoted single-line strings while others use unquoted multi-line YAML scalars. Both are valid, but for maintainability, consider standardizing on one format (preferably quoted single-line for readability).

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-NIXL.yaml`
at line 41, This PR has inconsistent formatting for the YAML key worker_env_var
across files; standardize all occurrences (key name worker_env_var) to a quoted
single-line string format (e.g., "KEY=VAL KEY2=VAL2 ...") for readability and
maintainability by updating each config that currently uses unquoted multi-line
YAML scalars to the quoted single-line style, ensuring no values or whitespace
are altered other than quoting and collapsing into one line.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen8_tep4_bs4_eplb0_mtp0-Default.yaml (1)

38-39: Consider centralizing worker_env_var defaults.

Line 38 and Line 39 now carry another copy of a long shared env string. A YAML anchor/base include for common worker env vars would reduce drift and prevent missed updates in future bulk edits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen8_tep4_bs4_eplb0_mtp0-Default.yaml`
around lines 38 - 39, Centralize the repeated env string by defining a single
YAML anchor (e.g. &default_worker_env) that contains the long env vars and then
reference it where needed using the anchor (e.g. worker_env_var:
*default_worker_env) instead of duplicating the literal; update the block that
currently sets worker_env_var to reference that anchor (symbol: worker_env_var)
and remove the duplicate string so future updates only change the anchor
declaration.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)

39-39: LGTM, with a minor consistency note.

The change is functionally correct. Note that MIMALLOC_PURGE_DELAY=0 is placed at the end of the string here, whereas in some other config files in this PR it's placed after TRTLLM_WORKER_DISABLE_GC=1. This doesn't affect functionality (environment variable order is irrelevant), but a consistent ordering across all configs would improve maintainability.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml`
at line 39, Summary: Environment variable ordering in worker_env_var is
inconsistent across configs; standardize ordering to match other files. Update
the worker_env_var string for the config that contains "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0" so that MIMALLOC_PURGE_DELAY
appears immediately after TRTLLM_WORKER_DISABLE_GC=1 (i.e., reorder to
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") to match the
ordering used in other config files; edit the worker_env_var value accordingly
in the YAML where the key worker_env_var is defined.
tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml (1)

39-40: Multiline plain-scalar style parses correctly; consider explicit quoting for future maintainability.

Lines 39–40 use multiline plain-scalar folding, which works as intended. The YAML parses correctly and all environment variables are present. However, quoting the full value as a single-line string would improve readability and make future edits less prone to accidental whitespace changes:

-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml`
around lines 39 - 40, The worker_env_var value uses a multiline plain-scalar
folding; change it to an explicit quoted single-line string to avoid accidental
whitespace changes in the future by replacing the folded value for the
worker_env_var key with a single quoted string (e.g., "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so all environment variables remain
present but are preserved as one stable scalar.
tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_con16_ccb-NIXL.yaml (1)

39-40: Consider consolidating to a single-line quoted string for consistency.

For improved maintainability and consistency across configuration files, consider using a single-line quoted string for worker_env_var.

♻️ Suggested consolidation
-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_con16_ccb-NIXL.yaml`
around lines 39 - 40, The worker_env_var value is split across lines causing
inconsistency; consolidate it into a single-line quoted string by joining the
two lines into one value and wrapping the entire environment-variable string in
quotes (update the worker_env_var entry), e.g., set worker_env_var to a single
quoted string containing "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes" so parsers and other config files remain consistent.
tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-NIXL.yaml (1)

39-40: Consider consolidating to a single-line quoted string for consistency.

The worker_env_var value spans multiple lines using an unquoted continuation. While valid YAML, this format is less clear than the single-line quoted approach used in other files (e.g., Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml).

♻️ Suggested consolidation
-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-NIXL.yaml`
around lines 39 - 40, The YAML key worker_env_var currently uses a multi-line
unquoted continuation which is inconsistent with other configs; change it to a
single-line quoted string containing all environment variables (e.g.,
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so the value
is consolidated and consistent with other files; update the value where
worker_env_var is defined to use the quoted single-line form.
tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con36_ccb-UCX.yaml (1)

39-40: Consider consolidating to a single-line quoted string for consistency.

The worker_env_var value spans multiple lines using an unquoted continuation. For consistency with other configuration files and improved clarity, consider using a single-line quoted string format.

♻️ Suggested consolidation
-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con36_ccb-UCX.yaml`
around lines 39 - 40, The worker_env_var value is split across multiple lines;
consolidate it into a single-line quoted string for consistency by replacing the
multi-line unquoted value for the worker_env_var key with one quoted line
containing all environment assignments (e.g., "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 ... ENROOT_ALLOW_DEV=yes"), ensuring the key name
worker_env_var remains unchanged and spacing between assignments is preserved.
tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_con1024_ccb-UCX.yaml (1)

39-40: Consider consolidating to a single-line quoted string for consistency.

The worker_env_var value uses a multi-line unquoted format. For consistency with other files in this PR and improved clarity, consider consolidating to a single-line quoted string.

♻️ Suggested consolidation
-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_con1024_ccb-UCX.yaml`
around lines 39 - 40, The worker_env_var value is split across lines; change it
to a single-line quoted YAML string to match project convention: replace the
multi-line unquoted block for worker_env_var with one quoted string containing
all environment assignments separated by spaces (e.g. "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"), ensuring you update the key name
worker_env_var and preserve all exact variable names and values.
tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-UCX.yaml (1)

39-40: Consider consolidating to a single-line quoted string for consistency.

The multi-line unquoted format for worker_env_var is less maintainable than the single-line quoted approach used in other configuration files.

♻️ Suggested consolidation
-  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
-    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
+  worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-UCX.yaml`
around lines 39 - 40, The YAML key worker_env_var currently uses a multi-line
unquoted format; change it to a single-line quoted string containing all
environment variables (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes") to match other configs and ensure consistent parsing and
maintainability.
tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con1_ccb-NIXL.yaml (1)

39-40: Add documentation for the MIMALLOC_PURGE_DELAY=0 setting to explain the performance trade-off.

The MIMALLOC_PURGE_DELAY=0 setting is valid and correctly forces immediate memory purging, which reduces memory usage during weight loading. However, this increases the frequency of memory release syscalls, which will impact benchmark performance.

Consider adding a YAML comment above line 39 to document why this non-standard configuration is necessary:

  # Set MIMALLOC_PURGE_DELAY=0 to prevent OOM during weight loading (nvbugs/5814350)
  # Note: This reduces memory footprint but increases syscall overhead
  worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
    TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con1_ccb-NIXL.yaml`
around lines 39 - 40, Add an explanatory YAML comment above the worker_env_var
entry that sets MIMALLOC_PURGE_DELAY=0: reference the MIMALLOC_PURGE_DELAY
variable and note it is set to 0 to force immediate memory purging to avoid OOM
during weight loading (nvbugs/5814350), and also warn that this reduces memory
footprint but increases syscall overhead and can degrade benchmark performance;
update the comment near the worker_env_var line so future readers see why this
non-standard setting is required.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/auto_deploy/nano_v3.yaml`:
- Line 1: The file examples/auto_deploy/nano_v3.yaml currently contains a broken
symlink target "model_registry/configs/nano_v3.yaml"; recreate
examples/auto_deploy/nano_v3.yaml as a proper symlink pointing to
model_registry/configs/nano_v3.yaml (using the correct relative or absolute path
that resolves inside the repo) without changing the target string content or
newline semantics so the symlink resolves correctly and CI's check-symlinks
passes.

In `@examples/auto_deploy/super_v3.yaml`:
- Line 1: The first line of examples/auto_deploy/super_v3.yaml is currently
corrupted and must be a valid symlink target entry pointing to
model_registry/configs/super_v3.yaml; open examples/auto_deploy/super_v3.yaml,
replace the corrupted/empty first line so it contains the exact symlink target
(no extra newlines or trailing whitespace), save as a single-line symlink
target, re-add/commit the corrected symlink, and re-run the check-symlinks CI
check to verify the fix.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp4_gen13_tep4_bs1_eplb0_mtp0-Default.yaml`:
- Around line 38-39: The YAML key worker_env_var currently splits its
environment string across two lines without proper YAML quoting or block scalar;
fix by making the entire env string a single scalar—either place all variables
on one line quoted (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 ...")
or use a YAML block scalar (|) to join lines into one string; update the
worker_env_var entry so the full environment string is parsed correctly by tools
reading this value.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml`:
- Line 39: The config sets MIMALLOC_PURGE_DELAY=0 in worker_env_var which forces
immediate purges and can hurt throughput under high concurrency; change this to
a non-zero safe default (or remove it) and validate by running the existing
high-concurrency benchmark (concurrency_list: '2048') to compare worker
throughput and latency with and without the setting, then either restore the
default or pick a tuned value and add a brief inline comment near worker_env_var
documenting the chosen value and the benchmark results; reference the
worker_env_var key and MIMALLOC_PURGE_DELAY to locate the change.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-UCX.yaml`:
- Line 40: Update the worker environment variable string (worker_env_var) to
include MIMALLOC_PURGE_DELAY=0 for the worker processes to mitigate OOM during
weight loading, ensure this exact change is applied consistently across the
remaining affected config files (the 235 files noted), then run the relevant
benchmark/performance tests used for these configs to measure allocation
overhead and latency impact; if you observe regressions, revert to a less
aggressive value or make MIMALLOC_PURGE_DELAY configurable (e.g., only set
during weight loading) so you balance OOM safety with performance.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml`:
- Line 41: Add a plain-text comment immediately above or inline next to the
worker_env_var setting that explains why MIMALLOC_PURGE_DELAY=0 is set (e.g., to
force immediate purge to avoid OOM during disaggregated weight loading), notes
that it reduces memory footprint at the potential cost of performance and may
skew memory-related benchmark metrics, and instructs maintainers to treat
results as non-directly comparable to historical baselines or to track them
separately; reference the worker_env_var and the MIMALLOC_PURGE_DELAY=0 token in
the comment so future readers know this setting’s intent and impact.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_128k8k_ctx3_pp8_gen1_dep32_bs4_eplb0_mtp0_con4-Default.yaml`:
- Around line 38-39: The worker_env_var YAML scalar is split across lines
incorrectly (key: worker_env_var); merge the two lines into a single valid
scalar by either placing the entire value on one quoted line (e.g. a
single-quoted string) or using proper YAML block scalar syntax (| or >) so all
env vars (TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes) are preserved; update the value for worker_env_var
accordingly so the parser reads the full env var string.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_con4_ccb-NIXL.yaml`:
- Around line 39-40: The YAML has a broken multi-line value for the key
worker_env_var (the value is split across two lines: "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0"
and the continuation "TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"); fix it by
either merging the entire env string onto a single line for worker_env_var or
wrap the entire multi-item value in quotes so YAML treats it as one scalar, and
apply the same quoted or single-line pattern used elsewhere in the PR to the
other affected config files.

In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml`:
- Around line 39-40: The YAML key worker_env_var has its value split across two
lines without proper YAML multi-line syntax; fix by making the entire env string
a single scalar (either put the whole value in quotes on one line:
worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes" or use a block scalar (|) and indent the continuation) so
the full set of environment variables is preserved; update the entry that
currently contains worker_env_var and the broken second line
(TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes) accordingly.

In
`@tests/scripts/perf/disaggregated/wideep_accuracy-deepseek-r1-fp4_gpqa_diamond_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml`:
- Around line 45-46: The YAML key worker_env_var is broken across two lines
causing invalid/missing env settings; fix by making the value a single scalar
(either wrap the entire env string in quotes on one line or use a proper YAML
multi-line scalar like | or >) so the full "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" value is preserved for worker_env_var.

---

Nitpick comments:
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep8_bs4_eplb0_mtp2-Default.yaml`:
- Line 38: Update test validation for the worker_env_var line to verify that
setting MIMALLOC_PURGE_DELAY=0 prevents OOM during weight loading: run the
disaggregated server weight-load scenario with this config, capture memory usage
and OOM events, and confirm no OOM kills occur; additionally run performance
benchmarks to measure overhead (latency and throughput) introduced by immediate
purging and compare against a baseline, and if overhead is significant, test a
non-zero delay (e.g., MIMALLOC_PURGE_DELAY=100) to find a better trade-off
before finalizing the YAML change.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen8_tep4_bs4_eplb0_mtp0-Default.yaml`:
- Around line 38-39: Centralize the repeated env string by defining a single
YAML anchor (e.g. &default_worker_env) that contains the long env vars and then
reference it where needed using the anchor (e.g. worker_env_var:
*default_worker_env) instead of duplicating the literal; update the block that
currently sets worker_env_var to reference that anchor (symbol: worker_env_var)
and remove the duplicate string so future updates only change the anchor
declaration.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx2_pp8_gen1_dep32_bs2_eplb0_mtp0-Default.yaml`:
- Around line 38-39: The worker_env_var value is spread across two YAML lines
which is valid but less readable; update the worker_env_var entry so the
environment variables are clearer by using a folded block scalar or by putting
them on one line and/or adding an inline comment explaining
MIMALLOC_PURGE_DELAY=0 (and why
TRTLLM_SERVER_DISABLE_GC/TRTLLM_WORKER_DISABLE_GC remain set), locating the
worker_env_var key in the YAML and replacing the multi-line value with the
folded-scalar version or a single-line value plus comment to improve
readability.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx5_pp4_gen1_dep32_bs2_eplb0_mtp3-Default.yaml`:
- Around line 38-39: The worker_env_var entry needs MIMALLOC_PURGE_DELAY=0 added
so mimalloc immediately returns freed memory to the OS to reduce memory pressure
during weight loading; update the worker_env_var value (the env string under
worker_env_var) to include MIMALLOC_PURGE_DELAY=0 alongside the existing vars
(e.g., TLLM_LOG_LEVEL, TRTLLM_* vars) so that processes use the new alloc
behavior and avoid OOMs during model weight load.

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx8_pp4_gen1_dep32_bs4_eplb0_mtp3-Default.yaml`:
- Around line 38-39: The worker_env_var value currently uses implicit YAML line
folding which is valid but unclear; update the YAML to use an explicit folded
scalar for that key (worker_env_var) such as the >- folded indicator and place
each env var on its own line under worker_env_var so the multi-line intent is
explicit and easier to maintain (preserve the same env variable strings and
spacing while switching to the folded scalar form).

In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml`:
- Line 39: Summary: Environment variable ordering in worker_env_var is
inconsistent across configs; standardize ordering to match other files. Update
the worker_env_var string for the config that contains "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0" so that MIMALLOC_PURGE_DELAY
appears immediately after TRTLLM_WORKER_DISABLE_GC=1 (i.e., reorder to
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") to match the
ordering used in other config files; edit the worker_env_var value accordingly
in the YAML where the key worker_env_var is defined.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml`:
- Line 40: Add a short YAML comment above the worker_env_var setting explaining
why MIMALLOC_PURGE_DELAY=0 is set (to immediately return freed memory to the OS
to mitigate OOM during weight loading when TRTLLM_WORKER_DISABLE_GC=1), so
future maintainers understand the tradeoff and rationale; update the entry that
contains worker_env_var and mention MIMALLOC_PURGE_DELAY,
TRTLLM_WORKER_DISABLE_GC, and the OOM/weight-loading context in one concise
comment.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml`:
- Line 41: The environment variable MIMALLOC_PURGE_DELAY=0 in the worker_env_var
line is unclear to future readers; add a brief inline comment next to
worker_env_var (mentioning MIMALLOC_PURGE_DELAY=0) that states why the immediate
mimalloc purge is required for these tests (e.g., to prevent memory growth or GC
latency during long-running/integration perf runs) and, if applicable, reference
any linked bug/issue or profiling data that motivated the setting.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep48_bs16_eplb288_mtp3_ccb-DEFAULT.yaml`:
- Line 42: The environment variable string assigned to worker_env_var places
MIMALLOC_PURGE_DELAY=0 at the end causing inconsistent ordering across config
files; modify the worker_env_var value so MIMALLOC_PURGE_DELAY=0 is inserted in
the same spot used by the other PR configs (keep the same ordering around
TRTLLM_ENABLE_PDL and ENROOT_ALLOW_DEV), e.g., move MIMALLOC_PURGE_DELAY=0 to be
adjacent to the other TRTLLM_* entries rather than appended at the very end so
the ordering of variables in worker_env_var matches the rest of the PR.

In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-NIXL.yaml`:
- Line 41: This PR has inconsistent formatting for the YAML key worker_env_var
across files; standardize all occurrences (key name worker_env_var) to a quoted
single-line string format (e.g., "KEY=VAL KEY2=VAL2 ...") for readability and
maintainability by updating each config that currently uses unquoted multi-line
YAML scalars to the quoted single-line style, ensuring no values or whitespace
are altered other than quoting and collapsing into one line.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var multiline environment string is hard to
read in its current implicit form; replace it with an explicit YAML multiline
scalar (e.g., use folded ">" or literal "|" or a quoted single-line string) so
each env var is clearer and preserves spacing—update the value for the key
worker_env_var to use an explicit multiline scalar or a quoted string containing
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" to improve
readability while keeping the same contents.
- Line 39: The env var MIMALLOC_PURGE_DELAY=0 in the worker_env_var forces
immediate purging which can hurt allocation-heavy performance; update the YAML
to either set a non-zero purge delay (e.g., a small positive value) or make the
setting configurable (via an env var or a runtime flag) and document it
alongside worker_env_var so OOM mitigation is enabled only when acceptable;
reference the literal MIMALLOC_PURGE_DELAY=0 and worker_env_var to locate and
change the value and add a short comment explaining the trade-off for future
reviewers.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_con16_ccb-NIXL.yaml`:
- Around line 39-40: The worker_env_var value is split across lines causing
inconsistency; consolidate it into a single-line quoted string by joining the
two lines into one value and wrapping the entire environment-variable string in
quotes (update the worker_env_var entry), e.g., set worker_env_var to a single
quoted string containing "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes" so parsers and other config files remain consistent.

In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var value uses a multiline plain-scalar
folding; change it to an explicit quoted single-line string to avoid accidental
whitespace changes in the future by replacing the folded value for the
worker_env_var key with a single quoted string (e.g., "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so all environment variables remain
present but are preserved as one stable scalar.

In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_con1024_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var value is split across lines; change it
to a single-line quoted YAML string to match project convention: replace the
multi-line unquoted block for worker_env_var with one quoted string containing
all environment assignments separated by spaces (e.g. "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"), ensuring you update the key name
worker_env_var and preserve all exact variable names and values.

In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con1_ccb-NIXL.yaml`:
- Around line 39-40: Add an explanatory YAML comment above the worker_env_var
entry that sets MIMALLOC_PURGE_DELAY=0: reference the MIMALLOC_PURGE_DELAY
variable and note it is set to 0 to force immediate memory purging to avoid OOM
during weight loading (nvbugs/5814350), and also warn that this reduces memory
footprint but increases syscall overhead and can degrade benchmark performance;
update the comment near the worker_env_var line so future readers see why this
non-standard setting is required.

In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con36_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var value is split across multiple lines;
consolidate it into a single-line quoted string for consistency by replacing the
multi-line unquoted value for the worker_env_var key with one quoted line
containing all environment assignments (e.g., "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 ... ENROOT_ALLOW_DEV=yes"), ensuring the key name
worker_env_var remains unchanged and spacing between assignments is preserved.

In
`@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-NIXL.yaml`:
- Around line 39-40: The YAML key worker_env_var currently uses a multi-line
unquoted continuation which is inconsistent with other configs; change it to a
single-line quoted string containing all environment variables (e.g.,
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so the value
is consolidated and consistent with other files; update the value where
worker_env_var is defined to use the quoted single-line form.

In
`@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-UCX.yaml`:
- Around line 39-40: The YAML key worker_env_var currently uses a multi-line
unquoted format; change it to a single-line quoted string containing all
environment variables (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes") to match other configs and ensure consistent parsing and
maintainability.

Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
@yingguo-trt yingguo-trt removed the request for review from a team March 20, 2026 03:33
This reverts commit 3627e26.

Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
@yingguo-trt
Copy link
Collaborator Author

/bot skip --comment "Not cover in CI pipelines"

@yingguo-trt yingguo-trt enabled auto-merge (squash) March 20, 2026 03:38
@tensorrt-cicd
Copy link
Collaborator

PR_Github #39681 [ skip ] triggered by Bot. Commit: e7fc563 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39681 [ skip ] completed with state SUCCESS. Commit: e7fc563
Skipping testing for commit e7fc563

Link to invocation

Copy link
Collaborator

@fredricz-20070104 fredricz-20070104 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one resolves the OOM bug. Approved.

@yingguo-trt yingguo-trt merged commit 3eabe7d into NVIDIA:main Mar 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants