[https://nvbugs/5814350][fix] Fix OOM killed during weight loading in disaggregated sever#12377
Conversation
…v_var Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
📝 WalkthroughWalkthroughThis pull request adds Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the high-level summary generated by CodeRabbit.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (19)
tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml (1)
40-40: LGTM - reasonable OOM mitigation.Adding
MIMALLOC_PURGE_DELAY=0is an appropriate approach to mitigate the OOM issue during weight loading. This configures the mimalloc allocator to immediately return freed memory to the OS rather than caching it, which should help reduce memory pressure at the cost of a minor performance tradeoff. Given that garbage collection is already disabled for workers (viaTRTLLM_WORKER_DISABLE_GC=1), aggressive memory purging makes sense.📝 Optional: Consider adding a comment for maintainability
For future maintainability, you could add a YAML comment explaining why this variable is set:
work_dir: <full_path_to_work_dir> + # MIMALLOC_PURGE_DELAY=0 mitigates OOM during weight loading (nvbugs/5814350) worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml` at line 40, Add a short YAML comment above the worker_env_var setting explaining why MIMALLOC_PURGE_DELAY=0 is set (to immediately return freed memory to the OS to mitigate OOM during weight loading when TRTLLM_WORKER_DISABLE_GC=1), so future maintainers understand the tradeoff and rationale; update the entry that contains worker_env_var and mention MIMALLOC_PURGE_DELAY, TRTLLM_WORKER_DISABLE_GC, and the OOM/weight-loading context in one concise comment.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx5_pp4_gen1_dep32_bs2_eplb0_mtp3-Default.yaml (1)
38-39: Appropriate fix for OOM during weight loading.Adding
MIMALLOC_PURGE_DELAY=0forces mimalloc to immediately return freed memory to the OS, which reduces memory pressure and should help prevent out-of-memory kills during weight loading. This is a suitable approach for the stated issue.Optional: Consider improving YAML readability.
While the multi-line format is valid YAML, quoting the entire value would make it clearer:
♻️ Optional formatting improvement
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx5_pp4_gen1_dep32_bs2_eplb0_mtp3-Default.yaml` around lines 38 - 39, The worker_env_var entry needs MIMALLOC_PURGE_DELAY=0 added so mimalloc immediately returns freed memory to the OS to reduce memory pressure during weight loading; update the worker_env_var value (the env string under worker_env_var) to include MIMALLOC_PURGE_DELAY=0 alongside the existing vars (e.g., TLLM_LOG_LEVEL, TRTLLM_* vars) so that processes use the new alloc behavior and avoid OOMs during model weight load.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx2_pp8_gen1_dep32_bs2_eplb0_mtp0-Default.yaml (1)
38-39: LGTM! Good fix for the OOM issue.Adding
MIMALLOC_PURGE_DELAY=0will cause mimalloc to immediately return freed memory to the OS, which should help prevent memory fragmentation during weight loading in the disaggregated server. This is an appropriate configuration change to address the OOM kills mentioned in the PR objective.The combination with the existing GC disable flags (
TRTLLM_SERVER_DISABLE_GC=1,TRTLLM_WORKER_DISABLE_GC=1) creates a memory management strategy where Python GC is disabled but the allocator aggressively returns memory to the OS.
Optional: Consider improving readability.
The multi-line format (lines 38-39) is valid YAML but could be more readable. Consider using a YAML block scalar or adding an inline comment explaining the OOM fix.
📝 Optional formatting improvement
Using a folded block scalar for better readability:
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: >- + TLLM_LOG_LEVEL=INFO + TRTLLM_SERVER_DISABLE_GC=1 + TRTLLM_WORKER_DISABLE_GC=1 + MIMALLOC_PURGE_DELAY=0 + TRTLLM_ENABLE_PDL=1 + ENROOT_ALLOW_DEV=yesOr adding an inline comment:
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 + # MIMALLOC_PURGE_DELAY=0 added to fix OOM during weight loading (bug 5814350) + worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx2_pp8_gen1_dep32_bs2_eplb0_mtp0-Default.yaml` around lines 38 - 39, The worker_env_var value is spread across two YAML lines which is valid but less readable; update the worker_env_var entry so the environment variables are clearer by using a folded block scalar or by putting them on one line and/or adding an inline comment explaining MIMALLOC_PURGE_DELAY=0 (and why TRTLLM_SERVER_DISABLE_GC/TRTLLM_WORKER_DISABLE_GC remain set), locating the worker_env_var key in the YAML and replacing the multi-line value with the folded-scalar version or a single-line value plus comment to improve readability.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep8_bs4_eplb0_mtp2-Default.yaml (1)
38-38: Validate thatMIMALLOC_PURGE_DELAY=0resolves the OOM issue and assess performance impact.Setting
MIMALLOC_PURGE_DELAY=0enables immediate memory purging, which should help prevent OOM kills by reclaiming unused memory faster. However, this may introduce performance overhead due to more frequent purging operations.Please ensure:
- Testing confirms this change resolves the OOM issue during weight loading in disaggregated servers
- Performance benchmarks show acceptable overhead from immediate purging
- Consider if a non-zero delay (e.g., 100ms) might provide a better balance between memory reclamation and performance
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep8_bs4_eplb0_mtp2-Default.yaml` at line 38, Update test validation for the worker_env_var line to verify that setting MIMALLOC_PURGE_DELAY=0 prevents OOM during weight loading: run the disaggregated server weight-load scenario with this config, capture memory usage and OOM events, and confirm no OOM kills occur; additionally run performance benchmarks to measure overhead (latency and throughput) introduced by immediate purging and compare against a baseline, and if overhead is significant, test a non-zero delay (e.g., MIMALLOC_PURGE_DELAY=100) to find a better trade-off before finalizing the YAML change.tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml (2)
39-40: Consider improving YAML readability.The multi-line string using implicit line folding works but could be clearer. Consider using an explicit YAML multiline operator or quotes.
📝 Alternative YAML formatting options
Option 1: Folded scalar with
>- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: >- + TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 + MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yesOption 2: Quoted string
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml` around lines 39 - 40, The worker_env_var multiline environment string is hard to read in its current implicit form; replace it with an explicit YAML multiline scalar (e.g., use folded ">" or literal "|" or a quoted single-line string) so each env var is clearer and preserves spacing—update the value for the key worker_env_var to use an explicit multiline scalar or a quoted string containing "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" to improve readability while keeping the same contents.
39-39: Be aware of performance trade-offs.
MIMALLOC_PURGE_DELAY=0forces immediate purging of unused memory back to the OS, which reduces memory footprint but may increase allocation overhead and degrade performance. Monitor allocation-heavy workloads to ensure the performance impact is acceptable given the OOM fix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml` at line 39, The env var MIMALLOC_PURGE_DELAY=0 in the worker_env_var forces immediate purging which can hurt allocation-heavy performance; update the YAML to either set a non-zero purge delay (e.g., a small positive value) or make the setting configurable (via an env var or a runtime flag) and document it alongside worker_env_var so OOM mitigation is enabled only when acceptable; reference the literal MIMALLOC_PURGE_DELAY=0 and worker_env_var to locate and change the value and add a short comment explaining the trade-off for future reviewers.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx8_pp4_gen1_dep32_bs4_eplb0_mtp3-Default.yaml (1)
38-39: Consider using explicit YAML multi-line syntax for clarity.While the current implicit line folding is valid YAML, using an explicit folded scalar would make the multi-line intent clearer and more maintainable.
📝 Optional refactor for improved readability
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: >- + TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 + MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yesThe
>-indicator creates a folded scalar (spaces replace newlines) without a trailing newline.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx8_pp4_gen1_dep32_bs4_eplb0_mtp3-Default.yaml` around lines 38 - 39, The worker_env_var value currently uses implicit YAML line folding which is valid but unclear; update the YAML to use an explicit folded scalar for that key (worker_env_var) such as the >- folded indicator and place each env var on its own line under worker_env_var so the multi-line intent is explicit and easier to maintain (preserve the same env variable strings and spacing while switching to the folded scalar form).tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml (1)
41-41: Consider adding a comment to document why this setting is needed.Future maintainers won't understand why
MIMALLOC_PURGE_DELAY=0was added to this configuration without context. A brief inline comment would improve maintainability.📝 Suggested documentation improvement
work_dir: <full_path_to_work_dir> + # MIMALLOC_PURGE_DELAY=0: Immediately return freed memory to OS to prevent OOM during weight loading (see nvbugs/5814350) worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0" server_env_var: "TRTLLM_SERVER_DISABLE_GC=1"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml` at line 41, The environment variable MIMALLOC_PURGE_DELAY=0 in the worker_env_var line is unclear to future readers; add a brief inline comment next to worker_env_var (mentioning MIMALLOC_PURGE_DELAY=0) that states why the immediate mimalloc purge is required for these tests (e.g., to prevent memory growth or GC latency during long-running/integration perf runs) and, if applicable, reference any linked bug/issue or profiling data that motivated the setting.tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep48_bs16_eplb288_mtp3_ccb-DEFAULT.yaml (1)
42-42: LGTM!The
MIMALLOC_PURGE_DELAY=0addition is correct. Note that this file appends the variable at the end while other configs in this PR insert it in the middle—functionally equivalent but inconsistent ordering across files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep48_bs16_eplb288_mtp3_ccb-DEFAULT.yaml` at line 42, The environment variable string assigned to worker_env_var places MIMALLOC_PURGE_DELAY=0 at the end causing inconsistent ordering across config files; modify the worker_env_var value so MIMALLOC_PURGE_DELAY=0 is inserted in the same spot used by the other PR configs (keep the same ordering around TRTLLM_ENABLE_PDL and ENROOT_ALLOW_DEV), e.g., move MIMALLOC_PURGE_DELAY=0 to be adjacent to the other TRTLLM_* entries rather than appended at the very end so the ordering of variables in worker_env_var matches the rest of the PR.tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-NIXL.yaml (1)
41-41: LGTM!The
MIMALLOC_PURGE_DELAY=0addition is correct.Optional: Across this PR, there's formatting inconsistency in
worker_env_varvalues—some files use quoted single-line strings while others use unquoted multi-line YAML scalars. Both are valid, but for maintainability, consider standardizing on one format (preferably quoted single-line for readability).,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-NIXL.yaml` at line 41, This PR has inconsistent formatting for the YAML key worker_env_var across files; standardize all occurrences (key name worker_env_var) to a quoted single-line string format (e.g., "KEY=VAL KEY2=VAL2 ...") for readability and maintainability by updating each config that currently uses unquoted multi-line YAML scalars to the quoted single-line style, ensuring no values or whitespace are altered other than quoting and collapsing into one line.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen8_tep4_bs4_eplb0_mtp0-Default.yaml (1)
38-39: Consider centralizingworker_env_vardefaults.Line 38 and Line 39 now carry another copy of a long shared env string. A YAML anchor/base include for common worker env vars would reduce drift and prevent missed updates in future bulk edits.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen8_tep4_bs4_eplb0_mtp0-Default.yaml` around lines 38 - 39, Centralize the repeated env string by defining a single YAML anchor (e.g. &default_worker_env) that contains the long env vars and then reference it where needed using the anchor (e.g. worker_env_var: *default_worker_env) instead of duplicating the literal; update the block that currently sets worker_env_var to reference that anchor (symbol: worker_env_var) and remove the duplicate string so future updates only change the anchor declaration.tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml (1)
39-39: LGTM, with a minor consistency note.The change is functionally correct. Note that
MIMALLOC_PURGE_DELAY=0is placed at the end of the string here, whereas in some other config files in this PR it's placed afterTRTLLM_WORKER_DISABLE_GC=1. This doesn't affect functionality (environment variable order is irrelevant), but a consistent ordering across all configs would improve maintainability.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml` at line 39, Summary: Environment variable ordering in worker_env_var is inconsistent across configs; standardize ordering to match other files. Update the worker_env_var string for the config that contains "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0" so that MIMALLOC_PURGE_DELAY appears immediately after TRTLLM_WORKER_DISABLE_GC=1 (i.e., reorder to "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") to match the ordering used in other config files; edit the worker_env_var value accordingly in the YAML where the key worker_env_var is defined.tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml (1)
39-40: Multiline plain-scalar style parses correctly; consider explicit quoting for future maintainability.Lines 39–40 use multiline plain-scalar folding, which works as intended. The YAML parses correctly and all environment variables are present. However, quoting the full value as a single-line string would improve readability and make future edits less prone to accidental whitespace changes:
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml` around lines 39 - 40, The worker_env_var value uses a multiline plain-scalar folding; change it to an explicit quoted single-line string to avoid accidental whitespace changes in the future by replacing the folded value for the worker_env_var key with a single quoted string (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so all environment variables remain present but are preserved as one stable scalar.tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_con16_ccb-NIXL.yaml (1)
39-40: Consider consolidating to a single-line quoted string for consistency.For improved maintainability and consistency across configuration files, consider using a single-line quoted string for
worker_env_var.♻️ Suggested consolidation
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_con16_ccb-NIXL.yaml` around lines 39 - 40, The worker_env_var value is split across lines causing inconsistency; consolidate it into a single-line quoted string by joining the two lines into one value and wrapping the entire environment-variable string in quotes (update the worker_env_var entry), e.g., set worker_env_var to a single quoted string containing "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" so parsers and other config files remain consistent.tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-NIXL.yaml (1)
39-40: Consider consolidating to a single-line quoted string for consistency.The
worker_env_varvalue spans multiple lines using an unquoted continuation. While valid YAML, this format is less clear than the single-line quoted approach used in other files (e.g.,Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep32_bs16_eplb0_mtp3_ccb-UCX.yaml).♻️ Suggested consolidation
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-NIXL.yaml` around lines 39 - 40, The YAML key worker_env_var currently uses a multi-line unquoted continuation which is inconsistent with other configs; change it to a single-line quoted string containing all environment variables (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so the value is consolidated and consistent with other files; update the value where worker_env_var is defined to use the quoted single-line form.tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con36_ccb-UCX.yaml (1)
39-40: Consider consolidating to a single-line quoted string for consistency.The
worker_env_varvalue spans multiple lines using an unquoted continuation. For consistency with other configuration files and improved clarity, consider using a single-line quoted string format.♻️ Suggested consolidation
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con36_ccb-UCX.yaml` around lines 39 - 40, The worker_env_var value is split across multiple lines; consolidate it into a single-line quoted string for consistency by replacing the multi-line unquoted value for the worker_env_var key with one quoted line containing all environment assignments (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 ... ENROOT_ALLOW_DEV=yes"), ensuring the key name worker_env_var remains unchanged and spacing between assignments is preserved.tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_con1024_ccb-UCX.yaml (1)
39-40: Consider consolidating to a single-line quoted string for consistency.The
worker_env_varvalue uses a multi-line unquoted format. For consistency with other files in this PR and improved clarity, consider consolidating to a single-line quoted string.♻️ Suggested consolidation
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_con1024_ccb-UCX.yaml` around lines 39 - 40, The worker_env_var value is split across lines; change it to a single-line quoted YAML string to match project convention: replace the multi-line unquoted block for worker_env_var with one quoted string containing all environment assignments separated by spaces (e.g. "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"), ensuring you update the key name worker_env_var and preserve all exact variable names and values.tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-UCX.yaml (1)
39-40: Consider consolidating to a single-line quoted string for consistency.The multi-line unquoted format for
worker_env_varis less maintainable than the single-line quoted approach used in other configuration files.♻️ Suggested consolidation
- worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 - TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes + worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-UCX.yaml` around lines 39 - 40, The YAML key worker_env_var currently uses a multi-line unquoted format; change it to a single-line quoted string containing all environment variables (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") to match other configs and ensure consistent parsing and maintainability.tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con1_ccb-NIXL.yaml (1)
39-40: Add documentation for theMIMALLOC_PURGE_DELAY=0setting to explain the performance trade-off.The
MIMALLOC_PURGE_DELAY=0setting is valid and correctly forces immediate memory purging, which reduces memory usage during weight loading. However, this increases the frequency of memory release syscalls, which will impact benchmark performance.Consider adding a YAML comment above line 39 to document why this non-standard configuration is necessary:
# Set MIMALLOC_PURGE_DELAY=0 to prevent OOM during weight loading (nvbugs/5814350) # Note: This reduces memory footprint but increases syscall overhead worker_env_var: TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con1_ccb-NIXL.yaml` around lines 39 - 40, Add an explanatory YAML comment above the worker_env_var entry that sets MIMALLOC_PURGE_DELAY=0: reference the MIMALLOC_PURGE_DELAY variable and note it is set to 0 to force immediate memory purging to avoid OOM during weight loading (nvbugs/5814350), and also warn that this reduces memory footprint but increases syscall overhead and can degrade benchmark performance; update the comment near the worker_env_var line so future readers see why this non-standard setting is required.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/auto_deploy/nano_v3.yaml`:
- Line 1: The file examples/auto_deploy/nano_v3.yaml currently contains a broken
symlink target "model_registry/configs/nano_v3.yaml"; recreate
examples/auto_deploy/nano_v3.yaml as a proper symlink pointing to
model_registry/configs/nano_v3.yaml (using the correct relative or absolute path
that resolves inside the repo) without changing the target string content or
newline semantics so the symlink resolves correctly and CI's check-symlinks
passes.
In `@examples/auto_deploy/super_v3.yaml`:
- Line 1: The first line of examples/auto_deploy/super_v3.yaml is currently
corrupted and must be a valid symlink target entry pointing to
model_registry/configs/super_v3.yaml; open examples/auto_deploy/super_v3.yaml,
replace the corrupted/empty first line so it contains the exact symlink target
(no extra newlines or trailing whitespace), save as a single-line symlink
target, re-add/commit the corrected symlink, and re-run the check-symlinks CI
check to verify the fix.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp4_gen13_tep4_bs1_eplb0_mtp0-Default.yaml`:
- Around line 38-39: The YAML key worker_env_var currently splits its
environment string across two lines without proper YAML quoting or block scalar;
fix by making the entire env string a single scalar—either place all variables
on one line quoted (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 ...")
or use a YAML block scalar (|) to join lines into one string; update the
worker_env_var entry so the full environment string is parsed correctly by tools
reading this value.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml`:
- Line 39: The config sets MIMALLOC_PURGE_DELAY=0 in worker_env_var which forces
immediate purges and can hurt throughput under high concurrency; change this to
a non-zero safe default (or remove it) and validate by running the existing
high-concurrency benchmark (concurrency_list: '2048') to compare worker
throughput and latency with and without the setting, then either restore the
default or pick a tuned value and add a brief inline comment near worker_env_var
documenting the chosen value and the benchmark results; reference the
worker_env_var key and MIMALLOC_PURGE_DELAY to locate the change.
In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-UCX.yaml`:
- Line 40: Update the worker environment variable string (worker_env_var) to
include MIMALLOC_PURGE_DELAY=0 for the worker processes to mitigate OOM during
weight loading, ensure this exact change is applied consistently across the
remaining affected config files (the 235 files noted), then run the relevant
benchmark/performance tests used for these configs to measure allocation
overhead and latency impact; if you observe regressions, revert to a less
aggressive value or make MIMALLOC_PURGE_DELAY configurable (e.g., only set
during weight loading) so you balance OOM safety with performance.
In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml`:
- Line 41: Add a plain-text comment immediately above or inline next to the
worker_env_var setting that explains why MIMALLOC_PURGE_DELAY=0 is set (e.g., to
force immediate purge to avoid OOM during disaggregated weight loading), notes
that it reduces memory footprint at the potential cost of performance and may
skew memory-related benchmark metrics, and instructs maintainers to treat
results as non-directly comparable to historical baselines or to track them
separately; reference the worker_env_var and the MIMALLOC_PURGE_DELAY=0 token in
the comment so future readers know this setting’s intent and impact.
In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_128k8k_ctx3_pp8_gen1_dep32_bs4_eplb0_mtp0_con4-Default.yaml`:
- Around line 38-39: The worker_env_var YAML scalar is split across lines
incorrectly (key: worker_env_var); merge the two lines into a single valid
scalar by either placing the entire value on one quoted line (e.g. a
single-quoted string) or using proper YAML block scalar syntax (| or >) so all
env vars (TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes) are preserved; update the value for worker_env_var
accordingly so the parser reads the full env var string.
In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_con4_ccb-NIXL.yaml`:
- Around line 39-40: The YAML has a broken multi-line value for the key
worker_env_var (the value is split across two lines: "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0"
and the continuation "TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"); fix it by
either merging the entire env string onto a single line for worker_env_var or
wrap the entire multi-item value in quotes so YAML treats it as one scalar, and
apply the same quoted or single-line pattern used elsewhere in the PR to the
other affected config files.
In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml`:
- Around line 39-40: The YAML key worker_env_var has its value split across two
lines without proper YAML multi-line syntax; fix by making the entire env string
a single scalar (either put the whole value in quotes on one line:
worker_env_var: "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes" or use a block scalar (|) and indent the continuation) so
the full set of environment variables is preserved; update the entry that
currently contains worker_env_var and the broken second line
(TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes) accordingly.
In
`@tests/scripts/perf/disaggregated/wideep_accuracy-deepseek-r1-fp4_gpqa_diamond_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml`:
- Around line 45-46: The YAML key worker_env_var is broken across two lines
causing invalid/missing env settings; fix by making the value a single scalar
(either wrap the entire env string in quotes on one line or use a proper YAML
multi-line scalar like | or >) so the full "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" value is preserved for worker_env_var.
---
Nitpick comments:
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen1_dep8_bs4_eplb0_mtp2-Default.yaml`:
- Line 38: Update test validation for the worker_env_var line to verify that
setting MIMALLOC_PURGE_DELAY=0 prevents OOM during weight loading: run the
disaggregated server weight-load scenario with this config, capture memory usage
and OOM events, and confirm no OOM kills occur; additionally run performance
benchmarks to measure overhead (latency and throughput) introduced by immediate
purging and compare against a baseline, and if overhead is significant, test a
non-zero delay (e.g., MIMALLOC_PURGE_DELAY=100) to find a better trade-off
before finalizing the YAML change.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp8_gen8_tep4_bs4_eplb0_mtp0-Default.yaml`:
- Around line 38-39: Centralize the repeated env string by defining a single
YAML anchor (e.g. &default_worker_env) that contains the long env vars and then
reference it where needed using the anchor (e.g. worker_env_var:
*default_worker_env) instead of duplicating the literal; update the block that
currently sets worker_env_var to reference that anchor (symbol: worker_env_var)
and remove the duplicate string so future updates only change the anchor
declaration.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx2_pp8_gen1_dep32_bs2_eplb0_mtp0-Default.yaml`:
- Around line 38-39: The worker_env_var value is spread across two YAML lines
which is valid but less readable; update the worker_env_var entry so the
environment variables are clearer by using a folded block scalar or by putting
them on one line and/or adding an inline comment explaining
MIMALLOC_PURGE_DELAY=0 (and why
TRTLLM_SERVER_DISABLE_GC/TRTLLM_WORKER_DISABLE_GC remain set), locating the
worker_env_var key in the YAML and replacing the multi-line value with the
folded-scalar version or a single-line value plus comment to improve
readability.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx5_pp4_gen1_dep32_bs2_eplb0_mtp3-Default.yaml`:
- Around line 38-39: The worker_env_var entry needs MIMALLOC_PURGE_DELAY=0 added
so mimalloc immediately returns freed memory to the OS to reduce memory pressure
during weight loading; update the worker_env_var value (the env string under
worker_env_var) to include MIMALLOC_PURGE_DELAY=0 alongside the existing vars
(e.g., TLLM_LOG_LEVEL, TRTLLM_* vars) so that processes use the new alloc
behavior and avoid OOMs during model weight load.
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx8_pp4_gen1_dep32_bs4_eplb0_mtp3-Default.yaml`:
- Around line 38-39: The worker_env_var value currently uses implicit YAML line
folding which is valid but unclear; update the YAML to use an explicit folded
scalar for that key (worker_env_var) such as the >- folded indicator and place
each env var on its own line under worker_env_var so the multi-line intent is
explicit and easier to maintain (preserve the same env variable strings and
spacing while switching to the folded scalar form).
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml`:
- Line 39: Summary: Environment variable ordering in worker_env_var is
inconsistent across configs; standardize ordering to match other files. Update
the worker_env_var string for the config that contains "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes MIMALLOC_PURGE_DELAY=0" so that MIMALLOC_PURGE_DELAY
appears immediately after TRTLLM_WORKER_DISABLE_GC=1 (i.e., reorder to
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") to match the
ordering used in other config files; edit the worker_env_var value accordingly
in the YAML where the key worker_env_var is defined.
In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml`:
- Line 40: Add a short YAML comment above the worker_env_var setting explaining
why MIMALLOC_PURGE_DELAY=0 is set (to immediately return freed memory to the OS
to mitigate OOM during weight loading when TRTLLM_WORKER_DISABLE_GC=1), so
future maintainers understand the tradeoff and rationale; update the entry that
contains worker_env_var and mention MIMALLOC_PURGE_DELAY,
TRTLLM_WORKER_DISABLE_GC, and the OOM/weight-loading context in one concise
comment.
In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml`:
- Line 41: The environment variable MIMALLOC_PURGE_DELAY=0 in the worker_env_var
line is unclear to future readers; add a brief inline comment next to
worker_env_var (mentioning MIMALLOC_PURGE_DELAY=0) that states why the immediate
mimalloc purge is required for these tests (e.g., to prevent memory growth or GC
latency during long-running/integration perf runs) and, if applicable, reference
any linked bug/issue or profiling data that motivated the setting.
In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep48_bs16_eplb288_mtp3_ccb-DEFAULT.yaml`:
- Line 42: The environment variable string assigned to worker_env_var places
MIMALLOC_PURGE_DELAY=0 at the end causing inconsistent ordering across config
files; modify the worker_env_var value so MIMALLOC_PURGE_DELAY=0 is inserted in
the same spot used by the other PR configs (keep the same ordering around
TRTLLM_ENABLE_PDL and ENROOT_ALLOW_DEV), e.g., move MIMALLOC_PURGE_DELAY=0 to be
adjacent to the other TRTLLM_* entries rather than appended at the very end so
the ordering of variables in worker_env_var matches the rest of the PR.
In
`@tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-v32-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-NIXL.yaml`:
- Line 41: This PR has inconsistent formatting for the YAML key worker_env_var
across files; standardize all occurrences (key name worker_env_var) to a quoted
single-line string format (e.g., "KEY=VAL KEY2=VAL2 ...") for readability and
maintainability by updating each config that currently uses unquoted multi-line
YAML scalars to the quoted single-line style, ensuring no values or whitespace
are altered other than quoting and collapsing into one line.
In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_con8_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var multiline environment string is hard to
read in its current implicit form; replace it with an explicit YAML multiline
scalar (e.g., use folded ">" or literal "|" or a quoted single-line string) so
each env var is clearer and preserves spacing—update the value for the key
worker_env_var to use an explicit multiline scalar or a quoted string containing
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes" to improve
readability while keeping the same contents.
- Line 39: The env var MIMALLOC_PURGE_DELAY=0 in the worker_env_var forces
immediate purging which can hurt allocation-heavy performance; update the YAML
to either set a non-zero purge delay (e.g., a small positive value) or make the
setting configurable (via an env var or a runtime flag) and document it
alongside worker_env_var so OOM mitigation is enabled only when acceptable;
reference the literal MIMALLOC_PURGE_DELAY=0 and worker_env_var to locate and
change the value and add a short comment explaining the trade-off for future
reviewers.
In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_con16_ccb-NIXL.yaml`:
- Around line 39-40: The worker_env_var value is split across lines causing
inconsistency; consolidate it into a single-line quoted string by joining the
two lines into one value and wrapping the entire environment-variable string in
quotes (update the worker_env_var entry), e.g., set worker_env_var to a single
quoted string containing "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes" so parsers and other config files remain consistent.
In
`@tests/scripts/perf/disaggregated/deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var value uses a multiline plain-scalar
folding; change it to an explicit quoted single-line string to avoid accidental
whitespace changes in the future by replacing the folded value for the
worker_env_var key with a single quoted string (e.g., "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so all environment variables remain
present but are preserved as one stable scalar.
In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_con1024_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var value is split across lines; change it
to a single-line quoted YAML string to match project convention: replace the
multi-line unquoted block for worker_env_var with one quoted string containing
all environment assignments separated by spaces (e.g. "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0
TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes"), ensuring you update the key name
worker_env_var and preserve all exact variable names and values.
In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con1_ccb-NIXL.yaml`:
- Around line 39-40: Add an explanatory YAML comment above the worker_env_var
entry that sets MIMALLOC_PURGE_DELAY=0: reference the MIMALLOC_PURGE_DELAY
variable and note it is set to 0 to force immediate memory purging to avoid OOM
during weight loading (nvbugs/5814350), and also warn that this reduces memory
footprint but increases syscall overhead and can degrade benchmark performance;
update the comment near the worker_env_var line so future readers see why this
non-standard setting is required.
In
`@tests/scripts/perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con36_ccb-UCX.yaml`:
- Around line 39-40: The worker_env_var value is split across multiple lines;
consolidate it into a single-line quoted string for consistency by replacing the
multi-line unquoted value for the worker_env_var key with one quoted line
containing all environment assignments (e.g., "TLLM_LOG_LEVEL=INFO
TRTLLM_SERVER_DISABLE_GC=1 ... ENROOT_ALLOW_DEV=yes"), ensuring the key name
worker_env_var remains unchanged and spacing between assignments is preserved.
In
`@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-NIXL.yaml`:
- Around line 39-40: The YAML key worker_env_var currently uses a multi-line
unquoted continuation which is inconsistent with other configs; change it to a
single-line quoted string containing all environment variables (e.g.,
"TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1 TRTLLM_WORKER_DISABLE_GC=1
MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1 ENROOT_ALLOW_DEV=yes") so the value
is consolidated and consistent with other files; update the value where
worker_env_var is defined to use the quoted single-line form.
In
`@tests/scripts/perf/disaggregated/wideep_deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_con1024_ccb-UCX.yaml`:
- Around line 39-40: The YAML key worker_env_var currently uses a multi-line
unquoted format; change it to a single-line quoted string containing all
environment variables (e.g., "TLLM_LOG_LEVEL=INFO TRTLLM_SERVER_DISABLE_GC=1
TRTLLM_WORKER_DISABLE_GC=1 MIMALLOC_PURGE_DELAY=0 TRTLLM_ENABLE_PDL=1
ENROOT_ALLOW_DEV=yes") to match other configs and ensure consistent parsing and
maintainability.
...t_configs/disagg/perf/deepseek-r1-fp4_128k8k_ctx1_pp4_gen13_tep4_bs1_eplb0_mtp0-Default.yaml
Outdated
Show resolved
Hide resolved
.../test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml
Show resolved
Hide resolved
...test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx6_gen1_dep16_bs64_eplb288_mtp0_ccb-UCX.yaml
Show resolved
Hide resolved
...t_configs/wideep/perf/deepseek-v32-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml
Show resolved
Hide resolved
...rf/disaggregated/deepseek-r1-fp4_128k8k_ctx3_pp8_gen1_dep32_bs4_eplb0_mtp0_con4-Default.yaml
Outdated
Show resolved
Hide resolved
...ts/perf/disaggregated/deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp3_con4_ccb-NIXL.yaml
Outdated
Show resolved
Hide resolved
...perf/disaggregated/Qwen3-235B-A22B-FP8_1k1k_ctx1_gen1_tep8_bs32_eplb0_mtp0_con4_ccb-UCX.yaml
Outdated
Show resolved
Hide resolved
..._accuracy-deepseek-r1-fp4_gpqa_diamond_1k1k_ctx2_gen1_dep16_bs128_eplb288_mtp3_ccb-NIXL.yaml
Outdated
Show resolved
Hide resolved
This reverts commit 3627e26. Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
b7c9197 to
e7fc563
Compare
|
/bot skip --comment "Not cover in CI pipelines" |
|
PR_Github #39681 [ skip ] triggered by Bot. Commit: |
|
PR_Github #39681 [ skip ] completed with state |
fredricz-20070104
left a comment
There was a problem hiding this comment.
This one resolves the OOM bug. Approved.
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.