AIConfigurator - Release 0.7.0

Summary

AIConfigurator 0.7.0 builds on the multi-backend foundation of 0.6.0 with a stronger focus on CLI and API ergonomics, new models and hardware (Nemotron, Mamba2, GB200 NVL72), generator and validator tooling, and operational robustness. This release unifies model input around model_path, adds naive config generation and generator benchmark mode, introduces a generator validator to compare configs with engine APIs, and supports auto backend (homogeneous), AIPerf benchmark command generation, and Dynamo planner profiler integration. Support matrix and CLI gain huggingface_id/architecture usage, support command and APIs, and --system all / --backend all. New collectors and data cover TensorRT-LLM WideEP, Mamba2, vLLM 0.14.0, and SGLang 0.5.8; Nemotron and EPLB support is expanded. Numerous fixes improve generator output, Kubernetes templates, TRT-LLM/SGLang constraints, validator usage, and error handling. Documentation and CI are updated for end-to-end workflow and notebook validation.

Key Highlights

Unified model input and CLI

model_path as unified input: Model input is unified on model_path (#289); support matrix uses huggingface_id and huggingface architecture (#275).
Naive config generation: CLI generate command can produce naive configs (#271); CLI APIs enable Python callers (#293).
Support command and APIs: New support command and cli_support API (#294), with --system all and --backend all (#439); Support for --backend any (homogeneous) (#331), later renamed to auto (#346).

Generator and validator

Benchmark mode and validator: Generator gains benchmark mode via rule plugins (#290) and a generator validator to compare generator configs with engine APIs (#329).
AIPerf and Dynamo integration: Generator can emit AIPerf benchmark commands (#357); hook to Dynamo planner profiler’s config gen (#419); picking modes and standalone picking API (#421); real-GPU enumeration exposed (#420).
Backend and version flexibility: generator supports both Dynamo version and backend version (#333).

New models and hardware

Nemotron: Nemotron support (#273), Nemotron v3 super model (#325), and Mamba2 ops in Nemotron v3 simulation (#342).
Mamba2 and WideEP: Mamba2 performance data collectors (#297); TensorRT-LLM WideEP All-to-All collector (#313); Trtllm wideep pipeline (#320); Trtllm wideep MoE collector (#335).
GB200 NVL72 and SGLang EPLB: GB200 NVL72 all2all data (#337); EPLB support in SGLang (#343).

Backend and collector updates

vLLM 0.14.0: vLLM collector updated to 0.14.0 with H100 data (#310) and B200 data (#525).
SGLang 0.5.8: SGLang performance collectors updated for v0.5.8 (#323).
FP8 and quantization: FP8 static_quant_mode / lowbit_input with compute_scale & scale_matrix modeling (#261); infer quantization from model info (#338).

Features & Enhancements

CLI and APIs

Support matrix: Use huggingface_id and huggingface architecture in support_matrix.csv (#275).
model_path: Use model_path as the unified model input argument (#289).
Generate command: CLI generate command to generate naive configs (#271).
CLI APIs: Create CLI APIs to support Python calls (#293).
Support command: CLI support command and cli_support API (#294).
Support all: Add --system all and --backend all for cli-support (#439).
Top-N: Add configurable top_n parameter for result limiting (#315).

Generator and validator

Benchmark mode: Generator add benchmark mode by setting rule plugins (#290).
Generator validator: Add generator validator to compare generator configs with engine APIs (#329).
Backend any/auto: Add support for --backend any (homogeneous) (#331); rename to 'auto' backend (#346).
Dynamo and backend version: Generator supports both Dynamo version and backend version (#333).
AIPerf command: Enhance generator to generate AIPerf benchmark command (#357).
Dynamo planner profiler: Hook to Dynamo planner profiler's config gen (#419).
Picking API: Add picking modes and expose standalone picking API (#421).
Real-GPU enumeration: Expose real-gpu enumeration logic (#420).
Common templates: Extract common generator templates for reuse (#314).

Kubernetes and deployment

k8s_hf_home: Add k8s_hf_home option (#303).
Customized system path: Customized system path support (#321).

Models and architectures

Nemotron: Nemotron support (#273); support Nemotron v3 super model (#325); add Mamba2 ops to Nemotron v3 simulation (#342).
Mamba2: Add performance data collectors for Mamba2 (#297).
Quantization: Infer quantization from model info (#338).
FP8 modeling: Add fp8 static_quant_mode/lowbit_input with compute_scale & scale_matrix modeling (#261).

Collectors and data

vLLM: Update vLLM collector to 0.14.0 and add H100 data (#310).
SGLang: Update SGLang performance collectors for v0.5.8 (#323); support EPLB in SGLang (#343).
TensorRT-LLM WideEP: TensorRT-LLM WideEP All-to-All collector (#313); Trtllm wideep pipeline (#320); Trtllm wideep MoE collector (#335).
GB200 NVL72: Add GB200 NVL72 all2all data (#337).
Qwen3-32B NVFP4 with vLLM: You can configure and deploy Qwen3-32B with NVFP4 quantization using the vLLM backend (e.g., --model-path nvidia/Qwen3-32B-NVFP4 with --backend vllm). The same CLI workflow applies across backends—only the generated deployment artifacts (config files, CLI args, K8s manifests) differ by backend (#546).

Support matrix and UX

PR description: Enhance PR description generation in support matrix (#460).
Logging: Hide/deduplicate spammy logs (#494).

Bug Fixes

Generator and config

Graceful exit and doc: Update generator doc and allow graceful exit of CLI when lacking database data (#277).
Dynamo 0.8.0: Align generator run script with Dynamo 0.8.0 (#278).
NIXL default: Use NIXL as default disagg transfer backend for SGLang 0.5.6.post2; allow user to set disagg transfer backend in CLI (#279).
NIXL KV backend: Add NIXL as default generator SGLang KV backend (#281).
model_name mapping: Map internal model_name to huggingface_architecture (#274).
MODEL_PATH in templates: Use MODEL_PATH to replace MODEL in vLLM template to align with TRT-LLM/SGLang (#282).
Output path: Add hardware and framework into config output path (#284).
Revert template refactor: Revert "extract common generator templates for reuse" to fix regressions (#340).
Validator: Fix validator service key mismatch (#370); make --backend required in generator validator (#443); correct validator invocation syntax in generator docs (#444).
bench_run.sh: Add shebang and error handling to bench_run.sh template (#446).
TRT-LLM templates: Correct TRT-LLM version-specific engine templates (build_config nesting + missing backend) (#447).
Artifact dir: Add --artifact-dir to benchmark templates to prevent Permission denied (#448); use correct artifact-dir for AIC-generated AIPerf commands (#474).
AIPerf concurrencies: Avoid duplicated concurrencies and sort the list in AIPerf command (#451).
TRT-LLM alignment: Align max_num_tokens and cache_transceiver_max_tokens_in_buffer to tokens_per_block in benchmark TRT-LLM rule (#459).
build_config (old TRT-LLM): Fix build_config for old TRT-LLM version (#465).
Naive config DGD name: Naive config generator produces RFC 1123-invalid DGD name 'None-agg' — fixed (#496).

Kubernetes and backend templates

k8s_model_cache: vLLM/SGLang K8s template missing k8s_model_cache param (#280).
PVC: Move PVC support from frontend to workers for SGLang backend (#291).
vLLM cudagraph: vLLM --cudagraph-capture-sizes causes startup failure in k8s_deploy.yaml (#395).

TRT-LLM and SGLang constraints

max_num_tokens % tokens_per_block: Ensure max_num_tokens % tokens_per_block == 0 in TRT-LLM (#307).
SGLang moe_dense_tp_size: SGLang moe_dense_tp_size only supports 1 or None (#308).
Quantization block sizes: Ensure quantization block sizes can be divided by MoE intermediate size per GPU (#311).
cache_transceiver_max_tokens_in_buffer: Ensure cache_transceiver_max_tokens_in_buffer % block_size == 0 (#424).

Collectors and models

MoE collector TRT-LLM 1.3.0: Update MoE collector to support TRT-LLM 1.3.0 (#326).
GPT-OSS inter_size: Correctly obtain inter_size for GPT-OSS; use w4a16_mxfp4 as default MoE quant mode (#319).
nemotron_nas block config: Enable correct parsing of block configs for nemotron_nas (#318).
Mamba2 collector: Add missing columns to Mamba2 collector (#341).
SGLang disaggregated: Filter out different TP sizes for SGLang non-wideep disaggregated serving (#344).
agg_decode: Remove duplicate agg_decode max_batch_size (#368).
fp16 KV cache: Fix fp16 KV cache dtype (#369).

Errors and edge cases

dynamoNamespace: Remove dynamoNamespace field (nvbugs/5830661) (#296).
Deployment guide: Update guide on Dynamo deployment (nvbugs/5833205) (#295).
SGLang L40S: Handle SGLang L40S missing data gracefully (#301).
System path: Fix system path error handling (#371).
Invalid backend: Catch invalid backend for TRT-LLM backend (#374).
Artifacts directory: Generated artifacts directory structure in dynamo_deployment_guide.md had incorrect extra subdirectory (#396).
head_node_ip: Example command in dynamo_deployment_guide.md failed due to invalid --head_node_ip (#397).
agg_pareto IndexError: Fix IndexError when all parallel configurations are skipped without exceptions (#398).
collect_config_paths: Uncaught exception no longer leaks raw traceback to user (#399).
Support matrix OOM: Resolve OOM issue during support matrix testing (#425).
Gradio warnings: Remove "gradio not installed" warnings (#428).
Support check: Make support check case-insensitive; rephrase model not found (#438).
GPU memory: Improve error messages when model doesn't fit in GPU memory (#445).
Disagg OOM: OOM not raised in disagg get_worker_candidates() causing IndexError — fixed (#461).

Documentation

End-to-end workflow: Add end-to-end workflow, document benchmark artifacts, and fix webapp visibility (#440).

CI/CD and testing

validate_database.ipynb: Add test for validate_database.ipynb (#268).

Other changes

Support matrix: Automated support matrix updates (#254, #298); update support matrix in README (#324).

New contributors

Thanks to everyone who contributed to this release:

@github-actions[bot] made their first contribution in #254
@yingxuanl-dot made their first contribution in #261

Full changelog

Full Changelog: v0.6.0...v0.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIConfigurator Release v0.7.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

AIConfigurator - Release 0.7.0

Summary

Key Highlights

Unified model input and CLI

Generator and validator

New models and hardware

Backend and collector updates

Features & Enhancements

CLI and APIs

Generator and validator

Kubernetes and deployment

Models and architectures

Collectors and data

Support matrix and UX

Bug Fixes

Generator and config

Kubernetes and backend templates

TRT-LLM and SGLang constraints

Collectors and models

Errors and edge cases

Documentation

CI/CD and testing

Other changes

New contributors

Full changelog

Contributors

Uh oh!