Model Name/ModeL size for verify configs #772

srivatsankrishnan · 2026-01-08T22:22:23Z

Summary

Currently we don't do strict pydantic validation of all the fields by design to allow users to add/remove new flags as and when they are available. This is particularly needed feature for fast moving frameworks such as Nemo/AIDyanno and M-bridge now.

However, Model familiy and Model name is required and user needs to provide this field. Currently the additional check for passing empty strings (or removing from toml file as done by this RM) was happening during run time causing a stack trace. This PR ensure this is caught during parsing using verify-configs or using run or dry-run model.

Reference: RM 4829207

Test Plan

CI/CD

Additional Notes

coderabbitai · 2026-01-08T22:22:29Z

Warning

Rate limit exceeded

@srivatsankrishnan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 1 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c005e41 and e3c4f6f.

📒 Files selected for processing (1)

tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py

📝 Walkthrough

Walkthrough

Enforces non-empty model_name and model_size in MegatronBridgeCmdArgs via field validators and min_length=1, and updates tests to supply these fields during model validation.

Changes

Cohort / File(s)	Summary
Model arg validation `src/cloudai/workloads/megatron_bridge/megatron_bridge.py`	Added `ValidationInfo` import, changed `model_name` and `model_size` to `Field(min_length=1)`, and added `@field_validator("model_name", "model_size", mode="after")` `validate_model_fields` to trim and reject empty values.
Test updates `tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py`	Updated test validation inputs to include `"model_name": "qwen3"` and `"model_size": "30b_a3b"` alongside existing `hf_token` when calling `model_validate`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I nudge the fields with care,
No empty names shall hide in there.
I trim the spaces, make them right,
Tests now pass with names in sight,
A hops-and-validate delight! 🥕

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title mentions 'Model Name/Model size' and 'verify configs', which directly relate to the PR's primary objective of adding validation for required model fields during configuration parsing.
Description check	✅ Passed	The description clearly explains the rationale for the changes: required model fields need validation during parsing to catch errors early instead of at runtime, with specific reference to the RM issue.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @src/cloudai/workloads/megatron_bridge/megatron_bridge.py:
- Around line 43-44: The PR made model_name and model_size required with
Field(min_length=1) but still allows whitespace-only values; add a field
validator like the existing hf_token validator to strip and reject
empty/whitespace-only strings for "model_name" and "model_size" (use
@field_validator("model_name","model_size", mode="after") in the same Pydantic
model, name it e.g. validate_model_fields, strip the value, raise
ValueError(f"cmd_args.{info.field_name} cannot be empty.") if the result is
empty, and return the stripped string) so parsing will fail on whitespace-only
input.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 507c141 and 1159eb7.

📒 Files selected for processing (2)

src/cloudai/workloads/megatron_bridge/megatron_bridge.py
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2026-01-05T22:23:56.998Z

Learnt from: srivatsankrishnan
Repo: NVIDIA/cloudai PR: 767
File: conf/experimental/megatron_bridge/test/b200/megatron_bridge_qwen_30b.toml:38-38
Timestamp: 2026-01-05T22:23:56.998Z
Learning: For the Megatron Bridge workload, the hf_token field in test TOML configurations is intentionally left empty (hf_token = "") to fail validation if not explicitly set by the user. CloudAI does not distribute secret keys, so users must provide their own Hugging Face token, and the validation failure is by design to enforce this.

Applied to files:

src/cloudai/workloads/megatron_bridge/megatron_bridge.py
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py

📚 Learning: 2025-12-16T19:47:41.994Z

Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 754
File: src/cloudai/_core/registry.py:226-234
Timestamp: 2025-12-16T19:47:41.994Z
Learning: In this repository, prefer expressing behavioral documentation through tests rather than docstrings. Tests act as living, verified documentation. Reserve docstrings for interfaces or high-level descriptions, and avoid duplicating behavior that is already covered by tests.

Applied to files:

src/cloudai/workloads/megatron_bridge/megatron_bridge.py
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py

🧬 Code graph analysis (1)

tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py (1)

src/cloudai/workloads/megatron_bridge/megatron_bridge.py (1)

MegatronBridgeCmdArgs (26-89)

🔇 Additional comments (1)

tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py (1)

86-88: LGTM! Test correctly updated for new required fields.

The test now provides model_name and model_size to satisfy the new validation requirements while maintaining its focus on verifying that empty hf_token is rejected. This change is necessary and consistent with other test methods in the file that already provide these fields.

src/cloudai/workloads/megatron_bridge/megatron_bridge.py

greptile-apps

Greptile Overview

Greptile Summary

Added Pydantic validation for model_name and model_size fields to catch empty/whitespace-only values during config parsing instead of at runtime.

Added min_length=1 constraint to model_name and model_size Field definitions
Implemented validate_model_fields custom validator that strips whitespace and rejects empty strings
Updated test to provide required fields, but missing dedicated test cases for the new validation logic
Validation follows the same pattern as existing hf_token validator (strips and returns cleaned value)
Successfully moves error detection from runtime to parse-time as intended

Confidence Score: 4/5

This PR is safe to merge with minimal risk
The validation logic is straightforward and follows existing patterns in the codebase. The changes successfully move validation from runtime to parse-time, improving error detection. Minor deduction for missing test coverage of the new validation logic.
Consider adding test cases in tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py to verify the new validation behavior

Important Files Changed

File Analysis

Filename	Score	Overview
src/cloudai/workloads/megatron_bridge/megatron_bridge.py	4/5	Added `min_length=1` constraint and custom validator for `model_name` and `model_size` to ensure they cannot be empty or whitespace-only, catching configuration errors earlier during parsing
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py	3/5	Updated existing test to provide required `model_name` and `model_size` fields, but missing test cases for the new validation logic

Sequence Diagram

sequenceDiagram
    participant User
    participant ConfigParser
    participant Pydantic
    participant MegatronBridgeCmdArgs
    participant FieldValidator

    User->>ConfigParser: Load TOML config
    ConfigParser->>Pydantic: model_validate(config_data)
    Pydantic->>MegatronBridgeCmdArgs: Create instance
    
    Note over Pydantic,MegatronBridgeCmdArgs: Field validation phase
    Pydantic->>MegatronBridgeCmdArgs: Check min_length=1 on model_name
    alt Empty string
        MegatronBridgeCmdArgs-->>Pydantic: ValidationError
        Pydantic-->>User: Error caught during parsing
    else Non-empty string
        MegatronBridgeCmdArgs->>FieldValidator: validate_model_fields(model_name)
        FieldValidator->>FieldValidator: Strip whitespace
        alt Whitespace-only
            FieldValidator-->>Pydantic: ValueError
            Pydantic-->>User: Error caught during parsing
        else Valid value
            FieldValidator->>MegatronBridgeCmdArgs: Return stripped value
        end
    end
    
    Pydantic->>MegatronBridgeCmdArgs: Check min_length=1 on model_size
    MegatronBridgeCmdArgs->>FieldValidator: validate_model_fields(model_size)
    FieldValidator->>MegatronBridgeCmdArgs: Return stripped value
    
    MegatronBridgeCmdArgs-->>Pydantic: Validated instance
    Pydantic-->>User: Success (or ValidationError)

tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py

greptile-apps

Greptile Overview

Greptile Summary

This PR adds Pydantic validation for model_name and model_size fields to catch empty/whitespace values at parse time rather than runtime.

Key Changes:

Changed field definitions from Field(default="") to Field(min_length=1) making them required
Added custom validator that strips whitespace and rejects empty strings after stripping
Updated test to include required fields when validating hf_token
Added comprehensive tests for empty string and whitespace-only validation

Impact:

Validation errors now occur during config parsing (verify-configs, run, dry-run) instead of during command generation
Provides better user experience by catching errors earlier
Values with leading/trailing whitespace are automatically stripped (consistent with hf_token validator)
Runtime checks in slurm_command_gen_strategy.py (lines 237-240) are now unreachable but harmless

Confidence Score: 4/5

This PR is safe to merge with minimal risk - it moves validation earlier in the pipeline and improves error handling
Score reflects thorough testing, clear implementation that achieves stated goals, and only minor stylistic suggestions around error message consistency. No functional bugs found.
No files require special attention - both changed files have appropriate test coverage and follow existing patterns

Important Files Changed

File Analysis

Filename	Score	Overview
src/cloudai/workloads/megatron_bridge/megatron_bridge.py	4/5	Added Pydantic validation for model_name and model_size fields to ensure they are non-empty strings, moving validation from runtime to parse time. Changes are sound and well-tested.
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py	5/5	Added comprehensive test coverage for the new validation logic, including tests for empty strings and whitespace-only values. Tests correctly verify both Pydantic's min_length validation and the custom validator.

Sequence Diagram

sequenceDiagram
    participant User
    participant TOML as TOML Config File
    participant Pydantic as Pydantic Validator
    participant CmdArgs as MegatronBridgeCmdArgs
    participant MinLength as min_length Validator
    participant Custom as Custom Validator
    participant CmdGen as Command Generator

    User->>TOML: Define model_name & model_size
    TOML->>Pydantic: Parse config (verify-configs/run/dry-run)
    
    alt Field is missing
        Pydantic-->>User: Error: Field required
    else Field is empty string ""
        Pydantic->>MinLength: Check min_length=1
        MinLength-->>User: Error: String should have at least 1 character
    else Field is whitespace only "   "
        Pydantic->>MinLength: Check min_length=1
        MinLength->>Custom: Pass (length >= 1)
        Custom->>Custom: Strip whitespace
        Custom->>Custom: Check if empty
        Custom-->>User: Error: cmd_args.model_name cannot be empty
    else Field is valid "  qwen3  "
        Pydantic->>MinLength: Check min_length=1
        MinLength->>Custom: Pass
        Custom->>Custom: Strip to "qwen3"
        Custom->>CmdArgs: Return stripped value
        CmdArgs->>CmdGen: Generate command
        Note over CmdGen: Runtime checks (lines 237-240)<br/>now unreachable
    end

greptile-apps · 2026-01-08T22:52:00Z

src/cloudai/workloads/megatron_bridge/megatron_bridge.py

+    model_name: str = Field(min_length=1)
+    model_size: str = Field(min_length=1)


The min_length=1 constraint combined with the custom validator that strips whitespace creates inconsistent error messages:

Empty string "": Pydantic's generic "String should have at least 1 character"

Whitespace " ": Custom message "cmd_args.model_name cannot be empty."

Consider removing min_length=1 to provide consistent error messages for all invalid inputs. The custom validator alone is sufficient to catch both cases.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

fix verify-configs

1159eb7

srivatsankrishnan marked this pull request as ready for review January 8, 2026 22:23

srivatsankrishnan requested review from alexmanle, amaslenn and jeffnvidia as code owners January 8, 2026 22:23

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

src/cloudai/workloads/megatron_bridge/megatron_bridge.py Show resolved Hide resolved

ensure no empty whitespace

c005e41

greptile-apps bot reviewed Jan 8, 2026

View reviewed changes

tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py Show resolved Hide resolved

greptile suggested unit test

e3c4f6f

alexmanle approved these changes Jan 8, 2026

View reviewed changes

srivatsankrishnan merged commit 69cba47 into NVIDIA:main Jan 8, 2026
4 checks passed

greptile-apps bot reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Name/ModeL size for verify configs #772

Model Name/ModeL size for verify configs #772

Uh oh!

srivatsankrishnan commented Jan 8, 2026

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		model_name: str = Field(min_length=1)
		model_size: str = Field(min_length=1)

Model Name/ModeL size for verify configs #772

Model Name/ModeL size for verify configs #772

Uh oh!

Conversation

srivatsankrishnan commented Jan 8, 2026

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 8, 2026 •

edited

Loading