Skip to content

Conversation

@srivatsankrishnan
Copy link
Contributor

Summary

Currently we don't do strict pydantic validation of all the fields by design to allow users to add/remove new flags as and when they are available. This is particularly needed feature for fast moving frameworks such as Nemo/AIDyanno and M-bridge now.

However, Model familiy and Model name is required and user needs to provide this field. Currently the additional check for passing empty strings (or removing from toml file as done by this RM) was happening during run time causing a stack trace. This PR ensure this is caught during parsing using verify-configs or using run or dry-run model.

Reference: RM 4829207

Test Plan

CI/CD

Additional Notes

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 8, 2026

Warning

Rate limit exceeded

@srivatsankrishnan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 1 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c005e41 and e3c4f6f.

📒 Files selected for processing (1)
  • tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py
📝 Walkthrough

Walkthrough

Enforces non-empty model_name and model_size in MegatronBridgeCmdArgs via field validators and min_length=1, and updates tests to supply these fields during model validation.

Changes

Cohort / File(s) Summary
Model arg validation
src/cloudai/workloads/megatron_bridge/megatron_bridge.py
Added ValidationInfo import, changed model_name and model_size to Field(min_length=1), and added @field_validator("model_name", "model_size", mode="after") validate_model_fields to trim and reject empty values.
Test updates
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py
Updated test validation inputs to include "model_name": "qwen3" and "model_size": "30b_a3b" alongside existing hf_token when calling model_validate.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I nudge the fields with care,
No empty names shall hide in there.
I trim the spaces, make them right,
Tests now pass with names in sight,
A hops-and-validate delight! 🥕

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title mentions 'Model Name/Model size' and 'verify configs', which directly relate to the PR's primary objective of adding validation for required model fields during configuration parsing.
Description check ✅ Passed The description clearly explains the rationale for the changes: required model fields need validation during parsing to catch errors early instead of at runtime, with specific reference to the RM issue.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@srivatsankrishnan srivatsankrishnan marked this pull request as ready for review January 8, 2026 22:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/cloudai/workloads/megatron_bridge/megatron_bridge.py:
- Around line 43-44: The PR made model_name and model_size required with
Field(min_length=1) but still allows whitespace-only values; add a field
validator like the existing hf_token validator to strip and reject
empty/whitespace-only strings for "model_name" and "model_size" (use
@field_validator("model_name","model_size", mode="after") in the same Pydantic
model, name it e.g. validate_model_fields, strip the value, raise
ValueError(f"cmd_args.{info.field_name} cannot be empty.") if the result is
empty, and return the stripped string) so parsing will fail on whitespace-only
input.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 507c141 and 1159eb7.

📒 Files selected for processing (2)
  • src/cloudai/workloads/megatron_bridge/megatron_bridge.py
  • tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-05T22:23:56.998Z
Learnt from: srivatsankrishnan
Repo: NVIDIA/cloudai PR: 767
File: conf/experimental/megatron_bridge/test/b200/megatron_bridge_qwen_30b.toml:38-38
Timestamp: 2026-01-05T22:23:56.998Z
Learning: For the Megatron Bridge workload, the hf_token field in test TOML configurations is intentionally left empty (hf_token = "") to fail validation if not explicitly set by the user. CloudAI does not distribute secret keys, so users must provide their own Hugging Face token, and the validation failure is by design to enforce this.

Applied to files:

  • src/cloudai/workloads/megatron_bridge/megatron_bridge.py
  • tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py
📚 Learning: 2025-12-16T19:47:41.994Z
Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 754
File: src/cloudai/_core/registry.py:226-234
Timestamp: 2025-12-16T19:47:41.994Z
Learning: In this repository, prefer expressing behavioral documentation through tests rather than docstrings. Tests act as living, verified documentation. Reserve docstrings for interfaces or high-level descriptions, and avoid duplicating behavior that is already covered by tests.

Applied to files:

  • src/cloudai/workloads/megatron_bridge/megatron_bridge.py
  • tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py
🧬 Code graph analysis (1)
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py (1)
src/cloudai/workloads/megatron_bridge/megatron_bridge.py (1)
  • MegatronBridgeCmdArgs (26-89)
🔇 Additional comments (1)
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py (1)

86-88: LGTM! Test correctly updated for new required fields.

The test now provides model_name and model_size to satisfy the new validation requirements while maintaining its focus on verifying that empty hf_token is rejected. This change is necessary and consistent with other test methods in the file that already provide these fields.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Added Pydantic validation for model_name and model_size fields to catch empty/whitespace-only values during config parsing instead of at runtime.

  • Added min_length=1 constraint to model_name and model_size Field definitions
  • Implemented validate_model_fields custom validator that strips whitespace and rejects empty strings
  • Updated test to provide required fields, but missing dedicated test cases for the new validation logic
  • Validation follows the same pattern as existing hf_token validator (strips and returns cleaned value)
  • Successfully moves error detection from runtime to parse-time as intended

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk
  • The validation logic is straightforward and follows existing patterns in the codebase. The changes successfully move validation from runtime to parse-time, improving error detection. Minor deduction for missing test coverage of the new validation logic.
  • Consider adding test cases in tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py to verify the new validation behavior

Important Files Changed

File Analysis

Filename Score Overview
src/cloudai/workloads/megatron_bridge/megatron_bridge.py 4/5 Added min_length=1 constraint and custom validator for model_name and model_size to ensure they cannot be empty or whitespace-only, catching configuration errors earlier during parsing
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py 3/5 Updated existing test to provide required model_name and model_size fields, but missing test cases for the new validation logic

Sequence Diagram

sequenceDiagram
    participant User
    participant ConfigParser
    participant Pydantic
    participant MegatronBridgeCmdArgs
    participant FieldValidator

    User->>ConfigParser: Load TOML config
    ConfigParser->>Pydantic: model_validate(config_data)
    Pydantic->>MegatronBridgeCmdArgs: Create instance
    
    Note over Pydantic,MegatronBridgeCmdArgs: Field validation phase
    Pydantic->>MegatronBridgeCmdArgs: Check min_length=1 on model_name
    alt Empty string
        MegatronBridgeCmdArgs-->>Pydantic: ValidationError
        Pydantic-->>User: Error caught during parsing
    else Non-empty string
        MegatronBridgeCmdArgs->>FieldValidator: validate_model_fields(model_name)
        FieldValidator->>FieldValidator: Strip whitespace
        alt Whitespace-only
            FieldValidator-->>Pydantic: ValueError
            Pydantic-->>User: Error caught during parsing
        else Valid value
            FieldValidator->>MegatronBridgeCmdArgs: Return stripped value
        end
    end
    
    Pydantic->>MegatronBridgeCmdArgs: Check min_length=1 on model_size
    MegatronBridgeCmdArgs->>FieldValidator: validate_model_fields(model_size)
    FieldValidator->>MegatronBridgeCmdArgs: Return stripped value
    
    MegatronBridgeCmdArgs-->>Pydantic: Validated instance
    Pydantic-->>User: Success (or ValidationError)
Loading

@srivatsankrishnan srivatsankrishnan merged commit 69cba47 into NVIDIA:main Jan 8, 2026
4 checks passed
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds Pydantic validation for model_name and model_size fields to catch empty/whitespace values at parse time rather than runtime.

Key Changes:

  • Changed field definitions from Field(default="") to Field(min_length=1) making them required
  • Added custom validator that strips whitespace and rejects empty strings after stripping
  • Updated test to include required fields when validating hf_token
  • Added comprehensive tests for empty string and whitespace-only validation

Impact:

  • Validation errors now occur during config parsing (verify-configs, run, dry-run) instead of during command generation
  • Provides better user experience by catching errors earlier
  • Values with leading/trailing whitespace are automatically stripped (consistent with hf_token validator)
  • Runtime checks in slurm_command_gen_strategy.py (lines 237-240) are now unreachable but harmless

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - it moves validation earlier in the pipeline and improves error handling
  • Score reflects thorough testing, clear implementation that achieves stated goals, and only minor stylistic suggestions around error message consistency. No functional bugs found.
  • No files require special attention - both changed files have appropriate test coverage and follow existing patterns

Important Files Changed

File Analysis

Filename Score Overview
src/cloudai/workloads/megatron_bridge/megatron_bridge.py 4/5 Added Pydantic validation for model_name and model_size fields to ensure they are non-empty strings, moving validation from runtime to parse time. Changes are sound and well-tested.
tests/slurm_command_gen_strategy/test_megatron_bridge_slurm_command_gen_strategy.py 5/5 Added comprehensive test coverage for the new validation logic, including tests for empty strings and whitespace-only values. Tests correctly verify both Pydantic's min_length validation and the custom validator.

Sequence Diagram

sequenceDiagram
    participant User
    participant TOML as TOML Config File
    participant Pydantic as Pydantic Validator
    participant CmdArgs as MegatronBridgeCmdArgs
    participant MinLength as min_length Validator
    participant Custom as Custom Validator
    participant CmdGen as Command Generator

    User->>TOML: Define model_name & model_size
    TOML->>Pydantic: Parse config (verify-configs/run/dry-run)
    
    alt Field is missing
        Pydantic-->>User: Error: Field required
    else Field is empty string ""
        Pydantic->>MinLength: Check min_length=1
        MinLength-->>User: Error: String should have at least 1 character
    else Field is whitespace only "   "
        Pydantic->>MinLength: Check min_length=1
        MinLength->>Custom: Pass (length >= 1)
        Custom->>Custom: Strip whitespace
        Custom->>Custom: Check if empty
        Custom-->>User: Error: cmd_args.model_name cannot be empty
    else Field is valid "  qwen3  "
        Pydantic->>MinLength: Check min_length=1
        MinLength->>Custom: Pass
        Custom->>Custom: Strip to "qwen3"
        Custom->>CmdArgs: Return stripped value
        CmdArgs->>CmdGen: Generate command
        Note over CmdGen: Runtime checks (lines 237-240)<br/>now unreachable
    end
Loading

Comment on lines +43 to +44
model_name: str = Field(min_length=1)
model_size: str = Field(min_length=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The min_length=1 constraint combined with the custom validator that strips whitespace creates inconsistent error messages:

  • Empty string "": Pydantic's generic "String should have at least 1 character"
  • Whitespace " ": Custom message "cmd_args.model_name cannot be empty."

Consider removing min_length=1 to provide consistent error messages for all invalid inputs. The custom validator alone is sufficient to catch both cases.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants