Skip to content

Fix validation for model config override#3649

Merged
copybara-service[bot] merged 1 commit intomainfrom
shuningjin-override
Apr 14, 2026
Merged

Fix validation for model config override#3649
copybara-service[bot] merged 1 commit intomainfrom
shuningjin-override

Conversation

@shuningjin
Copy link
Copy Markdown
Collaborator

@shuningjin shuningjin commented Apr 12, 2026

Description

Fix validation for flag override_model_config, b/502052235

Issue

Flag: override_model_config

Previous behavior:

  • (expected) override with CLI/kwargs/env gives error, when override_model_config=False
  • override with CLI/kwargs/env successfully, when override_model_config=True

Current behavior:

  • (unexpected) override with CLI/kwargs/env successfully, when override_model_config=False
  • override with CLI/kwargs/env successfully, when override_model_config=True

Cause: We previously had validation to check model config vs. CLI/kwargs/env, in pyconfig_deprecated.py. It is now bypassed with pyconfig.py and types.py.

Change

pyconfig.py: Add validation to check model config vs. CLI/kwargs/env, when override_model_config=True

tests/unit: add override_model_config=True to fix validation error in unit test

tests/unit/pyconfig_test

  • (existing) test_overriding_model : check successful update when override_model_config=True
  • (newly added) test_overriding_model_raises_error: check ValueError when override_model_config=False

Tests

Before

CLI, override_model_config=false -> can override, which is wrong

git rev-parse HEAD
# base_emb_dim: 2048 in deepseek2-16b.yml
OVERRIDE="base_emb_dim=1024"
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=megablox_pre_training model_name=deepseek2-16b tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V2-Lite dataset_type=synthetic enable_checkpointing=false attention=flash sparse_matmul=True use_tokamax_gmm=False dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=1 steps=5 max_target_length=1024 ici_fsdp_parallelism=-1 ${OVERRIDE}

https://paste.googleplex.com/5135739043053568

Config param base_emb_dim: 1024
completed step: 4, seconds: 0.488, TFLOP/s/device: 16.695, Tokens/s/device: 2100.384, total_weights: 4096, loss: 11.135

environment variable, override_model_config=false -> can override, which is wrong

git rev-parse HEAD
# base_emb_dim: 2048 in deepseek2-16b.yml
OVERRIDE=""
M_BASE_EMB_DIM=1024 \
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=megablox_pre_training model_name=deepseek2-16b tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V2-Lite dataset_type=synthetic enable_checkpointing=false attention=flash sparse_matmul=True use_tokamax_gmm=False dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=1 steps=5 max_target_length=1024 ici_fsdp_parallelism=-1 ${OVERRIDE}

https://paste.googleplex.com/5933592896208896

After

1. CLI, override_model_config=false -> prevent ok

# base_emb_dim: 2048 in deepseek2-16b.yml
OVERRIDE="base_emb_dim=1024"
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=megablox_pre_training model_name=deepseek2-16b tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V2-Lite dataset_type=synthetic enable_checkpointing=false attention=flash sparse_matmul=True use_tokamax_gmm=False dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=1 steps=5 max_target_length=1024 ici_fsdp_parallelism=-1 ${OVERRIDE}

https://paste.googleplex.com/5025033308209152

ValueError: Keys ['base_emb_dim'] are overridden by both model config and CLI/kwargs.This is not allowed, unless setting `override_model_config=True`.
  1. CLI, override_model_config=true -> override ok
# base_emb_dim: 2048 in deepseek2-16b.yml
OVERRIDE="base_emb_dim=1024 override_model_config=true"
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=megablox_pre_training model_name=deepseek2-16b tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V2-Lite dataset_type=synthetic enable_checkpointing=false attention=flash sparse_matmul=True use_tokamax_gmm=False dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=1 steps=5 max_target_length=1024 ici_fsdp_parallelism=-1 ${OVERRIDE}

https://paste.googleplex.com/5895284967211008

Config param base_emb_dim: 1024
completed step: 4, seconds: 0.486, TFLOP/s/device: 16.735, Tokens/s/device: 2105.397, total_weights: 4096, loss: 11.135

3. environment variable, override_model_config=false -> prevent ok

# base_emb_dim: 2048 in deepseek2-16b.yml
OVERRIDE=""
M_BASE_EMB_DIM=1024 \
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=megablox_pre_training model_name=deepseek2-16b tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V2-Lite dataset_type=synthetic enable_checkpointing=false attention=flash sparse_matmul=True use_tokamax_gmm=False dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=1 steps=5 max_target_length=1024 ici_fsdp_parallelism=-1 ${OVERRIDE}

https://paste.googleplex.com/6540298005118976

ValueError: Key 'base_emb_dim' is overridden by both model config and environment variable 'M_BASE_EMB_DIM'.This is not allowed, unless setting `override_model_config=True`.
  1. environment variable, override_model_config=true -> override ok
# base_emb_dim: 2048 in deepseek2-16b.yml
OVERRIDE="override_model_config=true"
M_BASE_EMB_DIM=1024 \
python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=gs://runner-maxtext-logs run_name=megablox_pre_training model_name=deepseek2-16b tokenizer_type=huggingface tokenizer_path=deepseek-ai/DeepSeek-V2-Lite dataset_type=synthetic enable_checkpointing=false attention=flash sparse_matmul=True use_tokamax_gmm=False dtype=bfloat16 weight_dtype=bfloat16 per_device_batch_size=1 steps=5 max_target_length=1024 ici_fsdp_parallelism=-1 ${OVERRIDE}

https://paste.googleplex.com/5443039960104960

Config param base_emb_dim: 1024
completed step: 4, seconds: 0.488, TFLOP/s/device: 16.685, Tokens/s/device: 2099.148, total_weights: 4096, loss: 11.135

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix Shuning!

Copy link
Copy Markdown
Collaborator

@gobbleturk gobbleturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests that both

  1. When override_model_config=False and user tries to override, we throw some error?
  2. When override_model_config=True, we can run a simple model (it looks like there are a bunch of tests which rely on this so its sort of indirectly tested)

I realize I should have added these tests when I created the feature override_model_config - but I also only tested case 2) - I am not sure is case 1) tested somewhere 2ed45b6

@shuningjin shuningjin force-pushed the shuningjin-override branch from 9662a79 to ce92bcc Compare April 13, 2026 21:20
@shuningjin shuningjin force-pushed the shuningjin-override branch from ce92bcc to e23d7e3 Compare April 13, 2026 21:36
@shuningjin
Copy link
Copy Markdown
Collaborator Author

shuningjin commented Apr 13, 2026

Thanks for the suggestion. I updated tests/unit/pyconfig_test

  • (existing) test_overriding_model : check successful update when override_model_config=True -- for case 2
  • (newly added) test_overriding_model_raises_error: check ValueError when override_model_config=False -- for case 1

Comment thread tests/unit/pyconfig_test.py
@copybara-service copybara-service Bot merged commit d26d2d7 into main Apr 14, 2026
49 of 50 checks passed
@copybara-service copybara-service Bot deleted the shuningjin-override branch April 14, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants