Dkorzekwa/any model other models#1007
Conversation
- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…zzletron (nemotron-3-nano-30b-a3b-base-bf16) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
… if now test_puzzletron.py will be repeatable. Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
There was a problem hiding this comment.
why do we call this nemotron_h and not nemotron_h_v3? Do we know if this will be same for v4 as well?
There was a problem hiding this comment.
the names are changing so fast, I added to TODO to unify it.
There was a problem hiding this comment.
Why this is qwen3_8b and not qwen3? All other models have generic converter not specific to one specific variant
There was a problem hiding this comment.
A model descriptor can be specific, and sometimes within the same model family across different sizes could be differences, e.g., in how model weights are named, or structured. This one was only tested on qwen3 8B, therefore named this way for now.
There was a problem hiding this comment.
same comment - why qwen3_vl_30b and not qwen3_vl?
There was a problem hiding this comment.
because tested only for this particular model
modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
Outdated
Show resolved
Hide resolved
3866125 to
27866de
Compare
# This prevents NaN values in uninitialized parameters (e.g., backbone.layers.1.mixer.gate.weight
# in nemotron-3-nano-30b-a3b-base-bf16) that can occur with from_config on RTX GPU cards (not on H100)
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…reproducible on CI) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…YAMLs (#1039) ### What does this PR do? Type of change: New tests / Refactoring Simplifies the puzzletron test infrastructure by: 1. **Removing `hf_configs/` folder** — HuggingFace configs are now loaded on-the-fly via `AutoConfig.from_pretrained(hf_model_name)` instead of from cached static files. 2. **Removing `HF_MODEL_CARD_NAMES` mapping** — HF model names (e.g. `meta-llama/Llama-3.1-8B-Instruct`) are passed directly as test parameters. 3. **Replacing hardcoded VL model check** with `hasattr(config, "text_config") and hasattr(config, "vision_config")` for generic detection. 4. **Unifying ~6k lines of near-identical YAML** into shared base configs with per-model overrides: - `validate_model_defaults.yaml`, `validate_solutions_defaults.yaml` — shared validation params - `pruning/pruning_defaults.yaml`, `pruning/ffn_pruning_base.yaml`, `pruning/attn_pruning.yaml`, `pruning/hidden_dim_pruning.yaml` — shared pruning bases - Per-model dirs now follow HF model card paths (`meta-llama/Llama-3.1-8B-Instruct/`) and contain only model-specific overrides (e.g. just the `layer_descriptor._target_` class) 5. **Removing `hydra_config_subdir` parameter** from test parametrize — config path is derived from `hf_model_name` directly. 6. **Removing unused `bypass:` entries** from all per-model main YAMLs. ### Usage ```python # Test parametrize now uses HF model names directly: ("meta-llama/Llama-3.1-8B-Instruct", "llama", None, False), ``` ### Testing All 8 parametrized test cases in `test_puzzletron.py` pass: - meta-llama/Llama-3.1-8B-Instruct - meta-llama/Llama-3.2-3B-Instruct - Qwen/Qwen2.5-7B-Instruct - Qwen/Qwen3-8B - Qwen/Qwen3-VL-30B-A3B-Instruct - mistralai/Mistral-Small-24B-Instruct-2501 - nvidia/NVIDIA-Nemotron-Nano-12B-v2 - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 CI Job: https://github.com/NVIDIA/Model-Optimizer/actions/runs/23087216443/job/67065820836 ### Before your PR is "*Ready for review*" - Is this change backward compatible?: N/A (test-only changes) - If you copied code from any other source, did you follow IP policy in [CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?: N/A - Did you write any new necessary tests?: N/A (refactoring existing tests) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A ### Additional Information Hydra packaging notes (non-obvious fixes required): - Added `# @Package _global_` to all per-model main YAMLs — needed when `config_name` contains path separators, otherwise Hydra nests all keys under the org/model package - Added `@_here_` to sub-defaults inside `pruning/` configs — prevents Hydra from compounding the `pruning` package at each inheritance level (`pruning` → `pruning.pruning` → `pruning.pruning.pruning`) - Moved `hydra/hydra_logging=disabled` from YAML defaults list to `overrides=` in `puzzletron.py` — the YAML override syntax broke with nested config paths --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Co-authored-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
5b3d97d to
73eb9a8
Compare
What does this PR do?
Merging dkorzekwa/any_model_other_models into dkorzekwa/mip_and_realize_models - this MR is only for reviewing. Ultimately dkorzekwa/any_model_other_models should be merged into feature/puzzletron once dkorzekwa/mip_and_realize_models is merged there.
Summary by CodeRabbit
New Features
Tests