gpt-oss 20b support#889
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
modelopt/torch/puzzletron/anymodel/models/gpt_oss_20b/gpt_oss_pruned_to_mxfp4.py
Show resolved
Hide resolved
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b.yaml
Outdated
Show resolved
Hide resolved
...es/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml
Outdated
Show resolved
Hide resolved
modelopt/torch/puzzletron/anymodel/models/gpt_oss_20b/gpt_oss_pruned_to_mxfp4.py
Outdated
Show resolved
Hide resolved
Signed-off-by: mchochowski <mchochowski@nvidia.com>
Signed-off-by: mchochowski <mchochowski@nvidia.com>
Signed-off-by: mchochowski <mchochowski@nvidia.com>
…uator (NVIDIA#894) This PR adds Nemo Evaluator support to the AnyModel branch. It includes documentation and a deployment script that allow for evaluation of AnyModel Puzzletron checkpoints with Nemo Evaluator. We assume development on a GPU node, following the current tutorial style, so we don't rely on Slurm-based deployment/evaluation, but instead use direct evaluation via `eval-factory run_eval`. --------- Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: mchochowski <mchochowski@nvidia.com>
## What does this PR do? **Overview:** - Update the AnyModel Puzzletron tutorial to use lm-eval. We add a script that monkey patches lm-eval to use the patched AnyModel model loading - No need for running ray deployments or replacing the NeMo Export-Deploy deployment script with a patched version - Moved instructions for using NeMo Evaluator to an alternative readme file --------- Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: mchochowski <mchochowski@nvidia.com>
## What does this PR do? **Overview:** Updated license of examples/puzzletron/evaluation/lm_eval_anymodel.py to match that of reference examples/llm_eval/lm_eval_hf.py. Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: mchochowski <mchochowski@nvidia.com>
Signed-off-by: mchochowski <mchochowski@nvidia.com>
…ml config Signed-off-by: mchochowski <mchochowski@nvidia.com>
Signed-off-by: mchochowski <mchochowski@nvidia.com>
e07dbaa to
ee182b5
Compare
Signed-off-by: mchochowski <mchochowski@nvidia.com>
kevalmorabia97
left a comment
There was a problem hiding this comment.
Left some comments. Also seeing pre-commit formatting not applied. Please run pre-commit run --all-files
There was a problem hiding this comment.
Why do we need this env variable and to add the workdir to sys.path?
There was a problem hiding this comment.
Why do we need to broadcast_list twice instead of reusing output of first call for 2nd one?
There was a problem hiding this comment.
this is actually nemo-deploy code with patch - didn't want to touch the internals, only update the model loading
Signed-off-by: mchochowski <mchochowski@nvidia.com>
Signed-off-by: mchochowski <mchochowski@nvidia.com>
Signed-off-by: chochowski <Marcin.Chochowski@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: chochowski <Marcin.Chochowski@gmail.com>
## What does this PR do? Adds gpt-oss-20b support for puzzle any-model pruning. **Type of change:** <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> new feature **Overview:** adds descriptor, converter and yaml configuration files for expert removal. Introduces slight changes on conversion to account for mxfp4 quantized checkpoint of gpt-oss ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> --------- Signed-off-by: mchochowski <mchochowski@nvidia.com> Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: chochowski <Marcin.Chochowski@gmail.com> Co-authored-by: J Rausch <38429553+j-rausch@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?
Adds gpt-oss-20b support for puzzle any-model pruning.
Type of change:
new feature
Overview:
adds descriptor, converter and yaml configuration files for expert removal. Introduces slight changes on conversion to account for mxfp4 quantized checkpoint of gpt-oss
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information