Skip to content

cp: ci: add NMP customizer contract test configs (1712) into r0.4.0#1858

Merged
akoumpa merged 1 commit intor0.4.0from
cherry-pick-1712-r0.4.0
Apr 15, 2026
Merged

cp: ci: add NMP customizer contract test configs (1712) into r0.4.0#1858
akoumpa merged 1 commit intor0.4.0from
cherry-pick-1712-r0.4.0

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor

beep boop [🤖]: Hi @adil-a 👋,

we've cherry picked #1712 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

* ci: add NMP customizer contract test configs

Add 13 recipe YAMLs from the NMP customizer service's
compile_automodel_config() output. These serve as contract tests —
if any stop working with finetune.py, it means a breaking change
was introduced that affects the customizer integration.

Configs cover 4 model families across SFT, PEFT, chat template,
and sequence packing axes:
- GPT-OSS 20B (MoE): full_sft, chat, peft, peft+packing
- Llama 3.1 8B: full_sft with TP=2
- Llama 3.2 1B: full_sft, chat, peft, peft+packing
- Nemotron Nano V3 (MoE): full_sft, chat, peft, peft+packing

Sample datasets will be placed on the CI cluster; data paths
overridden via CLI args at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* ci: enable checkpoint robustness testing for customizer configs

- Add ci.checkpoint_robustness sections to all 13 customizer YAMLs
  with model-specific KL thresholds matching existing configs
- Update finetune_launcher.sh to detect customizer/ configs and
  override dataset paths for both finetune and robustness phases
- Register dataset.path_or_dataset_id in conftest.py so pytest
  accepts the CLI override without aborting collection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* ci: remove redundant tp_size override from llama 3.1 robustness config

The base config already sets tp_size: 2 in the distributed section,
so the checkpoint_robustness override was redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* ci: use NEMO_CI_PATH for customizer dataset paths

NEMO_CI_PATH is the correct env var on eos CI
(/lustre/fsw/coreai_dlalgo_ci/automodel_ci), not TEST_DATA_DIR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* ci: update customizer dataset path to datasets/customizer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* ci: fix customizer dataset path to include sample-datasets subdir

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

* ci: reorganize customizer configs into model directories

Move customizer configs from flat examples/llm_finetune/customizer/
into their respective model-family directories with customizer_ prefix,
matching the established llm_finetune directory pattern.

- gpt_oss: 4 configs
- llama3_1: 1 config
- llama3_2: 4 configs
- nemotron: 4 configs

Update nightly_recipes.yml to integrate customizer entries into existing
model sections. Update finetune_launcher.sh glob from *customizer/* to
*customizer_* for filename-based detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adil-a <adil.asif2000@hotmail.com>

---------

Signed-off-by: adil-a <adil.asif2000@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor Author

/ok to test f9762d3

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa akoumpa merged commit 1baf6c2 into r0.4.0 Apr 15, 2026
35 checks passed
@akoumpa akoumpa deleted the cherry-pick-1712-r0.4.0 branch April 15, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick Run CICD Trigger Testing CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants