test: add vLLM deployment tests for checkpoint robustness#1656
Merged
test: add vLLM deployment tests for checkpoint robustness#1656
Conversation
vLLM deployment verification tests that load consolidated checkpoints and compare greedy output token-for-token against HuggingFace. Supports both full comparison and smoke test mode. Depends on checkpoint robustness PR #1606. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: adil-a <adil.asif2000@hotmail.com>
9 tasks
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Contributor
|
@adil-a I refactored similar to what we have for checkpoint robustness have a look when you get a chance: Ministral-3-3B-Instruct-2512, Llama-3.1-Nemotron-Nano-8B-v1, Qwen3-30B-A3B aren't recipes we have in examples. Are these needed? |
thomasdhc
reviewed
Apr 3, 2026
| TEST_FOLDER = "checkpoint_robustness" | ||
|
|
||
|
|
||
| class TestVLLMDeploy: |
Contributor
There was a problem hiding this comment.
I've updated the checkpoints to be the ones from the finetune job, should it be from the robustness job?
Contributor
There was a problem hiding this comment.
After we make that update I think we can delete this file
Contributor
|
@adil-a can we merge this? |
Contributor
|
@akoumpa Not yet this pathway has not been tested yet |
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
thomasdhc
approved these changes
Apr 8, 2026
Contributor
|
/ok to test 26054f2 |
akoumpa
approved these changes
Apr 9, 2026
svcnvidia-nemo-ci
pushed a commit
that referenced
this pull request
Apr 9, 2026
* test: add vLLM deployment tests for checkpoint robustness vLLM deployment verification tests that load consolidated checkpoints and compare greedy output token-for-token against HuggingFace. Supports both full comparison and smoke test mode. Depends on checkpoint robustness PR #1606. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: adil-a <adil.asif2000@hotmail.com> * Create deploy-test dependency group Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Revert deploy test group Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Move configs to recipes and create vllm_launcher Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Setup deploy environment Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Remove duplicate keys Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Add scope to vllm deploy test Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Drop needs dependency Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Use finetune test name for ckpt dir Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Make ckpt checking more robust Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Pass arguments correctly Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Update arguments Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Remove unused file Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> --------- Signed-off-by: adil-a <adil.asif2000@hotmail.com> Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
akoumpa
added a commit
that referenced
this pull request
Apr 9, 2026
test: add vLLM deployment tests for checkpoint robustness (#1656) * test: add vLLM deployment tests for checkpoint robustness vLLM deployment verification tests that load consolidated checkpoints and compare greedy output token-for-token against HuggingFace. Supports both full comparison and smoke test mode. Depends on checkpoint robustness PR #1606. * Create deploy-test dependency group * Revert deploy test group * Move configs to recipes and create vllm_launcher * Setup deploy environment * Remove duplicate keys * Add scope to vllm deploy test * Drop needs dependency * Use finetune test name for ckpt dir * Make ckpt checking more robust * Pass arguments correctly * Update arguments * Remove unused file --------- Signed-off-by: adil-a <adil.asif2000@hotmail.com> Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
akoumpa
added a commit
that referenced
this pull request
Apr 10, 2026
* test: add vLLM deployment tests for checkpoint robustness vLLM deployment verification tests that load consolidated checkpoints and compare greedy output token-for-token against HuggingFace. Supports both full comparison and smoke test mode. Depends on checkpoint robustness PR #1606. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: adil-a <adil.asif2000@hotmail.com> * Create deploy-test dependency group Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Revert deploy test group Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Move configs to recipes and create vllm_launcher Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Setup deploy environment Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Remove duplicate keys Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Add scope to vllm deploy test Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Drop needs dependency Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Use finetune test name for ckpt dir Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Make ckpt checking more robust Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Pass arguments correctly Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Update arguments Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> * Remove unused file Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> --------- Signed-off-by: adil-a <adil.asif2000@hotmail.com> Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--vllm_smoke_test)Test plan
L2_vLLM_Deploy_*.shscripts after checkpoint robustness tests complete🤖 Generated with Claude Code