Skip to content

Fix spec dec example tests#1183

Merged
kevalmorabia97 merged 4 commits intomainfrom
kmorabia/spec-dec-tests
Apr 7, 2026
Merged

Fix spec dec example tests#1183
kevalmorabia97 merged 4 commits intomainfrom
kmorabia/spec-dec-tests

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented Apr 6, 2026

What does this PR do?

Type of change: Test fix

  • Fix tests/examples/speculative_decoding - previously silently skipped
  • Avoid pulling nemotron-post-training-dataset-v2 in tests to reduce chances of HF loading timeout in CICD
  • Make slow and redundant tests manual to speed up CICD

Testing

  • Tests passing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A

Summary by CodeRabbit

  • Chores

    • Removed git‑LFS install step from CI and deleted an automated branch‑cleanup workflow
    • Trimmed example environment dependencies and relaxed transformers compatibility; added an optional tokenization dependency
  • Tests

    • Switched tests to generate datasets dynamically and improved fixture handling
    • Standardized PTQ test parameters (explicit calibration dataset) and refined GPU/test selection
  • Bug Fixes

    • Improved device-awareness and numeric handling in speculative decoding attention paths

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 52a90650-75eb-418c-ac8b-96b9e3113c48

📥 Commits

Reviewing files that changed from the base of the PR and between 361b4a4 and 0c7f0ed.

📒 Files selected for processing (1)
  • modelopt/torch/speculative/plugins/transformers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/torch/speculative/plugins/transformers.py

📝 Walkthrough

Walkthrough

Removed git-lfs from CI, adjusted example dependency lists, relaxed a transformers constraint, added tiktoken to hf extras, updated speculative/transformers plugin rope/cache/dtype handling, refactored PTQ test utilities and tests to use centralized command + explicit calibration dataset, and made speculative-decoding tests generate datasets at runtime.

Changes

Cohort / File(s) Summary
CI workflow change
\.github/workflows/_example_tests_runner.yml
Removed installation of git-lfs from the example tests runner.
Workflow removal
\.github/workflows/delete_outdated_pr_branches.yml
Deleted the workflow that pruned remote pull-request/<num> branches.
Example requirements
examples/llm_eval/requirements.txt, examples/llm_ptq/requirements.txt, examples/speculative_decoding/requirements.txt
Removed tiktoken from llm_eval and llm_ptq, removed torchvision from llm_eval, and changed transformers==5.0.0rc1transformers<5.4 in speculative_decoding.
Project extras
pyproject.toml
Added tiktoken to the hf optional-dependencies extra.
Speculative/Transformers plugin
modelopt/torch/speculative/plugins/transformers.py
Replaced helper cache factory with direct DynamicCache(config=...); extended _maybe_init_rope(device=...) and updated callers to pass the device; rebuilt eagle_config from a mutable arch config and injected top-level rope_theta into rope_scaling when missing; compute TTT attention mask using an effective dtype resolved from base config or layer weights.
PTQ test utilities
tests/_test_utils/examples/llm_ptq_utils.py
Replaced local subprocess invocation with run_llm_ptq_command(...); removed trust_remote_code flag handling; added calib_dataset: str = "cnn_dailymail" to PTQCommand; stop forwarding max_sm; extract quant separately; removed unreachable returns after pytest skips.
PTQ test update
tests/examples/llm_ptq/test_llm_ptq.py
Now passes calib_dataset="peoples_speech" when constructing the Whisper PTQ PTQCommand.
Speculative decoding tests
tests/examples/speculative_decoding/conftest.py, tests/examples/speculative_decoding/test_eagle.py
Fixture now generates a temporary dataset via YAML + make_dataset.py instead of using a static daring-anteater.jsonl; test_eagle.py imports AutoConfig directly, adds num_gpus param to test_llama_eagle3 and uses it for GPU-skip logic; two remote model params marked manual; use cfg.text_config when present.
Export test tweak
tests/gpu/torch/export/test_unified_hf_export_and_check_safetensors.py
Added explicit dataset="cnn_dailymail" argument to the hf_ptq.py invocation.
Misc. tests
tests/...
Various test parameter forwarding and dataset-handling adjustments to align with centralized command and dataset generation changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix spec dec example tests' directly aligns with the main change objective—fixing tests in tests/examples/speculative_decoding that were previously skipped.
Security Anti-Patterns ✅ Passed The PR complies with all security coding practices outlined in SECURITY.md. Security-sensitive patterns are only in test files which are exempt from these rules. The new tiktoken dependency is MIT-licensed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/spec-dec-tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/spec-dec-tests branch from 2a82a07 to c3526bb Compare April 6, 2026 19:31
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-07 05:24 UTC

Comment thread modelopt/torch/speculative/plugins/transformers.py
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Comment thread modelopt/torch/speculative/plugins/transformers.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
pyproject.toml (1)

85-85: Consider adding a version constraint for tiktoken.

While several dependencies in the hf group are unpinned (e.g., nltk, wonderwords), many critical ones have version constraints (e.g., transformers>=4.56,<5.0, peft>=0.17.0, sentencepiece>=0.2.1). Adding a minimum version constraint for tiktoken would help ensure compatibility and prevent potential issues with older versions.

📌 Example version constraint
-    "tiktoken",
+    "tiktoken>=0.5.0",

Note: The specific version should be chosen based on the minimum version required by your codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 85, The dependency entry "tiktoken" in the hf extras
of pyproject.toml is unpinned; update the hf extras to include a minimum version
constraint for tiktoken (for example "tiktoken>=X.Y.Z" or a range like
"tiktoken>=X.Y.Z,<NextMajor") so your code is protected from incompatible older
releases; modify the "tiktoken" entry in the hf extras list to the chosen
constraint string and run dependency resolution to verify compatibility with
existing constraints like transformers and peft.
tests/examples/speculative_decoding/test_eagle.py (2)

284-284: Same trust_remote_code=True pattern - consider documenting.

Same observation as line 234. An inline comment would clarify why this is needed for the Kimi checkpoint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/speculative_decoding/test_eagle.py` at line 284, The call to
AutoConfig.from_pretrained(checkpoint_dir, trust_remote_code=True) needs an
inline comment explaining why trust_remote_code=True is required for the Kimi
checkpoint; update the invocation site (the AutoConfig.from_pretrained call
where checkpoint_dir is used) to add a short comment like “// required for Kimi
checkpoint because model code is provided in the checkpoint and must be trusted”
(or similar) so readers understand the reason.

234-236: Hardcoded trust_remote_code=True - acceptable for test code but consider documenting.

Per coding guidelines, trust_remote_code=True should not be hardcoded. However, this is test code (excluded from Bandit checks) testing specific remote models (Kimi, MiniMax) that require remote code execution.

Consider adding an inline comment explaining why this is necessary:

Suggested documentation
+    # trust_remote_code=True required for moonshotai/MiniMaxAI models that use custom modeling code
     cfg = AutoConfig.from_pretrained(model_path, trust_remote_code=True)

As per coding guidelines: "Do not hardcode trust_remote_code=True when loading Hugging Face Transformers models."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/speculative_decoding/test_eagle.py` around lines 234 - 236,
The test currently hardcodes trust_remote_code=True when calling
AutoConfig.from_pretrained(model_path, trust_remote_code=True) (assigning to
cfg) which violates the general guideline; update the test to keep
trust_remote_code=True but add a clear inline comment adjacent to the
AutoConfig.from_pretrained call explaining that this is test-only, that the
tests exercise remote-models (e.g., Kimi, MiniMax) which require remote code
execution, and that this file is excluded from Bandit checks—so leave the flag
as-is for these specific models and do not change runtime behavior elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/speculative/plugins/transformers.py`:
- Around line 750-753: The code assumes self._base_llm_config has attribute
dtype but standard HF configs use torch_dtype; update the dtype selection logic
used before computing dtypemin so it first checks getattr(self._base_llm_config,
"dtype", None) then getattr(self._base_llm_config, "torch_dtype", None) and only
then falls back to self.eagle_module.layers[0].input_layernorm.weight.dtype;
ensure the variable named dtype is set from that prioritized lookup so
torch.finfo(dtype).min remains valid for dtypemin computation.

In `@pyproject.toml`:
- Line 85: Remove the "tiktoken" dependency entry from pyproject.toml (the
current string "tiktoken") and ensure the dependency remains declared only in
the example-specific requirements files (e.g.,
examples/specdec_bench/requirements.txt and examples/llm_eval/requirements.txt);
update those requirements files if missing and then regenerate any
lock/installed environment artifacts as needed (e.g., update poetry lock or CI
deps) so core package modelopt has no direct tiktoken dependency.

---

Nitpick comments:
In `@pyproject.toml`:
- Line 85: The dependency entry "tiktoken" in the hf extras of pyproject.toml is
unpinned; update the hf extras to include a minimum version constraint for
tiktoken (for example "tiktoken>=X.Y.Z" or a range like
"tiktoken>=X.Y.Z,<NextMajor") so your code is protected from incompatible older
releases; modify the "tiktoken" entry in the hf extras list to the chosen
constraint string and run dependency resolution to verify compatibility with
existing constraints like transformers and peft.

In `@tests/examples/speculative_decoding/test_eagle.py`:
- Line 284: The call to AutoConfig.from_pretrained(checkpoint_dir,
trust_remote_code=True) needs an inline comment explaining why
trust_remote_code=True is required for the Kimi checkpoint; update the
invocation site (the AutoConfig.from_pretrained call where checkpoint_dir is
used) to add a short comment like “// required for Kimi checkpoint because model
code is provided in the checkpoint and must be trusted” (or similar) so readers
understand the reason.
- Around line 234-236: The test currently hardcodes trust_remote_code=True when
calling AutoConfig.from_pretrained(model_path, trust_remote_code=True)
(assigning to cfg) which violates the general guideline; update the test to keep
trust_remote_code=True but add a clear inline comment adjacent to the
AutoConfig.from_pretrained call explaining that this is test-only, that the
tests exercise remote-models (e.g., Kimi, MiniMax) which require remote code
execution, and that this file is excluded from Bandit checks—so leave the flag
as-is for these specific models and do not change runtime behavior elsewhere.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1e1939be-54e5-460a-a4cd-e7b373cc56ad

📥 Commits

Reviewing files that changed from the base of the PR and between 4a5ef01 and c3526bb.

📒 Files selected for processing (11)
  • .github/workflows/_example_tests_runner.yml
  • examples/llm_eval/requirements.txt
  • examples/llm_ptq/requirements.txt
  • examples/speculative_decoding/requirements.txt
  • modelopt/torch/speculative/plugins/transformers.py
  • pyproject.toml
  • tests/_test_utils/examples/llm_ptq_utils.py
  • tests/examples/llm_ptq/test_llm_ptq.py
  • tests/examples/speculative_decoding/conftest.py
  • tests/examples/speculative_decoding/test_eagle.py
  • tests/gpu/torch/export/test_unified_hf_export_and_check_safetensors.py
💤 Files with no reviewable changes (3)
  • examples/llm_ptq/requirements.txt
  • examples/llm_eval/requirements.txt
  • .github/workflows/_example_tests_runner.yml

Comment thread modelopt/torch/speculative/plugins/transformers.py
Comment thread pyproject.toml
Comment thread modelopt/torch/speculative/plugins/transformers.py
@kevalmorabia97 kevalmorabia97 requested a review from h-guo18 April 6, 2026 19:49
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.25%. Comparing base (df80a0f) to head (0c7f0ed).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1183      +/-   ##
==========================================
+ Coverage   74.77%   76.25%   +1.47%     
==========================================
  Files         351      351              
  Lines       40072    41891    +1819     
==========================================
+ Hits        29964    31943    +1979     
+ Misses      10108     9948     -160     
Flag Coverage Δ
examples 45.20% <100.00%> (+4.98%) ⬆️
gpu 56.93% <6.25%> (-0.17%) ⬇️
unit 54.83% <75.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread modelopt/torch/speculative/plugins/transformers.py Outdated
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/spec-dec-tests branch from 361b4a4 to 0c7f0ed Compare April 7, 2026 03:22
@kevalmorabia97 kevalmorabia97 requested a review from h-guo18 April 7, 2026 03:22
@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) April 7, 2026 04:31
@kevalmorabia97 kevalmorabia97 merged commit 80d2f02 into main Apr 7, 2026
45 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/spec-dec-tests branch April 7, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants