[None][chore] Update flashinfer-python from 0.6.9 to 0.6.10 by yihwang-nv · Pull Request #13746 · NVIDIA/TensorRT-LLM

yihwang-nv · 2026-05-05T02:50:56Z

Summary

Bump flashinfer-python from 0.6.9 to 0.6.10 (latest stable, released 2026-05-04).
Update version pins in requirements.txt, security_scanning/pyproject.toml, security_scanning/poetry.lock, and ATTRIBUTIONS-Python.md.
Replace string version comparison with packaging.version.Version in tensorrt_llm/_torch/speculative/interface.py so the >= 0.6.4 gate evaluates correctly. Lexicographic compare gave "0.6.10" >= "0.6.4" → False, which would have silently disabled flashinfer in SpecWorkerBase after the bump.

Test plan

pip install -r requirements.txt installs successfully
pytest tests/unittest/_torch/flashinfer/ -v
pytest tests/unittest/_torch/attention/test_flashinfer_attention.py -v
CI pre-merge passes

Summary by CodeRabbit

Chores
- Updated flashinfer-python dependency to version 0.6.10 across all project configurations.
Bug Fixes
- Improved version compatibility checking to use proper semantic versioning comparison for better accuracy.

yihwang-nv · 2026-05-05T02:51:32Z

/bot run --disable-fail-fast

coderabbitai · 2026-05-05T02:53:15Z

📝 Walkthrough

Walkthrough

This PR bumps flashinfer-python from version 0.6.9 to 0.6.10 across dependency manifests and updates the FlashInfer availability check to use semantic version comparison via packaging.version.Version instead of lexicographic string comparison.

Changes

FlashInfer Dependency Update

Layer / File(s)	Summary
Dependency Manifest Updates `requirements.txt`, `security_scanning/pyproject.toml`	FlashInfer pinned version incremented from 0.6.9 to 0.6.10 in project dependency specifications.
Version Comparison Logic `tensorrt_llm/_torch/speculative/interface.py`	Import `Version` from `packaging.version` and update FlashInfer availability check to use semantic version comparison (`Version(flashinfer.__version__) >= Version("0.6.4")`) instead of string comparison.
Documentation `ATTRIBUTIONS-Python.md`	Attribution entry for `flashinfer-python` updated from 0.6.9 to 0.6.10.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#13631: Both PRs bump flashinfer-python and update the same dependency and attribution files.

Suggested reviewers

wenmingw
hyukn

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: updating flashinfer-python from 0.6.9 to 0.6.10, and follows the repository's naming convention with [None][chore] prefix.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description provides a clear summary of changes, specific version bump details, explanation of the critical bug fix (lexicographic string comparison issue), and a comprehensive test plan with checkmarks indicating verification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-05-05T02:57:32Z

PR_Github #46727 [ run ] triggered by Bot. Commit: d199a79 Link to invocation

Bump flashinfer-python dependency to the latest stable release. - Update version pins in requirements.txt, security_scanning/pyproject.toml, security_scanning/poetry.lock, and ATTRIBUTIONS-Python.md. - Replace string version comparison with packaging.version.Version in speculative/interface.py so the >= 0.6.4 gate evaluates correctly for versions like 0.6.10 (lexicographic compare would otherwise return False). Signed-off-by: Yihan Wang <yihwang@nvidia.com>

yihwang-nv · 2026-05-05T07:30:01Z

/bot run

yihwang-nv · 2026-05-05T07:30:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-05T07:36:52Z

PR_Github #46769 [ run ] triggered by Bot. Commit: da4929c Link to invocation

tensorrt-cicd · 2026-05-05T07:37:14Z

PR_Github #46770 [ run ] triggered by Bot. Commit: da4929c Link to invocation

tensorrt-cicd · 2026-05-05T07:37:16Z

PR_Github #46769 [ run ] completed with state ABORTED. Commit: da4929c

Link to invocation

tensorrt-cicd · 2026-05-05T07:37:58Z

PR_Github #46727 [ run ] completed with state ABORTED. Commit: d199a79

Link to invocation

tensorrt-cicd · 2026-05-05T12:36:04Z

PR_Github #46770 [ run ] completed with state SUCCESS. Commit: da4929c
/LLM/main/L0_MergeRequest_PR pipeline #36793 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

juney-nvidia

Approved from the perspective of spec decoding related changes.

yihwang-nv requested review from a team as code owners May 5, 2026 02:50

yihwang-nv requested a review from cascade812 May 5, 2026 02:50

github-actions Bot assigned yihwang-nv May 5, 2026

yihwang-nv requested a review from wenmingw May 5, 2026 02:51

wenmingw approved these changes May 5, 2026

View reviewed changes

yihwang-nv force-pushed the yihwang-nv/update_flashinfer_0.6.10 branch from d199a79 to da4929c Compare May 5, 2026 07:29

MartinMarciniszyn approved these changes May 5, 2026

View reviewed changes

yihwang-nv enabled auto-merge (squash) May 5, 2026 16:15

juney-nvidia approved these changes May 6, 2026

View reviewed changes

yihwang-nv merged commit 0a64622 into NVIDIA:main May 6, 2026
8 of 9 checks passed

Conversation

yihwang-nv commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

yihwang-nv commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

yihwang-nv commented May 5, 2026

Uh oh!

yihwang-nv commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

juney-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yihwang-nv commented May 5, 2026 •

edited

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading