release: v0.10 by AlpinDale · Pull Request #1549 · dphnAI/sonar

AlpinDale · 2025-11-01T00:30:52Z

:)

gemini-code-assist · 2025-11-01T00:33:05Z

Summary of Changes

Hello @AlpinDale, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request, titled 'release: v0.10', represents a substantial update to the project's core attention mechanisms and build infrastructure. The changes aim to streamline the codebase, enhance performance across various hardware platforms, and improve support for advanced features like KV cache quantization. The refactoring efforts consolidate existing attention backends, introduce new specialized attention layers, and ensure compatibility with the latest compiler and hardware capabilities, all while maintaining a focus on code quality and future extensibility.

Highlights

Attention Backend Refactoring: Significant refactoring and consolidation of attention backend implementations, removing several older backend files and introducing a new registry for managing attention backends across different hardware platforms (CUDA, ROCm, XPU).
CMake and Build System Updates: Updated CMake configurations to use C++17, removed Python 3.9 support, added new AMD GPU architectures, and refined CUDA architecture support for newer versions (13.0+), including specific flags for compression and include paths.
KV Cache Quantization Enhancements: Improved handling of KV cache quantization, particularly for FP8, with updates to scaling factor calculations, custom operations for IPEX, and new methods in the attention layer to support query quantization and output scaling.
New Attention Layer Features: Introduced a new MLAAttention class for Multi-Head Latent Attention, along with new custom Triton operations for unified MLA attention. Also added new utility functions and wrappers for ViT attention compatibility with torch.compile.
Code Quality and Compatibility: Added _bc_linter.py for linter suppression, updated Python import paths for better modularity, and included warnings/fallbacks for unsupported FlashAttention versions or features, indicating a focus on code quality and compatibility.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request appears to be a major release, introducing significant refactoring, especially in the attention backend and CMake build system. It also includes updates to dependencies like PyTorch and CUTLASS, and adds support for new hardware and CUDA versions. While most of the changes look solid, I've identified a critical issue in the CMakeLists.txt that could affect the build, and another suggestion to improve CMake best practices.

…dite

…command

Nero10578 · 2025-11-13T00:09:35Z

YEA!

release: v0.10

0d4946d

gemini-code-assist Bot reviewed Nov 1, 2025

View reviewed changes

Comment thread CMakeLists.txt Outdated

Comment thread CMakeLists.txt

AlpinDale added 20 commits November 1, 2025 00:38

[build] fix: correct source list variable

7416d43

fix hadacore path

47f4e05

fixes

420a77f

chat completions fixes

b9da4fb

token throttling

1cb3a3c

fix single_user_mode

0a60e95

log cleanup + fix gloo logs

e5de8d0

progress bars

f00113b

color logs

9fc178a

make the logger look better

4dea16b

more log cleanup

d89a00c

add verbose log

92b5b7b

remove loguru from requirements

cec9c3a

disagg fixes

d6f64c3

nccl p2p example

a673103

overhaul p2p nccl script

7168b63

update all tests

09d0753

cuda_device_count_stateless import fix

ef4ff13

fix scheduler preemption

5b3727e

shared connector multi-test is flaky

9c7f52e

AlpinDale mentioned this pull request Nov 2, 2025

allow disable fused_moe to use lora on moe #1547

Closed

AlpinDale added 6 commits November 2, 2025 13:41

update dockerfile

778c141

fix terratorch geospatial models

cea92c5

enhance scheduler token limit checks to include request's max_tokens

1feca49

ada support for marlin

5af2b23

don't use flashinfer for nvfp4 GEMM if not on hopper or above

7c68ccc

enable req-level metrics by default, align uvicorn logging with aphro…

8e7da5d

…dite

AlpinDale added 2 commits November 2, 2025 18:59

fix profile debug log

ec49861

refactor Dockerfile to remove unnecessary conditional in wheel build …

252a9d9

…command

AlpinDale merged commit f5f651f into main Nov 2, 2025
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

release: v0.10#1549

release: v0.10#1549
AlpinDale merged 29 commits into
mainfrom
sync-60f76ba

AlpinDale commented Nov 1, 2025

Uh oh!

gemini-code-assist Bot commented Nov 1, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nero10578 commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

AlpinDale commented Nov 1, 2025

Uh oh!

gemini-code-assist Bot commented Nov 1, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nero10578 commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants