Skip to content

release: v0.10#1549

Merged
AlpinDale merged 29 commits into
mainfrom
sync-60f76ba
Nov 2, 2025
Merged

release: v0.10#1549
AlpinDale merged 29 commits into
mainfrom
sync-60f76ba

Conversation

@AlpinDale

Copy link
Copy Markdown
Member

:)

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @AlpinDale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request, titled 'release: v0.10', represents a substantial update to the project's core attention mechanisms and build infrastructure. The changes aim to streamline the codebase, enhance performance across various hardware platforms, and improve support for advanced features like KV cache quantization. The refactoring efforts consolidate existing attention backends, introduce new specialized attention layers, and ensure compatibility with the latest compiler and hardware capabilities, all while maintaining a focus on code quality and future extensibility.

Highlights

  • Attention Backend Refactoring: Significant refactoring and consolidation of attention backend implementations, removing several older backend files and introducing a new registry for managing attention backends across different hardware platforms (CUDA, ROCm, XPU).
  • CMake and Build System Updates: Updated CMake configurations to use C++17, removed Python 3.9 support, added new AMD GPU architectures, and refined CUDA architecture support for newer versions (13.0+), including specific flags for compression and include paths.
  • KV Cache Quantization Enhancements: Improved handling of KV cache quantization, particularly for FP8, with updates to scaling factor calculations, custom operations for IPEX, and new methods in the attention layer to support query quantization and output scaling.
  • New Attention Layer Features: Introduced a new MLAAttention class for Multi-Head Latent Attention, along with new custom Triton operations for unified MLA attention. Also added new utility functions and wrappers for ViT attention compatibility with torch.compile.
  • Code Quality and Compatibility: Added _bc_linter.py for linter suppression, updated Python import paths for better modularity, and included warnings/fallbacks for unsupported FlashAttention versions or features, indicating a focus on code quality and compatibility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request appears to be a major release, introducing significant refactoring, especially in the attention backend and CMake build system. It also includes updates to dependencies like PyTorch and CUTLASS, and adds support for new hardware and CUDA versions. While most of the changes look solid, I've identified a critical issue in the CMakeLists.txt that could affect the build, and another suggestion to improve CMake best practices.

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt
@AlpinDale AlpinDale merged commit f5f651f into main Nov 2, 2025
0 of 4 checks passed
@Nero10578

Copy link
Copy Markdown
Contributor

YEA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants