Vllm + FlexAttention Work Tracking

# FlexAttention Performance & Feature Tracking

## Overview

FlexAttention currently has significant performance bottlenecks and missing features that limit its adoption. This tracking issue provides an overview of the main categories of work needed.

## 🚨 Critical Performance Issues

**Primary bottleneck**: Custom op prevents cudagraphing, causing ~10x throughput regression. Additional issues include unnecessary recompilations and  metadata operations.

## 🔧 Missing Features

FlexAttention currently only supports basic causal attention. Many common attention patterns are not yet implemented:
- ALiBi slopes
- Sliding window attention  
- Block sparse attention
- Quantized KV cache
- Encoder/cross-attention support
- Speculative decoding
- And more...

## 📋 Detailed Work Items & Contributing

**All specific issues, performance optimizations, and feature implementations are tracked in the project board:**

### 👉 [**[FlexAttention Project Board](https://github.com/users/drisspg/projects/7/views/1)**] 👈

The project board contains:
- Individual issues for each performance bottleneck
- Feature implementation tasks with detailed specifications  
- Priority labels and status tracking
- Technical implementation notes


## 📊 Current Status

- **Performance**: ~10x slower than optimal due to cudagraph blocking
- **Features**: Basic causal attention only, many common patterns missing
- **Priority**: Focus on performance fixes first, then high-impact features

---

*For technical details and implementation notes, see the full breakdown in the project board issues.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Vllm + FlexAttention Work Tracking #19765

FlexAttention Performance & Feature Tracking

Overview

🚨 Critical Performance Issues

🔧 Missing Features

📋 Detailed Work Items & Contributing

👉 [FlexAttention Project Board] 👈

📊 Current Status

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Vllm + FlexAttention Work Tracking #19765

Description

FlexAttention Performance & Feature Tracking

Overview

🚨 Critical Performance Issues

🔧 Missing Features

📋 Detailed Work Items & Contributing

👉 [FlexAttention Project Board] 👈

📊 Current Status

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions