QKV calculation improvements in attention mechanism #488

benraha · 2025-09-07T12:23:36Z

Motivation and Context

This PR has two changes:

Replace costly einsum with simple matrix multiplication - this version is faster in both PyTorch and ONNX. IMO, we do miss some of the clarity, but since this happens in every forward pass, it is worth it.
Add a fast-path for the broadcast_kv_across_heads function - since the use case where we don't have to change anything happens 50% of the time, this simple fix has great performance benefits.

Public API Changes

No Public API changes
Yes, Public API changes (Details below)

How Has This Been Tested?

Locally.

Checklist

The changes have been tested locally.
Documentation has been updated (if the public API or usage changes).
A entry has been added to CHANGELOG.md (if relevant for users).
The code follows the project's style guidelines.
I have considered the impact of these changes on the public API.

…mple fast-track to the broadcast kv head

gemini-code-assist

Code Review

This pull request introduces two performance optimizations to the attention mechanism. The first replaces a costly einsum operation with a more efficient matrix multiplication using torch.mm, and the second adds a fast-path to broadcast_kv_across_heads for a common use case. Both changes are logical and aim to improve performance. My review includes a suggestion to further refine the matrix multiplication implementation for better readability and to use a more idiomatic PyTorch function.

src/tabpfn/architectures/base/attention/full_attention.py

benraha · 2025-09-10T10:35:07Z

@priorphil can you please review? :)

priorphil · 2025-09-10T12:43:10Z

Thanks! Could you share some of the benchmarks (including hardware, datasets shapes, dtype, timings) you ran for each of these changes so I can get a feeling for the magnitude of the speedup? :)

benraha · 2025-09-10T13:44:36Z

Sure! I wrap the model in ONNX, and I measure a 15% reduction in the number of nodes in the graph and ~10% faster inference.

I work on CPU inference, with a fit of 1000 rows, running inference on a few lines at a time.

priorphil

LGTM, thanks for the changes.

#141) * Record copied public PR 488 * QKV calculation improvements in attention mechanism (#488) (cherry picked from commit f0ad402) --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: benraha <benraha@gmail.com>

Replace costly einsum with simple matrix multiplication, and add a si…

96bef57

…mple fast-track to the broadcast kv head

gemini-code-assist bot reviewed Sep 7, 2025

View reviewed changes

src/tabpfn/architectures/base/attention/full_attention.py Show resolved Hide resolved

Implementing Gemini's suggestion

e8eed22

priorphil approved these changes Sep 10, 2025

View reviewed changes

priorphil merged commit f0ad402 into PriorLabs:main Sep 10, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QKV calculation improvements in attention mechanism #488

QKV calculation improvements in attention mechanism #488

Uh oh!

benraha commented Sep 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

benraha commented Sep 10, 2025

Uh oh!

priorphil commented Sep 10, 2025

Uh oh!

benraha commented Sep 10, 2025

Uh oh!

priorphil left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

QKV calculation improvements in attention mechanism #488

QKV calculation improvements in attention mechanism #488

Uh oh!

Conversation

benraha commented Sep 7, 2025

Motivation and Context

Public API Changes

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

benraha commented Sep 10, 2025

Uh oh!

priorphil commented Sep 10, 2025

Uh oh!

benraha commented Sep 10, 2025

Uh oh!

priorphil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants