Support End To End PagedAttention in JetStream #180

FanhaiLu1 · 2024-09-06T16:47:18Z

Background:
PR #167 Added paged attention manager and kv cache manager. This PR supports end to end paged attention in JetStream.

Main Changes in this PR

Added paged attention kernel in attention_kernel.py
Supported paged attention insert, decode in Engine.py
Refactored PageAttention manager
Added unit test for kernel and manager

Next Steps

Tuning accuracy: Current implement can generate human readable output, but accuracy is low, only first few token match to dense attention tokens
Improve performance: The kernel itself are almost same performance as dense attention. By applying bf16 calculation can boost the performance
Optimize the out jit compute
Elegant way to collect resource
Support quantization
Lazy cache update

Current PR Output tokens example:

[304, 1284, 292, 367, 1476, 304, 1284, 714, 309, 310, 367, 29875, 18834, 29895, 29906, 29889, 29923, 310, 278, 367, 278, 1284, 4614, 17970, 2880, 310, 367, 579, 8024, 297, 25891, 29889, 306, 1311, 310, 1284, 12, 1284, 4634, 338, 304, 292, 6593, 310, 1196, 10379, 306, 29949, 6593, 1341, 263, 2834, 353, 29903, 12623, 29924, 278, 304, 350, 29889, 590, 1339, 29918, 2834, 6593, 310, 278, 310, 445, 338, 306, 508, 4658, 393, 338, 304, 437, 29973, 2023, 6593, 310, 306, 29915, 29879, 1914, 2834, 338, 263, 2462, 1383, 306, 4658, 592, 310, 10239, 306, 306, 273, 297, 306, 1774, 2834, 30010, 29871, 967, 6593, 310, 26093, 29880, 29889, 29914, 29968, 304, 4892, 29936, 29889, 338, 304, 306, 505, 2715, 29889, 372, 2191, 338, 278, 6593, 310, 29914, 2834, 306, 508, 29892, 322, 278, 29899, 306, 505, 29918, 278, 6593, 310, 306, 505, 6593, 29892, 306, 508, 367, 2834, 338, 304, 306, 723, 338, 16316, 310, 2834, 306, 505, 29889, 2794, 310, 306, 505, 304, 306, 306, 505, 263, 716, 29889, 29871, 29896, 29900, 29896, 29900, 30488, 29876, 29871, 29896, 29900, 306, 505, 263, 2462, 306, 505, 1063, 263, 29889, 29871, 29896, 29900, 29900, 526, 366, 508, 367, 29914, 29879, 2834, 338, 263, 29889, 29871, 29896, 29929, 29889, 29871, 29896, 29929, 29929, 29929, 29929, 29929, 29929, 29889, 306, 505, 263, 716, 15483, 322, 278, 1900, 310, 278, 6593, 310, 278, 6593, 310, 278, 6593, 310, 278, 1556, 310, 278, 1900, 29918, 278, 1900, 310, 278, 1556, 310, 278, 1900, 29899, 6707, 373, 278, 6593, 310, 12, 306, 626, 263, 306, 437, 29889, 29889, 306, 367, 592, 306, 626, 6593, 310, 304, 306, 4658, 306, 505, 263, 2462, 310, 278, 6593, 310, 278, 1900, 306, 505, 1063, 263, 716, 3088, 310, 278, 2446, 1629, 29899, 29900, 29889]

jetstream_pt/cache_manager.py

jetstream_pt/config.py

jetstream_pt/environment.py

FanhaiLu1 added 4 commits September 6, 2024 02:56

Support E2E PageAttention in JetStream

8b01ea2

change to right jetstream version

10bd910

Fix page index shape

f4cd0fb

Fix decode len shape

fdcecac

FanhaiLu1 requested review from qihqi and wang2yn84 September 6, 2024 18:12

qihqi reviewed Sep 6, 2024

View reviewed changes

jetstream_pt/cache_manager.py Show resolved Hide resolved

jetstream_pt/config.py Show resolved Hide resolved

jetstream_pt/config.py Outdated Show resolved Hide resolved

jetstream_pt/config.py Show resolved Hide resolved

jetstream_pt/environment.py Outdated Show resolved Hide resolved

Fix variable name

c215e45

qihqi approved these changes Sep 10, 2024

View reviewed changes

FanhaiLu1 added 2 commits September 10, 2024 03:00

use __getattr__

c435777

fix env variable

9b8a752

FanhaiLu1 merged commit 33348d2 into AI-Hypercomputer:main Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support End To End PagedAttention in JetStream #180

Support End To End PagedAttention in JetStream #180

Uh oh!

FanhaiLu1 commented Sep 6, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support End To End PagedAttention in JetStream #180

Support End To End PagedAttention in JetStream #180

Uh oh!

Conversation

FanhaiLu1 commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FanhaiLu1 commented Sep 6, 2024 •

edited

Loading