Adding cached KVs #51

gkroiz · 2023-05-11T00:16:19Z

This PR adds the option to cache KVs for inference. Porting changes from https://github.com/Liyang90/lit-llama/tree/cached_kvs.

main changes:

added KV caching
moved rope cache to the whole model from each attention layer
added mask cache to use for attention masking with cached kvs.
fix ops and tensors that would cause recompilation.

Fixes #21

gkroiz · 2023-05-11T00:17:03Z

cc @Liyang90 @carmocca

carmocca

This is looking great!

generate.py

lit_parrot/model.py

generate.py

carmocca · 2023-05-11T02:23:14Z

I'm getting a 30% speedup 🐎

Also minor code cleanup

lit_parrot/model.py

carmocca

We're done here! Awesome

lantiga

Looks amazing! Good to merge (I just proposed a change in capitalization of two types, just if we feel like it)

lit_parrot/model.py

generate.py

chat.py

Adding cached KVs

1841838

gkroiz requested review from awaelchli, carmocca and lantiga as code owners May 11, 2023 00:16

just formatting

ed198e9

carmocca reviewed May 11, 2023

View reviewed changes

generate.py Outdated Show resolved Hide resolved

carmocca and others added 11 commits May 11, 2023 04:39

Fix test_model

1bb98af

Put forwards together

9cf6fcb

Typing for the caches

8b4042a

kv_cache for consistency with rope_cache

3fd02bc

Uppercase T as everywhere else

8d18888

Don't need the mypy skip anymore as we don't run it in CI

c75eb7c

cache_shape

c47e340

Use torch.empty to initialize the kv_cache

172d921

Separate loops in the kv_cache vs no kv_cache case

03dd864

Add test for the cache. Some cases are failing

06c4c11

Added trimming based on max_seq_length to cached KVs

88a5b97

Also minor code cleanup

gkroiz commented May 11, 2023

View reviewed changes

lit_parrot/model.py Outdated Show resolved Hide resolved

carmocca and others added 5 commits May 11, 2023 19:07

Revert empty usage. It creates nans

1ec84f8

Fixed variable name typo

56c4328

max_seq_length default

03bcd92

Suggestion

199f28e

minor fix

4b7f9ea

Liyang90 reviewed May 11, 2023

View reviewed changes

lit_parrot/model.py Outdated Show resolved Hide resolved

carmocca added 3 commits May 11, 2023 22:40

Merge branch 'main' into parrot_cached_kvs

1562f06

Small generate refactor. This still gives me the XLA speedup

e07127b

Update chat.py to use the kv_cache. Performance is still untested

2867603

carmocca and others added 5 commits May 12, 2023 03:17

Add tokens/sec to chat.py

9091431

Accidental torch.cuda call

ce55c19

Define seperate dimensions for key and value caches

b2ab0c3

Wrong annotation

65e5a4d

formatting

7e7d853

carmocca approved these changes May 12, 2023

View reviewed changes

carmocca added 4 commits May 12, 2023 15:02

Update tpu howto numbers

7bf3cdb

typo

bd89299

Fix KvCache annotation

d4e2c40

Remove incorrect statement

0fab407

lantiga approved these changes May 12, 2023

View reviewed changes

lit_parrot/model.py Outdated Show resolved Hide resolved

fixed capitalization

f26e1b3

Liyang90 reviewed May 12, 2023

View reviewed changes

lit_parrot/model.py Outdated Show resolved Hide resolved

Liyang90 reviewed May 12, 2023

View reviewed changes

generate.py Show resolved Hide resolved

Liyang90 reviewed May 12, 2023

View reviewed changes

chat.py Show resolved Hide resolved

gkroiz added 3 commits May 12, 2023 17:51

optimized xla logic

0135d6e

optimize xla for chat.py + minor fix in model.py

96297b7

formatting fix

771c532

carmocca merged commit 0b5620d into Lightning-AI:main May 12, 2023

gkroiz deleted the parrot_cached_kvs branch May 12, 2023 20:39

This was referenced May 13, 2023

Adding cached KVs Lightning-AI/lit-llama#266

Merged

Adding caching for adapter KVs #66

Merged

carmocca mentioned this pull request May 19, 2023

Caches should not persist across multiple generate. #77

Closed

carmocca mentioned this pull request Jun 19, 2023

Restore flash attention support #171

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding cached KVs #51

Adding cached KVs #51

gkroiz commented May 11, 2023 •

edited by carmocca

Loading

gkroiz commented May 11, 2023

carmocca left a comment

carmocca commented May 11, 2023

carmocca left a comment

lantiga left a comment

Adding cached KVs #51

Adding cached KVs #51

Conversation

gkroiz commented May 11, 2023 • edited by carmocca Loading

gkroiz commented May 11, 2023

carmocca left a comment

Choose a reason for hiding this comment

carmocca commented May 11, 2023

carmocca left a comment

Choose a reason for hiding this comment

lantiga left a comment

Choose a reason for hiding this comment

gkroiz commented May 11, 2023 •

edited by carmocca

Loading