[Unity] Add an API to create multiple kv caches with single allocation by yelite · Pull Request #15064 · apache/tvm

yelite · 2023-06-08T17:35:26Z

This would be useful when creating multiple kv caches with the same shape. On A10G, compared to creating 64 kv caches separately in LLaMA from mlc-llm, doing a single allocation can save about 35 ms.

@tqchen @junrushao

tvm-bot · 2023-06-08T17:35:29Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @quic-sanirudh _{See #10317 for details}

_{Generated by tvm-bot}

tqchen · 2023-06-08T19:25:01Z

src/runtime/relax_vm/lm_support.cc

+  Array<AttentionKVCache> result;
+  for (int i = 0; i < num_caches; ++i) {
+    // Use DLManagedTensor to prevent underlying memory from being freed
+    DLManagedTensor* data_view = block_view.ToDLPack();


Likely we can reuse the memory allocator(storage interface without having to go through DLPack

https://github.com/apache/tvm/blob/unity/include/tvm/runtime/relax_vm/memory_manager.h

Thanks! I updated the code to use storage interface and it looks cleaner. But now it could print a warning message if the requested allocator type mismatches from the allocator that is created at VM initialization.

tqchen · 2023-06-09T18:25:17Z

cc @yzh119 @Hzfengsy

yzh119

LGTM and thanks for doing this, I just have a few minor comments.

yzh119 · 2023-06-10T07:44:40Z

src/runtime/relax_vm/lm_support.cc

+                                               int init_fill_count, int num_caches) {
+  DLDataType dtype = init_data->dtype;
+
+  int64_t cache_size = (dtype.bits * dtype.lanes + 7) / 8;


So currently the dtype is smaller than one byte, then we would pad it to one byte, is that correct?
FYI: Flexgen uses 4-bit KV cache, we can support it later.

I think it is fine for now. Since subbyte are usually packed manually(the dtype is i32)

Thanks for the clarification, make sense to me.

Add a batch create API for kv cache

5be91a0

yelite changed the title ~~[Unity] Add a batch API to create multiple kv caches with single allocation~~ [Unity] Add an API to create multiple kv caches with single allocation Jun 8, 2023

tqchen reviewed Jun 8, 2023

View reviewed changes

Use storage api

ec4fc48

yzh119 approved these changes Jun 10, 2023

View reviewed changes

tqchen merged commit e9ddd47 into apache:unity Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unity] Add an API to create multiple kv caches with single allocation#15064

[Unity] Add an API to create multiple kv caches with single allocation#15064
tqchen merged 2 commits intoapache:unityfrom
yelite:kv-cache-batch-create

yelite commented Jun 8, 2023

Uh oh!

tvm-bot commented Jun 8, 2023

Uh oh!

tqchen Jun 8, 2023

Uh oh!

tqchen Jun 8, 2023

Uh oh!

yelite Jun 9, 2023

Uh oh!

tqchen commented Jun 9, 2023

Uh oh!

yzh119 left a comment

Uh oh!

yzh119 Jun 10, 2023

Uh oh!

tqchen Jun 10, 2023

Uh oh!

yzh119 Jun 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yelite commented Jun 8, 2023

Uh oh!

tvm-bot commented Jun 8, 2023

Uh oh!

tqchen Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

tqchen Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

yelite Jun 9, 2023

Choose a reason for hiding this comment

Uh oh!

tqchen commented Jun 9, 2023

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 Jun 10, 2023

Choose a reason for hiding this comment

Uh oh!

tqchen Jun 10, 2023

Choose a reason for hiding this comment

Uh oh!

yzh119 Jun 11, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants