Why the TVM impelmentation is memroy efficient #244

jlidw · 2022-10-14T16:40:19Z

Thanks for your excellent work!

Just want to discuss the memory reduction problem. It seems that the TVM implementation does not store fewer matrices (like Queries, Keys, and Values matrix). The num of Q-K pairs is less than the full attention so that we can get a faster calculation speed, but why the memory reduction has a similar trend with the time reduction? Seems the TVM kernel does not use any technique to save the memory, and the padding 0 values are also int32, but the fact is that TVM implementation is memory efficient...

Looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the TVM impelmentation is memroy efficient #244

Why the TVM impelmentation is memroy efficient #244

jlidw commented Oct 14, 2022

Why the TVM impelmentation is memroy efficient #244

Why the TVM impelmentation is memroy efficient #244

Comments

jlidw commented Oct 14, 2022