Skip to content

Commit

Permalink
GPT-NeoX allocating full-length KV cache (octoml#179)
Browse files Browse the repository at this point in the history
This PR changes the GPT-NeoX KV cache creation function to create to
full size at the beginning, so no memory allocation will be required
when running on the fly.
  • Loading branch information
MasterJH5574 committed May 18, 2023
1 parent de7b5ab commit a181bd5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion mlc_llm/relax_model/gpt_neox.py
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,7 @@ def create_kv_cache_func(
) -> None:
init_shape = relax.ShapeExpr(
(
1,
config.max_sequence_length,
config.num_attention_heads,
config.hidden_size // config.num_attention_heads,
)
Expand Down

0 comments on commit a181bd5

Please sign in to comment.