Skip to content

Commit

Permalink
[Transform][Redo] Apply split_rotary optimization on prefill
Browse files Browse the repository at this point in the history
Prior to this commit, the `transform.fuse_split_rotary_embedding`
function was only applicable to the `decode` function of a Llama-type
model.  This was due to the sequence length being restricted to one,
both in the pattern-match rule and in the `split_rotary` function, and
the function being restricted to operate only on the `decode`
function.

This commit updates the `transform.fuse_split_rotary_embedding` pass
to be a `tvm.ir.transform.Pass`, operating on all applicable matched
in the `IRModule`.  The `split_rotary` function is now produced as a
fully-generic function, with static parameters substituted in
afterwards.  At this stage, the sequence length is retained as a
dynamic parameter, such that it can be used by the `prefill` function.

This commit reapplies the reverted commit
mlc-ai#1033.  The error in the
previous implementation was in the definition of
`rotary_embedding_offset`, which provided the `query_sequence_length`
instead of `kv_sequence_length`.  This was able to pass the validity
tests described
[here](mlc-ai#1058 (comment)),
as these two sequence lengths are identical for the first call.
  • Loading branch information
Lunderberg committed Oct 24, 2023
1 parent 9cb8e8e commit de117e3
Show file tree
Hide file tree
Showing 2 changed files with 247 additions and 210 deletions.
11 changes: 5 additions & 6 deletions mlc_llm/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,12 +440,11 @@ def mod_transform_before_build(
if max_seq_len:
num_key_value_heads = config.get_num_key_value_heads()
mod = fuse_split_rotary_embedding(
mod,
config.num_attention_heads // args.num_shards,
num_key_value_heads // args.num_shards,
config.hidden_size // args.num_shards,
config.position_embedding_base,
)
config.num_attention_heads // args.num_shards,
num_key_value_heads // args.num_shards,
config.hidden_size // args.num_shards,
config.position_embedding_base,
)(mod)

if args.target_kind == "cuda":
patterns = []
Expand Down
Loading

0 comments on commit de117e3

Please sign in to comment.