Skip to content

Commit

Permalink
Fix cache util logic (#186)
Browse files Browse the repository at this point in the history
  • Loading branch information
casper-hansen committed Nov 11, 2023
1 parent 7c97675 commit 299c460
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion awq/utils/fused_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def prepare_cache(blocks, seqlen: int) -> int:
will_cache_be_exceeded = start_pos + seqlen > block.attn.max_seq_len

# Reset and avoid retaining state when processing context
if seqlen > 1 and (will_cache_be_exceeded or seqlen > 1):
if seqlen > 1 and (will_cache_be_exceeded or start_pos > 0):
block.attn.start_pos = block.attn.cache.roll_kv_n_steps(start_pos, n=start_pos)

# Slowly roll out old tokens without performance hit if exceeded during decoding
Expand Down

0 comments on commit 299c460

Please sign in to comment.