Fix repetition penalty for long context #1037

irexyc · 2024-01-24T17:12:30Z

Motivation

去掉Shared Memory 支持更长的 session

irexyc · 2024-01-26T02:24:21Z

irexyc · 2024-01-26T05:54:18Z

before & after （llama-7b）

concurrency: 64
elapsed_time: 179.799s

first token latency(s)(min, max, ave): 0.031, 1.292, 0.106
per-token latency(s) percentile(50, 75, 95, 99): [0.024, 0.035, 0.053, 0.074]

number of prompt tokens: 490772
number of completion tokens: 472162
token throughput (completion token): 2626.051 token/s
token throughput (prompt + completion token): 5355.606 token/s
RPS (request per second): 11.124 req/s
RPM (request per minute): 667.411 req/min
--------------------------------------------------


concurrency: 64
elapsed_time: 179.440s

first token latency(s)(min, max, ave): 0.037, 1.361, 0.104
per-token latency(s) percentile(50, 75, 95, 99): [0.024, 0.033, 0.05, 0.073]

number of prompt tokens: 490772
number of completion tokens: 472162
token throughput (completion token): 2631.309 token/s
token throughput (prompt + completion token): 5366.330 token/s
RPS (request per second): 11.146 req/s
RPM (request per minute): 668.747 req/min
--------------------------------------------------

lzhangzz · 2024-01-26T08:22:38Z

before & after （llama-7b）

It seems that llama-7b does not use repetition penalty by default.

irexyc · 2024-01-26T08:26:25Z

It seems that llama-7b does not use repetition penalty by default.

I added repetition_penalty=1.005 in stream_infer manually.
https://github.com/InternLM/lmdeploy/blob/main/benchmark/profile_throughput.py#L88-L97

irexyc added 3 commits January 25, 2024 01:07

repetition penalty for long context

ae8ef64

fix cleanup penalty_workspace

ecd0d57

fix test

22c1d82

lvhan028 requested a review from lzhangzz January 25, 2024 03:02

lvhan028 added the Bug:P1 label Jan 25, 2024

irexyc added 2 commits January 25, 2024 11:25

speedup

e63bce3

global memory

ca8e7cf

lzhangzz approved these changes Jan 26, 2024

View reviewed changes

fix test_penalty_kernels

f0edba7

lvhan028 approved these changes Jan 29, 2024

View reviewed changes

lvhan028 merged commit 4d9a8b0 into InternLM:main Jan 29, 2024
8 checks passed

lvhan028 changed the title ~~repetition penalty for long context~~ Fix repetition penalty for long context Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix repetition penalty for long context #1037

Fix repetition penalty for long context #1037

irexyc commented Jan 24, 2024 •

edited

irexyc commented Jan 26, 2024

irexyc commented Jan 26, 2024

lzhangzz commented Jan 26, 2024

irexyc commented Jan 26, 2024

Fix repetition penalty for long context #1037

Fix repetition penalty for long context #1037

Conversation

irexyc commented Jan 24, 2024 • edited

Motivation

irexyc commented Jan 26, 2024

irexyc commented Jan 26, 2024

lzhangzz commented Jan 26, 2024

irexyc commented Jan 26, 2024

irexyc commented Jan 24, 2024 •

edited