Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix repetition penalty for long context #1037

Merged
merged 6 commits into from
Jan 29, 2024
Merged

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Jan 24, 2024

Motivation

address #1009 #898

去掉Shared Memory 支持更长的 session

@irexyc
Copy link
Collaborator Author

irexyc commented Jan 26, 2024

image

@irexyc
Copy link
Collaborator Author

irexyc commented Jan 26, 2024

before & after (llama-7b)

concurrency: 64
elapsed_time: 179.799s

first token latency(s)(min, max, ave): 0.031, 1.292, 0.106
per-token latency(s) percentile(50, 75, 95, 99): [0.024, 0.035, 0.053, 0.074]

number of prompt tokens: 490772
number of completion tokens: 472162
token throughput (completion token): 2626.051 token/s
token throughput (prompt + completion token): 5355.606 token/s
RPS (request per second): 11.124 req/s
RPM (request per minute): 667.411 req/min
--------------------------------------------------


concurrency: 64
elapsed_time: 179.440s

first token latency(s)(min, max, ave): 0.037, 1.361, 0.104
per-token latency(s) percentile(50, 75, 95, 99): [0.024, 0.033, 0.05, 0.073]

number of prompt tokens: 490772
number of completion tokens: 472162
token throughput (completion token): 2631.309 token/s
token throughput (prompt + completion token): 5366.330 token/s
RPS (request per second): 11.146 req/s
RPM (request per minute): 668.747 req/min
--------------------------------------------------


@lzhangzz
Copy link
Collaborator

before & after (llama-7b)

It seems that llama-7b does not use repetition penalty by default.

@irexyc
Copy link
Collaborator Author

irexyc commented Jan 26, 2024

It seems that llama-7b does not use repetition penalty by default.

I added repetition_penalty=1.005 in stream_infer manually.
https://github.com/InternLM/lmdeploy/blob/main/benchmark/profile_throughput.py#L88-L97

@lvhan028 lvhan028 merged commit 4d9a8b0 into InternLM:main Jan 29, 2024
8 checks passed
@lvhan028 lvhan028 changed the title repetition penalty for long context Fix repetition penalty for long context Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants