Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optmize baichuan in pytorch engine #1223

Merged
merged 1 commit into from
Mar 1, 2024

Conversation

grimoire
Copy link
Collaborator

@grimoire grimoire commented Mar 1, 2024

Tested on 1000 prompts

main

7B batch-size 256

concurrency: 256
elapsed_time: 136.351s

first token latency(s)(min, max, ave): 0.268, 14.395, 4.825
per-token latency(s) percentile(50, 75, 95, 99): [0.082, 0.092, 0.186, 0.52]

number of prompt tokens: 251952
number of completion tokens: 227002
token throughput (completion token): 1664.835 token/s
token throughput (prompt + completion token): 3512.653 token/s
RPS (request per second): 7.334 req/s
RPM (request per minute): 440.041 req/min

13B batch-size 128

concurrency: 128
elapsed_time: 185.828s

first token latency(s)(min, max, ave): 0.231, 11.174, 2.385
per-token latency(s) percentile(50, 75, 95, 99): [0.062, 0.064, 0.191, 0.52]

number of prompt tokens: 251952
number of completion tokens: 227002
token throughput (completion token): 1221.571 token/s
token throughput (prompt + completion token): 2577.406 token/s
RPS (request per second): 5.381 req/s
RPM (request per minute): 322.879 req/min

this repo

7B batch-size 256

concurrency: 256
elapsed_time: 131.733s

first token latency(s)(min, max, ave): 0.267, 15.520, 5.049
per-token latency(s) percentile(50, 75, 95, 99): [0.081, 0.092, 0.17, 0.469]

number of prompt tokens: 251952
number of completion tokens: 227002
token throughput (completion token): 1723.197 token/s
token throughput (prompt + completion token): 3635.793 token/s
RPS (request per second): 7.591 req/s
RPM (request per minute): 455.467 req/min

13B batch-size 128

concurrency: 128
elapsed_time: 171.667s

first token latency(s)(min, max, ave): 0.218, 8.568, 1.984
per-token latency(s) percentile(50, 75, 95, 99): [0.058, 0.061, 0.177, 0.475]

number of prompt tokens: 251952
number of completion tokens: 227002
token throughput (completion token): 1322.339 token/s
token throughput (prompt + completion token): 2790.017 token/s
RPS (request per second): 5.825 req/s
RPM (request per minute): 349.514 req/min

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 changed the title optmize baichuan optmize baichuan in pytorch engine Mar 1, 2024
@lvhan028 lvhan028 merged commit e549424 into InternLM:main Mar 1, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants