Remove the manual model conversion during benchmark #953

lvhan028 · 2024-01-15T14:44:46Z

Motivation

As tiltled

Modification

update documentation: docs/en/benchmark, docs/zh_cn/benchmark
update benchmark scripts and remove the configs: benchmark/

Tests

profile request throughput

llama-2-7b-chat (huggingface) / turbomind / tp1
internlm-chat-20b (huggingface) / turbomind / tp2
llama-2-7b-chat (convert turbomind) / turbomind / tp1
internlm-chat-20b (convert turbomind) / turbomind / tp2
llama-2-7b-chat (quantization) / turbomind / tp1
internlm-chat-20b (quantization) / turbomind / tp2
llama-2-7b-chat (huggingface) / pytorch / tp1
internlm-chat-20b (huggingface) / pytorch / tp2

profile api_server

llama-2-7b-chat (huggingface) / turbomind / tp1
internlm-chat-20b (huggingface) / turbomind / tp2
llama-2-7b-chat (huggingface) / pytorch / tp1
internlm-chat-20b (huggingface) / pytorch / tp2

profile static inference

llama-2-7b-chat (huggingface) / turbomind / tp1
internlm-chat-20b (huggingface) / turbomind / tp2
llama-2-7b-chat (huggingface) / pytorch / tp1
internlm-chat-20b (huggingface) / pytorch / tp2

benchmark/profile_generation.py

benchmark/config/internlm_20b.ini

benchmark/benchmark.sh

benchmark/profile_throughput.py

Update benchmark to v0.2.0 cli style

benchmark/profile_throughput.py

benchmark/benchmark_turbomind_engine_a100.sh

benchmark/profile_throughput.py

lmdeploy/turbomind/generate_gemm_config.py

grimoire · 2024-02-05T12:34:02Z

pytorch benchmark might leads to over sized memory usage.

One solution is clearing caches after gathering.

lmdeploy/lmdeploy/pytorch/engine/engine.py

Line 600 in c332efa

logits_gather.gather(tmp_out)

del tmp_out
torch.cuda.empty_cache()

Another solution is using a small max_prefill_token_num

lmdeploy/lmdeploy/messages.py

Line 170 in c332efa

max_prefill_token_num: int = 16384

benchmark/profile_generation.py

RunningLeon

LGTM

zhulinJulia24

lgtm

* tmp * update tuning gemm * modify profile_throughput * update profile_generation * fix profile generation * fix benchmark bash script * update * update * update * fix conflicts * fix error * pass cache count param * remove benchmark config * fix according to reviewer comments * update * update * update * update * update * fix profile_generation * update user guide * change title * fix profile_generation * fix gemm_tune according to reviewer comments * pass session_len and cache_max_entry_count to engines --------- Co-authored-by: RunningLeon <mnsheng@yeah.net>

lvhan028 added improvement WIP labels Jan 15, 2024

RunningLeon reviewed Jan 16, 2024

View reviewed changes

benchmark/profile_generation.py Outdated Show resolved Hide resolved

lvhan028 added 8 commits January 16, 2024 17:29

tmp

4888152

update tuning gemm

c3b3804

modify profile_throughput

223831a

update profile_generation

d59e53c

fix profile generation

be92c12

fix benchmark bash script

e392547

update

bf8bd0f

update

cc89606

lvhan028 removed the WIP label Jan 16, 2024

update

568a9f3

lvhan028 force-pushed the update-benchmark branch from c32233e to 568a9f3 Compare January 16, 2024 11:15

lvhan028 requested review from zhulinJulia24, irexyc and lzhangzz January 16, 2024 11:16

zhulinJulia24 reviewed Jan 16, 2024

View reviewed changes

benchmark/config/internlm_20b.ini Outdated Show resolved Hide resolved

zhulinJulia24 reviewed Jan 17, 2024

View reviewed changes

benchmark/benchmark.sh Outdated Show resolved Hide resolved

zhulinJulia24 reviewed Jan 18, 2024

View reviewed changes

benchmark/benchmark.sh Outdated Show resolved Hide resolved

RunningLeon reviewed Jan 18, 2024

View reviewed changes

benchmark/profile_throughput.py Outdated Show resolved Hide resolved

RunningLeon and others added 5 commits January 23, 2024 14:54

fix conflicts

0519b4b

Merge pull request #3 from RunningLeon/update-benchmark-new

6169b7a

Update benchmark to v0.2.0 cli style

merge main

6d7eb10

fix error

48168a3

merge main

03e5fc3

lvhan028 commented Feb 4, 2024

View reviewed changes

benchmark/profile_throughput.py Outdated Show resolved Hide resolved

lvhan028 added 2 commits February 5, 2024 00:06

pass cache count param

2603e97

remove benchmark config

48bc3e2

grimoire reviewed Feb 5, 2024

View reviewed changes

benchmark/benchmark_turbomind_engine_a100.sh Show resolved Hide resolved

grimoire reviewed Feb 5, 2024

View reviewed changes

benchmark/profile_throughput.py Show resolved Hide resolved

grimoire reviewed Feb 5, 2024

View reviewed changes

lmdeploy/turbomind/generate_gemm_config.py Outdated Show resolved Hide resolved

lvhan028 added 9 commits February 5, 2024 12:15

fix according to reviewer comments

93aab84

update

a29415c

update

6990931

update

f7d915d

Merge branch 'main' into update-benchmark

5532223

update

1ca0ac3

update

705d319

Merge branch 'main' into update-benchmark

0253813

fix profile_generation

179ff7b

lvhan028 added 4 commits February 17, 2024 15:51

Merge branch 'main' into update-benchmark

604461b

update user guide

e1464b6

change title

be17d44

fix profile_generation

da0d3eb

grimoire approved these changes Feb 19, 2024

View reviewed changes

zhulinJulia24 reviewed Feb 19, 2024

View reviewed changes

benchmark/profile_generation.py Outdated Show resolved Hide resolved

RunningLeon approved these changes Feb 20, 2024

View reviewed changes

lvhan028 added 3 commits February 20, 2024 20:57

fix gemm_tune according to reviewer comments

2a074e8

Merge branch 'main' into update-benchmark

790eaa0

pass session_len and cache_max_entry_count to engines

41a7bb2

zhulinJulia24 approved these changes Feb 21, 2024

View reviewed changes

lvhan028 merged commit 24beeb6 into InternLM:main Feb 21, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the manual model conversion during benchmark #953

Remove the manual model conversion during benchmark #953

lvhan028 commented Jan 15, 2024 •

edited

grimoire commented Feb 5, 2024 •

edited

RunningLeon left a comment

zhulinJulia24 left a comment

Remove the manual model conversion during benchmark #953

Remove the manual model conversion during benchmark #953

Conversation

lvhan028 commented Jan 15, 2024 • edited

Motivation

Modification

Tests

profile request throughput

profile api_server

profile static inference

grimoire commented Feb 5, 2024 • edited

RunningLeon left a comment

Choose a reason for hiding this comment

zhulinJulia24 left a comment

Choose a reason for hiding this comment

lvhan028 commented Jan 15, 2024 •

edited

grimoire commented Feb 5, 2024 •

edited