Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the manual model conversion during benchmark #953

Merged
merged 32 commits into from
Feb 21, 2024

Conversation

lvhan028
Copy link
Collaborator

@lvhan028 lvhan028 commented Jan 15, 2024

Motivation

As tiltled

Modification

  1. update documentation: docs/en/benchmark, docs/zh_cn/benchmark
  2. update benchmark scripts and remove the configs: benchmark/

Tests

profile request throughput

  • llama-2-7b-chat (huggingface) / turbomind / tp1
  • internlm-chat-20b (huggingface) / turbomind / tp2
  • llama-2-7b-chat (convert turbomind) / turbomind / tp1
  • internlm-chat-20b (convert turbomind) / turbomind / tp2
  • llama-2-7b-chat (quantization) / turbomind / tp1
  • internlm-chat-20b (quantization) / turbomind / tp2
  • llama-2-7b-chat (huggingface) / pytorch / tp1
  • internlm-chat-20b (huggingface) / pytorch / tp2

profile api_server

  • llama-2-7b-chat (huggingface) / turbomind / tp1
  • internlm-chat-20b (huggingface) / turbomind / tp2
  • llama-2-7b-chat (huggingface) / pytorch / tp1
  • internlm-chat-20b (huggingface) / pytorch / tp2

profile static inference

  • llama-2-7b-chat (huggingface) / turbomind / tp1
  • internlm-chat-20b (huggingface) / turbomind / tp2
  • llama-2-7b-chat (huggingface) / pytorch / tp1
  • internlm-chat-20b (huggingface) / pytorch / tp2

@lvhan028 lvhan028 removed the WIP label Jan 16, 2024
@grimoire
Copy link
Collaborator

grimoire commented Feb 5, 2024

pytorch benchmark might leads to over sized memory usage.

One solution is clearing caches after gathering.

logits_gather.gather(tmp_out)

del tmp_out
torch.cuda.empty_cache()

Another solution is using a small max_prefill_token_num

max_prefill_token_num: int = 16384

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@zhulinJulia24 zhulinJulia24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lvhan028 lvhan028 merged commit 24beeb6 into InternLM:main Feb 21, 2024
4 checks passed
grimoire pushed a commit to grimoire/lmdeploy that referenced this pull request Feb 22, 2024
* tmp

* update tuning gemm

* modify profile_throughput

* update profile_generation

* fix profile generation

* fix benchmark bash script

* update

* update

* update

* fix conflicts

* fix error

* pass cache count param

* remove benchmark config

* fix according to reviewer comments

* update

* update

* update

* update

* update

* fix profile_generation

* update user guide

* change title

* fix profile_generation

* fix gemm_tune according to reviewer comments

* pass session_len and cache_max_entry_count to engines

---------

Co-authored-by: RunningLeon <mnsheng@yeah.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants