-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the manual model conversion during benchmark #953
Conversation
c32233e
to
568a9f3
Compare
Update benchmark to v0.2.0 cli style
pytorch benchmark might leads to over sized memory usage. One solution is clearing caches after gathering. lmdeploy/lmdeploy/pytorch/engine/engine.py Line 600 in c332efa
del tmp_out
torch.cuda.empty_cache() Another solution is using a small Line 170 in c332efa
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* tmp * update tuning gemm * modify profile_throughput * update profile_generation * fix profile generation * fix benchmark bash script * update * update * update * fix conflicts * fix error * pass cache count param * remove benchmark config * fix according to reviewer comments * update * update * update * update * update * fix profile_generation * update user guide * change title * fix profile_generation * fix gemm_tune according to reviewer comments * pass session_len and cache_max_entry_count to engines --------- Co-authored-by: RunningLeon <mnsheng@yeah.net>
Motivation
As tiltled
Modification
Tests
profile request throughput
profile api_server
profile static inference