Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix benchmark generation #1349

Merged
merged 2 commits into from
Mar 28, 2024
Merged

Conversation

grimoire
Copy link
Collaborator

@zhulinJulia24
Copy link
Collaborator

zhulinJulia24 commented Mar 27, 2024

bug里的问题已经修复,--tp 2可以跑起来了,但执行到最后有assert output_seqlen <= n_token <= output_seqlen + 1报错,看下先合并还是一起解决呀 @grimoire

profiling ... concurrency: 1, n_prompt_token: 128, n_completion_token: 2048, test_round: 3, warmup_round: 1
2024-03-27 00:02:30,096 - lmdeploy - ^[[37mINFO^[[0m - Checking environment for PyTorch Engine.
2024-03-27 00:02:31,235 - lmdeploy - ^[[37mINFO^[[0m - Checking model.
^MLoading checkpoint shards: 0%| | 0/21 [00:00<?, ?it/s]^MLoading checkpoint shards: 5%|▍ | 1/21 [00:00<00:02, 9.90it/s]^MLoading checkpoint shards: 14%|█▍ | 3/21 [00:00<00:01, 13.78it/s]^MLoading checkpoint shards: 24%|██▍ | 5/21 [00:00<00:01, 14.77it/s]^MLoading checkpoint shards: 33%|███▎ | 7/21 [00:00<00:00, 15.14it/s]^MLoading checkpoint shards: 43%|████▎ | 9/21 [00:00<00:00, 14.12it/s]^MLoading checkpoint shards: 52%|█████▏ | 11/21 [00:00<00:00, 13.50it/s]^MLoading checkpoint shards: 62%|██████▏ | 13/21 [00:00<00:00, 14.08it/s]^MLoading checkpoint shards: 71%|███████▏ | 15/21 [00:01<00:00, 14.58it/s]^MLoading checkpoint shards: 81%|████████ | 17/21 [00:01<00:00, 15.02it/s]^MLoading checkpoint shards: 90%|█████████ | 19/21 [00:01<00:00, 15.21it/s]^MLoading checkpoint shards: 100%|██████████| 21/21 [00:01<00:00, 14.98it/s]^MLoading checkpoint shards: 100%|██████████| 21/21 [00:01<00:00, 14.54it/s]2024-03-27 00:02:37,233 - lmdeploy - ^[[37mINFO^[[0m - distribute model parameters.
2024-03-27 00:02:55,797 - lmdeploy - ^[[37mINFO^[[0m - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=682, num_gpu_blocks=7401, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
start to warmup ...
end warmup, elapsed time: 91.2s
^M 0%| | 0/3 [00:00<?, ?it/s]^M 33%|███▎ | 1/3 [01:33<03:07, 93.66s/it]Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/__w/lmdeploy/lmdeploy/benchmark/profile_generation.py", line 64, in infer
assert output_seqlen <= n_token <= output_seqlen + 1,
AssertionError: Error. session_id(1) request 2048 tokens, but generate 2047 tokens
^M 33%|███▎ | 1/3 [01:33<03:07, 93.66s/it]
Traceback (most recent call last):
File "benchmark/profile_generation.py", line 472, in
main()
File "benchmark/profile_generation.py", line 425, in main
output = _process_map(profile_target, (args.model_path, ))
File "benchmark/profile_generation.py", line 372, in _process_map
raise ret
ValueError: need at least one array to stack
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Copy link
Collaborator

@zhulinJulia24 zhulinJulia24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 merged commit 385e9bb into InternLM:main Mar 28, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants