Skip to content

Encounter error when serving with vicuna-13b-v1.3. #23

@shanshanpt

Description

@shanshanpt

Use docker environment:
docker build -t image_name .
sudo docker run -it --runtime=nvidia --name=test --net=host --gpus all --privileged --shm-size 20G --cap-add=CAP_SYS_ADMIN --cap-add=SYS_PTRACE llm bash

GPU V100
CUDA Version: 11.8
Python 3.9.16
pip uninstall torch
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

server:
python -m lightllm.server.api_server --model_dir /data/workspace/vicuna-13b-v1.3 --tp 2 --max_total_token_num 121060 --tokenizer_mode auto

client:
python ./test/benchmark_serving.py --tokenizer /data/workspace/vicuna-13b-v1.3 --dataset /data/workspace/ShareGPT_V3_unfiltered_cleaned_split.json --num-prompts 100

Error message:
Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO: Started server process [628]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Task exception was never retrieved
future: <Task finished name='Task-6' coro=<RouterManager.loop_for_fwd() done, defined at /data/workspace/lightllm/lightllm/server/router/manager.py:84> exception='97859f0c0d6242588bb78c8e4a29aed0'

========= Remote Traceback (1) =========
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 359, in _dispatch_request
res = self._HANDLERS[handler](self, *args)
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 837, in _handle_call
return obj(*args, **dict(kwargs))
File "/data/workspace/lightllm/lightllm/utils/infer_utils.py", line 49, in inner_func
result = func(*args, **kwargs)
File "/data/workspace/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 67, in exposed_prefill_batch
return self.forward(batch_id, is_prefill=True)
File "/data/workspace/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 104, in forward
batch: InferBatch = self.cache.pop(batch_id)
KeyError: '97859f0c0d6242588bb78c8e4a29aed0'

Traceback (most recent call last):
File "/data/workspace/lightllm/lightllm/server/router/manager.py", line 87, in loop_for_fwd
await self._step()
File "/data/workspace/lightllm/lightllm/server/router/manager.py", line 106, in _step
await self._prefill_batch(self.running_batch)
File "/data/workspace/lightllm/lightllm/server/router/manager.py", line 139, in prefill_batch
ans = await asyncio.gather(*rets)
File "/data/workspace/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 185, in prefill_batch
return ans.value
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/async
.py", line 108, in value
raise self._obj
_get_exception_class..Derived: '97859f0c0d6242588bb78c8e4a29aed0'

========= Remote Traceback (1) =========
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 359, in _dispatch_request
res = self._HANDLERS[handler](self, *args)
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 837, in _handle_call
return obj(*args, **dict(kwargs))
File "/data/workspace/lightllm/lightllm/utils/infer_utils.py", line 49, in inner_func
result = func(*args, **kwargs)
File "/data/workspace/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 67, in exposed_prefill_batch
return self.forward(batch_id, is_prefill=True)
File "/data/workspace/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 104, in forward
batch: InferBatch = self.cache.pop(batch_id)
KeyError: '97859f0c0d6242588bb78c8e4a29aed0'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions