[BUG] failed to serve a Qwen1.5-72B-chat model

**Issue description:**
Launching a server for a 7B model succeeded but failed on serving a 72B model. The launcher took about half an hour to initialize and then reported `EOFError: connection closed by peer`.

Please provide a clear and concise description of your issue.

**Steps to reproduce:**

Please list the steps to reproduce the issue, such as:

1. run the container `ghcr.io/modeltc/lightllm:main`
2. start server: 
```bash
python -m lightllm.server.api_server --model_dir ~/resources/huggingface/models/Qwen/Qwen1.5-72B-chat/     \
                                     --host 0.0.0.0                 \
                                     --port 8080                    \
                                     --tp 8                         \
                                     --eos_id 151645 \
                                     --trust_remote_code \
                                     --max_total_token_num 120000
```
3. Wait for half an hour and see error

**Expected behavior:**

Please describe what you expected to happen.

**Error logging:**

```
< python -m lightllm.server.api_server --model_dir ~/resources/huggingface/models/Qwen/Qwen1.5-72B-chat/     \
                                     --host 0.0.0.0                 \
                                     --port 8080                    \
                                     --tp 8                         \
                                     --eos_id 151645 \
                                     --trust_remote_code \
                                     --max_total_token_num 120000
INFO 03-09 16:38:17 [tokenizer.py:79] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 03-09 16:38:21 [tokenizer.py:79] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.

INFO 03-09 17:07:54 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:54 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:58 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:58 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:58 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:58 [mem_utils.py:18] Model kv cache using mode normal
ERROR 03-09 17:07:58 [start_utils.py:24] init func start_router_process : Traceback (most recent call last):
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 379, in start_router_process
ERROR 03-09 17:07:58 [start_utils.py:24]     asyncio.run(router.wait_to_model_ready())
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
ERROR 03-09 17:07:58 [start_utils.py:24]     return loop.run_until_complete(main)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 83, in wait_to_model_ready
ERROR 03-09 17:07:58 [start_utils.py:24]     await asyncio.gather(*init_model_ret)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 455, in init_model
ERROR 03-09 17:07:58 [start_utils.py:24]     await ans
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 427, in _func
ERROR 03-09 17:07:58 [start_utils.py:24]     await asyncio.to_thread(ans.wait)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/asyncio/threads.py", line 25, in to_thread
ERROR 03-09 17:07:58 [start_utils.py:24]     return await loop.run_in_executor(None, func_call)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
ERROR 03-09 17:07:58 [start_utils.py:24]     result = self.fn(*self.args, **self.kwargs)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/async_.py", line 51, in wait
ERROR 03-09 17:07:58 [start_utils.py:24]     self._conn.serve(self._ttl)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 438, in serve
ERROR 03-09 17:07:58 [start_utils.py:24]     data = self._channel.poll(timeout) and self._channel.recv()
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/channel.py", line 55, in recv
ERROR 03-09 17:07:58 [start_utils.py:24]     header = self.stream.read(self.FRAME_HEADER.size)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/stream.py", line 280, in read
ERROR 03-09 17:07:58 [start_utils.py:24]     raise EOFError("connection closed by peer")
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24] EOFError: connection closed by peer
ERROR 03-09 17:07:58 [start_utils.py:24]
```


**Environment:**

Please provide information about your environment, such as:

- [x] Using container

- OS: Ubuntu 20.04.6
- GPU info:
  - `nvidia-smi`: NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2
  - Graphics cards: H800-80G x 8
- Python: Python 3.9.18
- LightLLm: 486f64756ffee9147ae6fdaea90ec03a48217a02
- openai-triton: `pip show triton`
```
Name: triton
Version: 2.1.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/openai/triton/
Author: Philippe Tillet
Author-email: phil@openai.com
License: 
Location: /opt/conda/lib/python3.9/site-packages
Requires: filelock
Required-by: lightllm
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] failed to serve a Qwen1.5-72B-chat model #350

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] failed to serve a Qwen1.5-72B-chat model #350

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions