Skip to content

[BUG] failed to serve a Qwen1.5-72B-chat model #350

@pluiez

Description

@pluiez

Issue description:
Launching a server for a 7B model succeeded but failed on serving a 72B model. The launcher took about half an hour to initialize and then reported EOFError: connection closed by peer.

Please provide a clear and concise description of your issue.

Steps to reproduce:

Please list the steps to reproduce the issue, such as:

  1. run the container ghcr.io/modeltc/lightllm:main
  2. start server:
python -m lightllm.server.api_server --model_dir ~/resources/huggingface/models/Qwen/Qwen1.5-72B-chat/     \
                                     --host 0.0.0.0                 \
                                     --port 8080                    \
                                     --tp 8                         \
                                     --eos_id 151645 \
                                     --trust_remote_code \
                                     --max_total_token_num 120000
  1. Wait for half an hour and see error

Expected behavior:

Please describe what you expected to happen.

Error logging:

< python -m lightllm.server.api_server --model_dir ~/resources/huggingface/models/Qwen/Qwen1.5-72B-chat/     \
                                     --host 0.0.0.0                 \
                                     --port 8080                    \
                                     --tp 8                         \
                                     --eos_id 151645 \
                                     --trust_remote_code \
                                     --max_total_token_num 120000
INFO 03-09 16:38:17 [tokenizer.py:79] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 03-09 16:38:21 [tokenizer.py:79] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.

INFO 03-09 17:07:54 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:54 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:58 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:58 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:58 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:58 [mem_utils.py:18] Model kv cache using mode normal
ERROR 03-09 17:07:58 [start_utils.py:24] init func start_router_process : Traceback (most recent call last):
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 379, in start_router_process
ERROR 03-09 17:07:58 [start_utils.py:24]     asyncio.run(router.wait_to_model_ready())
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
ERROR 03-09 17:07:58 [start_utils.py:24]     return loop.run_until_complete(main)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 83, in wait_to_model_ready
ERROR 03-09 17:07:58 [start_utils.py:24]     await asyncio.gather(*init_model_ret)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 455, in init_model
ERROR 03-09 17:07:58 [start_utils.py:24]     await ans
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 427, in _func
ERROR 03-09 17:07:58 [start_utils.py:24]     await asyncio.to_thread(ans.wait)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/asyncio/threads.py", line 25, in to_thread
ERROR 03-09 17:07:58 [start_utils.py:24]     return await loop.run_in_executor(None, func_call)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
ERROR 03-09 17:07:58 [start_utils.py:24]     result = self.fn(*self.args, **self.kwargs)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/async_.py", line 51, in wait
ERROR 03-09 17:07:58 [start_utils.py:24]     self._conn.serve(self._ttl)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 438, in serve
ERROR 03-09 17:07:58 [start_utils.py:24]     data = self._channel.poll(timeout) and self._channel.recv()
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/channel.py", line 55, in recv
ERROR 03-09 17:07:58 [start_utils.py:24]     header = self.stream.read(self.FRAME_HEADER.size)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/stream.py", line 280, in read
ERROR 03-09 17:07:58 [start_utils.py:24]     raise EOFError("connection closed by peer")
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24] EOFError: connection closed by peer
ERROR 03-09 17:07:58 [start_utils.py:24]

Environment:

Please provide information about your environment, such as:

  • Using container

  • OS: Ubuntu 20.04.6

  • GPU info:

    • nvidia-smi: NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2
    • Graphics cards: H800-80G x 8
  • Python: Python 3.9.18

  • LightLLm: 486f647

  • openai-triton: pip show triton

Name: triton
Version: 2.1.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/openai/triton/
Author: Philippe Tillet
Author-email: phil@openai.com
License: 
Location: /opt/conda/lib/python3.9/site-packages
Requires: filelock
Required-by: lightllm

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions