Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] linux 启动阶段加载模型卡住 #2191

Closed
WangXBruc opened this issue Nov 27, 2023 · 4 comments
Closed

[BUG] linux 启动阶段加载模型卡住 #2191

WangXBruc opened this issue Nov 27, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@WangXBruc
Copy link

问题描述 / Problem Description
在 linux 上启动,加载模型阶段卡住
使用 python3 startup.py --all-api 启动api服务时,一直卡在 Loading the model ['chatglm3-6b'] on worker 855b18b5 ..
但是同样的代码,在mac上可以启动。
辛苦看下。

环境信息 / Environment Information
操作系统:Linux-5.10.84-004.centos7.x86_64-x86_64-with-glibc2.32.
python版本:3.8.10 (default, Nov 6 2023, 19:45:31)
项目版本:v0.2.7
langchain版本:0.0.340. fastchat版本:0.2.32

当前使用的分词器:ChineseRecursiveTextSplitter
当前启动的LLM模型:['chatglm3-6b', 'chatglm2-6b', 'zhipu-api', 'openai-api'] @ cuda
{'device': 'cuda',
'host': '0.0.0.0',
'infer_turbo': False,
'model_path': '/home/admin/chatglm3-6b',
'port': 20002}
{'device': 'cuda',
'host': '0.0.0.0',
'infer_turbo': False,
'model_path': '/home/admin/chatglm2-6b',
'port': 20002}
{'api_key': '',
'device': 'auto',
'host': '0.0.0.0',
'infer_turbo': False,
'online_api': True,
'port': 21001,
'provider': 'ChatGLMWorker',
'version': 'chatglm_turbo',
'worker_class': <class 'server.model_workers.zhipu.ChatGLMWorker'>}
{'api_base_url': 'https://api.openai.com/v1',
'api_key': '',
'device': 'auto',
'host': '0.0.0.0',
'infer_turbo': False,
'model_name': 'gpt-35-turbo',
'online_api': True,
'openai_proxy': '',
'port': 20002}
当前Embbedings模型: m3e-base @ cuda
==============================Langchain-Chatchat Configuration==============================

2023-11-27 18:58:44,830 - startup.py[line:647] - INFO: 正在启动服务:
2023-11-27 18:58:44,830 - startup.py[line:648] - INFO: 如需查看 llm_api 日志,请前往 /home/admin/LangChain-Chatchat/logs
2023-11-27 18:58:53 | INFO | model_worker | Register to controller
2023-11-27 18:58:53 | ERROR | stderr | INFO: Started server process [107018]
2023-11-27 18:58:53 | ERROR | stderr | INFO: Waiting for application startup.
2023-11-27 18:58:53 | ERROR | stderr | INFO: Application startup complete.
2023-11-27 18:58:53 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:20000 (Press CTRL+C to quit)
2023-11-27 18:58:55 | INFO | model_worker | Loading the model ['chatglm3-6b'] on worker 855b18b5 ...
2023-11-27 18:58:55 | INFO | model_worker | Loading the model ['chatglm2-6b'] on worker 7f4a31da ...
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards: 14%|███████████████████ | 1/7 [00:25<02:33, 25.53s/it]
Loading checkpoint shards: 14%|███████████████████ | 1/7 [00:25<02:34, 25.67s/it]
Loading checkpoint shards: 29%|██████████████████████████████████████ | 2/7 [00:52<02:12, 26.46s/it]
Loading checkpoint shards: 29%|██████████████████████████████████████ | 2/7 [00:53<02:13, 26.67s/it]
Loading checkpoint shards: 43%|█████████████████████████████████████████████████████████ | 3/7 [01:19<01:45, 26.44s/it]
Loading checkpoint shards: 43%|█████████████████████████████████████████████████████████ | 3/7 [01:19<01:46, 26.59s/it]
Loading checkpoint shards: 57%|████████████████████████████████████████████████████████████████████████████ | 4/7 [01:43<01:17, 25.74s/it]
Loading checkpoint shards: 57%|████████████████████████████████████████████████████████████████████████████ | 4/7 [01:44<01:17, 25.93s/it]
Loading checkpoint shards: 71%|███████████████████████████████████████████████████████████████████████████████████████████████ | 5/7 [02:10<00:52, 26.09s/it]
Loading checkpoint shards: 71%|███████████████████████████████████████████████████████████████████████████████████████████████ | 5/7 [02:11<00:52, 26.29s/it]
Loading checkpoint shards: 86%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 6/7 [02:36<00:26, 26.12s/it]
Loading checkpoint shards: 86%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 6/7 [02:38<00:26, 26.44s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [02:53<00:00, 22.80s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [02:53<00:00, 24.78s/it]
2023-11-27 19:01:48 | ERROR | stderr |
2023-11-27 19:01:53 | INFO | model_worker | Register to controller

@WangXBruc WangXBruc added the bug Something isn't working label Nov 27, 2023
@WangXBruc
Copy link
Author

pytorch 版本 2.1.0+cu121
cuda 版本 12.0

卡在 @加载模型阶段,大佬们,求救啊!
已经将 python 版本改成 3.10.0 还是不 work

@zRzRzRzRzRzRzR
Copy link
Collaborator

你更新到最新dev试试
然后,检查一下下显卡内存和运行内存是不是炸了

@xiaoxiao9992
Copy link

你更新到最新的开发尝试 然后,检查一下下一个显卡内存和运行内存是不是炸了

显卡和内存没炸, 可能是因为同时或者之前起得有其他服务, 把之前的服务停掉就能运行, 但是...没办法一起运行嘛
6a4db33e3ffe8ebf1deb9c82cecbbe5
8c8f35a70983a5fe610b1455ef91909
c6ecf0715d7d25e97f4ab29a2cbc04a

@WangXBruc
Copy link
Author

同时启动了两个本地模型,跑不起来,改成启动一个可以了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants