Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 加载lora微调后的模型失效 #1130

Closed
jackaihfia2334 opened this issue Aug 16, 2023 · 42 comments
Closed

[BUG] 加载lora微调后的模型失效 #1130

jackaihfia2334 opened this issue Aug 16, 2023 · 42 comments
Assignees
Labels
bug Something isn't working

Comments

@jackaihfia2334
Copy link

jackaihfia2334 commented Aug 16, 2023

通过ChatGLM-Efficient-Tuning项目微调了chatglm2-6b,并通过该项目的export方式导出merged后的模型chapi
修改config中对应的信息,进行加载,报如下warning, 加载成功后,发现微调的效果并不起效。而通过ChatGLM-Efficient-Tuning调用微调后的模型是起效的。

————————————————————————————————————————
warner(
Some weights of the model checkpoint at /data2/model/chapi were not used when initializing ChatGLMForConditionalGeneration: ['lm_head.weight']

  • This IS expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    2023-08-16 14:24:55 | INFO | model_worker | Register to controller
    2023-08-16 14:24:56 | INFO | controller | Register a new worker: http://127.0.0.1:20002
    2023-08-16 14:24:56 | INFO | controller | Register done: http://127.0.0.1:20002, {'model_names': ['chapi'], 'speed': 1, 'queue_length': 0}
    2023-08-16 14:24:56 | INFO | stdout | INFO: 127.0.0.1:52242 - "POST /register_worker HTTP/1.1" 200 OK
    2023-08-16 14:24:56 | ERROR | stderr | INFO: Started server process [1426]
    2023-08-16 14:24:56 | ERROR | stderr | INFO: Waiting for application startup.

————————————————————————————————————————————
我做的是self_cognation的微调,让它输出自己的名字是查派
而直接通过AutoModel调用虽然也是报相同的warning,却能输出微调后预期的结果
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("/data2/model/chapi", trust_remote_code=True) model = AutoModel.from_pretrained("/data2/model/chapi", trust_remote_code=True).half().cuda() model = model.eval() response, history = model.chat(tokenizer, "你好", history=[]) print(response)

输出:您好!我是 查派,由 xxx开发,旨在为用户提供智能化的回答和支持。

@jackaihfia2334 jackaihfia2334 added the bug Something isn't working label Aug 16, 2023
@hzg0601
Copy link
Collaborator

hzg0601 commented Aug 17, 2023

请给出,model_config相关配置信息,及启动方式

@jackaihfia2334
Copy link
Author

jackaihfia2334 commented Aug 17, 2023

请给出,model_config相关配置信息,及启动方式

配置信息如下

llm_model_dict = {

"chatglm2-6b": {
    "local_model_path": "/data2/model/chatglm2-6b",
    "api_base_url": "http://localhost:8888/v1",  # "name"修改为fastchat服务中的"api_base_url"
    "api_key": "EMPTY"
},

"chapi": {
    "local_model_path": "/data2/model/chapi",
    "api_base_url": "http://localhost:8888/v1",  # "name"修改为fastchat服务中的"api_base_url"
    "api_key": "EMPTY"
},

"vicuna-7b-v1.5-16k": {
    "local_model_path": "/data2/model/vicuna-7b-v1.5-16k",
    "api_base_url": "http://localhost:8888/v1",  # "name"修改为fastchat服务中的"api_base_url"
    "api_key": "EMPTY"
},

# 调用chatgpt时如果报出: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.openai.com', port=443):
#  Max retries exceeded with url: /v1/chat/completions
# 则需要将urllib3版本修改为1.25.11
# 如果依然报urllib3.exceptions.MaxRetryError: HTTPSConnectionPool,则将https改为http
# 参考https://zhuanlan.zhihu.com/p/350015032

# 如果报出:raise NewConnectionError(
# urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000001FE4BDB85E0>:
# Failed to establish a new connection: [WinError 10060]
# 则是因为内地和香港的IP都被OPENAI封了,需要切换为日本、新加坡等地
"openai-chatgpt-3.5": {
    "local_model_path": "gpt-3.5-turbo",
    "api_base_url": "https://api.openapi.com/v1",
    "api_key": os.environ.get("OPENAI_API_KEY")
},

}

LLM_MODEL = "chapi"

LLM_DEVICE = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

————————————————————————
启动方式即按照readme,先python server/llm_api.py,然后python server/api.py,最后streamlit run webui.py
——————————————————————————————————————
也尝试了直接使用fastchat项目本身的api调用+webui,也是不起效果的,其他部署方式均能起效

@hzg0601
Copy link
Collaborator

hzg0601 commented Aug 17, 2023

根据你的描述,可能是由于使用ChatGLM的modeling.py文件启动微调后的检查点,但由于llm-head不同,导致只能启动旧的模型,请参考README问档5.1.3小节加载PEFT检查点

@jackaihfia2334
Copy link
Author

根据你的描述,可能是由于使用ChatGLM的modeling.py文件启动微调后的检查点,但由于llm-head不同,导致只能启动旧的模型,请参考README问档5.1.3小节加载PEFT检查点

[感谢回复]
1.调用的是已经merge的模型而非调用basemodel+lora
2.通过AutoModel直接调用的方式能够生效
3.README问档5.1.3小节加载lora的方式也遇到一些问题,已在另一个issue中提出(#1110)

@hzg0601
Copy link
Collaborator

hzg0601 commented Aug 17, 2023

如果方便的话,可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题,我想调试解决一下

@jackaihfia2334
Copy link
Author

jackaihfia2334 commented Aug 17, 2023 via email

@jackaihfia2334
Copy link
Author

jackaihfia2334 commented Aug 17, 2023 via email

@hzg0601
Copy link
Collaborator

hzg0601 commented Aug 17, 2023

感谢

@jackaihfia2334
Copy link
Author

jackaihfia2334 commented Aug 18, 2023

感谢

经实践,通过peft方式加载lora模型(而非加载合并后的模型)是成功的,但启动api和webui错误,似乎存在适配上的问题。
我通过python3 -m fastchat.serve.cli --model-path /data2/project/peft-model,以命令行形式交互得到预期的结果。

而使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后,通过python server/api.py启动api服务,通过streamlit run webui.py启动webui服务,
在输入时 api服务报错如下(似乎是端口问题)
——————————————————————
root@docker-desktop:/data1/llm/code/Langchain-Chatchat# python server/api.py
2023-08-18 08:49:10,100 - utils.py[line:148] - INFO: Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2023-08-18 08:49:10,100 - utils.py[line:160] - INFO: NumExpr defaulting to 8 threads.
INFO: Started server process [32283]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit)
INFO: 127.0.0.1:46718 - "POST /chat/chat HTTP/1.1" 200 OK
2023-08-18 08:49:35,508 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:39,512 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:43,515 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:47,522 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 8.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:55,530 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 10.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
Caught exception: Error communicating with OpenAI

此问题也在isssues1110中提出 #1110

@jackaihfia2334
Copy link
Author

jackaihfia2334 commented Aug 18, 2023

感谢

另外,我尝试使用fastchat原生的webui加载方式,报错如下,似乎是因为没有从本地加载adaper_config而是去huggingface下载,详见
lm-sys/FastChat#2262

———————————————————————
root@docker-desktop:/# python3 -m fastchat.serve.gradio_web_server
2023-08-18 16:24:00 | INFO | gradio_web_server | args: Namespace(host='0.0.0.0', port=None, share=False, controller_url='http://localhost:21001', concurrency_count=10, model_list_mode='once', moderate=False, add_chatgpt=False, add_claude=False, add_palm=False, gradio_auth_path=None)
2023-08-18 16:24:00 | INFO | gradio_web_server | Models: ['peft-model']
2023-08-18 16:24:00 | INFO | stdout | Running on local URL: http://0.0.0.0:7860
2023-08-18 16:24:00 | INFO | stdout |
2023-08-18 16:24:00 | INFO | stdout | To create a public link, set share=True in launch().
2023-08-18 16:24:05 | INFO | gradio_web_server | load_demo. ip: 127.0.0.1. params: {}
2023-08-18 16:24:05 | INFO | httpx | HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 200 OK"
2023-08-18 16:24:05 | INFO | httpx | HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"
2023-08-18 16:24:10 | INFO | gradio_web_server | add_text. ip: 127.0.0.1. len: 2
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | find 'adapter_config.json' at 'peft-model'
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 6b2753fa-c523-43c3-80bf-70ebedbb1769)')' thrown while requesting HEAD https://huggingface.co/peft-model/resolve/main/adapter_config.json
2023-08-18 16:24:20 | WARNING | huggingface_hub.utils._http | '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 6b2753fa-c523-43c3-80bf-70ebedbb1769)')' thrown while requesting HEAD https://huggingface.co/peft-model/resolve/main/adapter_config.json
2023-08-18 16:24:20 | ERROR | stderr | Traceback (most recent call last):
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/peft/utils/config.py", line 119, in from_pretrained
2023-08-18 16:24:20 | ERROR | stderr | config_file = hf_hub_download(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
2023-08-18 16:24:20 | ERROR | stderr | return fn(*args, **kwargs)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1291, in hf_hub_download
2023-08-18 16:24:20 | ERROR | stderr | raise LocalEntryNotFoundError(
2023-08-18 16:24:20 | ERROR | stderr | huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.
2023-08-18 16:24:20 | ERROR | stderr |
2023-08-18 16:24:20 | ERROR | stderr | During handling of the above exception, another exception occurred:
2023-08-18 16:24:20 | ERROR | stderr |
2023-08-18 16:24:20 | ERROR | stderr | Traceback (most recent call last):
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 442, in run_predict
2023-08-18 16:24:20 | ERROR | stderr | output = await app.get_blocks().process_api(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1392, in process_api
2023-08-18 16:24:20 | ERROR | stderr | result = await self.call_function(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1097, in call_function
2023-08-18 16:24:20 | ERROR | stderr | prediction = await anyio.to_thread.run_sync(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
2023-08-18 16:24:20 | ERROR | stderr | return await get_asynclib().run_sync_in_worker_thread(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
2023-08-18 16:24:20 | ERROR | stderr | return await future
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
2023-08-18 16:24:20 | ERROR | stderr | result = context.run(func, *args)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 703, in wrapper
2023-08-18 16:24:20 | ERROR | stderr | response = f(*args, **kwargs)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/gradio_web_server.py", line 210, in add_text
2023-08-18 16:24:20 | ERROR | stderr | state = State(model_selector)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/gradio_web_server.py", line 68, in init
2023-08-18 16:24:20 | ERROR | stderr | self.conv = get_conversation_template(model_name)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/model/model_adapter.py", line 291, in get_conversation_template
2023-08-18 16:24:20 | ERROR | stderr | return adapter.get_default_conv_template(model_path)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/model/model_adapter.py", line 498, in get_default_conv_template
2023-08-18 16:24:20 | ERROR | stderr | config = PeftConfig.from_pretrained(model_path)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/peft/utils/config.py", line 123, in from_pretrained
2023-08-18 16:24:20 | ERROR | stderr | raise ValueError(f"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'")
2023-08-18 16:24:20 | ERROR | stderr | ValueError: Can't find 'adapter_config.json' at 'peft-model'
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 500 Internal Server Error"
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"
2023-08-18 16:24:20 | INFO | gradio_web_server | bot_response. ip: 127.0.0.1
2023-08-18 16:24:20 | ERROR | stderr | Traceback (most recent call last):
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 442, in run_predict
2023-08-18 16:24:20 | ERROR | stderr | output = await app.get_blocks().process_api(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1392, in process_api
2023-08-18 16:24:20 | ERROR | stderr | result = await self.call_function(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1111, in call_function
2023-08-18 16:24:20 | ERROR | stderr | prediction = await utils.async_iteration(iterator)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 346, in async_iteration
2023-08-18 16:24:20 | ERROR | stderr | return await iterator.anext()
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 339, in anext
2023-08-18 16:24:20 | ERROR | stderr | return await anyio.to_thread.run_sync(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
2023-08-18 16:24:20 | ERROR | stderr | return await get_asynclib().run_sync_in_worker_thread(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
2023-08-18 16:24:20 | ERROR | stderr | return await future
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
2023-08-18 16:24:20 | ERROR | stderr | result = context.run(func, *args)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 322, in run_sync_iterator_async
2023-08-18 16:24:20 | ERROR | stderr | return next(iterator)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 691, in gen_wrapper
2023-08-18 16:24:20 | ERROR | stderr | yield from f(*args, **kwargs)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/gradio_web_server.py", line 300, in bot_response
2023-08-18 16:24:20 | ERROR | stderr | if state.skip_next:
2023-08-18 16:24:20 | ERROR | stderr | AttributeError: 'NoneType' object has no attribute 'skip_next'
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 500 Internal Server Error"
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"

@wu-xiaohua
Copy link

如果方便的话,可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题,我想调试解决一下

同样遇到合并后的模型有这个问题

如果方便的话,可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题,我想调试解决一下

我也遇到微调模型无法使用,之前在0.1版本能够正常使用,用现在的0.2现版本就凉凉了
2023-08-18 23:26:56 | INFO | stdout | Loading /home/model requires to execute some code in that repo, you can inspect the content of the repository at https://hf.co//home/model. You can dismiss this prompt by passing trust_remote_code=True.
2023-08-18 23:26:56 | INFO | stdout | Do you accept? [y/N]
2023-08-18 23:26:56 | ERROR | stderr | Process model_worker(73212):
2023-08-18 23:26:56 | ERROR | stderr | Traceback (most recent call last):
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
2023-08-18 23:26:56 | ERROR | stderr | self.run()
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/multiprocessing/process.py", line 108, in run
2023-08-18 23:26:56 | ERROR | stderr | self._target(*self._args, **self._kwargs)
2023-08-18 23:26:56 | ERROR | stderr | File "/home/ai/Langchain-Chatchat/server/llm_api.py", line 194, in run_model_worker
2023-08-18 23:26:56 | ERROR | stderr | app = create_model_worker_app(*args, **kwargs)
2023-08-18 23:26:56 | ERROR | stderr | File "/home/ai/Langchain-Chatchat/server/llm_api.py", line 128, in create_model_worker_app
2023-08-18 23:26:56 | ERROR | stderr | worker = ModelWorker(
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/site-packages/fastchat/serve/model_worker.py", line 207, in init
2023-08-18 23:26:56 | ERROR | stderr | self.model, self.tokenizer = load_model(
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/site-packages/fastchat/model/model_adapter.py", line 268, in load_model
2023-08-18 23:26:56 | ERROR | stderr | model, tokenizer = adapter.load_model(model_path, kwargs)
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/site-packages/fastchat/model/model_adapter.py", line 72, in load_model
2023-08-18 23:26:56 | ERROR | stderr | model = AutoModelForCausalLM.from_pretrained(
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 461, in from_pretrained
2023-08-18 23:26:56 | ERROR | stderr | config, kwargs = AutoConfig.from_pretrained(
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 986, in from_pretrained
2023-08-18 23:26:56 | ERROR | stderr | trust_remote_code = resolve_trust_remote_code(
2023-08-18 23:26:56 | ERROR | stderr | File "/root/miniconda3/envs/langchain/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 538, in resolve_trust_remote_code
2023-08-18 23:26:56 | ERROR | stderr | answer = input(
2023-08-18 23:26:56 | ERROR | stderr | EOFError: EOF when reading a line

@chenkaiC4
Copy link

感谢

经实践,通过peft方式加载lora模型(而非加载合并后的模型)部署是成功的,但api和webui似乎存在适配上的问题。 我通过python3 -m fastchat.serve.cli --model-path /data2/project/peft-model,以命令行形式交互得到预期的结果。

而使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后,通过python server/api.py启动api服务,通过streamlit run webui.py启动webui服务, 在输入时 api服务报错如下(似乎是端口问题) —————————————————————— root@docker-desktop:/data1/llm/code/Langchain-Chatchat# python server/api.py 2023-08-18 08:49:10,100 - utils.py[line:148] - INFO: Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-08-18 08:49:10,100 - utils.py[line:160] - INFO: NumExpr defaulting to 8 threads. INFO: Started server process [32283] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit) INFO: 127.0.0.1:46718 - "POST /chat/chat HTTP/1.1" 200 OK 2023-08-18 08:49:35,508 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,512 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,522 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 8.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,530 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 10.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' Caught exception: Error communicating with OpenAI

此问题也在isssues1110中提出 #1110

@jackaihfia2334 请教一个问题:

我用的也是 0.2.0版本,用chatglm2 的 ptuning 训练的,得到了训练后的模型在 ptuning/output/whoami-pt-128-2e-2/checkpoint-300目录下,使用 python3 -m fastchat.serve.cli 加载模型,该如何制定参数?原来的 chatgml2-6b 模型需要引入吗?

@jackaihfia2334
Copy link
Author

jackaihfia2334 commented Aug 19, 2023

感谢

经实践,通过peft方式加载lora模型(而非加载合并后的模型)部署是成功的,但api和webui似乎存在适配上的问题。 我通过python3 -m fastchat.serve.cli --model-path /data2/project/peft-model,以命令行形式交互得到预期的结果。
而使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后,通过python server/api.py启动api服务,通过streamlit run webui.py启动webui服务, 在输入时 api服务报错如下(似乎是端口问题) —————————————————————— root@docker-desktop:/data1/llm/code/Langchain-Chatchat# python server/api.py 2023-08-18 08:49:10,100 - utils.py[line:148] - INFO: Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-08-18 08:49:10,100 - utils.py[line:160] - INFO: NumExpr defaulting to 8 threads. INFO: Started server process [32283] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit) INFO: 127.0.0.1:46718 - "POST /chat/chat HTTP/1.1" 200 OK 2023-08-18 08:49:35,508 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,512 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,522 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 8.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,530 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 10.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' Caught exception: Error communicating with OpenAI
此问题也在isssues1110中提出 #1110

@jackaihfia2334 请教一个问题:

我用的也是 0.2.0版本,用chatglm2 的 ptuning 训练的,得到了训练后的模型在 ptuning/output/whoami-pt-128-2e-2/checkpoint-300目录下,使用 python3 -m fastchat.serve.cli 加载模型,该如何制定参数?原来的 chatgml2-6b 模型需要引入吗?

ptuning模型路径中需包含peft,例如改名为ptuning/output/whoami-pt-128-2e-2/peft-model。
原来的 chatgml2-6b 模型不需要手动引入,只需在你的peft-model的adapter_config中指定好base模型( chatgml2-6b 模型)的路径,一般是本身就自动生成的,可以检查一下。然后输入python3 -m fastchat.serve.cli --model-path XXXX即可(XXX为你的peft模型路径)
可参考 lm-sys/FastChat#2219

@chenkaiC4
Copy link

@jackaihfia2334 感谢回复,我参考你的提示做了,但是 P-Tuning v2后,生成的数据下,没有 adapter_config.json,只有一个 config.json。下面是我的目录结构:

image

运行指令:
python3 -m fastchat.serve.cli --model-path /home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model

报错:

ValueError: Can't find 'adapter_config.json' at '/home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model'

然后手动添加了 adapter_config.json 文件,

{
    "base_model_name_or_path": "/home/ubuntu/code/ChatGLM2-6B/chatglm2-6b"
}

报错:
image


这个是P-Tuning v2 的数据格式对不上吗?

@jackaihfia2334
Copy link
Author

@jackaihfia2334 感谢回复,我参考你的提示做了,但是 P-Tuning v2后,生成的数据下,没有 adapter_config.json,只有一个 config.json。下面是我的目录结构:

image

运行指令: python3 -m fastchat.serve.cli --model-path /home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model

报错:

ValueError: Can't find 'adapter_config.json' at '/home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model'

然后手动添加了 adapter_config.json 文件,

{
    "base_model_name_or_path": "/home/ubuntu/code/ChatGLM2-6B/chatglm2-6b"
}

报错: image

这个是P-Tuning v2 的数据格式对不上吗?

maybe fastchat对p-tuning不支持或者有其他支持方式,可能需要去fastchat的项目里查看一下提个issue

@chenkaiC4
Copy link

感谢 @jackaihfia2334 握爪

@jackaihfia2334
Copy link
Author

jackaihfia2334 commented Aug 19, 2023

@jackaihfia2334 感谢回复,我参考你的提示做了,但是 P-Tuning v2后,生成的数据下,没有 adapter_config.json,只有一个 config.json。下面是我的目录结构:

image

运行指令: python3 -m fastchat.serve.cli --model-path /home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model

报错:

ValueError: Can't find 'adapter_config.json' at '/home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model'

然后手动添加了 adapter_config.json 文件,

{
    "base_model_name_or_path": "/home/ubuntu/code/ChatGLM2-6B/chatglm2-6b"
}

报错: image

这个是P-Tuning v2 的数据格式对不上吗?

factchat的源码里对peft-model都是去匹配adapter_config.json的,你可以试试看把你的config.json重命名为adapter_config.json。然后做相应的修改(估计 还会有其他bug)从你的报错来看是缺少peft_type参数

我使用lora微调得到的adapter_config.json内容如下,供参考
——————————————————————————————————
{
"auto_mapping": null,
"base_model_name_or_path": "/data2/model/chatglm2-6b",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"lora_alpha": 32.0,
"lora_dropout": 0,
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"revision": null,
"target_modules": [
"query_key_value"
],
"task_type": "CAUSAL_LM"
}
——————————————————

@chenkaiC4
Copy link

@jackaihfia2334 老哥太感谢了,我试了下,还是有问题,我准备弃坑了。改用你用推荐的 ChatGLM-Efficient-Tuning,不过这个项目不更新了,现在是 https://github.com/hiyouga/LLaMA-Efficient-Tuning,请问目前是用老版本的 ChatGLM-Efficient-Tuning 还是新的 LLaMA-Efficient-Tuning?我也是要跑下 self_cognation 的训练。目前你能集成到Langchain-Chatchat中,用接口调用了吗?

@jackaihfia2334
Copy link
Author

@jackaihfia2334 老哥太感谢了,我试了下,还是有问题,我准备弃坑了。改用你用推荐的 ChatGLM-Efficient-Tuning,不过这个项目不更新了,现在是 https://github.com/hiyouga/LLaMA-Efficient-Tuning,请问目前是用老版本的 ChatGLM-Efficient-Tuning 还是新的 LLaMA-Efficient-Tuning?我也是要跑下 self_cognation 的训练。目前你能集成到Langchain-Chatchat中,用接口调用了吗?

LLaMA-Efficient-Tuning这个新版本我也没用过,打赏试试看。集成到Langchain-Chatchat就是我在这个issue提的问题,llm服务可以启动,但是gradio_webui_surverr不适配,没法在网页上部署。我使用fastchat原生的webui也不成功,bug我也贴在上面,作者团队会后续修正。我也在fasthchat里提了issue。lm-sys/FastChat#2262

经过实践,使用fastchat的命令行可以成功(python3 -m fastchat.serve.cli)。我自己魔改fastchat原生的webui(fastchat.serve.gradio_web_server)也可以成功,但方式有点笨,等待官方后续修改。

@chenkaiC4
Copy link

@jackaihfia2334 API 接口是正常的吗?我目前只需要API接口能访问就行。

@jackaihfia2334
Copy link
Author

@jackaihfia2334 API 接口是正常的吗?我目前只需要API接口能访问就行。

往上翻翻,我贴的很具体了

@jackaihfia2334
Copy link
Author

@jackaihfia2334 API 接口是正常的吗?我目前只需要API接口能访问就行。

就是python -m fastchat.serve.model_worker成功python server/api.py报错
使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后,通过python server/api.py启动api服务,通过streamlit run webui.py启动webui服务,
在输入时 api服务报错如下(似乎是端口问题)

@chenkaiC4
Copy link

chenkaiC4 commented Aug 19, 2023

看到了,这个很奇怪的。按理模型加载后,前、后端的逻辑通过 HTTP 接口走,这块没有改动,本不应该报错的。

@chenkaiC4
Copy link

#1130 (comment)
的确像是没找到llm服务,然后超时 retry了

@chenkaiC4
Copy link

chenkaiC4 commented Aug 20, 2023

@jackaihfia2334 我用 https://github.com/hiyouga/LLaMA-Efficient-Tuning 训练后,用它的测试,能达到预期。

然后我用命令行方式:python3 -m fastchat.serve.cli --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_checkpoints,模型加载了,但是只加载了chatgml2的原始模型,问答也达不到预期效果。
下面是我生成的 adapter_config.json

{
  "auto_mapping": null,
  "base_model_name_or_path": "/home/ubuntu/code/glmchain/chatglm2-6b",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "lora_alpha": 32.0,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 8,
  "revision": null,
  "target_modules": [
    "query_key_value"
  ],
  "task_type": "CAUSAL_LM"
}

其中,base_model_name_or_path 中 /home/ubuntu/code/glmchain/chatglm2-6b,是本机下载的原生的chatglm2-6b模型位置。

下图是lora训练后的checkpoints结构:
image

看起来和你的没有区别,但cli 的问答达不到预期效果。

能看下你lora训练后的checkpoints 文件夹里的文件目录吗?

【更新】
使用老版本的代码:https://github.com/hiyouga/ChatGLM-Efficient-Tuning,训练后,运行 python3 -m fastchat.serve.cli --model /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_checkpoints ,问答达到预期。应该是新版本里,lora 和 fastchat 有不适配的地方。

@chenkaiC4
Copy link

chenkaiC4 commented Aug 20, 2023

@jackaihfia2334 问题解决了 😸 。操作如下:

  1. 采用老版本的代码进行训练,使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
  2. 训练完的checkpoint 目录,需要带有 peft,我的是 peft_chatglm2。(其中包含 adapter_config.json)
  3. 依次执行下面的指令:

1. 打开fastchat系统级的 http 服务,有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务,这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步,设置 configs/model_config.py后,运行 python server/api.py。设置如下:

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑,被坑主要的原因是没有执行 第三步 ,没有启动 openai 风格的接口,而FastChat里,使用的都是openai风格的API,然后就 http retry了

image

@jackaihfia2334
Copy link
Author

感谢解答!成功运行了。现在还存在两个问题
1.lora微调合并后的模型webui部署似乎还存在问题
2.希望LLaMA-Efficient-Tuning能够兼容fastchat

@XiaHuGXB
Copy link

可以尝试在LLaMA-Efficient-Tuning/ChatGLM-Efficient-Tuning中启动llm的api的服务,试了一下这样可以绕过fastchat

@jackaihfia2334
Copy link
Author

可以尝试在LLaMA-Efficient-Tuning/ChatGLM-Efficient-Tuning中启动llm的api的服务,试了一下这样可以绕过fastchat

这样操作是可行的。现在就是好奇为什么fastchat加载微调合并后的模型无法得到预期效果。

@dijkstra-mose
Copy link

dijkstra-mose commented Aug 22, 2023

也碰到同样的问题。调试发现是因为gradio_web_server.py里存在bug:

class State:
    def __init__(self, model_name):
        self.conv = get_conversation_template(model_name)   

这时会调用PeftModelAdapter.get_default_conv_template()
但由于PeftModelAdapter是动态读取conv_template的,这时读取conv是错的。
正确的逻辑应该改为调用model_worker的api动态读取conv:

class State:
    def __init__(self, model_name):
        ret = requests.post(
            controller_url + "/get_worker_address", json={"model": model_name}
        )
        worker_addr = ret.json()["address"]
        ret = requests.post(worker_addr + "/worker_get_conv_template")
        conv = ret.json()["conv"]
        self.conv = Conversation(
            name=conv["name"],
            system_template=conv["system_template"],
            system_message=conv["system_message"],
            roles=conv["roles"],
            messages=conv["messages"],
            offset=conv["offset"],
            sep_style=conv["sep_style"],
            sep=conv["sep"],
            sep2=conv["sep2"],
            stop_str=conv["stop_str"],
            stop_token_ids=conv["stop_token_ids"],
        )
        logger.info(f"model_name: {model_name}, worker_addr: {worker_addr}, worker_get_conv_template")

这样可以读取到正确的conv了,应该已经解决lora checkpoint部署的问题。

@MyGitHubPigStar
Copy link

感谢解答!成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前,使用lora合并后的模型,能用api的方式启动吗

@jackaihfia2334
Copy link
Author

感谢解答!成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前,使用lora合并后的模型,能用api的方式启动吗

可以启动,但回答没有得到微调的效果,用chatglm-efficient-tuning方式启动的api可以达到效果。
应该是fastchat本身存在问题

@MyGitHubPigStar
Copy link

MyGitHubPigStar commented Aug 22, 2023

感谢解答!成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前,使用lora合并后的模型,能用api的方式启动吗

可以启动,但回答没有得到微调的效果,用chatglm-efficient-tuning方式启动的api可以达到效果。 应该是fastchat本身存在问题

感谢!很奇怪。我使用了8月初生成的模型(chatglm-efficient-tuning),通过最新版的langchain是可以加载成功。但再次训练模型放入langchain就无法正确加载。

@BC-0521
Copy link

BC-0521 commented Aug 22, 2023

@jackaihfia2334 问题解决了 😸 。操作如下:

  1. 采用老版本的代码进行训练,使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
  2. 训练完的checkpoint 目录,需要带有 peft,我的是 peft_chatglm2。(其中包含 adapter_config.json)
  3. 依次执行下面的指令:

1. 打开fastchat系统级的 http 服务,有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务,这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步,设置 configs/model_config.py后,运行 python server/api.py。设置如下:

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑,被坑主要的原因是没有执行 第三步 ,没有启动 openai 风格的接口,而FastChat里,使用的都是openai风格的API,然后就 http retry了

image

大佬,https://github.com/hiyouga/ChatGLM-Efficient-Tuning这个链接里有5个版本,用哪个版本训练呢,我们运行python3 -m fastchat.serve.controller,第一步就报错了,
微信图片_20230822175715
怎么解决啊

@hzg0601
Copy link
Collaborator

hzg0601 commented Aug 22, 2023

已有答案,参考 #1130 (comment)

@Gzj369
Copy link

Gzj369 commented Sep 8, 2023

@jackaihfia2334 问题解决了 😸 。操作如下:

  1. 采用老版本的代码进行训练,使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
  2. 训练完的checkpoint 目录,需要带有 peft,我的是 peft_chatglm2。(其中包含 adapter_config.json)
  3. 依次执行下面的指令:

1. 打开fastchat系统级的 http 服务,有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务,这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步,设置 configs/model_config.py后,运行 python server/api.py。设置如下:

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑,被坑主要的原因是没有执行 第三步 ,没有启动 openai 风格的接口,而FastChat里,使用的都是openai风格的API,然后就 http retry了

image

@chenkaiC4

@jackaihfia2334 问题解决了 😸 。操作如下:

  1. 采用老版本的代码进行训练,使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
  2. 训练完的checkpoint 目录,需要带有 peft,我的是 peft_chatglm2。(其中包含 adapter_config.json)
  3. 依次执行下面的指令:

1. 打开fastchat系统级的 http 服务,有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务,这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步,设置 configs/model_config.py后,运行 python server/api.py。设置如下:

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑,被坑主要的原因是没有执行 第三步 ,没有启动 openai 风格的接口,而FastChat里,使用的都是openai风格的API,然后就 http retry了
image

大佬,https://github.com/hiyouga/ChatGLM-Efficient-Tuning这个链接里有5个版本,用哪个版本训练呢,我们运行python3 -m fastchat.serve.controller,第一步就报错了, 微信图片_20230822175715 怎么解决啊

@BC-0521 这个应该可以通过运行python时指定 --host 127.0.0.1 解决,或者看看是不是还有其他正在执行的进程导致端口占用

@Gzj369
Copy link

Gzj369 commented Sep 8, 2023

@jackaihfia2334 问题解决了 😸 。操作如下:

  1. 采用老版本的代码进行训练,使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
  2. 训练完的checkpoint 目录,需要带有 peft,我的是 peft_chatglm2。(其中包含 adapter_config.json)
  3. 依次执行下面的指令:

1. 打开fastchat系统级的 http 服务,有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务,这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步,设置 configs/model_config.py后,运行 python server/api.py。设置如下:

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑,被坑主要的原因是没有执行 第三步 ,没有启动 openai 风格的接口,而FastChat里,使用的都是openai风格的API,然后就 http retry了

image

@chenkaiC4 @jackaihfia2334 2位好,我应该是3个步骤+model_config.py 都是按照你说的设置的,现在启动web_ui提问,还是有问题
CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.controller --host 127.0.0.1 --port 21001

CUDA_VISIBLE_DEVICES=0 PEFT_SHARE_BASE_WEIGHTS=true python -m fastchat.serve.multi_model_worker --model-path /home/Baichuan2-13B-Chat/lora_checkpoint_60_baichuan2/peft_checkpoint-2216 --model-names peft_lora_baichuan2_13b_chat --num-gpus 1 --host 127.0.0.1 --port 21002

CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8888

image

CUDA_VISIBLE_DEVICES=0 python server/api.py

CUDA_VISIBLE_DEVICES=0 streamlit run webui.py --server.port 8081

web_ui提问,查看后端,提示如下错误
image

麻烦帮忙看看,非常感谢

@Gzj369
Copy link

Gzj369 commented Sep 11, 2023

@jackaihfia2334 问题解决了 😸 。操作如下:

  1. 采用老版本的代码进行训练,使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
  2. 训练完的checkpoint 目录,需要带有 peft,我的是 peft_chatglm2。(其中包含 adapter_config.json)
  3. 依次执行下面的指令:

1. 打开fastchat系统级的 http 服务,有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务,这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步,设置 configs/model_config.py后,运行 python server/api.py。设置如下:

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑,被坑主要的原因是没有执行 第三步 ,没有启动 openai 风格的接口,而FastChat里,使用的都是openai风格的API,然后就 http retry了
image

@chenkaiC4 @jackaihfia2334 2位好,我应该是3个步骤+model_config.py 都是按照你说的设置的,现在启动web_ui提问,还是有问题 CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.controller --host 127.0.0.1 --port 21001

CUDA_VISIBLE_DEVICES=0 PEFT_SHARE_BASE_WEIGHTS=true python -m fastchat.serve.multi_model_worker --model-path /home/Baichuan2-13B-Chat/lora_checkpoint_60_baichuan2/peft_checkpoint-2216 --model-names peft_lora_baichuan2_13b_chat --num-gpus 1 --host 127.0.0.1 --port 21002

CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8888

image

CUDA_VISIBLE_DEVICES=0 python server/api.py

CUDA_VISIBLE_DEVICES=0 streamlit run webui.py --server.port 8081

web_ui提问,查看后端,提示如下错误 image

麻烦帮忙看看,非常感谢

通过重新执行如下2个命令,web_ui.py已经可以正常访问了,但是测试发现针对提问容易输出重复的答案
CUDA_VISIBLE_DEVICES=0 python server/api.py

CUDA_VISIBLE_DEVICES=0 streamlit run webui.py --server.port 8081

@pursure-Hy
Copy link

@jackaihfia2334 问题解决了 😸 。操作如下:

  1. 采用老版本的代码进行训练,使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
  2. 训练完的checkpoint 目录,需要带有 peft,我的是 peft_chatglm2。(其中包含 adapter_config.json)
  3. 依次执行下面的指令:

1. 打开fastchat系统级的 http 服务,有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务,这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步,设置 configs/model_config.py后,运行 python server/api.py。设置如下:

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑,被坑主要的原因是没有执行 第三步 ,没有启动 openai 风格的接口,而FastChat里,使用的都是openai风格的API,然后就 http retry了

image

但是我看您的路径还是LLAMA-Efficient的呀,这不是新版本的吗?

@nailuonice
Copy link

感谢解答!成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前,使用lora合并后的模型,能用api的方式启动吗

lora合并后的模型实操是可以的 就死合并这个动作本身很麻烦 导出又导入的。 感觉chatchat项目本身支持lora加载并且有效才是王道,我现在的问题就是 lora不合并,按照大佬们的操作步骤能运行,但就是推理的时候发现没效果呀!

@Gzj369
Copy link

Gzj369 commented Dec 21, 2023 via email

@nailuonice
Copy link

nailuonice commented Dec 21, 2023

这里介绍lora加载的方式!!!

model_config.py相关
Pasted Graphic 8

检查是否有llm_model_dict字段
Pasted Graphic 9

7777端口跟 server_confgi.py对得上就行 随便改
Pasted Graphic 10



lora相关:对应adapter_config.json 里面base_model_name_or_path写对即可
Pasted Graphic 11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests