Skip to content

api deployment of chatglm-6b, chatglm2-6b, chatglm3-6b

Notifications You must be signed in to change notification settings

WangYangfan/chatglm-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

chatglm-api

🐥 api deployment of chatglm-6b, chatglm2-6b, chatglm3-6b and calling by python scripts.

Requirements

python==3.9.19
torch==2.3.0
transformers==4.40.2
accelerate==0.30.0
fastapi
argparse
loguru
python-dotenv

Setup

Before run scripts, need setup env: in envs/api.env, LOCAL_MODELS should be local root path of models. If using remote models, just keep LOCAL_MODELS="".

Hang Models

Hang one model

CUDA_VISIBLE_DEVICES=1 python api.py --model_name THUDM/chatglm-6b --port 8000

which means that a model named THUDM/chatglm-6b is hung on cuda:1 with port 8000.

Hang multi model

CUDA_VISIBLE_DEVICES=1 python api.py --model_name THUDM/chatglm-6b --port 8000

CUDA_VISIBLE_DEVICES=2 python api.py --model_name THUDM/chatglm2-6b --port 8001

CUDA_VISIBLE_DEVICES=3 python api.py --model_name THUDM/chatglm3-6b --port 8002

the script api.sh is suitable to all of chatglm-6b, chatglm2-6b, chatglm3-6b.

Calling

input = {
    'query': "123 + 123 + 123 + 123 = ?",
    # 'history': [['123 * 4 = ?', '123 * 4 = 496']],  # if chatglm-6b or chatglm2-6b
    # 'history': [{'role': 'user', 'content': '123 * 4 = ?'}, {'role': 'assistant', 'metadata': '', 'content': '123 * 4 = 496'}], # if chatglm3-6b
    'port': 8000,   # or 8001, 8002
}
request, history = gen_response(**input)
print(request, history)

🚨 Attention: chatglm-6b, chatglm2-6b and chatglm3-6b have different history forms when called.

About

api deployment of chatglm-6b, chatglm2-6b, chatglm3-6b

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages