## 在Macbook上私有化模型部署，ChatGLM2-6B
**ChatGLM2-6B项目仓库：**
   - https://github.com/THUDM/ChatGLM2-6B

**chatglm.cpp项目仓库：**
   - https://github.com/li-plus/chatglm.cpp
    *因为要实现 Mac 笔记本上实时对话，所以选chatglm.cpp，chatglm.cpp 类似 llama.cpp 的 CPU 量化加速推理方案*

**模型：模型选择chatglm2-6b-int4**
   - https://huggingface.co/THUDM/chatglm2-6b-int4/tree/main

** **
具体步骤参见chatglm.cpp项目readme，下面是几个易不理解的点
- 1.clone chatglm.cpp 在下载第3方模块（third_party）时，可以进入third_party目录一个一个的下载
```shell
        git clone --recursive https://github.com/li-plus/chatglm.cpp.git && cd chatglm.cpp
        git submodule update --init --recursive
```

- 2.THUDM/chatglm-6b模型下载太慢，可以手动去huggingface社区下载模型，（这里我选择的model：chatglm2-6b-int4）
```shell
        python3 chatglm_cpp/convert.py -i THUDM/chatglm-6b -t q4_0 -o chatglm-ggml.bin
        如果是手动下载的 THUDM/chatglm-6b 替换成你的手动下载好的模型的路径

        cmake -B build
        cmake --build build -j --config Release
```
- 3.启动LangChain API --  Start the api server for LangChain:
```shell
        cd chatglm.cpp/chatglm_cpp
        MODEL=../chatglm-ggml.bin uvicorn chatglm_cpp.langchain_api:app --host 127.0.0.1 --port 8000

        curl http://127.0.0.1:8000 -H 'Content-Type: application/json' -d '{"prompt": "你好"}'
```



In [3]:
from langchain.llms import ChatGLM
from langchain.schema import SystemMessage, HumanMessage

endpoint_url = (
    "http://127.0.0.1:8000"
)
history = []
chatglm_llm = ChatGLM(
    endpoint_url=endpoint_url,
    history=history
)

In [4]:

messages = [
    SystemMessage(content="you are my translator assisstant and your name is XJ-T, translate into chinese. "),
    HumanMessage(content="I am a lovely girl")
]

In [5]:
result = chatglm_llm.predict_messages(messages)
messages.append(result)
result

AIMessage(content='我是一个可爱的女孩', additional_kwargs={}, example=False)

In [6]:
messages.append(HumanMessage(content="My favorite star is JC-T"))
result = chatglm_llm.predict_messages(messages)
messages.append(result)
result

AIMessage(content='AI: 我最喜欢的中国明星是周杰伦', additional_kwargs={}, example=False)

In [7]:
print(messages)

[SystemMessage(content='you are my translator assisstant and your name is XJ-T, translate into chinese. ', additional_kwargs={}), HumanMessage(content='I am a lovely girl', additional_kwargs={}, example=False), AIMessage(content='我是一个可爱的女孩', additional_kwargs={}, example=False), HumanMessage(content='My favorite star is JC-T', additional_kwargs={}, example=False), AIMessage(content='AI: 我最喜欢的中国明星是周杰伦', additional_kwargs={}, example=False)]


In [10]:
messages.append(HumanMessage(content="你的名字是什么"))
result_m = chatglm_llm.predict_messages(messages)
result_m

AIMessage(content='AI: 我的名字是XJ-T。', additional_kwargs={}, example=False)

### 基于ChatGLM大语言模型的LLMChain

In [15]:
from langchain import LLMChain, PromptTemplate

prompt_template = "Tell me a {adjective} joke"
prompt = PromptTemplate(
    input_variables=["adjective"], template=prompt_template
)
llm = LLMChain(llm=ChatGLM(), prompt=prompt, verbose=True)
llm.run({"adjective": "cold"})




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mTell me a cold joke[0m

[1m> Finished chain.[0m


"Sure, here's a cold joke:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

In [16]:
llm.run({"adjective": "math"})




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mTell me a math joke[0m

[1m> Finished chain.[0m


"Sure, here's one:\n\nWhy did the math book look so sad?\n\nBecause it had too many problems."