From a2065a4f2ee350e59891bdbd89b20af49f7b70ff Mon Sep 17 00:00:00 2001 From: wangshuai09 <391746016@qq.com> Date: Tue, 30 Jan 2024 16:14:51 +0800 Subject: [PATCH] Update README for NPU inference --- README.md | 14 ++++++++++++++ README_EN.md | 13 +++++++++++++ 2 files changed, 27 insertions(+) diff --git a/README.md b/README.md index 4aab786..0fb42f5 100644 --- a/README.md +++ b/README.md @@ -322,6 +322,20 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to( 在 Mac 上进行推理也可以使用 [ChatGLM.cpp](https://github.com/li-plus/chatglm.cpp) +### NPU 部署 + +如果你拥有华为昇腾 Ascend 硬件,可以使用 NPU 后端运行 ChatGLM2-6B, 需按照如下步骤安装依赖: + +```shell +pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu +pip install torch_npu==2.1.0 +``` + +同时,模型加载修改后端: +```python +model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu') +``` + ### 多卡部署 如果你有多张 GPU,但是每张 GPU 的显存大小都不足以容纳完整的模型,那么可以将模型切分在多张GPU上。首先安装 accelerate: `pip install accelerate`,然后通过如下方法加载模型: ```python diff --git a/README_EN.md b/README_EN.md index 7a54334..bc4be7d 100644 --- a/README_EN.md +++ b/README_EN.md @@ -241,6 +241,19 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to( Loading a FP16 ChatGLM-6B model requires about 13GB of memory. Machines with less memory (such as a MacBook Pro with 16GB of memory) will use the virtual memory on the hard disk when there is insufficient free memory, resulting in a serious slowdown in inference speed. +### NPU Deployment + +If your device is Ascend, it is possible to use the NPU backend to run ChatGLM-6B on Ascend device. First, you need to install torch and torch_npu: +```shell +pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu +pip install torch_npu==2.1.0 +``` + +Then you need to change the code to load model to NPU backend: +```python +model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu') +``` + ## License The code of this repository is licensed under [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0). The use of the ChatGLM2-6B model weights is subject to the [Model License](MODEL_LICENSE). ChatGLM2-6B weights are **completely open** for academic research, and **free commercial use** is also allowed after completing the [questionnaire](https://open.bigmodel.cn/mla/form).