From a2065a4f2ee350e59891bdbd89b20af49f7b70ff Mon Sep 17 00:00:00 2001
From: wangshuai09 <391746016@qq.com>
Date: Tue, 30 Jan 2024 16:14:51 +0800
Subject: [PATCH] Update README for NPU inference

---
 README.md    | 14 ++++++++++++++
 README_EN.md | 13 +++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/README.md b/README.md
index 4aab786..0fb42f5 100644
--- a/README.md
+++ b/README.md
@@ -322,6 +322,20 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to(
 
 在 Mac 上进行推理也可以使用 [ChatGLM.cpp](https://github.com/li-plus/chatglm.cpp)
 
+### NPU 部署
+
+如果你拥有华为昇腾 Ascend 硬件，可以使用 NPU 后端运行 ChatGLM2-6B, 需按照如下步骤安装依赖:
+
+```shell
+pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+pip install torch_npu==2.1.0
+```
+
+同时，模型加载修改后端:
+```python
+model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu')
+```
+
 ### 多卡部署
 如果你有多张 GPU，但是每张 GPU 的显存大小都不足以容纳完整的模型，那么可以将模型切分在多张GPU上。首先安装 accelerate: `pip install accelerate`，然后通过如下方法加载模型：
 ```python
diff --git a/README_EN.md b/README_EN.md
index 7a54334..bc4be7d 100644
--- a/README_EN.md
+++ b/README_EN.md
@@ -241,6 +241,19 @@ model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to(
 
 Loading a FP16 ChatGLM-6B model requires about 13GB of memory. Machines with less memory (such as a MacBook Pro with 16GB of memory) will use the virtual memory on the hard disk when there is insufficient free memory, resulting in a serious slowdown in inference speed.
 
+### NPU Deployment
+
+If your device is Ascend, it is possible to use the NPU backend to run ChatGLM-6B on Ascend device. First, you need to install torch and torch_npu:
+```shell
+pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+pip install torch_npu==2.1.0
+```
+
+Then you need to change the code to load model to NPU backend:
+```python
+model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='npu')
+```
+
 ## License
 
 The code of this repository is licensed under [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0). The use of the ChatGLM2-6B model weights is subject to the [Model License](MODEL_LICENSE). ChatGLM2-6B weights are **completely open** for academic research, and **free commercial use** is also allowed after completing the [questionnaire](https://open.bigmodel.cn/mla/form).