##### 版權所有 2024 Google LLC.


In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemma - 使用 XTuner 進行微調

這個筆記本展示了如何使用 XTuner 對 Gemma 進行微調。[XTuner](https://github.com/InternLM/xtuner) 是一個高效、靈活且功能齊全的 LLM 微調工具包。XTuner 包裝了 Hugging Face 的微調功能，並提供了一個簡單的介面來進行微調。使用 XTuner 對 Gemma 進行微調非常簡單。

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/doggy8088/gemma-cookbook/blob/zh-tw/Gemma/Finetune_with_XTuner.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />在 Google Colab 中執行</a>
  </td>
</table>


## 設定

### 選擇 Colab 執行環境
要完成這個指南，你需要有一個具有足夠資源的 Colab 執行環境來執行 Gemma 模型。在這種情況下，你可以使用 T4 GPU:

1. 在 Colab 視窗的右上角，選擇 **▾ (額外連接選項)** 。
2. 選擇 **變更執行環境類型** 。
3. 在 **硬體加速器** 下，選擇 **T4 GPU** 。

### 在 Hugging Face 上設定 Gemma
XTuner 在底層使用 Hugging Face。因此你需要:

* 通過接受 Hugging Face 上特定模型頁面的 Gemma 授權來獲取 [huggingface.co](huggingface.co) 上的 Gemma 訪問權限，即 [Gemma 2B](https://huggingface.co/google/gemma-2b)。
* 生成一個 [Hugging Face 訪問令牌](https://huggingface.co/docs/hub/en/security-tokens) 並將其配置為 Colab 機密 'HF_TOKEN'。


In [2]:
import os
from google.colab import userdata
# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["HF_TOKEN"] = userdata.get("HF_TOKEN")

### 安裝 XTuner


In [3]:
!pip install -U 'xtuner'

Collecting xtuner
  Downloading xtuner-0.1.19-py3-none-any.whl (1.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes>=0.40.0.post4 (from xtuner)
  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets>=2.16.0 (from xtuner)
  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m40.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops (from xtuner)
  Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting lagent>=0.1.2 (from xtuner)
  Downloading lagent-0.2.2-py3-none-any.whl (69 kB)
[2K     [90m━

## 調整 Gemma

XTuner 有許多內建的配置用於調整各種 LLMs。查看與 Gemma 相關的配置。如果你對它們的外觀感到好奇或想進行調整，請查看這些[文件](https://github.com/InternLM/xtuner/tree/main/xtuner/configs/gemma)。


In [4]:
!xtuner list-cfg | grep gemma

gemma_2b_full_alpaca_e3
gemma_2b_it_full_alpaca_e3
gemma_2b_it_qlora_alpaca_e3
gemma_2b_qlora_alpaca_e3
gemma_7b_full_alpaca_e3
gemma_7b_it_full_alpaca_e3
gemma_7b_it_qlora_alpaca_e3
gemma_7b_qlora_alpaca_e3


為了展示，此筆記本使用 [QLoRA](https://arxiv.org/abs/2305.14314) 和 [Alpaca dataset](https://huggingface.co/datasets/tatsu-lab/alpaca) 微調指令調整的 Gemma 2B 模型。你也可以選擇啟用 DeepSpeed。


In [11]:
!xtuner train gemma_2b_it_qlora_alpaca_e3

06/02 03:40:31 - mmengine - [4m[97mINFO[0m - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 204097869
    GPU 0: NVIDIA A100-SXM4-40GB
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 12.2, V12.2.140
    GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
    PyTorch: 2.3.0+cu121
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -genc

### 轉換到 Hugging Face


建立一個資料夾來存放轉換後的 HF 模型。


In [12]:
!mkdir -p work_dirs/gemma_2b_it_qlora_alpaca_e3_hf

將 LoRA 適配器轉換為 HF。


In [13]:
!xtuner convert pth_to_hf gemma_2b_it_qlora_alpaca_e3 work_dirs/gemma_2b_it_qlora_alpaca_e3/iter_6500.pth work_dirs/gemma_2b_it_qlora_alpaca_e3_hf

quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
`low_cpu_mem_usage` was None, now set to True since model is quantized.
`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 100% 2/2 [00:03<00:00,  1.57s/it]
Load PTH model from work_dirs/gemma_2b_it_qlora_alpaca_e3/iter_6500.pth
Saving adapter to work_dirs/gemma_2b_it_qlora_alpaca_e3_hf
Convert LLM to float16
All done!


### 合併 LoRA adapter


建立一個資料夾來存放合併的模型。


In [14]:
!mkdir -p work_dirs/gemma_2b_it_qlora_alpaca_e3_merged

合併模型和 LoRA 轉接器。


In [15]:
!xtuner convert merge google/gemma-2b-it work_dirs/gemma_2b_it_qlora_alpaca_e3_hf work_dirs/gemma_2b_it_qlora_alpaca_e3_merged --max-shard-size 2GB

2024-06-02 04:42:02.310283: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-02 04:42:02.367089: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-02 04:42:02.367144: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-02 04:42:02.369043: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-02 04:42:02.377440: I tensorflow/core/platform/cpu_feature_guar

### 上傳模型到 Hugging Face


從磁碟載入模型。


In [16]:
from transformers import AutoModel

model = AutoModel.from_pretrained(
    "work_dirs/gemma_2b_it_qlora_alpaca_e3_merged", local_files_only=True
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

將模型推送到 HF Hub。


In [17]:
model.push_to_hub("gemma-2-finetuned-model-xtuner")

model-00001-of-00003.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/134M [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/windmaple/gemma-2-finetuned-model-xtuner/commit/173d08858a594ed07939f68abb9050f6ceccdf61', commit_message='Upload model', commit_description='', oid='173d08858a594ed07939f68abb9050f6ceccdf61', pr_url=None, pr_revision=None, pr_num=None)

## 結論

這本筆記本展示了如何使用 XTuner 對 Gemma 2B IT 模型進行指令調整。如果你想用另一個數據集進行微調，請查看 XTuner 文件了解如何[準備你自己的數據集](https://github.com/InternLM/xtuner/blob/main/docs/en/user_guides/dataset_prepare.md)。
