## Fine-tune large models for Vietnamese poem generation using Low-rank adapter

*   Install requirements
*   Model loading
*   Post processing
*   Apply LoRa
*   Training

In this notebook, we will finetune a large models on `8 bit` quantization and `low-rank` adaptation for resource efficiency. Otherwise, colabs won't be able to run it. For the evaluation, we use a custom function to score the generated result, basing on its quality/conformity to the rigid rules of Vietnamese poetry.


### Install requirements

WARNING: This model is 13Gb in size
Either upload the project to google drive, name it accordingly (Trainer_file), and run on colab
Or don't use this section at all

In [None]:
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git

In [None]:
from google.colab import drive
drive.mount('/content/drive/')
%cd /content/drive/My Drive/Trainer_file/

### Model loading

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
import pandas as pd
import json

model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-7b1", 
    load_in_8bit=True, 
    device_map='auto'
)

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-7b1")

from utils.check_rule import *


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)


CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


Downloading (…)lve/main/config.json:   0%|          | 0.00/739 [00:00<?, ?B/s]



Downloading (…)model.bin.index.json:   0%|          | 0.00/27.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Post-processing on the model

Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all our layers, and cast the layer-norm in `float32` for stability. We also cast the output of the last layer in `float32` for the same reasons.

In [None]:
for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

### Apply LoRA

Here comes the magic with `peft`! Let's load a `PeftModel` and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from `peft`.

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    #target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 7864320 || all params: 7076880384 || trainable%: 0.11112693126452029


### Load data

In [None]:
from datasets import Dataset
import pandas as pd
data_train = pd.read_csv('resource/dataset/dataset.csv')
data_train = data_train[data_train['genre']=='luc bat'].reset_index(drop=True)[:20000]
data_train['prompt'] = data_train['prompt'] + '\n###\n '+ data_train['completion'].apply(lambda x: '.\n\n'.join(x.split('\n\n')[:3])) + '.'+'@@@'
data_train = data_train[['prompt']]

data_train = Dataset.from_pandas(data_train)

In [None]:
def tokenize_function(examples):
    return tokenizer(examples["prompt"])
data_train = data_train.map(tokenize_function, batched=True, num_proc=4, remove_columns=["prompt"])

Map (num_proc=4):   0%|          | 0/20000 [00:00<?, ? examples/s]

In [None]:
# block_size = tokenizer.model_max_length
block_size = 128
def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
        # customize this part to your needs.
    total_length = (total_length // block_size) * block_size
    # Split by chunks of max_len.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result

data_train = data_train.map(
    group_texts,
    batched=True,
    batch_size=1000,
    num_proc=4,
)

Map (num_proc=4):   0%|          | 0/20000 [00:00<?, ? examples/s]

### Training

In [None]:
trainer = Trainer(
    model=model, 
    train_dataset=data_train,
    args=TrainingArguments(
        output_dir='finetune',
        num_train_epochs=1,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        save_steps=10000, # no saving checkpoint (it takes up lots of space)
        save_total_limit=1,
        learning_rate=2e-5,
        fp16=True,
        logging_steps=10,
        
    ),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!

In [None]:
trainer.train()
model.save_pretrained('modeling/poem_generator_bloom')