## LLaMa2 Fine-tuning with QLoRA

QLoRA를 적용하여, LLaMa2를 Fine-tuning.<br>
사용된 방식은 KoLLaMa2를 만들 때, 사용된 방식을 채용하였습니다. <br>
https://colab.research.google.com/drive/19AFEOrCI6-bc7h9RTso_NndRwXJRaJ25?usp=sharing#scrollTo=OSHlAbqzDFDq
<br>
또한, Alpaca-Lora의 프롬프트를 채용하여 사용하였습니다.

### 1. 필수 파일 설치
PEFT는 버그때문에 제대로 LoRA가 적용되지 않아, 새로 설치

In [None]:
!pip install accelerate appdirs loralib bitsandbytes black\
black[jupyter] datasets fire git+https://github.com/huggingface/peft.git transformers>=4.28.0 sentencepiece py7zr scipy gradio\
guardrail-ml==0.0.12 tensorboard trl==0.4.7 bitsandbytes==0.40.2\
unstructured["local-inference"]==0.7.4 pillow==9.0.0 protobuf==3.20.0 wandb

  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-duo_xf2p


In [None]:
!pip uninstall peft -y -q
!pip install -q git+https://github.com/huggingface/peft.git@e536616888d51b453ed354a6f1e243fecb02ea08

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for peft (pyproject.toml) ... [?25l[?25hdone


In [None]:
%cd /content/drive/MyDrive/LLaMaprac/llama

/content/drive/MyDrive/LLaMaprac/llama


### 2. Huggingface 로그인
LLaMa2를 이용하기 위해, 허깅페이스 로그인을 먼저 해야함.

In [None]:
!huggingface-cli login --token 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

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


### 3. 모델 Load

In [None]:
import os
import sys
from typing import List

import fire
import torch
from trl import SFTTrainer
from datasets import load_dataset

from guardrail.client import (
    run_metrics,
    run_simple_metrics,
    create_dataset)

from peft import (
    LoraConfig,
    get_peft_model,
    get_peft_model_state_dict,
    set_peft_model_state_dict,
    prepare_model_for_int8_training
)

import transformers
from transformers import (
    LlamaForCausalLM,
    LlamaTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
    Trainer
)

In [None]:
base_model: str = 'meta-llama/Llama-2-7b-hf' # the only required argument
data_path: str = "/content/drive/MyDrive/Judgement_dataset/law_dicts_train.json"
output_dir: str = "/content/drive/MyDrive/LLaMaprac/output"
# training hyperparams
batch_size: int = 512
micro_batch_size: int = 16
num_epochs: int = 3
learning_rate: float = 3e-4
cutoff_len: int = 256
val_set_size: int = 2000
# lora hyperparams
lora_r: int = 8
lora_alpha: int = 16
lora_dropout: float = 0.05
lora_target_modules: List[str] = [
    "q_proj",
    "v_proj",
]
# llm hyperparams
train_on_inputs: bool = True  # if False, masks out inputs in loss
add_eos_token: bool = False
group_by_length: bool = False  # faster, but produces an odd training loss curve

gradient_accumulation_steps = batch_size // micro_batch_size

In [None]:
# 4bit QLoRA 학습을 위한 설정
bnb_4bit_compute_dtype = "float16" # 코랩 무료버전에서 실행 시 "float16"를 사용하세요
bnb_4bit_quant_type = "nf4"
use_4bit = True
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    inference_mode=False,
    bias="none",
    task_type="CAUSAL_LM",
)

device_map = "auto"

In [None]:
from huggingface_hub import login

## Llama-2-7b를 불러온다. 다만, 허깅페이스 로그인이 필요하다.
login(token="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")
model = LlamaForCausalLM.from_pretrained(
        base_model,
        device_map=device_map,
        quantization_config=bnb_config,

)

# 토크나이저 불러오기, 패딩토큰은 LLaMa와 같기 때문에, 0으로 설정,
# 그리고, 패딩 side는 right로 설정해본다.
tokenizer = LlamaTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
# 학습 진행 중 loss가 치솟다가 0.0으로 떨어지는 문제 해결을 위해 사용
tokenizer.padding_side = "right"

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
model = get_peft_model(model, peft_config)

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
print_trainable_parameters(model)

trainable params: 4194304 || all params: 3504607232 || trainable%: 0.11967971650867153


### 4. 데이터셋을 불러오기

In [None]:
def loading_dataset(data_path):
        if data_path.endswith(".json") or data_path.endswith(".jsonl"):
            data = load_dataset("json", data_files=data_path, split="train")
        else:
            data = load_dataset(data_path)
        return data

In [None]:
dataset = loading_dataset(data_path)

In [None]:
train_val = dataset.train_test_split(
            test_size=val_set_size, shuffle=True, seed=42  )

In [None]:
train_val

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction'],
        num_rows: 10375
    })
    test: Dataset({
        features: ['input', 'output', 'instruction'],
        num_rows: 2000
    })
})

In [None]:
from utils.prompter import Prompter

prompter = Prompter('custom_template')

In [None]:
def tokenize(prompt, add_eos_token=True):

  # there's probably a way to do this with the tokenizer settings
  # but again, gotta move fast
  result = tokenizer(
    prompt,
    truncation=True,
    max_length=cutoff_len,
    padding=False,
    return_tensors=None,
  )
  if (
    result["input_ids"][-1] != tokenizer.eos_token_id
    and len(result["input_ids"]) < cutoff_len
    and add_eos_token
  ):
    result["input_ids"].append(tokenizer.eos_token_id)
    result["attention_mask"].append(1)

  result["labels"] = result["input_ids"].copy()

  return result
def generate_and_tokenize_prompt(data_point):
  full_prompt = prompter.generate_prompt(
        data_point["instruction"],
        data_point["input"],
        data_point["output"],
  )
  tokenized_full_prompt = tokenize(full_prompt)
  if not train_on_inputs:
    user_prompt = prompter.generate_prompt(
            data_point["instruction"], data_point["input"]
    )
    tokenized_user_prompt = tokenize(
            user_prompt, add_eos_token=add_eos_token
     )
    user_prompt_len = len(tokenized_user_prompt["input_ids"])

    if add_eos_token:
      user_prompt_len -= 1

    tokenized_full_prompt["labels"] = [
      -100
      ] * user_prompt_len + tokenized_full_prompt["labels"][
        user_prompt_len:
      ]  # could be sped up, probably
  return tokenized_full_prompt

In [114]:
train_data = train_val["train"].shuffle().map(generate_and_tokenize_prompt)
val_data = train_val["test"].shuffle().map(generate_and_tokenize_prompt)

Map:   0%|          | 0/10375 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

### 5. Training

In [117]:
# TrainingArguments 생성
training_arguments = TrainingArguments(
    output_dir=output_dir,
    save_strategy="steps",
    eval_steps=200 if val_set_size > 0 else None,
    save_steps=200,
    save_total_limit=3, # 가장 최근 체크포인트 3개만 저장합니다.
    logging_steps=10,
    #max_grad_norm=0.3,
    num_train_epochs=num_epochs, # epochs 대신 max_steps을 기준으로 할 수 있습니다.
    per_device_train_batch_size=micro_batch_size,
    learning_rate=learning_rate,
    warmup_steps=100,
    warmup_ratio=0.03,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim="paged_adamw_8bit", # paged_adamw_8bit 사용시 메모리를 더 절약할 수 있지만 loss가 0으로 떨어지는 문제가 있습니다.
    group_by_length=True,
    fp16 = True, # 코랩 무료버전에서 실행 시 "True"를 사용하세요
    bf16 = False, # 코랩 무료버전에서 실행 시 "False"를 사용하세요
    lr_scheduler_type="constant",
)

In [118]:
trainer = Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=training_arguments,
    data_collator=transformers.DataCollatorForSeq2Seq(
        tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
    ),
)

In [119]:
model.config.use_cache=False
trainer.train()

RuntimeError: ignored

### 번외. 만들어둔 Training.py를 이용해 데이터 로드 및 학습을 시작.

In [None]:
!python Training.py \
    --base_model 'meta-llama/Llama-2-7b-hf' \
    --data_path '/content/drive/MyDrive/Judgement_dataset/law_dicts_train.json' \
    --output_dir '/content/drive/MyDrive/LLaMaprac/output' \
    --num_epochs 10 \
    --learning_rate 5e-5 \
    --batch_size 512 \
    --micro_batch_size 16 \
    --prompt_template_name 'custom_template'

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
Training Llama2-QLoRA model with params:
base_model: meta-llama/Llama-2-7b-hf
data_path: /content/drive/MyDrive/Judgement_dataset/law_dicts_train.json
output_dir: /content/drive/MyDrive/LLaMaprac/output
batch_size: 512
micro_batch_size: 16
num_epochs: 10
learning_rate: 5e-05
cutoff_len: 256
val_set_size: 2000
lora_r: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj']
train_on_inputs: True
add_eos_token: False
group_by_length: False
wandb_project: 
wandb_run_name: 
wandb_watch: 
wandb_log_model: 
resume_from_checkpoint: False
prompt template: custom_template

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token i